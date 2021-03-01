



According to Sergey Levine and collaborators, the seemingly simple task of figuring out objects from large clusters of different types of objects is “one of the most serious open problems in robotics.” Understanding is a good example of problems that plague real machine learning, such as latencies that delay the expected order of events and goals that can be difficult to specify.

Most of the artificial intelligence is developed in an ideal environment. In other words, it is a computer simulation that avoids the unevenness of the real world. Whether it’s DeepMind’s AlphaMu program for Go and chess, Atari for language generation, or GPT-3 for OpenAI, all of the most sophisticated deep learning programs benefit from a set of constraints to improve the software. I have received it.

As such, the most difficult and perhaps most promising task of deep learning may be in the realm of robotics. Robotics introduces constraints that are completely unpredictable in the real world.

This is one point from a recent report by researchers at the University of California, Berkeley and Google that summarizes years of experimentation with robots using what is called reinforcement learning.

“In general, real-world tasks are both the greatest challenges and the greatest opportunities for reinforcement learning,” Sergey Levine, an assistant professor of electrical engineering and computer science at Berkeley, said in an email exchange. I think. ”

This month, Levine, who also holds a robotics appointment in the Google program, along with fellow researchers Julian Ibarz, Jie Tan, Chelsea Finn, Mrinal Kalakrishnan, and Peter Pastor, is titled “How to Train Your Robot with Deep Reinforcement Learning.” The review has been published. -Lessons learned. This is posted on the arXiv preprint server.

This treatise describes some experiments that Levine and others have performed over the years using reinforcement learning and summarizes where those experiments ran into obstacles.

The experiment involves the most basic tasks of robotics, such as grabbing an object with a robotic arm and moving it from one place on the table to another. Even this very simple task reveals a fascinating challenge.

Reinforcement learning is an approach to machine learning that has existed for decades. This is most famously used by Google’s DeepMind unit to develop AlphaZero. This is a program that allows you to repeat the game and beat the world’s top Go players, top chess players, and shogi players without any information about human play. DeepMind has extended the program to MuZero, allowing you to master Atari games with the same approach.

The basic idea of ​​reinforcement learning is that possible actions and results are searched and then stored in memory, and two algorithms called value functions and policy functions are combined to move the next action at any point in the task. Is to select. What was the most fruitful part of your search history? All calculations are based on the ultimate reward concept, such as winning a chess game.

Levine and colleagues point out that robotics breaks some of the most basic assumptions of its reinforcement learning paradigm.

For one thing, the robotics situation doesn’t go as cleanly as in strategy games such as Go and chess. The traditional model of reinforcement learning is called the Markov decision process, in which one state is followed by another in an orderly manner, depending on the action taken. All reinforcement learning assumes that you can measure how an action leads from one discrete state to another.

However, in the world of robotics, there is latency, the delay between one state and another. As Levine and co-authors explain, “latency means that the next state of the system does not depend directly on the measured state, but on the post-measurement latency state after a delay. I can’t observe this. “

As a result, “latency violates the most basic assumptions of MDP. [the Markov Decision Process], Therefore, some RLs can fail [reinforcement learning] “Algorithms,” the author points out. They show an example of a successful reinforcement learning program being interrupted when delaying a Markov state transition with expected latency.

There is another and perhaps bigger problem that arises in robotics. It’s the concept of goals and rewards.

Traditionally, reinforcement learning has assumed that goals are clearly defined, and all available actions can be evaluated by the program’s value function, which clearly indicates that the program is moving closer to or further from the goal. .. In chess, go, and atari games, the goal of victory in the game is clear, and movement advances the player in a measurable way toward that goal.

“In a simulation or video game environment, the reward function is usually easy to specify because it gives you full access to the simulator or game state and can determine if the task is complete or the game score is accessible,” Levine said. Collaborators say. write.

“But in the real world, assigning scores to quantify how well a task is completed can itself be a challenging perceptual problem.”

Think of a robot arm reaching out to open a door, the author says. The learning robot may try to optimize by approaching the doorknob. However, if you get too close to the doorknob and the angle is not enough to grab the knob, you will actually disappoint your ultimate goal. This is an example of how you can actually set a big goal backwards by optimizing subtasks such as approaching objects.

Various unsupervised learning approaches can lead to strategies for robots to grasp objects, making it difficult to specify goals and policies.

Kalashnikov and others

This is an example of how reinforcement learning suffers when the reward is “sparse”, that is, when there are few clues provided to the robot.

“This is definitely a big challenge,” Levine told ZDNet in an email. “And that’s one of the places where standard RLs exist. [reinforcement learning] Problem statements that assume that rewards are simply “provided” to agents in some way (for example, as part of the code) deviate from real-world requirements. ”

As explained in the article, the solution includes an approach, for example, demonstrating a task with a person performing an action. Standard reinforcement learning is not set up to accommodate such demonstration-specific goal specifications.

“To be able to handle this kind of’natural’task specification, we need to extend them,” Levine told ZDNet. Another approach is to accumulate a large amount of data in advance by simulation and supply it to the robot. But, again, the complexity of the real world avoids the reducing nature of simulation. This is what Levine and the company call the “real gap.” So simulations can be useful, but to some extent.

All of these challenges are exacerbated by the fact that they exist in many areas of life.

Levine and collaborators considered how to use the demonstration to set goals. In this case, the left frame provides the demonstration and the right side completes the task.

Xie et al. 2019

The complexity of robotics embodies many “real world problems.” “Power grid control, road network regulation, HVAC control, and even more complex applications in logistics, inventory management, and economics,” Levine told ZDNet.

“Robotics is the most physically concrete instantiation of these tasks and is the easiest human to relate to because we all share the experience of controlling our bodies. Therefore, it can be more easily associated with the robot you are trying to control. Its body, “he said.

Ultimately, Levine tends to see emergency evacuation as a virtue.

“Reinforcement learning is both a challenge and an opportunity here,” he told ZDNet. By not being tied to a simulator, he said, robots can learn a vocabulary of richer skills.

“In games like chess and go, the RL policy will be as good as the” simulator “in which it exists,” Levine said. “You may be able to play a game of chess, but you can’t learn anything else in that” world “because it doesn’t contain anything other than chess. ”

In the real world, in contrast, “robots can experience much of the same thing we experience, and with all their complexity they can confront the world, perhaps astonishing us. You can even learn what you can’t do. ”

“I think this is really exciting,” Levine said.

