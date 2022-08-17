



If you’ve used smart voice assistants such as Alexa, Siri, and what Google calls smart assistants, you’ve noticed that the technology is getting smarter every day. Google is waiting for you, Siri can speak in a gender-neutral voice, and Alexa can read your bedtime stories in the voice of your late grandmother. As we covered at last month’s Robotics event, robotics is also evolving by leaps and bounds. The gap between his two voice commands and autonomous robotics is huge for a variety of reasons. Last week, I went to Google’s robotics lab in Mountain View to see how that might change in the near future.

Teaching a robot how to perform repetitive tasks in a controlled space that humans are not allowed to enter is a difficult but more or less solved problem. Rivian’s recent factory tour reminded us of that, but industrial robots are ubiquitous in manufacturing.

General-purpose robots that can solve different tasks based on voice commands in human-populated spaces are much more difficult. “But what about the Roomba,” you might say, but everyone’s favorite robot vacuum is generally programmed not to touch anything other than or on the floor.

“You may be wondering why we play ping pong. One of the big challenges in robotics today is the intersection of fast, accurate and adaptable. There is.It’s not a problem.It’s ok in an industrial environment.But being fast, adaptable and accurate is a huge challenge.Ping pong is a really great microcosm of this problem.Accuracy and speed. You can learn from people you play with, and it’s a skill that you acquire through practice,” says Vincent, a prominent scientist and head of robotics at Google Research. Mr Vanhoucke told me. “It’s not a skill that makes you a champion overnight after reading the rules.

Speed ​​and accuracy are one thing, but what Google is really trying to decipher in the Robot Lab is the intersection of human language and robotics. We have made some impressive leaps in the level of robots understanding natural language that humans might use. “If you have time, could you bring me a drink from the counter?” is a very simple request that you can ask a human being. But to the machine, this statement seems to combine a lot of knowledge and understanding into his one question. Let’s break it down: Sometimes the word “wait a minute” is just meant metaphorically and doesn’t make any sense at all, or it’s actually a request for the robot to finish what it’s doing. Sometimes it is. If the robot is too literal, the “correct” answer to “can you bring me a drink” might just be the robot saying “yes”. I can do it. Make sure you can grab a drink. But as a user, you didn’t explicitly ask the robot to do that. And if we’re being very pedantic, you didn’t explicitly tell the robot to bring you a drink.

These are some of the problems Google is tackling with its natural language processing system. The Pathways Language Model — or PaLM among friends: Instead of literally doing what humans say, it processes and absorbs exactly what humans actually want.

The next challenge is to recognize what the robot can actually do. The robot might understand perfectly if you ask it to grab a bottle of cleaner from the top of the refrigerator, which is kept safely out of reach of children. The problem is that robots can’t reach that high. The big breakthrough is what Google calls “affordances”: what robots with some success can actually do. This can range from simple tasks (“move forward a meter”), to slightly more advanced tasks (“go find a coke can in the kitchen”), to complex and multifaceted tasks that require the robot to demonstrate considerable comprehension. May include step actions. one’s abilities and the world around them. (“Hmm, I spilled a can of Coke on the floor. Can you mop it up and bring me a health drink?”)

Google’s approach uses the knowledge contained in the language model (“Say”) to determine and score actions that are useful for high-level instructions. It also uses affordance functions (“Can”) that enable real-world grounding and determine possible actions in a given environment. Using the PaLM language model, Google calls it PaLM-SayCan.

To solve the more advanced commands above, the robot needs to break it down into several individual steps. An example is:

come on speaker Look on the floor, find the spill and remember where it is. Search drawers, cabinets, and kitchen counters for mops, sponges, or paper towels. If you find a cleaning tool (there is a sponge in the drawer), pick it up. Close the drawer. Go to Spill. Clean up any spills while monitoring that the sponge can absorb all the liquid. If not, wring it out in the sink and come back. Once the spill is clean, squeeze the sponge again. Turn on the faucet, rinse the sponge, turn off the faucet, and squeeze the sponge one last time. Open the drawer, put the sponge away, close the drawer. Identify the drinks you have in your kitchen and somehow decide which ones are “healthier” than Coke. Find a bottle of water in the refrigerator, pick it up, and bring it to the person who asked for it. The person may have moved after asking the question. He sinks 14 times. I thought it would be a great idea to use a small kitchen sponge to wipe off 11 ounces of liquid instead of using paper towels.

Anyway — I’m joking here, but you get the point. Even instructions that sound relatively simple can actually involve numerous steps, logic, and decisions along the way. Have you found the healthiest drink, or is your goal to have a healthier drink than Coca-Cola? Get your drink first, then figure out the rest of the task Does it make sense to allow humans to quench their thirst in between?

The key here is to teach robots what they can and cannot do, and what makes sense in different situations. A tour of Google’s robotics lab shows more than 30 different robots, both Everyday Robots and dedicated machines, playing ping-pong, catching lacrosse balls, building blocks, opening refrigerator doors, and more. and learned to behave “politely” in the same room. universe as a human.

An interesting challenge facing robotics is that language models are not inherently based on the physical world. They’re trained on a huge text library, but the text library doesn’t interact with the environment and they don’t have to worry too much about things going wrong. It’s kind of funny that when you ask Google to show you the nearest coffee shop, Maps mistakenly shows him 45 days of hiking and his 3 days of lake swimming. In the real world, stupid mistakes have real consequences.

For example, if you see the prompt “I spilled my drink, can you help me?” the language model GPT-3 responds: It makes sense: for some clutter, vacuuming is a good choice, and it’s no surprise that the language model associates vacuuming with cleaning. If a robot actually did that, it could fail. Vacuum cleaners are not suitable for spilled drinks and do not mix water and electronics. So at best your vacuum will break, or at worst your appliance could catch fire.

Google’s PaLM-SayCan enabled robots are placed in a kitchen setting and trained to become better at different aspects of helping in the kitchen. When given instructions, the robot tries to make decisions. “How likely is it that I’ll be successful at what I’m trying to do?” and “How likely is this thing to help?” getting smarter

Affordances, the ability to do something, are not an either/or. Balancing three golf balls on top of each other is very difficult, but not impossible. Opening a drawer is almost impossible for a robot that is not shown how the drawer works. work. As suggested by Google, an untrained robot may not be able to grab a bag of potato chips from a drawer. But a few instructions and a few days of practice can greatly increase your chances of success.

Of course, all this training data is scored as the robot tries. Sometimes robots “solve” tasks in surprising ways, but in reality it may be “easier” for them to do so.

By decoupling the language model from affordances, robots can “understand” commands in different languages. The team has also demonstrated it in the kitchen. Head of robotics, he is when Vincent Vanhoucke asks the robot for a can of Coke in French. “We learned language skills for free,” the team said, and the neural networks used to train robots were enough to open new doors (both literally and figuratively) in accessibility and universal access. I emphasized that I have a lot of flexibility.

Robots and technology are neither currently available nor intended for use in commercial products.

“It’s all research right now. As you can see from our current skill level, we’re not quite ready to deploy it in a commercial environment yet. We’re a research outfit and love working on things that don’t work,” Vanhoucke says. says. “That’s kind of the definition of research, and we keep pushing. We like to work on things that don’t have to scale. It’s a way of letting us know how we scale, so we can see trends for where things are going in the future.”

It will take some time for Google’s robotics lab to grasp what the long-term commercial implications of its experiments will be, but even the relatively simple demo shown in Mountain View last week , natural language processing and robotics both win as teams at Google build deeper skills, knowledge and vast datasets on how to train robots.

