Google announced a new approach using Large Language Models (LLM). It shows how robots can create their own code based on instructions from humans.

The latest work builds on Google’s PaLM-SayCan model for robots to understand unlimited human prompts and respond rationally and safely in physical space. It also builds on OpenAI’s GPT-3 LLM and related work for automatic code completion like the Copilot feature on GitHub.

“What if a robot could autonomously write its own code to interact with the world when given instructions from a human?” said the Google researchers. The latest generation of language models, such as PaLM, are capable of complex inference and have been trained on millions of lines of code, he said. “Given natural language instructions, our current language model is very good at writing not only generic code, but also code that can control robot behavior.”

Also, Google has made a strong commitment to use AI to support more languages.

Google Research calls its new development “code as policy”, arguing that code-writing LLMs can be reused to create robot policy code in response to natural language commands.

“Given some sample language commands (in the form of comments) as input followed by the corresponding policy code (via a few prompts), LLM will incorporate the new commands and autonomously reconfigure the API calls. and generate a new policy code for each,” said the Google researchers. Code as Policies: Language Model Programs for Embodied Control.

In the given example, the user would say “stack the blocks into an empty bowl” or “place the blocks horizontally near the top of the 2D boundary of the square”. A program generated by Google’s language model then writes code in Python to tell the robot exactly what to do with your voice commands. It relies on Python programming constructs, but also uses libraries such as Shapely. In that case, it is used for spatial geometric reasoning.

The improvement Google claims is that language models may be better suited for this task than directly learning a robot task and outputting natural language actions.

“CaP extends our previous work, PaLM-SayCan, by allowing the language model to use the full representation of generic Python code to complete more complex robotic tasks. We propose to write robot code directly in a few shots using ,” said Google Research.

In addition to generalizing to new instructions, Google says the model can transform precise values, such as speed, based on vague descriptions such as “faster” or “to the left.” CaP also supports instructions in languages ​​other than English and emojis.

According to Google, the model can write code that tells the robot to push blocks of different colors onto a 2D square, but it lacks a 3D reference, so it’s more like “building a house with blocks”. Complex instructions cannot be converted. .

Also, while CaP makes robots more flexible, it also introduces potential risks because “the synthesized program (unless manually checked at each runtime) can cause unintended behavior on the physical hardware.” It will happen,” he warned.

