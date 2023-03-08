



Enlarge / Demonstration video of a robotic arm controlled by PaLM-E reaching for a bag of chips.

Google research

On Monday, a group of AI researchers from Google and the Technical University of Berlin unveiled PaLM, a multimodal embodied visual language model (VLM) with 562 billion parameters that integrates vision and language for robot control. -E announced. They claim it is the largest of his VLM ever developed and can perform a wide variety of tasks without the need for retraining.

According to Google, given a high-level command such as “bring me a rice ball from the drawer,” PaLM-E generates an action plan for a mobile robotic platform with an arm (developed by Google Robotics). , can be run. action itself.

PaLM-E does this by analyzing data from the robot’s camera without the need for a preprocessed scene representation. This eliminates the need for humans to preprocess and annotate data, enabling more autonomous robot control.

In a demo video provided by Google, PaLM-E “brings rice chips out of the drawer” and does it. It incorporates multiple planning steps and visual feedback from the robot’s camera.

In a demo video provided by Google, PaLM-E “brings rice chips out of the drawer” and does it. It incorporates multiple planning steps and visual feedback from the robot’s camera.

They are also resilient and able to react to their environment. For example, the PaLM-E model can guide a robot to retrieve a bag of chips from the kitchen. With PaLM-E integrated into the control loop, it can withstand interruptions that may occur during the task. In the video example, the researcher grabs the chip from the robot and moves it, while the robot finds the chip and grabs it again.

advertisement

In another example, the same PaLM-E model autonomously controls a robot through a complex sequence of tasks that previously required human guidance. A research paper from Google describes how PaLM-E transforms instructions into actions.

We demonstrate the performance of PaLM-E on challenging and diverse mobile manipulation tasks. We mainly follow the setup of Ahn et al. (2022), robots must plan a sequence of navigation and manipulation actions based on human instructions. For example, given the instruction “I spilled my drink, can you get me something to clean it up?” You need to plan a sequence of . 4. Place the sponge on the user. Inspired by these tasks, we develop his three use cases of affordance prediction, obstacle detection, and long-term planning to test PaLM-E’s embodied reasoning abilities. The low-level policy is from RT-1 (Brohan et al., 2022), a transformation model that takes an RGB image and natural language instructions and outputs end effector control commands.

PaLM-E is a predictor of the next token and is based on Google’s existing Large Language Model (LLM) called ‘PaLM’ (similar to the technology behind ChatGPT), hence ‘PaLM- called E. Google “carved” PaLM by adding sensory information and robotic control.

Based on language models, PaLM-E takes continuous observations such as images or sensor data and encodes them into a set of vectors of the same size as the language tokens. This allows the model to “understand” sensory information in the same way it processes language.

A Google-provided demo video showing the robot being guided by the PaLM-E following the instruction “Bring me a green star”. The researchers say the green star is “an object that this robot is not directly exposed to.”

A Google-provided demo video showing the robot being guided by the PaLM-E following the instruction “Bring me a green star”. The researchers say the green star is “an object that this robot is not directly exposed to.”

In addition to the RT-1 Robotics Transformer, PaLM-E draws from Google’s previous work on the ViT-22B, a Vision Transformer model revealed in February. ViT-22B has been trained on a variety of visual tasks, including image classification, object detection, semantic segmentation, and image captioning.

advertisement

Google Robotics is not the only research group working on robotic control using neural networks. This particular work is similar to Microsoft’s recent paper “ChatGPT for Robotics”. In this paper, we experimented with a similar combination of visual data and a large language model for robot control.

Aside from robotics, Google researchers observed some interesting effects that clearly stem from using a large language model as the core of PaLM-E. One indicates “active transmission”. In other words, knowledge and skills learned from one task can be transferred to another, resulting in “significantly higher performance” compared to single-task robot models.

They also observed trends in model scale. “The larger the language model, the better it preserves its language features when quantitatively training visual language and robotics tasks. The 562B PaLM-E model retains nearly all of its language features.”

PaLM-E is the largest VLM ever reported. We observe novel features such as multimodal train-of-thought reasoning and multi-image reasoning, despite being trained on only a single image prompt. Although not the focus of our work, PaLM-E sets a new SOTA on the OK-VQA benchmark. pic.twitter.com/9FHug25tOF

Danny Dries (@DannyDriess) March 7, 2023

The researchers then found that PaLM-E can be used for multimodal thought-chain reasoning (allowing the model to analyze a set of inputs containing both verbal and visual information) and multi-image inference (using multiple images as inputs). , inference or prediction, even though it was trained on only a single image prompt). In that sense, PaLM-E seems to continue a surprising trend that emerges as deep learning models become more complex over time.

Google researchers plan to investigate more applications of PaLM-E in real-world scenarios such as home automation and industrial robotics. And they hope PaLM-E will inspire more research on multimodal reasoning and embodied AI.

“Multimodal” is a buzzword we hear more and more as companies reach for artificial general intelligence that can perform common ostensibly human-like tasks.

Sources 1/ https://Google.com/ 2/ https://arstechnica.com/information-technology/2023/03/embodied-ai-googles-palm-e-allows-robot-control-with-natural-commands/ The mention sources can contact us to remove/changing this article

What Are The Main Benefits Of Comparing Car Insurance Quotes Online

LOS ANGELES, CA / ACCESSWIRE / June 24, 2020, / Compare-autoinsurance.Org has launched a new blog post that presents the main benefits of comparing multiple car insurance quotes. For more info and free online quotes, please visit https://compare-autoinsurance.Org/the-advantages-of-comparing-prices-with-car-insurance-quotes-online/ The modern society has numerous technological advantages. One important advantage is the speed at which information is sent and received. With the help of the internet, the shopping habits of many persons have drastically changed. The car insurance industry hasn't remained untouched by these changes. On the internet, drivers can compare insurance prices and find out which sellers have the best offers. View photos The advantages of comparing online car insurance quotes are the following: Online quotes can be obtained from anywhere and at any time. Unlike physical insurance agencies, websites don't have a specific schedule and they are available at any time. Drivers that have busy working schedules, can compare quotes from anywhere and at any time, even at midnight. Multiple choices. Almost all insurance providers, no matter if they are well-known brands or just local insurers, have an online presence. Online quotes will allow policyholders the chance to discover multiple insurance companies and check their prices. Drivers are no longer required to get quotes from just a few known insurance companies. Also, local and regional insurers can provide lower insurance rates for the same services. Accurate insurance estimates. Online quotes can only be accurate if the customers provide accurate and real info about their car models and driving history. Lying about past driving incidents can make the price estimates to be lower, but when dealing with an insurance company lying to them is useless. Usually, insurance companies will do research about a potential customer before granting him coverage. Online quotes can be sorted easily. Although drivers are recommended to not choose a policy just based on its price, drivers can easily sort quotes by insurance price. Using brokerage websites will allow drivers to get quotes from multiple insurers, thus making the comparison faster and easier. For additional info, money-saving tips, and free car insurance quotes, visit https://compare-autoinsurance.Org/ Compare-autoinsurance.Org is an online provider of life, home, health, and auto insurance quotes. This website is unique because it does not simply stick to one kind of insurance provider, but brings the clients the best deals from many different online insurance carriers. In this way, clients have access to offers from multiple carriers all in one place: this website. On this site, customers have access to quotes for insurance plans from various agencies, such as local or nationwide agencies, brand names insurance companies, etc. "Online quotes can easily help drivers obtain better car insurance deals. All they have to do is to complete an online form with accurate and real info, then compare prices", said Russell Rabichev, Marketing Director of Internet Marketing Company. CONTACT: Company Name: Internet Marketing CompanyPerson for contact Name: Gurgu CPhone Number: (818) 359-3898Email: [email protected]: https://compare-autoinsurance.Org/ SOURCE: Compare-autoinsurance.Org View source version on accesswire.Com:https://www.Accesswire.Com/595055/What-Are-The-Main-Benefits-Of-Comparing-Car-Insurance-Quotes-Online View photos