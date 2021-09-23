



Posted by: Jing Yu Koh, Research Engineer, Peter Anderson, Senior Research Scientist, Google Research

When people navigate unfamiliar buildings, they use many visual, spatial, and semantic cues to help them reach their goals efficiently. For example, even in an unfamiliar home, looking at the dining area allows you to make intelligent predictions about possible locations in the kitchen and lounge areas, and thus expected locations for common household items. It is difficult for robot agents to take advantage of semantic clues and statistical regularity in new buildings. A typical approach is to implicitly learn end-to-end what these clues are and how to use them for navigation tasks through model-free reinforcement learning. However, navigation queues learned this way are expensive to learn, difficult to test, and difficult to reuse with another agent without re-learning from scratch.

People moving through unfamiliar buildings can use visual, spatial, and semantic cues to predict where they are going around the corner. The computational model with this feature is a visual world model.

An attractive alternative to robot navigation and planning agents is to use world models to encapsulate a wealth of meaningful information about their surroundings. This allows the agent to make specific predictions about the feasible outcomes in the environment. Such models have shown widespread interest in robotics, simulation, and reinforcement learning, including discovering the first known solutions for simulated 2D car racing tasks and achieving human-level performance in Atari games. , Impressive results have been obtained. However, the gaming environment is still relatively simple compared to the complexity and variety of the real world.

The Pathdreamer: World Model for Indoor Navigation, published in ICCV 2021, uses only limited seed observations and proposed navigation trajectories to generate high-resolution 360 visual observations of areas of buildings invisible to agents. Introducing the world model. .. As shown in the video below, the Pathdreamer model synthesizes an immersive scene from a single point of view, when the agent moves to a new point of view, or to a completely invisible area, such as around a corner. You can predict what you will see in. Beyond potential applications in video editing and photo realization, solving this task promises to systematize knowledge of the human environment and benefit robot agents navigating the real world. increase. For example, a robot whose mission is to find a particular room or object in an unfamiliar building may use a world model to perform a simulation before physically searching for somewhere. Can be identified. You can also increase the amount of training data for agents by training the agents in the model using a world model such as Pathdreamer.

Pathdreamer provides a single observation (RGB, depth, and segmentation) and the proposed navigation trajectory as input, and synthesizes high-resolution 360 observations up to 6-7 meters away from the original location, including the corners. .. See the entire video for other results.

How does Pathdreamer work? Pathdreamer takes a sequence of one or more previous observations as input and produces a prediction of the trajectory of a future location. It may be provided in advance or repeatedly by an agent that interacts with the returned observations. Both inputs and predictions consist of RGB, semantic segmentation, and depth images. Internally, Pathdreamer uses a 3D point cloud to represent surfaces in the environment. Points in the cloud are labeled with both RGB color values ​​and semantic segmentation classes such as walls, chairs, and tables.

To predict visual observations at the new location, the point cloud is first reprojected in 2D at the new location to provide a “guidance” image. From this image, Pathdreamer produces realistic high resolution RGB, semantic segmentation and depth. When the model “moves”, new observations (actual or predicted observations) are accumulated in the point cloud. One of the benefits of using point clouds for memory is temporal consistency. The revisited area is rendered in a way consistent with previous observations.

Internally, Pathdreamer represents a surface in the environment through a 3D point cloud that contains both semantic labels (top) and RGB color values ​​(bottom). To generate new observations, Pathdreamer “moves” through the point cloud to a new location and uses the reprojected point cloud image as guidance.

Pathdreamer works in two stages to convert the guidance image into a plausible and realistic output. The first stage, the structure generator, creates the segmentation and depth images, and the second stage, the image generator, renders them to RGB output. Conceptually, the first stage provides a plausible high-level semantic representation of the scene, and the second stage renders it into a realistic color image. Both stages are based on convolutional neural networks.

Pathdreamer works in two stages. The first stage, the structure generator, creates images of segmentation and depth, and the second stage, the image generator, renders them to RGB output. The structure generator conditions noise variables so that the model can synthesize diverse scenes in areas of high uncertainty.

Diverse generation results Various scenes are possible in areas of high uncertainty, such as around corners and invisible rooms. Incorporating ideas from probabilistic video generation, Pathdreamer’s structure generator conditioned on noise variables that represent probabilistic information about the next location that is not captured in the guidance image. Pathdreamer synthesizes different scenes by sampling multiple noise variables, allowing the agent to sample multiple plausible results for a particular trajectory. These diverse outputs are reflected not only in the first stage output (semantic segmentation and depth image), but also in the generated RGB image.

Pathdreamer can generate multiple diverse and plausible images for areas of high uncertainty. The guidance image in the leftmost column represents the pixels that the agent has previously seen. Black pixels represent previously invisible areas where Pathdreamer renders a variety of outputs by sampling multiple random noise vectors. In reality, the generated output is signaled by new observations as the agent navigates the environment.

Pathdreamer is trained in reconstructing images from Matterport 3D and 3D environments, allowing you to combine realistic images with continuous video sequences. The high resolution and 360 degree output image makes it easy to convert existing navigation agents for use in any camera field of view. To learn more and try Pathdreamer yourself, it’s a good idea to check out the open source code.

Applying to Visual Navigation Tasks As a visual world model, Pathdreamer shows powerful potential to improve the performance of downstream tasks. To demonstrate this, apply Pathdreamer to Vision-and-Language Navigation (VLN) tasks. This task requires the embodied agent to follow natural language instructions to move to a realistic 3D environment location. Use the Room-to-Room (R2R) dataset to simulate many navigable trajectories possible in your environment, rank each for navigation instructions, and best rank to execute. By selecting the attached trajectory, the command tracking agent performs a pre-planned experiment. .. Three settings are considered. In the Ground-Truth configuration, agents plan by interacting with the real environment, that is, by moving. In baseline settings, agents plan ahead without moving by manipulating navigation graphs that encode navigable routes within the building, but do not provide visual observation. In the Pathdreamer configuration, the agent pre-plans without moving by manipulating the navigation graph and also receives the corresponding visual observations generated by Pathdreamer.

With the three steps planned in advance, the VLN agent achieves a navigation success rate of 50.4% in the Pathdreamer setting, which is significantly higher than the 40.6% success rate in the baseline setting without Pathdreamer. This suggests that Pathdreamer encodes useful, accessible visual, spatial, and semantic knowledge of the real indoor environment. As an upper bound on the performance of the perfect world model, the Ground-Truth setting (planned by movement) has an agent success rate of 59%, but this setting gives the agent considerable time and resources to physically explore. Please note that you have to spend. Many orbits. This can be exorbitantly costly in the actual setup.

Use the Room-to-Room (R2R) dataset to evaluate some planning settings for the agent that follow the instructions. Pre-planning with a navigation graph with corresponding visual observations synthesized by Pathdreamer (Pathdreamer setting) is more effective than pre-planning with a navigation graph alone (baseline setting). And gain about half the benefits of pre-planning with a world model that perfectly matches reality (ground truth setting).

Conclusions and future work These results show the possibility of using world models such as Pathdreamer for complex embodied navigation tasks. We hope that Pathdreamer will help unlock model-based approaches to challenging embodied navigation tasks such as navigation to specified objects and VLNs.

Applying Pathdreamer to other embodied navigation tasks such as Object-Nav, continuous VLN, and street-level navigation is a natural direction for future work. We also envision further research into improving the architecture and modeling direction of Pathdreamer models, as well as testing on more diverse datasets, including but not limited to outdoor environments. For more information on Pathdreamer, visit the GitHub repository.

Acknowledgments This project is a collaboration with Jason Baldridge, Honglak Lee and Yinfei Yang. Thanks to Austin Waters, Noah Snavely, Suhani Vora, Harsh Agrawal, David Ha and others for their feedback throughout the project. We also thank the Google Research team for their general support. Finally, thanks to Tom Small for creating the animation for the third figure.

Sources 1/ https://Google.com/ 2/ http://ai.googleblog.com/2021/09/pathdreamer-world-model-for-indoor.html The mention sources can contact us to remove/changing this article

What Are The Main Benefits Of Comparing Car Insurance Quotes Online

LOS ANGELES, CA / ACCESSWIRE / June 24, 2020, / Compare-autoinsurance.Org has launched a new blog post that presents the main benefits of comparing multiple car insurance quotes. For more info and free online quotes, please visit https://compare-autoinsurance.Org/the-advantages-of-comparing-prices-with-car-insurance-quotes-online/ The modern society has numerous technological advantages. One important advantage is the speed at which information is sent and received. With the help of the internet, the shopping habits of many persons have drastically changed. The car insurance industry hasn't remained untouched by these changes. On the internet, drivers can compare insurance prices and find out which sellers have the best offers. View photos The advantages of comparing online car insurance quotes are the following: Online quotes can be obtained from anywhere and at any time. Unlike physical insurance agencies, websites don't have a specific schedule and they are available at any time. Drivers that have busy working schedules, can compare quotes from anywhere and at any time, even at midnight. Multiple choices. Almost all insurance providers, no matter if they are well-known brands or just local insurers, have an online presence. Online quotes will allow policyholders the chance to discover multiple insurance companies and check their prices. Drivers are no longer required to get quotes from just a few known insurance companies. Also, local and regional insurers can provide lower insurance rates for the same services. Accurate insurance estimates. Online quotes can only be accurate if the customers provide accurate and real info about their car models and driving history. Lying about past driving incidents can make the price estimates to be lower, but when dealing with an insurance company lying to them is useless. Usually, insurance companies will do research about a potential customer before granting him coverage. Online quotes can be sorted easily. Although drivers are recommended to not choose a policy just based on its price, drivers can easily sort quotes by insurance price. Using brokerage websites will allow drivers to get quotes from multiple insurers, thus making the comparison faster and easier. For additional info, money-saving tips, and free car insurance quotes, visit https://compare-autoinsurance.Org/ Compare-autoinsurance.Org is an online provider of life, home, health, and auto insurance quotes. This website is unique because it does not simply stick to one kind of insurance provider, but brings the clients the best deals from many different online insurance carriers. In this way, clients have access to offers from multiple carriers all in one place: this website. On this site, customers have access to quotes for insurance plans from various agencies, such as local or nationwide agencies, brand names insurance companies, etc. "Online quotes can easily help drivers obtain better car insurance deals. All they have to do is to complete an online form with accurate and real info, then compare prices", said Russell Rabichev, Marketing Director of Internet Marketing Company. CONTACT: Company Name: Internet Marketing CompanyPerson for contact Name: Gurgu CPhone Number: (818) 359-3898Email: [email protected]: https://compare-autoinsurance.Org/ SOURCE: Compare-autoinsurance.Org View source version on accesswire.Com:https://www.Accesswire.Com/595055/What-Are-The-Main-Benefits-Of-Comparing-Car-Insurance-Quotes-Online View photos