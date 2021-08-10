



What does the future of Internet search look like? Google imagines it to look like a casual conversation with a friend.

Google’s search engine has been online for over 20 years, and the technology behind it is constantly evolving. Recently, the company announced a new artificial intelligence system called MUM, which stands for multitasking integrated model. MUM is designed to capture the subtle nuances of human language on a global scale. This makes it easier for users to find the information they are looking for and to ask more abstract questions.

Google has already used MUM in an independent task to learn more about the different ways people refer to the COVID vaccine, but states that the new technology is not yet part of the search system. Currently, there is no schedule for when features will be rolled out in Live Search, but the team is actively working on developing other one-off tasks to complete the MUM.

Here’s what you need to know about what MUM is and how it differs from its predecessor.

Solve the COVID vaccine name game

When the vaccine became available earlier this year, Google’s Vice President of Search, Pandu Nayak, and his colleagues provided people with information about where to get the COVID vaccine, how it works, and where users can find it when they search. Designed an experience to do. This experience patchworked all this important and relevant information together and pinned it to the top of the first page of search results. But first, the team had to program it, so it only popped up if the query was about the COVID vaccine. This can be a problem as people around the world may refer to the COVID vaccine in different ways and with different names.

Last year, the team spent hundreds of hours exploring resources and identifying all the different names for COVID itself. But this year they had MUM. According to Nayak, he was able to set up a very simple experiment using MUM to generate over 800 names for 17 vaccines in 50 different languages ​​in seconds. There are many language tasks that need to be resolved, such as classification, ranking, and information extraction. In the short term, we plan to use MUM to improve each. It does not lead to new features or new experiences. Rather, existing features and existing experiences work much better.

Meet MUM on Google I / O

I first heard about MUM at the Spring Google I / O Developers Conference when Google’s Senior Vice President, Prabhakar Raghavan, announced it.

The new technology is a natural evolution of machine learning-based search that Google has improved and changed over the last decade. Google is proud that MUM can acquire in-depth knowledge of the world, understand and generate languages, and train in 75 languages ​​at a time. There are also internal pilots who test whether they can understand multimodal, that is, various forms of information such as text, images, and videos at the same time.

All this complexity can be explained by the simple examples described in meetings and blog posts. If you asked Google, I hiked the mountains. I want to hike Adams and Imayama. What should I do next fall, Fuji? This is the type of search query that most people don’t mind typing today. Users understand that it is generally not a way to search for information online.

This is a question you casually ask a friend, but today’s search engines have so much conversational and subtle nuances that they can’t be answered directly, Raghavan explained in I / O. But ideally, MUM understands you trying to compare the two mountains, and also understands that preparations can include terrain fitness training, autumn weather hiking equipment, etc. increase. It will be able to analyze your question, break it down into a series of questions, learn about each aspect of your problem, and undo it. Users can click to see the details of the search results related to each aspect of the question, or get comprehensive text explaining how they answered the original query.

Such experience is a long-term goal for MUM engineers, and the time it takes to reach that goal is not yet clear. Conversely, in the medium term, Google engineers are training MUM to recognize word-image relationships and make them work. Nayak says it did a very good job when asked to generate a new text image for MUM, like Siberian Husky.

A brief history of search

Since its inception in 1998, Google has continuously mapped the web, collected an overwhelming amount of content, and created an index to organize all the information.

You can think of the Google Search Index as acting like the index on the back of a book. Shows all pages where a particular word appears. With the exception of the Internet, there are two important differences. One is that a book can have 300-1,000 pages. This is modest compared to the trillions of pages on the web. The second important difference is that the index on the back of the book searches for one word at a time, while the Web searches for word combinations. According to Nayak, billions of queries are received daily from around the world thanks to this scale and this combinatorial explosion. And the fact to note here is that 15% of the searches we get every day are searches we’ve never seen before. The query stream has an incredible novelty.

Part of the novelty comes from a new way of misspelling words, Nayak adds. Part of that is because the world is constantly changing and there are new (and sometimes very specific) things that people want.

To narrow down all possible web information to those that are really relevant to the query, Google uses algorithms to use factors such as freshness, location, and how various pages are linked, and considers it to be the most useful page. Rank the information at the top. To each other. By far, the most important class factor is related to language comprehension, says Nayak. Language comprehension is really the center of search. You need to understand the meaning of the query, so you need to understand the meaning of the document, and you need to understand how these two match.

Of course, software, including its subtle nuances, cannot truly understand a language like we do. However, programmers can develop various strategies that try to estimate how we understand the language. Just 16 years ago, Google built the first version of the synonym system. This explains the fact that different words have different meanings in different contexts. Therefore, changing means adjusting when talking about laptop brightness. If you don’t understand this, many related pages will be excluded from the search results due to different word choices.

Then, about 10 years ago, the company created a knowledge graph. The idea behind it was that words in queries and documents could mean something more than just a stream of letters, when referring to people, places, or things in the world. Nayak explains that if you don’t understand a reference to what a particular string means, you don’t fully understand what that word means. Entities such as people, places, things, and companies are put into the database, and the knowledge graph links the relationships between them. It also provides a brief summary of the quick facts you need to know about entities such as celebrities and landmarks.

For example, if you search for Marie Curie, Google’s knowledge graph will tell you when and where she was born, who she married, who she was, where she attended college, and what she knows. This is a convenient way to display information other than the list of page results that Google displays after searching.

Machine learning gets hot

About six years ago, Google released the first version of machine learning-based search. It then looks at the context in which the word is used to understand its meaning, and based on accumulated research in the deep learning community on natural language algorithms that can understand which part of the context to pay attention to. It continued to improve. In 2019, Google introduced the BERT architecture for search. The training algorithm was, in effect, a series of blank-filling exercises. Use common phrases to block random words and ask the network to predict what those words are. Also known as a masked language model.

For example, can I get medicine for someone at a pharmacy? Previously, searchers got the result that they received a prescription at a pharmacy. BERT understood that not only would it receive a prescription, but it would also receive a prescription for someone else, such as a friend or family member. According to Nayak, a question that was previously unhandled had a subtle problem that resulted in more relevant results.

From now on, MUM can not only understand languages ​​like BERT, but also generate languages. By comparison, MUM is much larger and has more features than BERT (MUM is about 1,000 times more powerful, according to Google). MUM is trained in a high quality subset of the public web corpus across all different languages ​​offered by Google. The types of languages ​​that MUM learns are in some ways good (hopefully) because the search team removes poor quality content, adult content, explicit content, and malicious language. By training in all languages ​​at the same time, you can generalize information from a language with a lot of data to a language with less data, filling the gap with less data available for training.

However, Nayak acknowledges that there are definitely challenges with large language models like MUM, where the team is actively working on solutions. For example, one is the issue of bias. Since this is trained from the web corpus, there are concerns about whether it reflects or enhances the bias that exists on the web, says Nayak. The fact that you are trained in a high quality subset of the corpus is what Nayak wants and will eliminate some of the worst biases. Google will continue to use search quality evaluators and other rating processes to review the results and look for patterns in question. It does not solve all problems, but it is a significant mitigation.

MUM is built on an assembly of innovative features that Google has experimented with to improve search. Today, when people come to search, they don’t have a perfectly formed query in their heads. They will search broadly for what is happening in their lives, says Nayak. You have to take this ambiguous need you have, translate it into one or more queries that you can publish to Google, learn about the various aspects of the problem, and put it together. Hmm.

Features such as autocomplete are intended to facilitate the search process to some extent, but MUM has the potential to open up new possibilities. According to Nayak, the real question I think of all search tools is that it’s a tool, so is it useful, if not perfect?

