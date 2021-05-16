



Imagine a collection of perhaps millions, or even billions, of books accidentally thrown into a pile of fields by a publisher. Every day, the mountains grow exponentially.

Those books are full of knowledge and answers. But how do seekers find them? Books are useless because there is no organization.

This is an unfiltered, glorious raw internet. As a result, most online awareness quests begin with Google (of course, there are other search engines as well). Google’s algorithmic tentacles scan and index every book in its ungodly mountain. When someone enters a query in the search bar, the search algorithm looks up the indexed version of the internet, displays the page, and displays them in a ranked list of top hits.

This approach is very convenient. In fact, it’s so useful that it hasn’t changed radically for over 20 years. But now, Google’s AI researchers, who set the criteria for search engines first, are sketching blueprints for what might come next.

In a paper on arXiv preprint servers, the team suggests that technology that makes the Internet more searchable is at our fingertips. They state that machine learning algorithms for large language models, such as OpenAI’s GPT-3, can completely replace today’s indexing, retrieval, and ranking systems.

Is AI the search engine of the future?

When asking for information, the author writes that most people want to ask an expert for a subtle and reliable response. Instead, they google it. This can work or be terribly wrong. Like when I was panicked and sucked into a health-related rabbit hole at 2 o’clock in the morning.

Search engines display a (preferably high quality) source that contains at least part of the answer, but searchers need to scan, filter, and read the results and stitch the answers together wherever possible.

Search results have improved dramatically over the years. Still, this approach is far from perfect.

There are Q & A tools such as Alexa, Siri, and Google Assistant. However, these tools are vulnerable and have a limited (but ever-growing) repertoire of questions that can be fielded. Although they have their own drawbacks (discussed in detail below), large language models like GPT-3 are much more flexible and new responses in natural language to any query or prompt. Can be built.

The Google team suggests that the next generation of search engines could integrate the best of all the world and integrate today’s top information retrieval systems into large-scale AI.

It’s worth noting that machine learning is already working with classic index and ranking search engines. However, the authors suggest that machine learning can completely replace the system, rather than simply extending it.

“What if we completely removed the concept of indexes and replaced them with a large, pre-trained model that efficiently and effectively encodes all the information contained in the corpus?” Donald Metzeler and co-authors I am writing in a paper. “What if the distinction between search and ranking disappears and there is a single response generation phase instead?”

One of the ideal results they envision is a bit like the computer of the Spacecraft Enterprise in Star Trek. Information seekers ask questions and the system answers in conversation. That is, the natural language response you would expect from an expert, and the answer would include a credible citation.

In this treatise, the author sketches what they call an ambitious example of what this approach actually looks like. The user asks, “What are the health benefits of red wine?” The system returns clear, prose and subtle answers from multiple authoritative sources. In this case, WebMD and Mayo Clinic highlight the potential benefits and risks of drinking red wine.

But it doesn’t have to end there. The author has another advantage of large language models that they can learn many tasks with small adjustments (this is known as one-shot learning or multi-shot learning). Therefore, they may be able to perform all the same tasks that current search engines accomplish, and even dozens more.

Still just a vision

Today, this vision is out of reach. The large language model is what the author calls “dilettantes”.

Algorithms like GPT-3 can produce prose that is almost indistinguishable from human-written text, but they still tend to give nonsensical replies. To make matters worse, they inadvertently reflect the bias embedded in the training data, lack a sense of contextual understanding, and cannot cite sources to justify their answers (or high quality and low). You can’t even separate quality sources).

“They are perceived to know a lot, but their knowledge is deep in the skin,” the author writes. This paper also describes the breakthroughs needed to fill the gap. In fact, many of the challenges they outline apply to the entire field.

An important advance is to move from an algorithm that models only the relationships between terms (such as individual words) to an algorithm that also models the relationships between words in an article and the entire article, for example. It also models the relationships between different articles on the Internet.

Researchers also need to define what constitutes a quality response. This in itself is not an easy task. But first, the authors suggest that high-quality responses should be reliable, transparent, unbiased, accessible, and include diverse perspectives.

Even today’s state-of-the-art algorithms are not approaching this standard. And it is unwise to deploy natural language models on this scale until they are resolved. However, search engines are not the only application that will benefit if there is work that has already been done to resolve some of these challenges.

“Early Gray, Hot”

It’s a fascinating vision. Searching for answers and searching web pages while trying to determine what you can trust and what you can’t trust can be exhausting.

Undoubtedly, many of us do not work as we can or should.

However, it is also worth guessing how the Internet accessed in this way will change the way people contribute.

If the author consumes information primarily by reading the prose response synthesized by the algorithm rather than opening and reading individual pages, does the author publish as many works? And how do Google and other search engine makers essentially reward creators who are creating information that trains the algorithm itself?

Many people are still reading the news, and in such cases the search algorithm needs to provide a list of articles. However, I think that subtle changes will occur when the amount added by small creators is small. Doing so reduces the amount of information on the Web and weakens the algorithms that depend on that information.

There is no way to know. In many cases, speculation is rooted in today’s problems and later turns out to be innocent. In the meantime, work will definitely continue.

Perhaps we will solve further problems when these challenges arise, and in the process we will reach a fun, talkative Star Trek computer that we have long imagined, knowing everything.

Image Credit: JD X / Unsplash

