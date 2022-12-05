



Some thoughts on how LLM might (and wouldn’t) interfere with Google.

I asked a more generalized version of this question last year on how to beat a Google search. The funny thing is that I only wrote about his open source GPT-3 model (GPT-J) two days ago, and the two have nothing to do with each other. But now, as LLMs have become more sophisticated, more and more people are using prompts to query specific knowledge.

Why can’t Google do this? Much of the AI ​​research behind LLM originated at Google. There is no lack of talent or sophistication when it comes to this technology. Here are some of the reasons why Google is confused by the AI ​​it helped create.

Innovator’s dilemma.

LLM changes the nature of searches to significantly reduce the ads you see. What if a search engine results page (SERP) doesn’t make sense for a set of queries?Google launched his Knowledge Graph (a box that summarizes information for a query) in 2012. Chrome also answers some queries for the omnibox itself. If LLMs dramatically increased the type and number of these queries, it could have a significant impact on SERP real estate and thus advertising revenue.

Wikipedia probably hit its traffic growth around the time the Knowledge Graph became popular. LLM probably threatens information-rich sites like Wikipedia more than Google.

It’s also possible that a startup can gain distribution by offering a small service before unit economics makes sense. OpenAI could afford to lose $0.001 per query in this phase. Google couldn’t. I believe the cost of inference will drop dramatically in the future as inference is optimized.

2. Reputational risk.

OpenAI allows anyone to query their model. Stability AI has open-sourced its models and weights. Meta briefly launched Cicero before shutting down. Google publishes a paper reflecting all the development of LLM, but nothing for people to play with. why?

There is a significant reputational risk for Google to allow public access to models that may output racist, biased, or offensive output (after all, these models are internet data). trained). Companies with little to lose have more freedom to launch these models and gain distribution.

Replacing an existing product with an LLM even carries significant reputational risk. Consumers trust Google to deliver relevant results. A few bad results or a search that takes too long can seriously undermine decades of user trust in Google. LLMs confidently hallucinate information and present it as fact. LLM failure modes are not fully understood, but outputs are still unpredictable.

Objection.

Distribution often trumps the product. Google does not know about hardware (Pixels, Chromebooks), operating systems (Android, ChromeOS), web browsers (Chrome, Chromium), and other distribution paths (such as the default iOS search, Google/Apple agreements for Maps, etc.). , we have the best internet distro you can ask for. .). How much better is GPT-3 than Google Search?

Google has the best access to its own large datasets and compute. GPT-3 was primarily trained using non-proprietary data (the commoditization of large language models), while other models such as GitHub Copilot rely on specific data.

