



A few weeks ago at Google I/O, we announced that we were opening up AI Overviews to everyone in the US.

User feedback shows that the AI ​​Summary feature encourages users to be more satisfied with their search results and ask longer, more complex questions, knowing that Google can help them. Users are using the AI ​​Summary feature as a starting point to access web content, leading to higher quality clicks to web pages and increased likelihood of users staying on the page, because we're better able to find the right information and web pages that are helpful to them.

Last week, a strange and erroneous summary (and a ton of fake screenshots) was shared on social media. We know that people trust Google Search to give them accurate information, and they're not shy to point out oddities or errors they see in rankings or other search features. Like our users, we hold ourselves to high standards, so we expect, appreciate, and take your feedback seriously.

Given the attention our AI Overview has received, we wanted to explain what happened and what steps we have taken.

How AI Overview Works

We've been building search features for years to help people find the information they're looking for as quickly as possible. AI Overview takes that a step further, designed to answer complex questions that previously required multiple searches or follow-ups, with prominent links to more information.

AI Summary works in a completely different way than chatbots and other LLM products people may have tried. It doesn't simply generate output based on training data. It's driven by a customized language model that's integrated with our core web ranking system and designed to perform traditional search tasks, like identifying relevant, high-quality results from our index. So, AI Summary doesn't just provide a text output, it also includes related links to encourage users to explore further. Because accuracy is paramount in search, AI Summary is built to only surface information that is supported by the top web results.

This means that AI Summary doesn't hallucinate or fudge facts like other LLM products. When AI Summary gives you the wrong results, it's usually for other reasons – it misinterprets your query, it misinterprets the nuances of language on the web, or there isn't much good information available (these are challenges that arise with other search features too).

This approach is highly effective: Overall, our tests show that AI Summary's accuracy rates are comparable to another popular feature, Featured Snippets in Search, which uses an AI system to identify and surface important information along with links to web content.

Strange Results

In addition to designing our AI profile to optimize accuracy, we tested the feature thoroughly before launch, including through a robust red teaming effort, evaluation with a sample of common user queries, and testing performance on a portion of our search traffic. However, nothing beats millions of people using the feature for many new searches. We also saw non-sense new searches that appeared to be intended to generate false results.

Apart from that, a ton of fake screenshots have been widely shared. Some of these fake results are obvious and ridiculous, while others suggest that the app returned dangerous results for topics like leaving dogs in the car, smoking during pregnancy, and depression. No AI synopsis was shown for these, so if you come across these screenshots, we encourage you to search and check them out for yourself.

However, we did see some strange, inaccurate, or unhelpful AI summaries. These were generally for queries that people don't typically ask, but they did highlight some specific areas that needed improvement.

One of the areas we identified was the ability to interpret nonsensical queries and satirical content. Let's look at an example: How many stones should I eat? Before these screenshots went viral, very few people would have asked Google this question. You can see it for yourself in Google Trends.

There isn't much web content that seriously explores this question either – this is often referred to as a data void or information gap, where there is limited quality content on a topic. However, in this case, satirical content on the topic happened to be republished on a geological software provider's website. So when someone typed this question into a search, they were greeted with an AI overview that dutifully linked to one of the few websites addressing this question.

In other examples, the AI ​​presented summaries of sarcastic and trolling content from discussion forums. While forums are often a great source of reliable, first-hand information, they can also lead to less useful advice, like using glue to keep cheese on pizza.

In a small number of cases, our AI summaries misinterpreted the language on a webpage and presented inaccurate information. We quickly addressed these issues through improvements to our algorithms and established processes for removing answers that don't comply with our policies.

Improvements

As with any search improvement, rather than fixing every single query, we're working on updates that will address a wider range of queries, including new ones we haven't seen yet.

Looking at examples from the past few weeks, we've identified patterns where we didn't do things right and have made over a dozen technical improvements to our systems. Here are some examples of what we've done so far:

We've strengthened our detection mechanisms for nonsensical queries where AI summaries should not be shown, restricting them from including satire or humorous content. We also updated our systems to limit the use of user-generated content in responses that may provide misleading advice. We've added additional trigger restrictions for queries where we find that AI summaries are less useful. We already have strong guardrails in place for topics like news and health. For example, we aim to not show AI summaries in hard news topics where freshness and factuality are important. In the case of health, we've started refining additional triggers to strengthen quality protections.

In addition to these improvements, we have closely monitored feedback and external reports and taken action on a small number of AI summaries that violated our content policies. This means summaries that contain potentially harmful, obscene, or otherwise violating information. We found content policy violations in less than 1 out of 7 million unique queries where an AI summary was shown.

At the scale of the web, with billions of queries every day, it's inevitable that some anomalies and errors will occur. Over the past 25 years, we've learned from these errors and have learned a lot about how to build and maintain a high-quality search experience, including how to make search more user-friendly for everyone. We'll continue to improve when and how we show our AI summaries, and provide better protections for edge cases. Your ongoing feedback is greatly appreciated.

