How machine learning improves the Chrome address bar on Windows, Mac, and ChromeOS

Used billions of times every day, the Chrome address bar (we call it the Omnibox) makes it easy to search the web, quickly find tabs and bookmarks, and navigate back to the web pages you want to visit. It's a powerful tool. Previously visited or search for information.

The latest release of Chrome (M124) integrates machine learning models to power Chrome Omnibox on the desktop, making web page suggestions more accurate and relevant to you. In the future, these models will also help improve the relevance score of search suggestions. Here, we take a closer look at some of the key insights that helped our team build this integration and where we hope the new model will take us.

How we got here

As the engineering lead for the team responsible for Omnibox, every product launch feels special, but this one is truly near and dear to my heart. When we first started working on Chrome Omnibox, we asked for ideas on how we could make it better for users. The number one answer I heard was “improving the scoring system.” The problem wasn't that the score was bad. In fact, Omnibox often feels like magic in its ability to display any URL or query you want. The problem was that it lacked flexibility. A hand-built and tuned set of formulas worked well, but was difficult to improve and adapt to new scenarios. As a result, the scoring system remained largely untouched for a long time.

In most cases, ML-trained scoring models were the obvious path forward. But it took many false starts to finally get here. The reason we haven't been able to tackle this challenge for so long is that it's difficult to replace core mechanisms for functionality that is used literally billions of times every day. Software engineering projects are sometimes described as “building an airplane while you fly it.” The project felt like “replacing every seat in every airplane in the world while it's still flying.” The scale was so large that the changes were directly felt by all users.

This ambitious initiative would not have been possible without the work of such a talented and dedicated team. There were bumps in the road, walls to break through, and unforeseen issues that slowed us down, but the team was driven by a sincere belief in the impact of getting this right for users.

amazing insight

One of the fun things about using ML systems is that training takes into account all your data at a scale that would be difficult or impossible for an individual or team. And it can lead to surprising insights.

The coolest example of this phenomenon in this project was when we looked at the score curve for one particular signal: time since last navigation. The expected value of this signal is that the smaller the signal (the more recently you went to a particular URL), the greater the contribution it will make towards a higher relevance score.

And in fact, that's what the model learned. But when I looked closer, I noticed something surprising. When the time since navigation was very short (seconds instead of hours, days, or weeks), the model was decreasing the relevance score. We found that this training data reflected a pattern where users would visit a URL that wasn't what they actually wanted, then quickly return to the Chrome Omnibox and try again. In this case, the destination URL is almost certainly not what you want, so you should receive a lower relevancy score on your second attempt.

In retrospect, this is clear. And if we hadn't started ML scoring, I'm pretty sure we would have added new rules to our old system to reflect this scenario. But no one had any idea that this could be happening until the training system observed and learned this pattern.


We believe new ML models open up many new possibilities to improve the user experience by potentially incorporating new signals, such as differentiating between time of day to improve relevancy. We would like to consider versions that are specialized for training models for specific environments (such as mobile, enterprise, academic users, or different locales).

Furthermore, we have observed that the way users interact with the Chrome Omnibox changes over time, and we believe that the relevance score should also change with the user. With the new scoring system, you can now easily collect newer signals and retrain, evaluate, and deploy new models periodically over time.

Written by Justin Donnelly, Chrome Software Engineer




