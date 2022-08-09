



Most of us are familiar with Chrome’s Safe Browsing feature. This feature warns you when you try to visit dangerous pages. Most of us aren’t actively looking for malware or phishing sites, but every once in a while, a Reddit link, an email, or a deep search rabbit hole takes us to a nasty place, and Chrome isn’t as likely to be. let them know that you have a sexuality. We recommend that you continue. I never gave it much thought, but I always assumed the system was working because Google was aware of the pages I was visiting through Chrome and was monitoring them based on the list. That’s partly correct, but it’s missing one important and interesting fact. The Safe Browsing system doesn’t actually tell Google what pages you’re on, so it protects your privacy a little bit.

I want to emphasize that this discussion completely avoids another very effective means of privacy invasion: the advertising boogeyman. And that’s not to say that other systems, such as tab-synced browsing history, don’t complicate the discussion of what Google knows and doesn’t know. But Safe Browsing does more than just stream a continuous list of visited URLs to Google’s servers. In fact, according to a recently published blog post describing the APIs and systems used, Chrome keeps a local list on the device for comparison. But it’s not just a list of URLs.

It may seem intuitive, but it quickly becomes a storage and resource hog: Google protects us from more pages than you might think. Or your computer keeps a list of so-called “hashes”. This is a cryptographically generated semi-unique string of letters and numbers, created from each unique URL.

Google has a workaround in the logic behind how this system works to ensure security is not easily circumvented by minor URL changes. Random changes to subdomains and subpages therefore do not allow malicious actors to make subtle adjustments. Avoid detection. Also, the list is updated approximately every 30 minutes. Furthermore, even with the full hash, the list would grow too large after a certain point, so Google actually only stores his first four truncated bits of the hash.

This is enough unique data that most secure URLs won’t be accidentally caught, but it can still be caught: “Two unique pages are actually This is a problem because two different pages can generate the same hash in this system, i.e. Chrome treats a good page as a potentially known bad page. may recognize.

A more visual representation of how Safe Browsing’s hash comparison system works.

Safe Browsing only uses this system because there are now over 1.88 billion websites and old estimates put the number of unique pages indexed by Google in the trillions You will definitely get some false positives if you do. Thankfully, there is one more step. When Google detects that a partial hash of a page matches a corresponding list of partial hashes, such as malware or adware, Chrome asks Google for all URLs corresponding to that partial hash so it can compare them locally.

This means that your browser can make a second determination locally where Google can’t see it, comparing the much longer hashed identifier of the site you’re visiting to Google’s much larger list, and finding that larger list. saves you the headache of actually saving the . It’s completely local and takes up a lot of space. This also means that Google’s servers and APIs that handle this Safe Browsing feature don’t know exactly what page you’re actually viewing. The comparison is done on-device.

Undoubtedly, Google’s servers still know which sites you may be visiting, as they may be on a newer, much smaller list, but that’s also not guaranteed. No. Google does this continuously for every resource on every page you visit, so you’re probably going to a “good” base site, but that page has dangerous It contains embeds, images, etc. that point to locations. This means a higher degree of privacy protection than URL-based browsing security systems.

The best possible solution from a privacy-first point of view is to always store a complete and complete list locally, but for resource reasons this is not feasible, so this is the second best approach and still partially obfuscates which page you are referring to. Access from Google’s Safe Browsing system.

Maintaining privacy on the internet is very difficult and this is definitely not a perfect solution, but it is a good and privacy-preserving way to implement a URL-based malware detection system. We need to make sure that our systems, cookies, third-party libraries, and all the software on the myriad of machines between you and the websites you visit do their part to respect you.

