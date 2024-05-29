



What's in the news: Over 2,500 pages of Google API (Application Programming Interface) documentation containing 14,014 API attributes were reportedly leaked on Github on March 27, 2024, and remained on the site until May 7. These API documents reveal what Google considers important when ranking websites in its search engine. The leak was discovered by Erfan Azimi, founder of search engine optimization (SEO) company EA Eagle Digital, and reported by SEO expert Rand Fishkin.

Based on their analysis of the leaked data, the two SEO experts said Google uses click data (including good, bad and long clicks) in systems such as NavBoost and Glue. According to testimony by Pandu Nayak, Google's vice president of search, in Google's US lawsuit, these systems help rank content that ultimately appears on the search engine results page. The data suggests that Google has ways of filtering out clicks it doesn't want to count in its ranking system and including those it does. The company also appears to measure click length and impressions.

The use of click data here is interesting, given that Google has previously denied that it factors click data into search rankings. However, it's important to note that Google has not acknowledged any data leaks. We've reached out to Google and will update this article if we hear back from the company.

Other key findings about the data breach:

Google creates sitelinks based on the most clicked URLs in Google Chrome. Sitelinks are sublinks that appear under the main sitelist. For example, if you search for MediaNama, you'll see the following sitelinks:

Fishkins' analysis of the leaked data revealed that the site links created by Google also take into account clicks on pages in the Chrome browser.

Google whitelists certain authorities and sites. If you search for travel in the leaked data, you'll find a model dedicated to quality travel sites. Fishkin argues that this suggests the company is creating a whitelist for travel sites. Similarly, one could argue that the company also has other whitelists based on the leaked code that flag local authorities related to COVID-19 and elections.

You can read Fishkin's full analysis of the breach and efforts to verify that the data belongs to Google here.

