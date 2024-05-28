



Google's search algorithm is perhaps the most important system on the Internet, determining which sites survive and what content appears on the web. But how Google ranks websites has long been a mystery, unravelled by journalists, researchers and those involved in search engine optimization.

Now, a shocking leak of thousands of pages of internal documents seems to offer an unprecedented glimpse into how search works — and suggests that Google hasn't been completely honest about it for years. So far, Google has not responded to multiple requests for comment on the documents' legitimacy.

Rand Fishkin, who has worked in the SEO industry for more than a decade, said he received 2,500 pages of documents from a source in hopes that reports of the leak would refute lies that Google employees have leaked about how its search algorithm works. Fishkin said the documents outline Google's search API and provide a breakdown of the information available to employees.

The details Fishkin shared are dense and technical, likely easier to read for developers and SEO professionals than the general public. And the leaks don't necessarily prove that Google uses any particular data or signals that matter to search rankings. Rather, they outline what data Google collects from web pages, sites, and searchers, giving SEO professionals an indirect hint at what Google values, SEO expert Mike King wrote in his summary of the documents.

The leaked documents cover topics such as what data Google collects and uses, which sites Google prioritizes on sensitive topics like elections, and how Google treats small websites. According to Fishkin and King, some information in the documents appears to contradict public statements made by Google representatives.

“Lie” is a harsh word, but it's the only accurate word that can be used here, King writes. “While I don't necessarily blame Google representatives for trying to protect their company's proprietary information, I do take issue with the company's efforts to actively discredit people in marketing, technology, and journalism who have published reproducible findings.

Google did not respond to The Verge's requests for comment on the documents, specifically a direct request to deny their legitimacy. In an email to The Verge, Fishkin said the company is not disputing the veracity of the leaks, but that employees have asked it to change the wording of some of the posts regarding their depiction of certain events.

Google's secret search algorithm has spawned an entire industry of marketers who adhere strictly to Google's public guidelines and do it for millions of businesses around the world. This widespread and often annoying tactic has led to a poor Google search results and a common perception that they are full of junk that website owners feel they must use to get people to see their sites. In response to The Verges' past reporting on SEO-driven tactics, Google representatives have often resorted to the knee-jerk defense that Google's guidelines don't say so.

However, some details in the leaked documents call into question the accuracy of Google's public statements about how search works.

One example Fishkin and King gave was whether Google Chrome data was used at all in rankings. Google representatives have repeatedly stated that they don't use Chrome data to rank pages, but Chrome is specifically mentioned in the section on how websites appear in search. In the screenshot I captured below as an example, the links that appear below the main URL for vogue.com may have been created in part using Chrome data, the document states.

Chrome is explained in the section on how to create additional links. Image: Google

Another question is what role EEAT plays in rankings. EEAT stands for Experience, Expertise, Authority and Credibility and is a metric Google uses to evaluate the quality of search results. Google representatives have previously said EEAT is not a ranking factor. Fishkin notes that he has found very few documents that mention EEAT by name.

However, King detailed how Google does collect author data from pages, and that there is a field for whether an entity on a page is an author or not. Part of the document King shared states that the field is primarily developed and tailored for news articles, but is also populated for other content (such as scientific articles). While this does not support the byline being an explicit ranking metric, it does indicate that Google is at least tracking this attribute. Google representatives have previously argued that author bylines don't impact rankings, and are therefore something website owners should do for their readers, not for Google.

While the documents aren't conclusive evidence, they offer a detailed and revealing look into a closely guarded black box system. The US government's antitrust lawsuit against Google over search has also led to the release of internal documents that provide further insight into how the company's flagship products work.

Google's general caution about how search works has led to websites looking similar as SEO marketers try to outsmart Google based on hints provided by Google. Fishkin also accuses publications of believing Google's official claims and promoting them as truth without much further analysis.

Historically, some of the search industry's loudest and most prolific publishers have been content to uncritically repeat Google's public statements. They write headlines like “Google says XYZ is true” rather than “Fishkin says Google asserts XYZ, but the evidence suggests otherwise.” Please, do better. If this leak and the DOJ trial can produce just one change, I hope it is this.

