



Google has confirmed that a massive leak of about 2,500 internal documents related to its search engine is authentic, with one expert saying the trove shows “what Google says and what it does is one thing” when it comes to the company's mysterious algorithm.

Despite its enormous influence over the flow of information, traffic and advertising revenue online, the tech giant has remained secretive about how its search engine works.

Some details appeared to contradict past public statements by Google employees about which factors are and aren't used to calculate rankings.

For example, a Google Search employee said in 2016 that the company didn’t have a website authority score.

The company also explicitly denied that it uses Chrome data for its search rankings.

However, information in the documents suggests that Google considers click-through rates, data from the Chrome web browser, website size, and a factor called domain authority (a measure of a website's importance or relevance on a particular subject) in its rankings.

Some experts said the Google documents leak was the biggest ever for the company's search algorithm. AP

The key takeaway here is that what Google says and what it does are different, Michael King, CEO of iPullRank, which published the first analysis of this trove of data, told The Washington Post.

These documents make that very clear,” King added. “We don't have the recipes that Google uses to search, but it's now very clear what the ingredients are.

Some experts, including from industry publication Search Engine Land, have noted that the documents mention modules that suggest Google could implement whitelists for certain topics, such as searches related to elections (IsElectionAuthority) and the COVID-19 pandemic (IsCovidLocalAuthority).

King said the references were likely Google's attempt to identify quality sources of information on particular subjects.

Few details have been released about how the whitelist will work, but Google has long faced accusations of exhibiting left-wing bias: A recent analysis by media company Allside found that 63% of Google News articles came from left-leaning outlets, and just 6% came from right-leaning outlets.

An analysis by the right-wing watchdog Media Research Centre detailed 41 cases of alleged election interference by the online search giant since 2008.

The report cites data from Dr. Robert Epstein, who previously testified before the Senate Judiciary Committee that biased search results generated by Google's search algorithms tipped at least 2.6 million votes in favor of Hillary Clinton.

Google confirmed the document was authentic. AFP via Getty Images

Google has long denied any bias against conservative views and has maintained that its investigation of Epstein has been widely debunked.

The leaked search documents allegedly contain more than 14,000 ranking factors that Google considers when sorting through a range of websites, from news organizations like The Washington Post to small business owners.

The internal data was reportedly published on the online code repository GitHub in March but went unscrutinized by public scrutiny until search engine optimization (SEO) experts Rand Fishkin and Michael Hill independently obtained and posted a breakdown of it.

Google tacitly acknowledged that the documents were authentic but warned that they lacked important context and should not be used by the public to gain insight into how search works.

“We want to be careful not to make inaccurate inferences about searches based on information that is out of context, out of date or incomplete,” Google spokesman Davis Thompson said in a statement.

“We are also working to share extensive information about how search works and the types of factors our system weighs, and to protect the integrity of our search results from manipulation,” the statement added.

Google warned against drawing conclusions based on the documents. Reuters

Google also warned that the documents are not a comprehensive, relevant or up-to-date view of its search ranking algorithms.

It remains to be seen whether Google has actually implemented all of the ranking factors detailed in the document, or whether it was simply testing and experimenting with them — some factors may never have been used at all.

Even if they were used, it is essentially impossible to assess how important they are in creating what users see in search results.

The document does not clarify how the ranking features will be weighted.

According to Barry Schwartz, a well-known SEO expert and owner of web consultancy RustyBrick, the leaked documents offer an interesting but incomplete look at the company's inner workings of search.

Schwartz said the documents are best viewed as an indication of what Google is thinking about online search.

“How Google handles certain factors like links, quality of content, authority, authorship — those are all included in there,” Schwartz said. The problem is, we don't know how those weightings work, how important these signals are, or if they're even being used. And that's the problem.

Still, King said the documents represent Google's biggest search leak to date.

“This is the largest, most transparent investigation into Google's capabilities that we've ever seen,” King said.

