



Google Search is often referred to as the gateway to the internet — the first place most people look for information online. But Google hasn't said much about how it organizes the internet, making search a giant black box that dictates what we know and don't know. This week, a 2,500-page leak first reported by search engine optimization (SEO) veteran Rand Fishkin revealed a 26-year-old Google Search mystery to the world.

Google's antitrust lawsuit is the best thing ever to happen to AI

“I think the biggest takeaway is that what Google representatives say and what Google's search engine does are two different things,” Fishkin said in an emailed statement to Gizmodo.

These documents provide a more detailed look at how Google Search controls the information we consume. Getting the right webpage to appear on your computer is not a passive task; thousands of editorial decisions are made on your behalf by a secretive group of Googlers. For SEO, an industry that succeeds or fails on Google's algorithms, the leaked documents are a disaster. It's like an NFL referee rewriting the rules of football mid-season and finding out about it in the middle of the Super Bowl.

Multiple SEO experts told Gizmodo that the leak lists 14,000 ranking factors that, at a minimum, provide a blueprint for how Google organizes everything on the web. These factors include Google's determination of a website's authority on a particular subject, the size of the website, the number of clicks a web page receives, and more. Google had previously denied that it uses some of these ranking factors in its searches, but the company confirmed that the documents, although incomplete, are genuine.

In an email to Gizmodo, a Google spokesperson urged people to be careful not to make inaccurate inferences about searches based on out-of-context, out-of-date, or incomplete information. Google shares extensive information about how search works and the types of factors its system emphasizes, and it also works to protect the integrity of its search results from manipulation.

Regarding Google's note, the company would not confirm what these documents get right or wrong. Google told Gizmodo that it would be a mistake to assume this is comprehensive information about search, and that revealing too much information could encourage malicious behavior. Ultimately, we don't know what is taken into account to determine these factors, or how much weight Google Search gives to each factor.

We're just looking at the different variables that are being considered, SEO expert Mike King, who first analyzed the leak, told Gizmodo in an interview. [Google] Browse the website.

The leak was first noticed by SEO expert Erfan Azimi, who found the API documentation publicly available on GitHub. It's unclear whether the documents were actually leaked or if Google accidentally published them in a quiet corner of the web. Azimi brought the documents to Fishkin last week with the aim of making them public. Fishkin asked King to try to make sense of the documents.

King noted that one of the home page ranking features, PagerankNs, suggests that the prominence of a website's home page can influence all content it publishes. Fishkin wrote that the leak refers to a system called NavBoost, which was first mentioned by Google's vice president of search, Pandu Nayak, in his Justice Department testimony. The system is said to measure clicks to boost rankings in Google searches. Many in the SEO industry see these documents as confirmation of what the industry has long suspected: websites that Google deems popular may receive higher search rankings for queries, even if lesser-known sites have better information.

In recent months, some smaller publishers have seen their Google search traffic disappear. When The Verges' Nilay Patel asked Google CEO Sundar Pichai about it last week, Pichai said he wasn't sure if this was a uniform trend. One of the ranking features King points to seems to categorize these smaller sites as a whole.

There is a feature there called smallPersonalSite, I don't know how it is used, but it is [Google] King said they're trying to figure out if these are smaller sites, since a lot of them are going under right now. [Google] They're doing nothing to counterbalance the signals of these big brands.

Notably, Pichai later said in an interview with The Verge that Google has also funneled traffic to smaller sites at times. These ranking features could be indicative of levers Google could use. As more national media organizations license their content to ChatGPT, Google Search also appears to be biased toward larger publishers. Overall, this could have a stifling effect, compressing what most people hear to just mainstream media organizations.

The impact of the leaked Google documents has been widespread. Kristen Ruby, CEO of Ruby Media Group, who has worked in digital PR and SEO for over 15 years, told Gizmodo that she received an ominous email on Monday night: “Something big is going to happen to Google tomorrow.”

Ruby quickly spotted the leak and noted that two ranking features stood out to her: isElectionAuthority and isCovidLocalAuthority. These features appear to be ways that Google ranks the trustworthiness of webpages that provide relevant information about elections and COVID-19, respectively. In 2019, Ruby wrote at length about how Google's assessment of trustworthy webpages (which Google calls EEAT, an acronym for Experience, Expertise, Authority, and Trust) is inherently political. She noted that Google's assessment of these factors tends to be politically biased:

What's troubling to me is that Google doesn't provide context for key items in its data like isElectionAuthority and isCovidLocalAuthority. How does Google define these important domain authorities? In an emailed statement, Luby said, “We shouldn't have to guess what the answers are. Google should be upfront about them.”

While Google is a company and has a right to personal information, Luby argues that Google has an obligation to answer questions about these ranking features that shape the world around us. In their article on the leak, King and Fishkin also look at isCovidLocalAuthority and isElectionAuthority, noting that both are important in helping search engines boost quality information.

“Whether you like it or not, Google is effectively a public service, so I think it's really important that they provide that information discernment,” King said. “They'll probably resist me saying that, but we think of Google as the primary source for information on the web.”

The way Google ranks information in these examples is a microcosm of the entire search ecosystem. Millions of questions arise every day about which information to amplify and which to silence. Google and several tech companies have long tried to present themselves as opinion-free algorithms, but these ranking features show that this is not the case. Many more examples of ranking features are revealed in the 2,500-page leak.

Finding answers in Google's algorithm

Google has not gone into detail about these documents, telling Gizmodo that revealing too much information could encourage malicious activity, so it's up to SEO experts to unravel this on behalf of everyone who uses Google Search. Some of the 14,000 ranking features identified last week are ones that Google has explicitly claimed it hasn't used for years.

In a 2016 video, a Google Search representative declared, “We don't have a website authority score.” In a 2015 interview, another Googler said, “Using clicks directly to rank is a mistake.” Given the leaked documents and Google's response, it's hard to make sense of these comments.

Fishkin said the response is a perfect example of why people don't like or trust Google: It doesn't address the leak issue, it doesn't offer any value, and it was likely written by an AI trained on the most soulless corporate messaging of the past few decades.

In the age of AI answers, Ruby noted, how Google ranks web pages is more important than ever. Google's new AI summary feature may give you one clear answer instead of a barrage of links to different perspectives. But we've seen 10-year-old Reddit posts gain a strange amount of authority, encouraging some users to put glue on their pizza. How Google selects authority is increasingly important, since now only the top results may have a say.

“We're switching gears. We're moving from one search system to another,” Luby said. AI is having a major impact on search results.

Ultimately, we don't know what Google is actually doing with these ranking features. What we do know is that Google created these classifiers, and likely many more, to rank websites across the Internet. These rankings clearly involve judgment, providing further evidence that Google Search is not an objective experience, but rather a series of editorial choices made by people inside Google.

