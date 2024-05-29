



Over the US holiday period, several posts were shared about an alleged leak of Google ranking-related data. The initial posts about the leak focused on “confirming” Rand Fishkin's long-held beliefs, with little attention paid to the context of the information or what it actually means.

Context Matters: The Document AI Warehouse

The leaked documents relate to a public Google Cloud platform called Document AI Warehouse that is used to analyze, organize, search, and store data. This public document is titled An Overview of Document AI Warehouse. The Facebook post states that the “leaked” data is an “internal version” of publicly available Document AI Warehouse documents. This is the context of this data:

Screenshot: Document AI Warehouse

@DavidGQuaid tweeted:

“I think it's pretty clear, as the name suggests, that this is an external-facing API for building document warehouses.”

This appears to pour cold water on the idea that the “leaked” data represents inside information about Google Search.

From what we know so far, the “leaked data” is similar to that found on the public Document AI Warehouse page.

Internal search data leak?

SparkToro's original post does not state that the data was taken from a Google search, it says that the person who sent the data to Rand Fishkin is the one who made the claim.

One of the things I admire about Rand Fishkin is how precise he is in his writing, especially when it comes to caveats. Rand correctly points out that it is the people providing the data who claim that the data was taken from a Google search. There is no evidence, just allegations.

He writes:

“I received an email from someone claiming to have access to a large amount of leaked API documentation from Google's search division.”

Fishkin himself did not claim that the former Googlers confirmed that the data came from Google searches; he wrote that the person who sent the data in the email made that claim.

“The email further claimed that the leaked documents had been verified as authentic by former Google employees, who shared additional personal information about Google's search operations.”

Fishkin writes about a subsequent video conference call in which the leakers revealed that their contact with the former Googlers occurred in the context of meeting them at a search industry event. Again, we must take the leakers' word for it regarding the former Googlers; their statements came after careful review of the data, not as off-the-record comments.

Fishkin wrote that he contacted three former Googlers about the matter. Notably, the ex-Googlers did not explicitly acknowledge that the data was internal to Google Search. They only acknowledged that the data did not originate from Google Search, but resembled internal Google information.

Fishkin writes that he heard the following from a former Googler:

“I didn't have access to this code when I worked there, but this certainly looks genuine.” “It has all the hallmarks of an internal Google API.” “It's a Java-based API, and someone has spent a lot of time ensuring it conforms to Google's own internal standards for documentation and naming.” “I need more time to be sure, but this is consistent with the internal documentation I'm familiar with.” “From my quick review, there's nothing to indicate this isn't genuine.”

Saying something originated from a Google search is one thing, but saying it originated from Google is another thing entirely.

Open your heart

It's important to keep an open mind, as there's a lot that's unknown about the data — for example, we don't know if this is documentation from an internal search team — so it's probably not a good idea to derive actionable SEO advice from this data.

Additionally, we don't recommend analyzing data to concretely confirm long-held beliefs, as this leads to confirmation bias.

Confirmation bias definition:

“Confirmation bias is the tendency to search for, interpret, prefer, and recall information in a way that confirms or supports one's prior beliefs and values.”

Confirmation bias causes us to deny things that are empirically true. For example, the theory that Google automatically excludes new sites from ranking, known as the sandbox, has been around for decades. Every day, people report that their new site or new page ranks in the top 10 of Google searches almost instantly.

But if you are a fervent believer in the sandbox, then such actually observable experiences will be ignored, no matter how many people observe experiences to the contrary.

Brenda Malone ( LinkedIn profile ), a freelance Senior SEO Technical Strategist and Web Developer, messaged me about the claims made about Sandbox.

“I know from practical experience that the sandbox theory is wrong. I indexed a personal blog with two posts in just two days. According to the sandbox theory, a small site with two posts should never be indexed.”

The point here is that if a document turns out to have been generated from a Google search, searching for confirmation of a long-held belief is the wrong way to analyze the data.

What is the Google data breach?

There are five things to consider regarding leaked data:

The context of the leaked information is unclear. Is it related to Google Search? Is it for other purposes? The purpose of the data. Was the information used for actual search results? Or was it used internally for data management and manipulation? Ex-Googlers have not confirmed that the data is specific to Google Search, only that it appears to come from Google. Keep an open mind. What do you think happens when you try to find justification for a long-held belief? You find it everywhere. This is called confirmation bias. Evidence suggests that the data is related to an external-facing API for building a document warehouse. What others are saying about the “leaked” documents

Ryan Jones, someone with not only extensive experience in SEO but also a deep understanding of computer science, shared some rational insights on the so-called data leak.

Ryan tweeted.

“I don't know if this is for production or testing. My guess is that it's mainly for testing potential changes.

I don't know what's used for the web or other areas, some may just be used for Google Home, news, etc.

I don't know what the inputs are to the ML algorithm and what is used to train it. I would guess that clicks are not direct inputs but are used to train a model that predicts click likelihood (other than trend boosting).

We also speculate that some of these fields only apply to the training data set and not to all sites.

Am I saying Google isn't lying? Absolutely not. But let's take an open-minded and critical look at this leak.”

@DavidGQuaid tweeted:

“I don't know if this is for Google search or Google cloud docs search.

The API seems selective. I don't think the algorithm is executed this way. What if an engineer wants to skip the quality checks altogether? This seems like what would happen if you wanted to build a content warehouse app for an enterprise knowledge base.”

Does the “leaked” data relate to Google searches?

At this point, there is no solid evidence that this “leaked” data actually comes from Google Search. There is a lot of ambiguity as to what the purpose of the data is. Notably, there are hints that the data is merely “an external-facing API for building a document warehouse, as the name suggests,” and has nothing to do with ranking websites in Google Search at all.

While it's not yet conclusive that this data isn't derived from Google searches, the winds of evidence seem to be pointing in that direction.

Featured image: Shutterstock/Jaaak

Sources 1/ https://Google.com/ 2/ https://www.searchenginejournal.com/google-data-leak-clarification/517711/ The mention sources can contact us to remove/changing this article

What Are The Main Benefits Of Comparing Car Insurance Quotes Online

LOS ANGELES, CA / ACCESSWIRE / June 24, 2020, / Compare-autoinsurance.Org has launched a new blog post that presents the main benefits of comparing multiple car insurance quotes. For more info and free online quotes, please visit https://compare-autoinsurance.Org/the-advantages-of-comparing-prices-with-car-insurance-quotes-online/ The modern society has numerous technological advantages. One important advantage is the speed at which information is sent and received. With the help of the internet, the shopping habits of many persons have drastically changed. The car insurance industry hasn't remained untouched by these changes. On the internet, drivers can compare insurance prices and find out which sellers have the best offers. View photos The advantages of comparing online car insurance quotes are the following: Online quotes can be obtained from anywhere and at any time. Unlike physical insurance agencies, websites don't have a specific schedule and they are available at any time. Drivers that have busy working schedules, can compare quotes from anywhere and at any time, even at midnight. Multiple choices. Almost all insurance providers, no matter if they are well-known brands or just local insurers, have an online presence. Online quotes will allow policyholders the chance to discover multiple insurance companies and check their prices. Drivers are no longer required to get quotes from just a few known insurance companies. Also, local and regional insurers can provide lower insurance rates for the same services. Accurate insurance estimates. Online quotes can only be accurate if the customers provide accurate and real info about their car models and driving history. Lying about past driving incidents can make the price estimates to be lower, but when dealing with an insurance company lying to them is useless. Usually, insurance companies will do research about a potential customer before granting him coverage. Online quotes can be sorted easily. Although drivers are recommended to not choose a policy just based on its price, drivers can easily sort quotes by insurance price. Using brokerage websites will allow drivers to get quotes from multiple insurers, thus making the comparison faster and easier. For additional info, money-saving tips, and free car insurance quotes, visit https://compare-autoinsurance.Org/ Compare-autoinsurance.Org is an online provider of life, home, health, and auto insurance quotes. This website is unique because it does not simply stick to one kind of insurance provider, but brings the clients the best deals from many different online insurance carriers. In this way, clients have access to offers from multiple carriers all in one place: this website. On this site, customers have access to quotes for insurance plans from various agencies, such as local or nationwide agencies, brand names insurance companies, etc. "Online quotes can easily help drivers obtain better car insurance deals. All they have to do is to complete an online form with accurate and real info, then compare prices", said Russell Rabichev, Marketing Director of Internet Marketing Company. CONTACT: Company Name: Internet Marketing CompanyPerson for contact Name: Gurgu CPhone Number: (818) 359-3898Email: [email protected]: https://compare-autoinsurance.Org/ SOURCE: Compare-autoinsurance.Org View source version on accesswire.Com:https://www.Accesswire.Com/595055/What-Are-The-Main-Benefits-Of-Comparing-Car-Insurance-Quotes-Online View photos