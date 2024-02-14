



After Google's Gemini went public, it took just 24 hours for someone to notice that chat was publicly visible in Google search results. Google quickly responded to what appeared to be a breach. The reason for how this happened is quite surprising and not as sinister as it first seems.

@shemiadhikarath tweeted:

“Hours after @Google Gemini was released, search engines like Bing indexed public conversations from Gemini.”

They posted a screenshot of the site search for gemini.google.com/share/.

However, if you look at the screenshot, you'll see a message saying, “We'd like to display a description here, but the site doesn't allow it.”

By the early morning hours of Tuesday, February 13th, Google Gemini Chat started being removed from Google search results, with only three search results showing up on Google. By the afternoon, the number of Gemini leaked chats appearing in the search results was reduced to just one search result.

How was the Gemini chat page created?

Gemini provides a way to create a link to a public version of a private chat.

Google doesn't automatically create web pages from private chats. Users create chat pages from the link at the bottom of each chat.

Screenshot of how to create a shared chat page

Why was my Gemini chat page indexed?

The obvious reason why the chat page was crawled and indexed is because Google forgot to place robots.txt at the root of the Gemini subdomain (gemini.google.com).

The robots.txt file is a document for controlling crawler activity on your website. Publishers can block specific crawlers using commands standardized in the Robots.txt protocol.

I checked robots.txt on February 13th at 4:19am and found the following:

I then checked the Internet Archive to see how long the robots.txt file has been around and found that it has been around since at least February 8th, when the Gemini app was announced.

Internet Archive screenshot

This means that the obvious reason why your chat page was crawled is not the correct reason, but only the most obvious reason.

Google Gemini subdomains had robots.txt that blocked web crawlers from both Bing and Google, but how were they supposed to crawl and index these pages?

Two ways to discover and index your private chat pages Chances are you have a public link somewhere. Although unlikely, it is possible that it was discovered through your browsing history linked from a cookie.

It is likely that there is a public link.

I asked Bill Hartzer (@bhartzer) about this and he found a public link to one of the indexed pages.

Therefore, we found that public links likely caused these Gemini Chat pages to be crawled and indexed.

Bill Hartzer made this observation:

“Despite the Gemini URL being blocked in the robots.txt file, the Gemini URL is indexed because there is a link to it in a blog comment.

This indicates that Google will continue to index URLs that are blocked from being crawled in your robots.txt file.

If Google really wanted to make sure Gemini URLs weren't indexed, they would allow it to be crawled in the robots.txt file and add a noindex meta tag to the page. Perhaps Google should follow its own advice here? ”

Why did my chat pages start being removed from search results?

But if there are public links, why did Google start removing chat pages entirely? Google excludes web pages in the /share/ folder from its search index even if they have public links Have you created internal rules for your search crawler?

Insights into how Bing and Google Search index your content

Now comes the really interesting part for all you search geeks out there who are curious about how Google and Bing index their content.

The Microsoft Bing search index responded differently to Gemini content than Google search. In the early morning hours of February 13th, Google still showed three search results, but Bing only showed one result from the subdomain. There was a seemingly random nature to what was indexed and how much.

Why was Gemini's chat page leaked?

The known facts are:

Google introduced robots.txt on February 8th. Google and Bing both indexed pages on the gemini.google.com subdomain. Both Google and Bing may have discovered the link to the chat and subsequently indexed it. Search engines started indexing and dumping content regardless of robots.txt.

So, back to the question of why these pages started being removed from both Google and Bing search results. My guess is that the Google Gemini chat page is a low quality web page that is not worth displaying for what is essentially a long tail search (site:gemini.google.com/share/). Masu. There's really no useful reason to show these pages in search results.

Content blocked by Robots.txt can still be discovered, crawled, and ultimately search indexed. You can also rank pages if they are useful, as long as they are not. I think that may be the case.

