Search engines other than Google can't show recent Reddit results

Recent discussions on Reddit are no longer appearing in search engine results other than Google because Reddit's content policy has been updated to prohibit crawling the site without agreeing to Reddit's rules, which prohibit using Reddit content for AI training without Reddit's explicit consent.

404 Media reports that using “” in non-Google search engines like Bing, DuckDuckGo, and Mojeek returns few or no results for Reddit from the past week. Ars Technica ran searches in these and other search engines and confirmed the results. Brave, for example, sometimes returns some results for Reddit (examples here and here), but not as many as Google would show for the same query. One that stands out is Kagi, a paid engine that pays Google for a portion of its search index, but still returns recent Reddit results.

As 404 Media noted, Reddit's Robots Exclusion Protocol (robots.txt file) blocks bots from scraping the site. The protocol also states that “Reddit believes in an open Internet, but not in the misuse of public content.” Reddit approves scrapers from the Internet Archive and several research-focused organizations.

Reddit announced the changes to its robots.txt file on June 25. Prior to the change, the company said it had “seen a rise in overtly commercial entities scraping Reddit and claiming they are not bound by our terms and policies. Even worse, they hide behind robots.txt and claim they can use Reddit content for any purpose they like.”

Last month, Reddit said any “person of goodwill” could contact it to work with the company, linking to an online form. But Mojeek CEO Colin Hayhurst said in an email that he reached out to Reddit after being blocked, but that Reddit “did not respond to many messages and emails.” He noted that Reddit CEO Steve Huffman has reached out since 404 Media's report.

Google Search Strangehold Strengthened

Because Google is, at least for now, effectively the only search engine that can surface Reddit's most recent results, Reddit has unintentionally helped tighten Google's grip on the search industry. The change comes amid recent quality concerns about Google's results, where SEO and AI spam farms, ads, and e-commerce links have been ranked higher than more relevant results. There are also concerns about Google's AI profile.

Asked for comment, Reddit spokesman Tim Rathschmidt said in an email that the company is in discussions with “multiple search engines.”

We were unable to reach agreements with all of the companies because some were unable or unwilling to make enforceable commitments regarding the use of Reddit content, including for use in AI.

After Reddit declared war on making its content available for free to train AI (which drove up the price of API access and led to the shutdown of many third-party Reddit apps), it reportedly signed a deal to allow Google to use Reddit data for AI training for $60 million per year. It was expected that Reddit would try to strike a similar deal with Microsoft, but it appears they were unable to come to an agreement that was in line with Reddit's content policies, which also include rules on user privacy and removed content.

A Microsoft spokesperson said:

Microsoft respects the robots.txt standard and respects instructions from websites that do not want content on their pages to be used by our generative AI models. Bing has stopped crawling Reddit after implementing an updated robots.txt file on July 1 that prevented it from being crawled altogether.

In October, The Washington Post, citing anonymous sources, reported that Reddit was considering blocking Bing's search crawler if it couldn't reach a deal with Microsoft.

As 404 Media noted, Reddit's guide to accessing its data lists “search or website ads” as a paid commercial use. It's unclear how much it would cost other search engines to be allowed to scrape the platform. Rathschmidt said Reddit is “open to working with partners, big and small.”

“It's bad for the health of the internet when commercial companies can collect our content without restriction and use it for other purposes. [training] “It's an AI model,” he said.

For now, Google can continue to rely on Reddit to make its search results more relevant. Google did not respond to Ars' request for comment.

Meanwhile, alternative search engines may find it harder to compete.

“Our proprietary ranking algorithms have meant that users often find different pages on Reddit than they could find on Google or other sites,” Mojeek's Hayhurst said.

The CEO added that being blocked by Reddit “isn't a big deal,” but he was concerned about the precedent it sets: “Search engines are the primary source of traffic for most websites, and if this behavior becomes widespread, traffic will decrease even more, and smaller sites will be affected even more than larger sites,” he said.

Advance Publications, which owns Ars Technica's parent company Cond Nast, is Reddit's largest shareholder.

This article has been updated with additional comment from Microsoft.




