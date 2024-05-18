



Google has revealed details of two new crawlers optimized for scraping image and video content for “research and development” purposes. Although it is not explicitly stated in the documentation, we believe that rankings will not be affected if a publisher decides to block a new crawler.

Note that the data scraped by these crawlers is not explicitly for AI training data. This is the purpose of the Google extension crawler.

Google and other crawlers

The two new crawlers are versions of Google's GoogleOther crawler, which was launched in April 2023. The original He Google Other crawler has also been designated for use by Google product teams for research and development called His One-Time Crawl, and its description includes a clue as to what the new Google Other variant will be used for. About what will happen.

The purpose of the original GoogleOther crawler is officially described as follows:

“GoogleOther is a general-purpose crawler that can be used by various product teams to retrieve publicly accessible content from a site. For example, it might be used for a one-time crawl for internal research and development. there is.”

Two other variations of Google

There are two new GoogleOther crawlers.

GoogleOther-Image GoogleOther-Video

The new variant is for crawling binary data, which is non-text data. HTML data is commonly referred to as a text file, ASCII or Unicode file. If it can be viewed as a text file, it is a text file/ASCII/Unicode file. Binary files are files, images, audio, videos, etc. that cannot be opened with text viewer apps.

The new GoogleOther variant is for image and video content. Google lists user agent tokens for both new crawlers that can be used in robots.txt to block the new crawlers.

1. Google Other Images

User agent token:

GoogleOther – Image GoogleOther

Complete user agent string:

GoogleOther-Image/1.0

2. GoogleOther-Video

User agent token:

GoogleOther – Video GoogleOther

Complete user agent string:

GoogleOther-Video/1.0

Newly updated GoogleOther user agent string

Google has also updated the GoogleOther user agent string for the regular GoogleOther crawler. You can continue to use the same user agent token (GoogleOther) as before for blocking purposes. The new user agent string is simply data sent to the server to identify the crawler, specifically a full description of the technology being used. In this case, the technology used is Chrome, and the model number is updated regularly to reflect the version in use (in the example shown below, WXYZ is a placeholder for the Chrome version number).

Complete list of GoogleOther user agent strings:

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML like Gecko) Chrome/WXYZ Mobile Safari/537.36 (compatible; GoogleOther) Mozilla/5.0 AppleWebKit/537.36 (Gecko KHTML; Compatibility; GoogleOther) Chrome/WXYZ Safari/537.36 GoogleOther's bot family

These new bots may appear in your server logs from time to time, and this information can help you identify them as genuine Google crawlers and also allow you to opt out of having your images and videos scraped for research and development purposes. It's also useful for publishers who want to outsource.

Read the updated Google crawler documentation

Google other images

Google other videos

Featured image by Shutterstock/ColorMaker

