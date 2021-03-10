



Google uses predictive techniques to detect duplicate content based on URL patterns. This can cause pages to be misidentified as duplicates.

To prevent unnecessary crawls and indexing, Google will try to predict when a page may contain similar or duplicate content based on the URL.

If Google crawls pages with similar URL patterns and detects that they contain the same content, it may determine that all other pages with that URL pattern also have the same content.

Unfortunately, for site owners, they have the same URL pattern as the pages that are actually duplicates, so pages with unique content can be canceled as duplicates. These pages are excluded from Google’s index.

This topic is described in the Google Search Central SEO Hangouts recorded on March 5th. Site owner Ruchit Patel asks Mueller about event websites where thousands of URLs aren’t properly indexed.

One of Mueller’s theories about why that is happening is due to the prediction method used to detect duplicate content.

Google’s John Mueller on Duplicate Content Prediction

Google has multiple levels of determining when a web page contains duplicate content.

One is to look directly at the content of the page, and the other is to predict when the page will be duplicated based on the URL.

“What often happens on our side is that there are multiple levels to try to understand when there is duplicate content on the site. One is that if you look directly at the content of the page, this page will have this content. Yes, this page has different content and should be treated as separate pages.

The other is a broader predictive approach that looks at the URL structure of your website. In the past, when I checked such a URL, it was displayed as follows. It has the same content as such a URL. It basically learns that pattern and says that a URL that looks like this is the same as a URL that looks like this. “

Mueller further explains that the reason Google does this is to save resources on crawling and indexing.

If Google determines that a page is a duplicate version of another page because of similar URLs, it will crawl that page to see what the content actually looks like. I don’t even do that.

“You can save on crawls and indexing and focus on these possible or very likely duplicate cases without having to look at individual URLs, and I think it’s a city. I saw it happen with something like.

The car is another place we’ve seen, and the system basically recognizes that what you specify as a city name has little to do with the actual URL. And you usually learn such patterns when your site offers a lot of the same content under different names. “

Mueller talks about how Google’s predictive method of detecting duplicate content affects the event’s website.

“That is, for event sites, I’m not sure if this is the case for your website. For event sites, you might get one city and maybe even a city 1km away. Event page same Indicates that the event is exactly the same because it is related to both locations.

And you probably take a city 5 km away and see the exact same event again. And from our point of view, we tend to fall into the situation of checking 10 event URLs, but when we checked 10 event URLs, the same content was displayed, so this parameter looks like a city name. Is actually irrelevant.

And that’s what our system can say, well, maybe the city name is totally irrelevant and we can ignore it. “

What can site owners do to fix this issue?

As a potential solution to this problem, Mueller suggests looking for real-world cases of duplicate content and limiting it as much as possible.

“Therefore, what I’m trying to do in these cases is to see if there is such a situation with strong content duplication and find a way to limit it as much as possible.

This may have been referred to as this small city just outside the big city, using something like rel canonical on the page. Set canonical to big city because you will see exactly the same content.

So you can see all the URLs you crawl on your website and index. This URL and its content are unique and it is important to index all these URLs.

Or you will see clear information that this URL you know seems to be the same as any other URL. You may have set up a redirect or set up a legitimate URL associated with it. You can also understand by focusing on these main URLs. The aspect of the city there is important for each page. “

Mueller hasn’t addressed this aspect of the issue, but it’s worth noting that there are no penalties or negative ranking signals associated with duplicate content.

At best, Google doesn’t index duplicate content, but it doesn’t hurt your entire site.

Listen to Mueller’s reaction in the video below:

