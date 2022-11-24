



Google recently changed something with their deduplication system. And as I see it, the change was a mistake.

In October, we began to see multiple client websites reporting an increase in the number of duplicates. Google chose a canonical page that is different from the user page.

I know it’s not because these websites made drastic changes on the part of Onely’s clients. must be.

Why did I say this is not a positive change? See for yourself:

Unfortunately, we are unable to share specific pages from these websites. But just by looking at the URL path, Google made a mistake. They are not the same:

I don’t mind bugs if it’s just one page. But look at this chart:

21 million pages were incorrectly marked as duplicates last month, but there were no major structural changes on the client side. These pages are not indexed and do not drive traffic to your site. Google’s shortcomings cost the business millions of dollars.

At first I thought this was related to Google’s October spam update. This update went live on October 19th and coincided with a spike in the number of duplicates, prompting Google to choose a different canonical page than the user page. But then, a few weeks before that update was rolled out, I found other similarly affected websites. Here are two examples:

Worst of all, most of the time, there is absolutely no logic in how Google chooses canonical variants! The duplication of choosing canonical is fairly common.

But in these cases, Google chooses product A as the canonical page for product B. Similarly, Google is choosing the Samsung Galaxy S20 product page as the preferred canonical page, not the JBL speaker product page! Again, zero logic.

On one of these other websites, Google normalized the women’s clothing category to men’s clothing.

It’s unclear which algorithms Google uses to detect duplicate content, but many of them work on common phrases. For example, both men’s and women’s category pages include t-shirts, sportswear, and jeans. But this doesn’t mean they are duplicates. far cry.

What does this mean?

Google has had similar issues in the past, but this may finally be resolved. But so far there’s no indication that Google is aware of the problem and is working on a solution.

For now you have to remember:

Google may have determined that your site’s URLs were duplicates and deindexed them, even if they weren’t. Check your page indexing report in Google Search Console to see how many URLs are reported as duplicates, whether Google has chosen a different canonical than the user’s, or how many URLs are reported as duplicates other than the user’s chosen canonical. See if there has been a recent spike.

Next steps are:

Explore pages that you think are important to your business. Find an example of Google choosing the wrong canonical. Add your own content to let Google know that your page has changed. Request indexing in Google Search Console. Know when Google deindexes your pages

We understand that Google may deindex some pages to free up space for higher quality documents.

I also noticed that Google deindexed many pages during the core update. Onely’s Ziemek Buko wrote an article about what happened after one of his recent updates.

My solution for this is to use ZipTie.

ZipTie can proactively monitor whether existing pages are indexed over time.

You can easily see which URLs have been deindexed by Google.

You can then use Google Search Console to find out why Google deindexed it. But without ZipTie, you risk being surprised by a steady decline in traffic as your pages are deindexed without your knowledge.

Sources 1/ https://Google.com/ 2/ https://ziptie.dev/google-duplicate-detection-algorithm-is-broken/

