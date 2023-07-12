



Google’s John Mueller answered whether removing pages from large sites would help solve the problem of pages found by Google but not crawled. John provided some general insight on how to resolve this issue.

Detected – Not Currently Indexed

Search Console is a service provided by Google for communicating search-related issues and feedback.

Indexing status is an important part of Search Console because it tells publishers which parts of their site are indexed and considered for ranking.

You can check the indexing status of your web pages in the Page Indexing report in Search Console.

Reports that a page has been found by Google but not indexed are often an indication that a problem needs to be addressed.

There are multiple reasons why Google finds a page but refuses to index it, but Google’s official documentation lists only one reason.

“FOUND – NOT CURRENTLY INDEXED. The page was found by Google but has not yet been crawled.”

Normally Google wanted to crawl URLs, but this was expected to overload the site. So Google rescheduled the crawl.

This is why the last crawl date is empty in the report. ”

Google’s John Mueller explains more about why pages are found but not indexed.

Do you want to remove the index of unindexed pages to improve indexing across your site?

The idea is that removing certain pages will result in fewer pages to crawl, making it easier for Google to crawl the rest of your site.

Google recognizes that all sites have a limited crawl capacity (crawl budget).

Googlers have repeatedly stated that there is no such thing as a crawl budget as SEOs know.

Google has many considerations for the number of pages to crawl, such as the capacity of the website server to handle large crawls.

The underlying reason why Google is so hung up on crawl volume is that it doesn’t have enough storage to store all the web pages on the internet.

As such, Google tends to index pages that have some value (if the server can handle them) and not index pages that aren’t.

For more information on crawl budgets, see Google Shares Crawl Budget Insights.

The questions asked were:

“De-indexing 8 million used products and aggregating them into 2 million unique indexable product pages, the crawlability and indexability (discovered – currently not indexed issues) Will it help me improve?”

Google’s John Mueller first acknowledged that it was impossible to address the person’s specific problem, and then offered general recommendations.

he replied:

“Impossible to say.

We recommend that you review the Crawl Budget Guide for Large Sites in our documentation.

For large sites, more crawls may be limited by how your website can handle more crawls.

However, in most cases, the overall website quality is what matters.

Are you increasing the number of pages from 8M to 2M and significantly improving the overall quality of your website?

Unless you focus on real quality improvements, it’s easy to spend a lot of time improving your real website by simply reducing the number of indexable pages, which doesn’t improve search. . ”

Mueller gives two reasons for the unindexed issue found

Google’s John Mueller gives two reasons why Google finds a page but refuses to index it.

Server capacity Overall website quality 1. Server capacity

Mueller said Google’s ability to crawl and index web pages may be “limited by whether the website can handle more crawls.”

As websites get bigger, more bots are needed to crawl them. To further complicate matters, Google is not the only bot crawling large sites.

Other legitimate bots such as Microsoft and Apple are also trying to crawl the site. In addition, there are many other bots, some legitimate, others related to hacking and data scraping.

This means that on a large site, especially in the evening, there could be thousands of bots using website server resources to crawl the large website.

So one of the first questions I ask publishers having indexing issues is the state of their servers.

Websites with millions or even hundreds of thousands of pages generally require a dedicated server or cloud host (because cloud servers provide scalable resources such as bandwidth, GPU, and RAM).

In your hosting environment, you may need to allocate more memory to your process, like PHP’s memory limit, in order for your server to handle high traffic and prevent 500 error response messages.

Server troubleshooting includes analyzing server error logs.

2. Overall website quality

This is an interesting reason why not enough pages are indexed. Overall site quality is like the score or judgment Google assigns to her website.

Any part of the website can affect the quality of the entire site

John Mueller says that website sections can influence quality decisions for the site as a whole.

Mr Muller said:

“…in some cases, we care about overall site quality.

And if we look at the quality of the site as a whole, it doesn’t matter to us why it’s low quality, even if a good portion of it is low quality.

…if you find that the important parts are of poor quality, you may think that the website as a whole is not as great as you thought. ”

Definition of site quality

Here’s how Google’s John Mueller defines site quality in another Office Hours video:

“Content quality is not just the text of an article.

It’s actually the quality of the website as a whole.

It includes everything from layout to design.

For example, how information is displayed on the page, how images are integrated, how quickly you work, all these factors have an impact there. ”

How long it takes to determine the quality of the entire site

Another fact about how Google determines site quality is how long it takes Google to determine site quality. In some cases it may take several months.

Mr Muller said:

“It takes a lot of time to understand how a website fits in with the rest of the internet.

…and that can easily take months, six months, or even more than six months…”

Site optimization for crawling and indexing

Optimizing an entire site or a section of a site is a common high-level way to look at the problem. In many cases, you end up optimizing individual pages on a scale basis.

Especially for e-commerce sites with thousands of products, optimization can take many forms.

Points to note:

Main Menu Make sure your main menu is optimized to direct users to important sections of your site that are of interest to most users. The main menu can also link to the most popular pages.

Linking to Popular Sections and Pages The most popular pages and sections can also be linked to from a prominent section on the home page.

This gives users access to the pages and sections that are most important to them, but it also tells Google that these pages are important pages that should be indexed.

Improving thin content pages Thin content is basically pages that have little useful content or are mostly duplicates of other pages (template content).

Simply filling a page with words is not enough. Words and sentences should have meaning and relevance to site visitors.

For products, include dimensions, weight, available colors, suggestions for other products to pair with, best brand to pair with the product, links to manuals, FAQs, ratings, and other information that users may find valuable will be

Solve crawled and unindexed issues and grow your online sales

In physical stores, it feels like just putting products on the shelves is enough.

But the reality is that it often takes a knowledgeable salesperson to get those items off the shelf.

A web page can act as a knowledgeable salesperson, telling Google why the page should be indexed and helping customers make product choices.

Check out the 13:41 portion of Google SEO Office Hours.

Featured image by Shutterstock/Rembolle

