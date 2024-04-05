



Gary Illyes, a search team engineer at Google, detailed in a recent video how search engines evaluate the quality of web pages when indexing.

This information is timely, as Google is steadily raising the bar for “high quality” content.

Quality: A key factor in indexing and crawl frequency

Illyes described the indexing stage, which includes analysis of a page's text content, tags, attributes, images, and videos.

At this stage, Google also calculates various signals that help determine the quality of the page and therefore its ranking in search results.

Ilies explains:

“The final step in indexing is deciding whether to include the page in Google's index. This process, called index selection, is highly dependent on the quality of the page and previously collected signals.”

This detail is especially important for publishers and SEO professionals who are struggling to index their content.

From a technical point of view, everything may be done correctly. However, if a page does not meet certain quality thresholds, it will not be indexed.

Additionally, Google has previously confirmed that high-quality content is crawled more frequently, which is important for staying competitive in search results.

One of Google's goals this year is to conserve crawl resources by prioritizing pages that are “deserving” of crawling, underscoring the urgency of meeting Google's quality standards.

Handling signals and duplicate content

Illies touched on how Google analyzes signals.

Some signals are simple, such as a rel= “canonical” annotation, while others are more complex, such as a page's importance on the internet.

Google also employs “overlap clustering,” where similar pages are grouped together and a single canonical version is selected to represent the content in search results. The canonical version is determined by comparing the quality signals collected for each duplicate page.

Also read: Google explains how to choose legitimate web pages

Additional insights about indexing

Illyes shared the following notable details along with his insights on quality assessment:

HTML Parsing and Semantic Issues: Illyes explained how Google parses the HTML of a web page and fixes any semantic issues that arise.unsupported tag < head> Using elements can cause indexing issues. Identifying the main content: Illyes said that Google focuses on the “main content or centerpiece of the page” when analyzing it. This suggests that optimizing the core content of a web page is more important than incremental technical changes. Index storage: Illyes revealed that Google's search database is spread across thousands of computers. This is interesting context regarding the scale of Google's infrastructure.

Watch the full video below.

Why SEJ is paying attention

Google continues to prioritize high-quality content in its indexing and ranking process, so SEO professionals need to be aware of how Google measures quality.

SEO experts who know the factors that influence indexing, such as relevance, quality, and signal calculation, will have a better idea of ​​what to aim for in order to meet Google's indexing thresholds.

How this helps

To ensure your content meets Google's quality standards, consider these actionable steps.

Focus on comprehensively creating content that addresses your audience's needs and pain points. Identify current search demand trends and align your content to these topics. Make sure your content is well-structured and easy to navigate. Implement schema markup and other structured data to help Google better understand context. Update and update your content regularly to stay relevant and valuable.

By prioritizing quality, relevance, and meeting search demand, you may be able to increase indexed pages and crawl frequency.

FAQ What does Google's “index selection” process involve?

The index selection process is the final step in Google's indexing and determines whether a page should be included in the search index.

This decision is based on page quality and various signals collected during the initial evaluation.

If your pages don't meet the quality standards set by Google, they run the risk of not being indexed. Therefore, it is important to focus on producing high-quality content to ensure visibility in Google's search engine.

How does Google handle duplicate content, and what role do quality signals play in this process?

Google handles duplicate content through a process called “duplicate clustering,” which groups similar pages together. The canonical version representing the group is then selected in the search results.

The canonical version is selected based on quality signals associated with each duplicate page. These signals may include attributes such as the appropriate use of the rel=”canonical” tag, or more complex factors such as the perceived importance of the page on the internet.

Ultimately, the canonical version chosen reflects Google's assessment of which pages are most likely to provide the best value to users.

Featured image: YouTube.com/GoogleSearchCentral, April 2024.

