Google’s Gary Illyes discussed the concept of “centerpiece content,” how they go about identifying it, and why soft 404s are the most critical error that gets in the way of indexing content. The context of the discussion was the recent Google Search Central Deep Dive event in Asia, as summarized by Kenichi Suzuki.
Main Body Content
According to Gary Illyes, Google goes to great lengths to identify the main content of a web page. The phrase “main content” will be familiar to those who have read Google’s Search Quality Rater Guidelines. The concept of “main content” is first introduced in Part 1 of the guidelines, in a section that teaches how to identify main content, which is followed by a description of main content quality.
The quality guidelines define main content (aka MC) as:
“Main Content is any part of the page that directly helps the page achieve its purpose. MC can be text, images, videos, page features (e.g., calculators, games), and it can be content created by website users, such as videos, reviews, articles, comments posted by users, etc. Tabs on some pages lead to even more information (e.g., customer reviews) and can sometimes be considered part of the MC.
The MC also includes the title at the top of the page (example). Descriptive MC titles allow users to make informed decisions about what pages to visit. Helpful titles summarize the MC on the page.”
Google’s Illyes referred to main content as the centerpiece content, saying that it is used for “ranking and retrieval.” The content in this section of a web page has greater weight than the content in the footer, header, and navigation areas (including sidebar navigation).
Suzuki summarized what Illyes said:
“Google’s systems heavily prioritize the “main content” (which he also calls the “centerpiece”) of a page for ranking and retrieval. Words and phrases located in this area carry significantly more weight than those in headers, footers, or navigation sidebars. To rank for important terms, you must ensure they are featured prominently within the main body of your page.”
Content Location Analysis To Identify Main Content
This part of Illyes’ presentation is important to get right. Gary Illyes said that Google analyzes the rendered web page to located the content so that it can assign the appropriate amount of weight to the words located in the main content.
This isn’t about the identifying the position of keywords in the page. It’s just about identifying the content within a web page.
Here’s what Suzuki transcribed:
“Google performs positional analysis on the rendered page to understand where content is located. It then uses this data to assign an importance score to the words (tokens) on the page. Moving a term from a low-importance area (like a sidebar) to the main content area will directly increase its weight and potential to rank.”
Insight: Semantic HTML is an excellent way to help Google identify the main content and the less important areas. Semantic HTML makes web pages less ambiguous because it uses HTML elements to identify the different areas of a web page, like the top header section, navigational areas, footers, and even to identify advertising and navigational elements that may be embedded within the main content area. This technical SEO process of making a web page less ambiguous is called disambiguation.
Related:
- Google Answers If Semantic HTML Element Has An Impact
- What Semantic HTML Is And Why It’s Good For SEO
3. Tokenization Is Foundation Of Google’s Index
Because of the prevalence of AI technologies today, many SEOs are aware of the concept of tokenization. Google also uses tokenization to convert words and phrases into a machine-readable format for indexing. What gets stored in Google’s index isn’t the original HTML; it’s the tokenized representation of the content.
See also: Introduction To LLMs For SEO With Examples
4. “Soft 404s Are A Critical Error
This part is important because it frames soft 404s as a critical error. Soft 404s are pages that should return a 404 response but instead return a 200 OK response. This can happen when an SEO or publisher redirects a missing web page to the home page in order to conserve their PageRank. Sometimes a missing web page will redirect to an error page that returns a 200 OK response, which is also incorrect.
Many SEOs mistakenly believe that the 404 response code is an error that needs fixing. A 404 is something that needs fixing only if the URL is broken and is supposed to point to a different URL that is live with actual content.
But in the case of a URL for a web page that is gone and is likely never returning because it has not been replaced by other content, a 404 response is the correct one. If the content has been replaced or superseded by another web page, then it’s proper in that case to redirect the old URL to the URL where the replacement content exists.
The point of all this is that, to Google, a soft 404 is a critical error. That means that SEOs who try to fix a non-error event like a 404 response by redirecting the URL to the home page are actually creating a critical error by doing so.
Suzuki noted what Illyes said:
“A page that returns a 200 OK status code but displays an error message or has very thin/empty main content is considered a “soft 404.” Google actively identifies and de-prioritizes these pages as they waste crawl budget and provide a poor user experience. Illyes shared that for years, Google’s own documentation page about soft 404s was flagged as a soft 404 by its own systems and couldn’t be indexed.”
Related: Google Warns Of Soft 404 Errors And Their Impact On SEO
Takeaways
- Main Content
Google gives priority to the main content portion of a given web page. Although Gary Illyes didn’t mention it, it may be helpful to use semantic HTML to clearly outline what parts of the page are the main content and which parts are not. - Google Tokenizes Content For Indexing
Google’s use of tokenization enables semantic understanding of queries and content. The importance for SEO is that Google no longer relies heavily on exact-match keywords, which frees publishers and SEOs to focus on writing about topics (not keywords) from the point of view of how they are helpful to users. - Soft 404s Are A Critical Error
Soft 404s are commonly thought of as something to avoid, but they’re not generally understood as a critical error that can negatively impact the crawl budget. This elevates the importance of avoiding soft 404s.
See also: How Bing AI Search Uses Website Content
Featured Image by Shutterstock/Krakenimages.com