Chapter 2:  Best practices for auditing the HTML that Googlebot crawls today

Each time a web page is requested by an end user, inputs from multiple sources are brought together to create the experience visible in modern web browsers, which we see as a complete webpage. In the second or so that it takes the final rendering to be constructed, there may be hundreds of tasks executed in a very intentional sequence. This blog post is part of an on-going series about SEO. To read all the blogs in this series, click here.

These tasks are executed in two main places:

  • Server-side – the web servers where the web page is stored
  • Client-side – the end user’s web browser and the device upon which is is being used

Web developers determine the sequence that tasks will be executed, and decide which parts are executed server-side and which are executed client-side. The tasks executed client-side typically include images, videos, interactive elements, advertisements, reviews, recommendations, and other services managed by vendors.

Since 1995 (the beginning of SEO time), search engines only read and indexed server-side content and code, preventing them from being able to connect end users with a very significant percentage of the Internet’s content.

In October 2014, Google began indexing complete web pages, including both server-side and client-side content. Bazaarvoice data suggests that Google is now crawling and indexing complete web pages more than 80% of the time.

While this change is significant, Google has not provided any specifics into how the updated technologies work.  Many SEO tools have not yet been updated – even some tools owned and managed by Google.  The SEO industry is learning that historical SEO audit techniques are no longer adequate. View Source, once the go-to functionality for auditing HTML, is no longer sufficient. 

Today, auditors should use Inspect Element functionality within a web browser like Safari or Chrome to see the complete web page that Google now crawls. At Bazaarvoice, Safari has become our preferred browser for SEO audits, because developer and Inspect Element functionality is always built-in. To enable this functionality in Safari, open Safari preferences, click advanced, then click Show Develop menu in menu bar.

Picture1

Using Inspect Element: Important steps to ensure success:

Using Inspect Element is similar to View Source, but a few additional steps are required.

Step 1:  Load the page. If already on the page, reload the page by clicking the reload icon.  It’s very important that you do not scroll or click on anything in the page. Such activities may cause content to be added that is not part of the initial page build.

Step 2:  After the page is reloaded, right-click on an area of whitespace in the page.  From the menu, click Inspect Element.

Step 3:  The Elements tab should be displayed with some HTML code visible.  If another tab is selected, select the Elements tab.

Step 4:  In the HTML code displayed, find the opening <HTML… block of code.  Right click on the <HTML… and then select Copy as HTML.  The HTML that Google reads is now on your clipboard.

Step 5:  Properly indent the HTML, but don’t use a tool that moves things around.  You may have functionality in your favorite text or HTML editor to do this.  If not, we’ve found a free tool called Tabifier (http://tools.arantius.com/tabifier) to work quite well.

Step 6: Paste your properly-indented HTML into your favorite text editor.

Now that you have a cleaned up version of your HTML in your favorite text editor, you can audit your code as you would have previously.  The way Google reacts to this version of your HTML is consistent with how they search engine reacted to the server-side version, prior to October 2014.

When using Google’s Structured Data Testing Tool (https://developers.google.com/structured-data/testing-tool/), paste this version of your HTML into the tool.  Do not use the Fetch URL feature.  Fetch URL retrieves the server-side, View Source version of HTML; it has not yet been upgraded to retrieve HTML using methods that are consistent with Googlebot’s new functionality.

After completing the audit using this version of your HTML, also go back and audit using server-side, View Source HTML.  Bing, Yahoo, and other search engines still use the server-side HTML more than 95% of the time, and Google may still get the server-side version on occasion.  It’s wise to make sure you understand the difference between the versions, and especially to make sure that your structured data, like schema.org markup, validates in both versions.

To learn more about Bazaarvoice solutions to drive traffic through SEO, visit the Spotlights pages of our website here.

  • http://www.elite-strategies.com/blog Patrick Coombe

    “Such activities may cause content to be added that is not part of the initial page build”

    really good point. If you are new, do it both ways. Normally Google will capture the ‘initial page build’ but these days with the DOM / JS that Google can read, who knows.

  • http://www.WTF.com/ FCN

    I noticed the jpeg above of the Safari”inspect preferences” has stop plug-ins ticked off.
    There are so many pages that demand you do NOT block flash and like products or you do not see the full value of the page. As a consumer this p*sses me off-forcing me to stop a running w/sound clip so many times. Adobe seems to be at war with Mozilla as well.Either that or don’t block anything and be forced to deal with scripts and everything being loaded.As an IT person-I prefer a minimalist view and do not desire multiple plug-ins to run while I am just trying to SHOP