🎉 Celebrating 3 Million+ Users!

TIJDELIJKE AANBIEDING!

What is Googlebot?

Googlebot is the web crawler software that Google uses to discover, browse, and collect information about webpages. Google then uses this information to build and update its search index.

Google uses a lot of crawlers to crawl the web. However, the term ‘Googlebot’ is specific to two of those web crawlers:

  • Googlebot Desktop 
  • Googlebot Smartphone

As their names imply, Googlebot Desktop imitates a person visiting the site with a desktop browser, while Googlebot Smartphone imitates a person visiting the site on a smartphone browser.

However, both crawlers identify as Googlebot and use the ‘Googlebot’ user agent token. Google will crawl your site with both crawlers, though you will likely receive more visits from Googlebot Smartphone. 

Google also has other crawlers that could crawl a site using the ‘Googlebot’ user agent token. These crawlers are:

  • Googlebot Video — Used for crawling videos
  • Googlebot Image — Used for crawling images
  • Googlebot News — Used for crawling news articles for Google News
  • Google-InspectionTool — Used by the Rich Results Test tool and the URL inspection tool in Google Search Console

How Googlebot Works

The Googlebot crawling process can be divided into two distinct processes: URL discovery and fetching. We will now discuss them below. 

1 URL Discovery

URL discovery refers to the process wherein Googlebot finds new and existing URLs online. Google discovers URLs using three methods:

  • URLs it has previously visited
  • URLs on the pages it previously visited
  • URLs in the sitemap you submitted to Google

Googlebot typically crawls the web using a list of URLs generated from previous crawls. While crawling a page, it notes the URLs on it and adds them to its list of known pages. 

Googlebot may also discover new URLs using the sitemap you submitted through the Google Search Console. However, Googlebot can find your pages without the sitemap if there is a link pointing to them from another page it has discovered.

2 Fetching

Googlebot does not crawl all the URLs it discovers. However, if it decides to crawl a URL, it will proceed to crawl the URL if this is its first time seeing it. 

Otherwise, if it has previously crawled the page, it would review several signals to determine if it has been modified since its last visit. If the page has been modified, then Googlebot may crawl it again. 

Google will render the page during the crawl. That is, it compiles and runs the page’s HTML, CSS, and JavaScript files. That way, it can see the page the same way a human visitor sees it. 

Without compiling the page, some JavaScript files on the site would not run, and Googlebot would not be able to see every piece of content on the page. 

The process of visiting the site and compiling its data is called fetching. Googlebot uses algorithms to determine what URLs to crawl, how often it should crawl them, and how many pages it should fetch from a site.

After fetching the page, Googlebot then sends it to the indexing systems for indexing.  

You should know that Googlebot only crawls the first 15 megabytes of each of the HTML, CSS, and JavaScript files on a page. If a file is over 15 MB, Googlebot will cease crawling it and send the 15 MB it crawled for indexing. 

List of Googlebot User Agents

Googlebot user agents are the pieces of information that allow a server to identify the Googlebot crawling the site. The string contains two sets of information, namely:

  • User agent token
  • User agent string

The user agent token allows you to identify the Googlebot crawling your site. You can use it to write crawl rules specific to that Googlebot. 

The user agent string, on the other hand, further identifies the Googlebot and provides more information about its browser, version, operating system, and machine. 

We will now provide you with the user agent token and user agent strings of the crawlers that identify as Googlebot. 

1 Googlebot Smartphone

The user agent token of Googlebot Smartphone is Googlebot. The user agent string of Googlebot Smartphone is listed below.

Opmerking: Googlebot uses the latest version of the Chrome browser to crawl your site. Since the browser is frequently updated, W.X.Y.Z is a placeholder for the version number of the Chrome browser used for the crawl.

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

2 Googlebot Desktop

The user agent token of Googlebot Desktop is Googlebot. Its user agent string is listed below. W.X.Y.Z is used as a placeholder for the version number of the Chrome browser used to fetch the page.

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z Safari/537.36

While rare, Googlebot Desktop may also use any of the user agent strings below. 

Googlebot/2.1 (+http://www.google.com/bot.html)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

3 Googlebot News

The user agent tokens of Google News are Googlebot en Googlebot-News. Its user agent string is the same as that of Googlebot Desktop. 

4 Googlebot Image

The user agent tokens of Googlebot Image are Googlebot en Googlebot-Image. Googlebot will use one of the two user agent tokens when it crawls the images on your site.

The user agent string of Googlebot Image is listed below:

Googlebot-Image/1.0

5 Googlebot Video

The user agent tokens of Googlebot Video are Googlebot en Googlebot-Video. Its user agent string is listed below:

Googlebot-Video/1.0

6 Google-InspectionTool

The user agent tokens of the Google-InspectionTool are Googlebot en Google-InspectionTool. However, it uses different user agent strings for its mobile and desktop crawlers.

On mobile, Google-InspectionTool uses the user agent string listed below. W.X.Y.Z is a placeholder for the version number of the Chrome browser used to fetch the page.

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0;)

On desktops, the Google-InspectionTool uses the user agent string listed below:

Mozilla/5.0 (compatible; Google-InspectionTool/1.0;)

How to Verify Googlebot

Some webmasters and developers create web crawlers that pretend to be Googlebot. These crawlers will use the Googlebot user agent token (Googlebot) when accessing your site. 

These crawlers use the Googlebot user agent token to bypass robots.txt rules that block other crawlers but allow Googlebot. Some also pretend to be Googlebot to avoid detection, scrutiny, and potential blocking from sites they crawl.

Fortunately, you can easily verify whether a bot is really Googlebot by comparing its IP address to the list of IP addresses used by Googlebot. If the so-called Googlebot has a different IP from the one listed on that page, then it is not Googlebot. 

Talking of IP addresses, Google often crawls sites from machines located close to the servers hosting the site. So, it is normal to receive visits from multiple Googlebot crawlers with different IP addresses. 

However, Googlebot typically crawls sites from IP addresses based in the United States. If a site has blocked US-based IP addresses, Googlebot will attempt to crawl the site using an IP address in another country. 

How to Block Googlebot From Crawling Your Pages

You can block Googlebot from crawling specific pages on your site by adding the robots meta tag below to the head tag of the HTML code of the page. 

Optionally, if you have Rank Math, you can refer to this article on setting your posts and pages to noindex.

<meta name="robots" content="noindex, nofollow">

How to Reduce Your Google Crawl Rate

Googlebot typically visits a site once every few seconds, depending upon your site’s crawl budget. However, this may be too much for some sites. In this case, you may want to reduce Googlebot’s crawl rate, as excessive crawling could overload your server, causing it to slow down or crash. 

If you want to reduce the rate at which Googlebot crawls your site, you should: 

  • Return a client or server error status code
  • Contact Google to reduce your crawl rate

1 Return a Client or Server Error Status Code

You can reduce your crawl rate by returning an HTTP 500, 503, or 429 status code from multiple URLs on your site.

The HTTP 500 and 503 status codes indicate a server error, while the 429 indicates a client error. Google will reduce the rate at which it crawls your site once it sees multiple pages return one of these status codes.

However, this should be used as a temporary measure that lasts no longer than two days, else Google may remove the page from its index.

2 Contact Google to Reduce Your Crawl Rate

You can reduce your crawl rate by filing an overcrawling report with Google. Provide Google with some details about the Googlebot crawler, and Google will take it from there. 

Tips to Ensure Googlebot Can Crawl Your Site

If you run a site or publish content to the web, you will most likely want Googlebot to crawl your URLs. In that case, follow the below tips to ensure that Googlebot can always crawl your site and pages. 

1 Don’t Disallow Googlebot in Your robots.txt File

You should ensure that your robots.txt file does not prevent Googlebot from crawling your site. So, review your robots.txt file and confirm it does not contain the rule below. If it does, remove it. 

User-agent: Googlebot
Disallow: /

2 Don’t Block Indexing Using a Robots Meta Tag

If Google is unable to crawl or index a page, review its HTML code to ensure you have not set it to noindex. If you have, you should remove it so Googlebot can crawl the page. 

3 Use Internal Links on Your Site

Googlebot finds your URL by following the URLs on the previously crawled pages. So, use internal links on your site so that Googlebot can find the new and uncrawled pages on your site. 

4 Submit Your Sitemaps to Google

You should create a sitemap and submit it to Google using the Google Search Console.

While Googlebot can find your pages without a sitemap, submitting one may allow it to find pages quicker. You can refer to this guide on submitting your sitemaps to Google. 

🇳🇱 Nederlands