What is Crawl Budget?
The crawl budget is the maximum number of webpages a search engine will crawl on a site within a period.
Frequent crawls by search engine crawler bots can reduce your server’s performance or even overload it and cause your site to go offline. To prevent this from happening, search engines estimate your server capacity and crawling requirements and assign a crawl budget to your site.
Crawl budget is not a ranking factor, and Google says most bloggers do not need to worry about their crawl budget. Google is probably crawling your posts normally if you have less than a thousand URLs or if Google crawls your content the same day they are published.
Crawl budget is typically only an issue for large sites with thousands of URLs and sites that autogenerate pages using URL parameters.
Factors That Affect the Crawl Budget
You do not have any control over the crawl budget Google assigns to your site. Instead, Google uses two factors to determine the crawl budget to assign to a site. They are:
- Crawl rate limit
- Crawl demand
1 Crawl Rate Limit
The crawl rate limit is the maximum number of pages that Googlebot will try to crawl on a site within a given timeframe. Google specifies a crawl rate limit because crawling uses up a site’s resources, and excessive crawling will hurt the user experience of other visitors.
The crawl rate limit is dependent on the server’s performance. Google can increase the crawl rate limit if the server responds quickly. Similarly, Google will reduce the crawl rate limit if the site responds slowly.
2 Crawl Demand
The crawl demand refers to how frequently a site needs to be crawled. Sites that publish large amounts of content will have a high crawl demand, while those that publish less content will have a lower crawl demand.
Google will also crawl popular webpages more often than other pages, so such webpages will have a high crawl demand even if the site does not publish a lot of content.
Google also wants to prevent results from becoming stale, so it will crawl certain content more often if it determines they are becoming stale. The crawl demand can also increase under certain circumstances. For example, when you move your content, create large-scale redirects, or change your URL structure.
How to Reduce Your Crawl Budget
In certain cases, you may want search engines to reduce the rate at which they crawl your site. In such cases, you can include a crawl delay rule in your robots.txt file. However, while this will reduce the rate at which search engines like Bing and Yahoo crawl your site, it has no effect on Google and Yandex.
In the case of Google, Google recommends that you return an HTTP 500 Errore interno del server, 503 Service Unavailable, or 429 Too Many Requests response codes from multiple pages on your site.
You should only use this method to reduce your crawl budget between a few hours and two days at most. If you want to reduce your crawl budget for longer periods, Google recommends you file a report and request to reduce the rate it crawls your site.
How to Increase Your Crawl Budget
You cannot instruct Google to increase your crawl budget. However, you can improve the signals that Google uses to determine your crawl rate limit. You can also manage your crawl demand and ensure that Google only crawls helpful content you want on search results pages.
1 Increase Your Server Capacity
Google assesses your crawl rate limit by observing your server performance. It reduces your crawl rate limit when your server returns too many errors and increases it when it returns fewer errors. So, improve your server capacity and ensure your site loads fast, is always accessible, and returns very little server error.
2 Prevent Google From Crawling Unhelpful Content
You should prevent Google from indexing your unimportant, unhelpful, and duplicate pages. You can do this by adding the noindex tag to such pages. You should also update your robots.txt files with rules that disallow Google from crawling those URLs.
You should also specify a URL canonico so that Google knows the important page among a set of duplicate pages. You should also create an XML sitemap and populate it with your important pages. Do not include pages you do not want Google to crawl and index in the sitemap.
3 Ensure Your Pages Are Accessible
Google uses up resources trying to crawl unavailable pages. The unavailability of these pages may also signal to Google that your site is not well maintained or you do not need that much crawl demand. This could cause Google to reduce your crawl demand and budget.
To prevent such, ensure your URLs do not return soft 404 or 404 non trovato errors. Your URLs should not have excessively long redirect chains either, as Google will stop following your redirects after a while.