Search engines use “crawlers” to discover pages on websites. These pages are then scored by hundreds of signals to determine if they should be “indexed” and where they should be “ranked” in the index for a given keyword.
Most technical SEO audits don’t analyze your website’s crawl, as off the shelf SEO tools don’t gather the proper data. This is a huge mistake, particularly for larger websites.
Mismanaging your website’s crawl can lead to:
This webinar will explore the importance of managing your website’s crawl for increased visibility in search engines.
“Recently, we’ve heard a number of definitions for ‘crawl budget’, however, we don’t have a single term that would describe everything that ‘crawl budget’ stands for externally.”
He goes onto break the budget itself into three components:
The number of simultaneous connections Googlebot may use and also the time it waits between each crawl.
If a site responds quickly to Googlebot, the bot will see the site as healthy and can increase the crawl rate. Conversely, if the site responds poorly, the crawl rate can go down.
If a site doesn’t have much content, or new content being added, or doesn’t have content linked to, the demand for Google to crawl that site reduces.
You can hopefully see there are some things we can have some influence over here and that some of the activity you are doing already has a bigger impact on your ability to rank you than thought.
We can use Google Search Console (or Bing Webmaster Tools) to get the data we need.
Just log in to Search Console > Domain > Crawl > Crawl Stats to see the number of pages crawled per day.
Google doesn’t give any guidance on what these reports mean, so let’s take a look at 3 client websites to give some context:
The website above is experiencing a steady, healthy crawl from the Googlebot. The statistics to the right show the variance depending on daily crawls. These variances are normal, there’s generally a large swing between “high” and “low”.
The website above experience a drastic drop in pages crawled after they misused a directive in their Robots.txt file. The directive told search engines NOT to crawl a large number of pages on their website, causing a sudden drop in pages crawled. In this instance, this was a bad move, as the pages they blocked had value to searchers.
The website above is experiencing an increase in pages crawled per day due to an influx of inbound links from authority websites (causing Googlebot to visit the site more). You could also see an increase in pages crawled by passing equity to pages using internal links OR publishing more content on your website.
Any of the following factors could have a negative impact on your crawl budget.
For larger sites, this is a way to filter and display certain results. While great for users – for Googlebot not so much. This creates lots of duplicate content issues, sends Google the wrong message, and can mean Googlebot crawls pages we don’t want it to.
This can occur pretty easily on custom built eCommerce websites. For example:
This can easily happen on custom built eCommerce websites – the solution is to use canonicalization or dynamic url parameters.
Having Google see a 200 (OK) status returned for a page that doesn’t exist and that should be a 404 error. Google doesn’t want to waste time crawling these pages and over time, this can negatively impact how often Google visits your site.
You can check for soft errors in Google Search Console Crawl Errors report.
These happen when links go on and on. Google gives the example of a calendar with a “Next Month” link. Googlebot could theoretically follow those links forever.
Google wants to show users the best search results possible. Having a lot of low quality, thin or spammy content will cause Google to lose trust in your website.
The best way to audit your site crawl budget is to analyze your log files. Your server logs should be readily available but can normally be accessed via cPanel, Plesk or FTP.
Downloading your log files you can use a tool such as botify which can help you understand where Googlebot is spending most of its time on your site.
You are looking to find out:
Once you have identified these pages and errors, we can start to look at increasing our crawl budget.
There are a number of things you can do to increase your crawl budget and are both onsite and offsite elements. Make sure you check all of these methods to increase your crawl budget.
The most common website errors are:
Redirect chains stop the flow of link equity around a site and dramatically reduces Google crawl rates.
Matt Cutts in one of his webmaster help videos back in 2011 recommended 1-2 hops as ideal, three if needed, but suggested that once you got into the 4-6 range, the success rate of Googlebot following those redirect hops was very low.
Make sure you are using your robots.txt file to block parts of your site you don’t need Google to waste its crawl budget on.
These could include:
Check out our full guide on using your Robots.txt file.
As Google touched on above, site speed is a factor in your crawl health – the faster your site, the better crawl health you will have and the more Google will crawl your site. You can see that Google has highlighted this as a major reason for reduced crawl budget.
Some of the major elements to investigate will be:
For those with eCommerce platforms that have dynamic URLs, Googlebot treats any dynamic URLs that lead to the same page as separate pages. This means you may be wasting your crawl budget by having Google to crawl the same page over and over as it thinks it is a different page.
Backlinks not only help to improve your organic rankings, they push search crawlers to your site more often.
Based on our data (average of FTF client data)
Check out our list of link strategies that work in 2020.
While this is a part of technical SEO, it is something that gets overlooked by lots of SEOs. Crawl budgets as you can now see are a big part in making sure your site gets the crawl it needs.
If you don’t have the right amount of budget or are using it up sending Google to the wrong pages, your site will never drive you the traffic it should. As you can see from the above, this stuff works. And it can be a great way to unblock a site’s ability to drive organic search traffic.
Our data driven approach to keyword research.