Google has thousands of servers engineered to crawl the internet. When Google bot lands on a website it starts to go through every single page – but this does not mean it will crawl the entire website. This is because of what’s known as the crawl budget
Crawl budget is the number of links on a website crawled by Google bot. How a crawl budget is allocated per website is based on several factors. Here are some crawl budget optimization tips courtesy of the Infinitypp experts:
1. PREVENT INFINITE CRAWL ON PAGINATED DATA
If you are displaying paginated data, you should prevent the Google crawler from visiting pages that return no result with a 200 header. For example:
If the last page number of a search result &page=20, if Google tries to crawl anything beyond 20 your website should display a 404 header status
3. Reduce the number of URLS
As noted by Google ?any URL on a page is counted in the crawl budget. When a crawler downloads the content of a page, it can find out the number of URLS. The number of URL’s are counted for the website’s crawl budget.
Here are few tips to reduce the number of URL’s
- Find out the number of links you have on a single page using a link counter tool. Try reducing it. The more you reduce the more budget you save
4. Increase the website speed
When Google crawlers realize that the response time is gradually decreasing after a certain number of crawls, it will decrease the number of crawls. This is to reduce the impact on the website. The last thing Google want’s is to be responsible for causing a website to be blow
5. Reduce 500 errors code
50x error codes mean’s there problem is from the server side. This is a bad indication, Google crawlers prefer to crawl healthy websites which return a 200 header code. Go through your server logs or use a service such as New Relic to discover the 500 error codes.
6. Submit an XML sitemap
Google loves sitemap. Submitting a sitemap is very easy, all you have to is to submit to the Google webmaster page. If you have submitted a sitemap and you are finding that not enough links are being crawled make sure that you have included the “last modified date” and “change frequency”. These two are important, it gives Google bot a hint on often it should visit these links.
7. View the access log
The access log is a file which includes a list of links or files that a web user has requested from the website. An access log includes the name of the crawler, IP address, and much more information. By viewing the access log you could come to know on which page does the Google crawler initiates its process and as well the pages it crawls and frequency. Apache Logs Views is a very handy tool to analyze access logs.
8. Avoid Redirect Chains
When a link has a redirect chain, it wastes crawl budget. i:e a 301 then a 302 redirect.#
9. Reduce 404 pages
Pages that return 404 result consume your crawl budget. 404 pages are not harmfull but it is best to not have any.
10. Keep track of Googlebot activity
Google webmaster has few pages dedicated to Crawl only. One of the most important pages is the Crawl Stats. As SEO professionals we would need to know the number of pages Googlebot crawls on average, the days which crawled the least (why it crawled the least – was it because of a server issue? Low Apdex ?score ? ) and the time spent downloading a page. Try to decrease the time spent downloading a page to the least! The faster the better. Googlebot loves fast websites since it has to use few resources.