Cyber Monday Friday Offer & $4,000+ In Bonuses!

Know more.
  1. Home
  2. Knowledge Base
  3. How to Fix the “Indexed, though blocked by robots.txt” Error

How to Fix the “Indexed, though blocked by robots.txt” Error

If you’ve received an email alert or happen to notice the warning ‘Indexed, though blocked by robots.txt in your Google Search Console as shown below, then in this knowledgebase article, we’ll show you how you can troubleshoot the warning and fix it.

Indexed, though blocked by robots.txt error in Google Search Console

1 What Does the Error ‘Indexed, Though Blocked by Robots.Txt’ Mean?

The error simply means,

  • Google has found your page and indexed in search results.
  • But then, it has also found a rule in robots.txt that instructs to ignore the page from crawling.

Now that Google is confused about whether to index the page or not, it simply throws a warning in Google Search Console. So that you can look into this and choose a plan of action.

When you’ve blocked the page with the intention to prevent the page from getting indexed, you need to be aware that — although Google respects the robots.txt in most cases, that alone cannot prevent the page from getting indexed. There could be a multitude of reasons, like an external site linking to your blocked page and eventually leading Google to index the page with the little information available.

On the other hand, if the page is supposed to be indexed, but accidentally got blocked by robots.txt, then you should unblock the page from robots.txt to ensure Google’s crawlers are able to access the page.

Now you get the basic ideology behind this warning, the practical causes behind this could be plenty considering the CMS and the technical implementation. Hence, we would go through a comprehensive way to debug and fix this warning in this article.

2 Export the Report from Google Search Console

For small websites, you might have only a handful of URLs under this warning. But most complex websites and eCommerce sites, are bound to have hundreds or even thousands of URLs. While it is not feasible to use GSC to go through all the links, you can export the report from Google Search Console to open it with a spreadsheet editor.

To export, simply click the warning that would be available under Google Search Console Dashboard > Coverage > Valid With Warnings.

Open the warning - Indexed, though blocked by robots.txt in Google Search Console

And on the next page, you’ll be able to export all the URLs pertaining to this warning by clicking the Export option available in the top-right corner. From the list of export options, you can choose to download and open the file with a spreadsheet editor of your choice.

Export coverage errors in Google Search Console

Now that you’ve exported the URLs, the very first thing you should figure out from looking at these URLs is – whether the page should be indexed or not indexed. And the course of action would only depend on your answer.

3 Pages to Be Indexed

If you figure out the page is supposed to be Indexed, then you should test your robots.txt and identify if there is any rule preventing the page from being crawled by Googlebot.

To debug your robots.txt file, you can follow the exact steps discussed below.

3.1 Open robots.txt Tester

At first, head over to the robots.txt Tester. If your Google Search Console account is linked with more than one website, then select your website from the list of sites shown in the top right corner. Now Google will load your website’s robots.txt file. Here is how it would look like.

Select site in robots.txt Tester

3.2 Enter the URL of Your Site

At the bottom of the tool, you will find the option to enter a URL from your website for testing. Here you will add a URL from the spreadsheet we’ve downloaded earlier.

Enter URL for testing in robots.txt Tester

3.3 Select the User-Agent

From the drop-down available on the right side of the text box, select the user-agent you want to simulate (Googlebot in our case).

Select Googlebot as User-Agent in robots.txt Tester

3.4 Validate Robots.txt

Finally, click the Test button.

The crawler would instantly validate if it has access to the URL based on the robots.txt configuration and accordingly the test button would turn out to ACCEPTED or BLOCKED.

The code editor available at the center of the screen will also highlight the rule in your robots.txt, which is blocking access, as shown below.

Testing robots.txt file using robots.txt Tester

3.5 Edit & Debug

If the robots.txt Tester finds any rule preventing access, you can try editing the rule right inside the code editor and then run through the test once again.

You can also refer to our dedicated knowledgebase article on robots.txt to understand more about the accepted rules, and it would be helpful in editing the rules here.

If you happen to fix the rule, then it’s great. But please note, this is a debugging tool, and any changes you make here will not be reflected on your website’s robots.txt unless you copy & paste the contents to your website’s robots.txt.

3.6 Export Robots.txt

So to add the modified rules in your robots.txt, head over to Rank Math > General Settings > Edit robots.txt inside your WordPress admin area. If this option isn’t available for you, then ensure you’re using the Advanced Mode in Rank Math.

In the code editor that is available in the middle of your screen, paste the code you’ve copied from the robots.txt. Tester and then click the Save Changes button to reflect the changes.

Copy and paste robots.txt content

4 Pages Not to Be Indexed

Well, if you determine the page is not supposed to be indexed, but Google has indexed the page, then it could be one of the reasons we’ve discussed below.

4.1 Noindex Pages Blocked Through Robots.txt

When a page should not be indexed in search results, it should be indicated by a Robots Meta directive and not through a robots.txt rule.

A robots.txt file only contains instructions for crawling. Remember, crawling and indexing are two separate processes.

Preventing a page from being crawled ≠ Preventing a page from being indexed

So to prevent a page from getting indexed, you can add a No Index Robots Meta using Rank Math.

But then, if you add No Index Robots Meta and simultaneously block the search engine from crawling these URLs, technically, you’re not allowing Googlebot to crawl & know the page has got a No Index Robots Meta.

Ideally, you should allow Googlebot to crawl through these pages, and based on the No Index Robots Meta, Google will drop the page from the index.

Note: Use robots.txt only for blocking files (like images, PDF, feeds, etc.) where it isn’t possible to add a No Index Robots Meta.

Pages that you have disallowed through robots.txt might have links from external sites. Then Googlebot will eventually try to index the page.

Since you’ve disallowed the bots from crawling the page, Google would index the page with the limited information available from the linked page.

To resolve this issue, you might consider reaching out to the external site and request to change the link to a more relevant URL on your website.

5 Conclusion — Validate Fix in Google Search Console

Once you’ve fixed the issues with the URLs, head back to the Google Search Console warning and then click the Validate Fix button. Now Google will recrawl these URLs and close the issue if the error is resolved.

Validate fix for Indexed, though blocked by robots.txt warning

And, that’s it! We hope the article helped you fix the error and if you still have absolutely any questions, please feel free to reach our support team directly from here, and we’re always here to help.

Was this article helpful?

Still need help?

?

Submit Your Question

Please give us the details, our support team will get back to you.

Open Ticket