Offering PDFs (ebooks, guides, or anything else) can be great sources of lead magnets for building your audience. But that said, you wouldn’t want these valuable resources to be available through Google search for people to find them easily, against downloading them through your forms.
And the same goes for websites in other segments, documents stored in PDF shouldn’t be available in search results, unless otherwise, you intend to do.
While we are familiar with preventing your pages from getting indexed in search results using a Robots Meta tag, the same approach will not work for media files and PDF files.
So in this knowledgebase article, we will discuss about preventing your PDF files from showing up in search results.
Why Noindex Robots Meta Directive Cannot Be Added to PDF Files?
As we know, the Robots Meta tag is added to the head of the page’s HTML code. With Rank Math, you can easily add it to your posts, pages, archive pages, and even to your attachment pages, because, they’re all HTML.
But, PDF is not an HTML file and does not contain any HTML code, hence you cannot add a NoIndex Robots Meta tag to it.
If that’s even possible by any means — when your PDF files are accessed, they are not served through WordPress, but instead directly by your server, and your WordPress plugins would not have control over it.
Hence for these reasons, Rank Math (or any other SEO plugin) cannot add a Robots Meta directive to a PDF file.
But that said, if your PDF files are already indexed in search results, or you want to prevent your PDF files from being crawled, the following methods will help you to do so.
How to Prevent PDF Files from being Crawled?
Raise a Search Removal Request
Raising a removal request in Google Search Console is perhaps the quickest way to get your PDF files dropped from the index, and to raise a request, follow the exact steps we’ve discussed below.
1 Open Search Console
If you’ve multiple accounts connected with your Google Search Console, then make sure to choose the correct property.
2 Navigate to Removals Section
From the left sidebar, click the Removals section.
3 Choose New Request
On the Removals page, click New Request.
4 Submit New Request
In the popup that appears on the screen, enter the PDF URL that you want to remove and then click Next.
In the next screen, confirm removing the URL by clicking the Submit Request button.
Once the request has been submitted, you can check the removal status in the Google Search Console.
Note: This removal from the search index is only temporary for a set time period. As Google states, the content will not be indexed for six months. But to keep this PDF permanently out of Google’s search index, you’d need to block search engines from crawling the PDF using robots.txt.
Disallow PDF Files Using robots.txt
By using robots.txt, you can prevent your PDF file from being crawled. However, this will only prevent your page from being crawled and not indexing.
Only when someone links to your PDF file from an external site, search engines will go ahead to index your file. Unless someone explicitly links, Google will not crawl your PDF and index your file. Now, let’s look at how to prevent your PDF from being crawled using robots.txt.
1 Navigate to Your Robots.txt
Head over to WordPress Dashboard > Rank Math > General Settings > Edit Robots.txt. If the Edit Robots.txt option isn’t available for you, then ensure you’re using the Advanced Mode in Rank Math.
2 Edit Robots.txt
And then copy and paste the following code in the code editor. Make sure to replace
yoursite.com with your domain name. The
Disallow: *.pdf rule added here will instruct search engines not to crawl PDF files.
User-agent: * Disallow: /wp-admin/ Disallow: *.pdf Allow: /wp-admin/admin-ajax.php Sitemap: https://yoursite.com/sitemap_index.xml
3 Save Changes
Now click the Save Changes button at the bottom of the screen to reflect the changes. You can further test and confirm using robots.txt Tester to see if Googlebot is able to crawl and access your PDF file.
And, that’s it! We hope the above methods helped you prevent your PDF files from being crawled and get them dropped out of the search index. If you still have absolutely any questions about indexing your content, please feel free to reach our support team directly from here, and we’re always here to help.