1. Home
  2. Knowledge Base
  3. SEO Suite
  4. Can I NoIndex PDF Files Using an SEO Plugin?

Can I NoIndex PDF Files Using an SEO Plugin?

Offering PDFs (ebooks, guides, or anything else) can be great sources of lead magnets for building your audience. But that said, you wouldn’t want these valuable resources to be available through Google search for people to find them easily, against downloading them through your forms.

And the same goes for websites in other segments, documents stored in PDF shouldn’t be available in search results, unless otherwise, you intend to do.

While we are familiar with preventing your pages from getting indexed in search results using a Robots Meta tag, the same approach will not work for media files and PDF files.

So in this knowledgebase article, we will discuss about preventing your PDF files from showing up in search results.

Why Noindex Robots Meta Directive Cannot Be Added to PDF Files?

As we know, the Robots Meta tag is added to the head of the page’s HTML code. With Rank Math, you can easily add it to your posts, pages, archive pages, and even to your attachment pages, because, they’re all HTML.

But, PDF is not an HTML file and does not contain any HTML code, hence you cannot add a NoIndex Robots Meta tag to it.

If that’s even possible by any means — when your PDF files are accessed, they are not served through WordPress, but instead directly by your server, and your WordPress plugins would not have control over it.

Hence for these reasons, Rank Math (or any other SEO plugin) cannot add a Robots Meta directive to a PDF file.

But that said, if your PDF files are already indexed in search results, or you want to prevent your PDF files from being crawled, the following methods will help you to do so.

How to Prevent PDF Files from being Crawled?

Raise a Search Removal Request

Raising a removal request in Google Search Console is perhaps the quickest way to get your PDF files dropped from the index, and to raise a request, follow the exact steps we’ve discussed below.

1 Open Search Console

Head over to Google Search Console and then log in to your account. If you haven’t already connected and verified your website in Google Search Console, then you can let Rank Math do that for you.

If you’ve multiple accounts connected with your Google Search Console, then make sure to choose the correct property.

Open Google Search Console

2 Navigate to Removals Section

From the left sidebar, click the Removals section.

3 Choose New Request

On the Removals page, click New Request.

4 Submit New Request

In the popup that appears on the screen, enter the PDF URL that you want to remove and then click Next.

Submit new request for removal in Google Search Console

In the next screen, confirm removing the URL by clicking the Submit Request button.

Submit Request to remove URL from Google Search Console

Once the request has been submitted, you can check the removal status in the Google Search Console.

Temporary removal status in Google Search Console

Note: This removal from the search index is only temporary for a set time period. As Google states, the content will not be indexed for six months. But to keep this PDF permanently out of Google’s search index, you’d need to block search engines from crawling the PDF using robots.txt.

Disallow PDF Files Using robots.txt

By using robots.txt, you can prevent your PDF file from being crawled. However, this will only prevent your page from being crawled and not indexing.

Only when someone links to your PDF file from an external site, search engines will go ahead to index your file. Unless someone explicitly links, Google will not crawl your PDF and index your file. Now, let’s look at how to prevent your PDF from being crawled using robots.txt.

1 Navigate to Your Robots.txt

Head over to WordPress Dashboard > Rank Math > General Settings > Edit Robots.txt. If the Edit Robots.txt option isn’t available for you, then ensure you’re using the Advanced Mode in Rank Math.

2 Edit Robots.txt

And then copy and paste the following code in the code editor. Make sure to replace yoursite.com with your domain name. The Disallow: *.pdf rule added here will instruct search engines not to crawl PDF files.

User-agent: *
Disallow: /wp-admin/
Disallow: *.pdf
Allow: /wp-admin/admin-ajax.php

Sitemap: https://yoursite.com/sitemap_index.xml
Paste code in robots.txt to disallow PDF files

3 Save Changes

Now click the Save Changes button at the bottom of the screen to reflect the changes. You can further test and confirm using robots.txt Tester to see if Googlebot is able to crawl and access your PDF file.

Save Changes to robots.txt

And, that’s it! We hope the above methods helped you prevent your PDF files from being crawled and get them dropped out of the search index. If you still have absolutely any questions about indexing your content, please feel free to reach our support team directly from here, and we’re always here to help.

Was this article helpful?

Still not using Rank Math?

Setup takes less than 5 minutes including the import from your old SEO Plugin!

Learn more about the PRO Version

Still need help?


Submit Your Question

Please give us the details, our support team will get back to you.

Open Ticket

Related Articles