1. Home
  2. Knowledge Base
  3. SEO Suite
  4. Can I NoIndex PDF Files Using an SEO Plugin?

Can I NoIndex PDF Files Using an SEO Plugin?

Offering PDFs (ebooks, guides, or anything else) can be great sources of lead magnets for building your audience. But that said, you wouldn’t want these valuable resources to be available through Google search for people to find them easily, against downloading them through your forms.

And the same goes for websites in other segments; documents stored in PDF shouldn’t be available in search results unless otherwise, you intend to do.

While we are familiar with preventing your pages from getting indexed in search results using a Robots Meta tag, the same approach will not work for media files and PDF files.

So in this knowledgebase article, we will discuss about preventing your PDF files from showing up in search results.

1 Why Noindex Robots Meta Directive Cannot Be Added to PDF Files?

The Robots Meta tag is added to the head of the page’s HTML code. With Rank Math, you can easily add it to your posts, pages, archive pages, and even your attachment pages because they’re all HTML.

But, PDF is not an HTML file and does not contain any HTML code; hence you cannot add a NoIndex Robots Meta tag to it.

If that’s even possible by any means — when your PDF files are accessed, they are not served through WordPress but directly by your server, and your WordPress plugins would not have control over it.

Hence, Rank Math (or any other SEO plugin) cannot add a Robots Meta directive to a PDF file.

But with that said, if your PDF files are already indexed in search results, or you want to prevent your PDF files from being crawled, the following methods will help you to do so.

2 How to Prevent PDF Files from Being Crawled?

2.1 Raise a Search Removal Request

Raising a removal request in Google Search Console is perhaps the quickest way to get your PDF files dropped from the index, and to raise a request, follow the exact steps we’ve discussed below.

2.1.1 Open Search Console

Head over to Google Search Console and then log in to your account. If you haven’t already connected and verified your website in Google Search Console, then you can let Rank Math do that for you.

Choose the correct property if you’ve multiple accounts connected with your Google Search Console.

Open Google Search Console

2.1.2 Navigate to Removals Section

From the left sidebar, click the Removals section.

removals section

2.1.3 Choose New Request

On the Removals page, click New Request.

new request

2.1.4 Submit New Request

In the popup that appears on the screen, enter the PDF URL that you want to remove and then click Next.

Submit new request for removal in Google Search Console

In the next screen, confirm removing the URL by clicking the Submit Request button.

Submit Request to remove URL from Google Search Console

Once the request has been submitted, you can check the removal status in the Google Search Console.

Temporary removal status in Google Search Console

Note: This removal from the search index is only temporary for a set time period. As Google states, the content will not be indexed for six months. But to keep this PDF permanently out of Google’s search index, you’d need to block search engines from crawling the PDF using robots.txt.

2.2 Disallow PDF Files Using robots.txt

By using robots.txt, you can prevent your PDF file from being crawled. However, this will only prevent your page from being crawled and not indexing.

Only when someone links to your PDF file from an external site search engines will go ahead to index your file. Unless someone explicitly links, Google will not crawl your PDF and index your file. Now, let’s look at how to prevent your PDF from being crawled using robots.txt.

2.2.1 Navigate to Your Robots.txt

Head over to WordPress Dashboard > Rank Math > General Settings > Edit Robots.txt. If the Edit Robots.txt option isn’t available, then ensure you’re using the Advanced Mode in Rank Math.

edit robots.txt

2.2.2 Edit Robots.txt

And then, copy and paste the following code into the code editor. Make sure to replace yoursite.com with your domain name. The Disallow: *.pdf rule added here will instruct search engines not to crawl PDF files.

User-agent: *
Disallow: /wp-admin/
Disallow: *.pdf
Allow: /wp-admin/admin-ajax.php

Sitemap: https://yoursite.com/sitemap_index.xml
Paste code in robots.txt to disallow PDF files

2.2.3 Save Changes

Now click the Save Changes button at the bottom of the screen to reflect the changes. You can test and confirm using robots.txt Tester to see if Googlebot can crawl and access your PDF file.

Save Changes to robots.txt

2.3 Disallow PDF Files Using X-Robots-Tag

X-robots-tag is another method to prevent non-HTML files like PDFs from indexing. It is more complicated rather than the Robot Meta tag. 

To prevent PDF files using x-robots-tag, you’ll need to modify your website’s htaccess file, header.php, or need server-level modification.

If you’re using an Apache server, then add the following x-robots-tag in your website’s httpd.config file or htaccess file.

<Files ~ "\.pdf$">
Header set X-Robots-Tag "noindex, follow"
</Files>

And for the Ngnix server, add the following x-robots-tag.

location ~* \.pdf$ 
{
add_header X-Robots-Tag "noindex, follow";
}

Note: Make sure to check Google’s guidelines on x-robots-tag before making any modification to your website’s file.

And that’s it! We hope the above methods helped you prevent your PDF files from being crawled and get them dropped out of the search index. If you still have absolutely any questions about indexing your content, please feel free to reach our support team directly from here, and we’re always here to help.

Was this article helpful?

Still not using Rank Math?

Setup takes less than 5 minutes including the import from your old SEO Plugin!

Learn more about the PRO Version

Still need help?

?

Submit Your Question

Please give us the details, our support team will get back to you.

Open Ticket

Related Articles