Understanding Canonical URLs: The Definitive Guide

Understanding Canonical URLs: The Definitive Guide

96 / 100

The link element rel=”canonical” – often referred to as a canonical URL – is extremely important yet still frequently subject to common misconceptions and incorrectly used even by seasoned SEOs who’ve been in the industry for a while. 

So, without further ado – in this guide, let’s take a close look at what canonical URLs are, how to use them, when to use them & more…

1 What Are Canonical URLs?

Canonical URLs

A canonical URL is a link element that can be used to indicate to search engines that specific URLs are all related to a master page.

In short, they help you specify which version of a URL you would like to appear in search results. This is useful because, in some cases, where you may have content that is accessible via multiple URLs or different websites entirely, you can use canonical URLs to avoid duplicated content from negatively affecting rankings.

Technically speaking, canonical URLs are simply HTML link tags that make use of the rel=canonical attribute. Simply put, here’s how canonical URLs work:

A canonical URL is set by placing what is referred to as a canonical tag onto pages. Canonical tags are just snippets of HTML code that define what the main/master page is for these potentially duplicated pages.

A canonical tag itself is a snippet of HTML code that is used to define what the main versions of duplicated or similar pages are. In a whole range of scenarios (which we’ll cover in this guide), where you have content that is either the same or very similar available under different URLs, canonical tags should be used to specify which version is the main (or master version) and should, therefore, be indexed by search engines (such as Google)…

2 Why Canonical URLs Matter?

Now that you know what canonical tags & URLs are, let’s take a look at why they matter and you should start considering setting different canonical URLs for certain posts and pages on your websites.

Just as you’d expect, Google isn’t a fan of duplicate content mainly because it makes it more difficult for them to rank pages. In other words:

How should Google know which version of a page to index and subsequently rank as well as how to distribute “link equity”?

Too much duplicate content can also affect your “crawl budget.” That means Google may end up wasting time crawling multiple versions of the same page rather than other important content on your website.

IMPORTANT SIDENOTE

Canonical tags actually aren’t new. Although it is possible that some people haven’t come across this concept (until now!), canonical tags were actually introduced all the way back in 2009.

Too much duplicate, as well as too much low-quality content, evidently isn’t good for your website.

Why would you want Google to waste time crawling multiple versions of the same page instead of focusing on the important pages of your website?

If new pages tend to be crawled the same day they’re published, crawl budget is not something webmasters need to focus on. Likewise, if a site has fewer than a few thousand URLs, most of the time it will be crawled efficiently.

Google Webmaster Central Blog (source)

While Google says that this usually isn’t an issue, the use of canonical tags can remedy all of these potential issues because they not only allow you to indicate [to Google] which version of a page should be indexed but also where link equity (colloquially referred to as “link juice”) should be consolidated.

3 Setting Canonical URLs for WordPress Posts & Pages

The Rank Math SEO WordPress plugin makes it easy to change the canonical URL using the meta box (as pictured below).

By default, Rank Math makes use of the current post/page URL as the canonical URLs so you would only need to alter this setting if you wish to change it to something else.

Rank Math Canonical URL WordPress

This is also known as a self-referencing canonical, which we’ll cover later along with all the other scenarios in which canonicalization is beneficial.

4 Setting Canonical URLs Manually (Advanced)

If your website doesn’t currently take advantage of the WordPress content management system which currently powers over 40% of all websites on the internet, including sites like CNN, Bloomberg & more – here’s how you can manually set canonical URLs for pages on your website… 

So, as mentioned earlier – a canonical URL is simply set by using the HTML link attribute rel=”canonical”. Therefore, in order to set one on any page of a website, simply add the following code in the <head> </head> section of a webpage’s HTML source code:

<link rel="canonical" href="https://rankmath.com/about/">

And then, simply replace https://rankmath.com/about/ with the URL that you’d like to set as the canonical URL for the page that you’ve added the above code to. 

5 When Should You Use Canonical URLs?

5.1 301 Redirects vs Using rel=canonical

Not sure if it would make more sense to implement a redirect or make use of canonicalization? The easiest way to put it is: 

If it’s possible to use redirects to fix the problem, use a redirect. However, use canonical URLs if you still want both versions of the page to be accessible (just not in search results) and it simply wouldn’t be possible to use redirects to make that happen. In other words, if a webpage is identical or near-duplicate and serves no additional purpose in being accessible to the internet (i.e. users of your site or search engines), then simply redirect it to whichever you consider a priority. 

And when this isn’t possible because both pages still serve a valid & valuable purpose in being accessible, then use a canonical URL to specify which of the related pages you’d prefer search engines to consider as the original/master page. 

5.2 Do Pages Need a Self-Referencing Canonical URL?

In the image of the Rank Math SEO Meta Box that appeared earlier in this post, we didn’t link another page by inserting a URL but the canonical URL was set to to the current page itself.

It’s strongly recommended to have the rel=canonical link elements on all pages mainly because this has been adopted as a best practice ever since Google confirmed that this is the best way to deal with it.

The potential side-effect of not having self-referencing canonical URLs on pages that point to the plain version of the URL is running into duplicate content errors. That’s why adding a self-referencing canonical to URLs is good practice – and you’ll be pleased to hear that the Rank Math SEO plugin already does this so you don’t have to worry about it.

Most people assume that their website doesn’t have any duplicate content because they obviously haven’t published the same piece of content over and over on purpose. That being said, that isn’t necessarily true because search engines crawl individual URLs, not pages on your website.

Yes, this really means that they would see rankmath.com/blog/seo-audit and rankmath.com/blog/seo-audit?id=123 as unique pages despite being the same actual page with either very similar or exactly the same content.

https://rankmath.com/blog/seo-audit/
https://rankmath.com/blog/seo-audit/?utm_source=active%20users&utm_medium=email
http://rankmath.com/wordpress/seo-plugin/?utm_medium=twitter

URLs with query strings such as the one shown above are known as parameterized URLs and can cause problematic duplicate content issues on websites – especially ones that allow filtering such as eCommerce websites.

And this is why self-referencing canonicals are incredibly useful. People may often link to URLs with queries and UTM parameters – which means that when that happens, Google may start picking up the URL with parameters as the canonical version. Therefore, taking advantage of self-referencing canonicals helps avoid this situation by explicitly specifying which URL you consider the most important or primary version of that page. 

5.3 Cross-Domain Canonical URLs

In the event that you do have the same piece of content on multiple domains, you can also make use of canonicalization. A great example of this is some websites that scrape and reshare content from websites that are not their own – possibly curating articles in a specific niche. If a canonical URL is set to the original source of the content (where it was first published) then any links that point to the second version will count towards the original canonical version’s – increasing the original content’s chances of ranking. 

5.4 Canonicalization for AMP Pages

AMP HTML documents as required to set canonical URLs on all AMP pages that reference the non-AMP equivalent (or self-reference the AMP page if there is no equivalent available). The canonical tag is a mandatory HTML element for AMP content to be considered valid and in cases where it is possible the canonical tag is supposed to point to the original non-AMP version of the content. 

Note: In such scenarios, the original non-AMP version of the content which is used as the canonical URL in the AMP page equivalent cannot be non-indexable itself (by way of any method, including a 301 redirect, another canonical URL, etc.)

This is because this would send conflicting messages to search engines – making it highly likely that the AMP page wouldn’t show up in search results at all. 

TL;DR – The canonical tag is a mandatory element for AMP pages to be considered valid, and the canonical tag is supposed to point back at the original ‘non-AMP‘ version of the page. If the page is standalone AMP, then the canonical should be self-referential.

5.5 Different website versions for different devices

Mobile Website Canonicalization

If you have a scenario with a website that has separate desktop and mobile pages – meaning two versions of the site, such as one at rankmath.com and a mobile version at m.rankmath.com – you should use canonical URLs and rel=alternate to indicate the similarity/relationship between these two pages.

Note: Google is the only search engine to officially support this implementation at this time.

In practice, here’s what this would look like on both the Desktop and Mobile version of a website:

Desktop

On the desktop version of the page the canonical URL and alternate URL in the <head> section look as follows:

<head>
	<link rel="canonical" href="https://rankmath.com/" />
	<link rel="alternate" href="https://m.rankmath.com/" />
</head>

Mobile

While on the mobile version of the page, the canonical URL in the <head> section should appear as follows:

<head>
	<link rel="canonical" href="https://rankmath.com/" />
</head>

That way it’s easier for search engines to understand which version of the page to show for mobile devices and which version to show to Desktop searchers.

6 Common Canonical URL Myths & Misconceptions

Although it has been around for a while, canonicalization is difficult to understand and it’s easy to go wrong.

Here are some examples of common problems that you may run into as you use canonicalization on your websites:

6.1 Not Properly Using Canonicalization on Multilingual Websites

Multilingual websites typically use Hreflang tags to store and display various versions of a webpage based on a user’s geographical location.

When using hreflang tags you should specify a canonical page in the same language, or the best possible substitute language if a canonical doesn’t exist for the same language.

That being said, if you choose not to indicate a canonical URL, Google will identify what they think is the best version or URL.

If you use WordPress as your content management system of choice (which we obviously highly recommend) and serve content to website visitors in multiple languages, we highly recommend making use of the Weglot translation plugin for your website.

We’ve independently verified that they do handle canonicalization as outlined and officially recommended by Google. Another plugin that we can recommend is TranslatePress.

6.2 Canonicalizing Paginated Pages

Google’s John Mueller stated that canonicalizing all paginated pages to the first page in the series is considered improper use of the rel=canonical tag. Page 2 in the series cannot be considered to be equivalent to page 1 so making use of canonicalization in this situation would be incorrect.

5.3 Also Setting the Canonicalized URL to ‘Noindex’

Making use of canonicalization and no-indexing wouldn’t make sense. Just no-indexing a page doesn’t indicate to Google which page you would like to combine with another page and that ranking signals should be forwarded to a said master page.

When Google sees two URLs from your site, they look the same, and you tell us your preference clearly, we’ll try to combine them and treat them as one (usually stronger) URL instead of separate ones. Redirects, rel=canonical, internal & external linking, sitemaps, hreflang, etc. all tell us your preferences, and the more you can align those, the more we’ll follow them and use them to pick a canonical out of that set (and forward all the signals to the canonical chosen).

On the other hand, noindex (alone) & robots.txt disallow (in general) are not clear signs for canonicalization. Just having a noindex on a page doesn’t tell us that you want to have it combined with something else, and that signals should be forwarded. A robots.txt disallow is even trickier, we don’t even know if the page matches anything else on your site, so we couldn’t even use it for canonicalization if we wanted to.

John Mueller, Webmaster Trends Analyst John Mueller

Simply put, you could say that rel=canonical does what 301 redirects do; attributes any links to the non-canonical version to the canonical one but without the redirect (since you want to be able to retain access to both pages).

Canonical URLs are for situations in which you just wouldn’t be able to (and shouldn’t) implement a 301 redirect.

Similarly, don’t do things like canonicalizing page A –> page B and then redirecting page B –> page A or chaining canonical tags, such as pointing page A –> B, page B –> C, etc). Sending clear signals is important because you otherwise often lead search engines to make bad decisions.

If you’ve ever considered both canonicalizing a URL and no-indexing it, you should consider using a 301 redirect. And if you can’t use a redirect then you should only use rel=canonical.

6.4 Only Indicating a Preferred Website Version in the Google Search Console

One option to set canonical URLs is to use the Google Search Console to specify your preferred canonical domain. There are a few reasons that this method is beneficial including that it is fast and extremely easy to implement.

However, there are also some known issues associated with using this method. It would, for example, be used to specify a preferred domain but you’d still need a plugin like Rank Math to easily specify canonical URLs for specific posts and pages on an individual basis when encountering various scenarios. 

And, of course, another downside to this approach is that specifying the preferred domain in Google Search Console only correctly sets the canonical variation for Google, but doesn’t do so for other search engines.

6.5 Are canonical URLs considered directives for search engines?

Canonical URLs are not considered directives, however, they are considered a search engine signal. What this means is that they are important and should be used because they help search engines understand a website’s content and how it relates to other content on your site

6.6 Should you canonicalize the first page of a paginated series?

No, this is a very common misconception. Each page within a paginated series of pages should have its own self-referencing canonical URL. If you have done this on your site, or have been doing so – it’s likely that Google will simply pick up on that & ignore the signal (since it isn’t a directive). 

6.7 Can you set canonical URLs as relative URLs? 

While the link tag accepts relative URLs, so it actually is considered valid HTML – using relative URLs in canonicals can lead to other issues including the base URL being incorrectly configured which would render the entire canonical setup invalid. 

As a matter of fact, Google themselves have stated some of the most common issues they see with canonicals actually comes from the use of relative URLs. 

In short, since the point of a canonical URL is to precisely state which URL is the preference (with precision, not ambiguity) this is really best achieved using absolute URLs when setting canonical URLs on your website.

Multilingual websites typically use Hreflang tags to store and display various versions of a webpage based on a user’s geographical location.

When using hreflang tags you should specify a canonical page in the same language, or the best possible substitute language if a canonical doesn’t exist for the same language.

That being said, if you choose not to indicate a canonical URL, Google will identify what they think is the best version or URL.

If you use WordPress as your content management system of choice (which we obviously highly recommend) and serve content to website visitors in multiple languages, we highly recommend making use of the Weglot translation plugin for your website.

We’ve independently verified that they do handle canonicalization as outlined and suggested by Google.

7 Conclusion – Proper Use of Canonical URLs is Important

And, that’s it! We hope we’ve been able to address absolutely every single question you’ve ever had about canonical URLs and how to use them in the situations you face & with websites you run.

One of the reasons we actually built Rank Math was to take the legwork out of repetitive work in SEO just like this. Setting custom canonical URLs (let alone automatically setting self-referencing canonical URLs by default) – all of which thanks to Rank Math & WordPress couldn’t be easier.

If you have absolutely any questions and want to join the conversation – Tweet @rankmathseo! 💬

Email Icon

Don’t Miss Any Future Post!

Sign up today for Exclusive SEO Articles