What is Duplicate Content?
Duplicate content refers to content that appears at different URLs on a site or on different sites across the web. It could be the exact same content or slightly similar content that looks too similar to be considered a copy.
Is There a Duplicate Content Penalty in SEO?
グーグル has confirmed that there is no penalty for duplicate content. Google further clarified that duplicate content does not violate its Google Search Essentials guidelines or Spam policies. In fact, Google expects sites to contain duplicate content.
However, this only applies to sites that unintentionally create duplicate content or sites that syndicate their content across multiple sites. Sites that publish thin content と doorway pages may be penalized for engaging in activities that violate Google Search policies.
Scraped content, which refers to content copied without permission, violates Google Search Essential guidelines and may also lead to copyright infringement. Sites with a large number of legal removal requests may be demoted on search results pages.
Causes of Duplicate Content
グーグル has clarified that duplicate content could occur when:
- The same or similar content is published on multiple sites
- The same or similar content is published on multiple pages on the same site
- A single content is accessible through multiple URLs on the same site
1 The Same or Similar Content is Published on Multiple Sites
This sort of duplicate content could occur with or without the knowledge or permission of the original publisher. It is actually quite common with syndicated content and press releases that are usually published and republished on multiple sites.
This could also occur without the publisher’s permission, for example, if another blogger copies content from a site and publishes it on another without permission. Google considers both content to be duplicates, even if there are minor differences between them.
2 The Same or Similar Content is Published on Multiple Pages on the Same Site
This sort of duplicate occurs when a blogger publishes the same or very similar content as separate content. Both content will typically rank for the same or very similar keywords. In many cases, the blogger may not even know that they are duplicate content.
3 A Single Content is Accessible Through Multiple URLs on the Same Site
Most sites are accessible through multiple URLs. For example, the below URLs will typically lead to the same location:
- yourdomain.com
- www.yourdomain.com
- https://yourdomain.com
- https://www.yourdomain.com
Some sites also have separate mobile versions and printer-friendly versions, which usually have their own distinct URLs. Some URLs could also contain URL parameters.
Google recognizes all of these as different webpages since they can technically contain different content. However, if they contain the same content, then they become duplicate pages.
How Google Handles Duplicate Content
グーグル has confirmed that it groups duplicate pages into a single cluster. That is, it combines them and treats them as if they were one piece of content.
Google then analyzes the duplicate content and selects the one that represents the entire cluster. The selected URL is called the 正規URL and will be displayed on search results pages. Google will also pass the link popularity and link equity of those pages to the canonical URL.
What is Not Considered Duplicate Content
Google is clear about what qualifies and does not qualify as duplicate content. Specifically, Google has confirmed that localized websites and content containing quotes from other content are not duplicate content.
1 Localized Websites
A localized website refers to a site or page that contains content already published elsewhere in another language. For example, if you have an article titled “Introduction to Yoga” on your site, you can republish the exact content on another page or a different site if it is in a different language.
2 Snippets and Quotes
Google says you can quote content from other sites in your content. These are not considered duplicate content as long as you include your own original content along with it. Your content should be unique and helpful. It should not be thin content or a minor modification to the copied content.
Why is Duplicate Content Bad for SEO?
Duplicate content can cause a host of issues on the affected sites and even on search results pages. Specifically, some that Google has identified include:
1 It Dilutes Your Link Popularity and PageRank
Duplicate content can dilute your link popularity と ページランク. For example, instead of having 100 links pointing to a single page on your site, you could have 10 links pointing to 10 duplicate pages. In this case, your link popularity and PageRank will be split across multiple pages
2 It Could Cause Google to Use Up Your Site Bandwidth
Google treats different URLs as separate webpages. While it has systems to detect URLs that lead to duplicate pages, some URLs do not give off any indication that they lead to the same content. In such situations, Google may end up crawling the same pages through multiple URLs.
This could cause Google to use up more bandwidth and server resources than it should. It may also cause Google to use up your クロール予算 quicker than usual. Once this happens, Google may not crawl other pages you want it to display on search results pages.
3 Google Does Not Want the Same Content on Search Results Pages
Google does not want to display the same content on search results pages, irrespective of whether the content was published on the same or on different sites. So, Google only publishes one and leaves the rest out of search results pages.
In this case, if you are copying someone else’s content, there is the possibility that it will not rank highly compared to that of the original publisher. Similarly, if you have duplicate pages on your site, Google only wants to display one of those pages.
4 It Could Cause Google to Display Non-User-Friendly URLs
Some bloggers are particular about the URL Google displays on search results pages. They typically want Google to display clean URLs. However, duplicate content typically leads Google to try to find which URL should be canonical.
While Google typically displays what it considers the best URL, this may be different from the URL that you want it to display. In some cases, Google may even display some long and not-so-user-friendly URLs as your canonical.
How to Check for Duplicate Content
You may need dedicated SEO tools to detect duplicate content on your site. However, if you do not have access to that, visit the webpage you think is a duplicate and copy some unique part of its text.
Next, go to Google and enter the site: operator followed by your domain name, for example, site:yourdomain.com. Then paste the text in double quotes ” “. Then, click 入る on your keyboard and review the content that shows up.
If that text appears in multiple results, review those pages and compare their content. If they are the same or too similar, then you have a duplicate content issue.
You can also search for the same content on the web, but this time, do not include the site: operator or your domain name. This will show you the sites copying your content.
How to Fix Duplicate Content Issues
Google has provided some guidelines on fixing duplicate content issues involving your site. Most of the solutions require you to perform certain actions that allow Google to easily determine your canonical URL and understand which URL you want it to consider as canonical.
1 Specify a Canonical URL
Google recommends that you declare your most desired URL as the canonical URL. However, you should know that Google uses several signals to determine your canonical URL. So, specifying one does not explicitly mean that Google would use it. However, it is a strong signal. You can refer to this guide to declaring your canonical URLs.
2 Use 301 Redirects
If you have restructured your site or changed your URL format, Google recommends that you create 301 redirects from the old URLs to the new one. This is a strong signal indicating that the URL being linked to should be considered canonical.
3 Add the URL to Your Sitemap
Google recommends that you add your most desired URL to your sitemap. If you do not have a sitemap, you should create one. You can refer to this guide on creating and configuring your sitemaps.
4 Use Consistent Internal Linking
Google recommends that you ensure that your internal linking format remains consistent across your pages. Specifically, you should use your desired linking format.
For example, if you want visitors to access your site using yourdomain.com rather than www.yourdomain.com, then you should use URLs like yourdomain.com/recipes when linking to other content on your site.
5 Ensure Other Sites Link Back to You
If you allow another site to republish your content, then Google recommends that you get them to include a backlink pointing to you. However, this may be impossible, particularly in situations where the blog is copying your content without permission.
Google mentioned that it has a great system for identifying which site originally published a piece of content, so you typically should not bother about this.
However, if you discover that the other content somehow ranks higher than yours in search results, then you should review your robots.txt file to ensure you have not blocked Google from crawling your site.
You should also review your sitemap to ensure that you have not changed the URL that points to the affected webpage. You should also confirm that your site is still available on Google and conforms to Google Search Essential guidelines.
That said, Google added that you should not block the URLs using your robots.txt file. If you do, Google will be unable to crawl the blocked URL and will ultimately treat them as separate content.