This video by Greg Grothaus, a Google Search Quality Engineer, was created on August 12, 2009 - so this is current information on the Duplicate Content Issue coming directly from the source.
The video is part of what they call "webmaster outreach" which they use to reach out to the webmaster community to explain how search quality works...
The first issue that Greg Grothaus discusses is the common myth about the Duplicate Content Penalty. He explains how they create a set of results for any given search query, and explains that there is actually no penalty.
They simply determine which of the duplicate pieces of content is most relative to the actual search query, and omit the others. Those that are omitted from one search query, may very well show in another more relevant search query. Example: a web page, and the print version of that same web page.
Greg says, "We recognize that most duplicate content is not deceptive in origin, so as a result we're not trying to penalize it, we're just trying to show in our search results content that is distinct and offer the searcher a variety of results. This is very much a per query thing." (paraphrased) He recommends that you read the Duplicate Content help file on Google, which states:
Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don't follow the advice listed above, we do a good job of choosing a version of the content to show in our search results. source
And which also says:
If you find that another site is duplicating your content by scraping (misappropriating and republishing) it, it's unlikely that this will negatively impact your site's ranking in Google search results pages. If you do spot a case that's particularly frustrating, you are welcome to file a DMCA request to claim ownership of the content and request removal of the other site from Google's index.
The exception is what they consider spam, and Greg says that this is still not a penalty for Duplicate Content but rather a penalty for spam. This is defined as a web page where someone has intentionally copied content and marked it up for the purpose of manipulating the search results, which they will omit from their index and/or give a much lower ranking. The example given was a case where someone copies the entire exact content from a Wikipedia page, then publishes and optimizes it on a page of their own site.
Greg goes on to explain exactly what Duplicate Content is...
The first is a common problem, which is multiple URL's which all point to the exact same page or content. Examples would be: url.com vs url.com/index.htm vs or http://url.com vs http://www.url.com. All 4 of those URL structures pulling up the exact same home page. He explains why this is a problem, and again states that there is no penalty associated with this issue.
The real issues in a case like that - again, not penalties - start with the fact that you are diluting your Link Popularity. The solution is to use only one instance of any given URL (link) and create a 301 redirect for any other instances (with www or without, for example).
Greg also says that in these cases, it causes inefficient crawling of your entire website, which could cause some of your new content to be missed.
How to Fix These Common Duplicate Content Issues
The solution is to understand what they call "the canonical". This refers to the simplest and most significant form of your content - or the URL that you want to show for any given page of content. The URL you choose (example: with www or without) is considered your canonical URL.
Once you've picked your canonical URL there are several ways you can let Google know to use this URL, including:
- Link to your web pages consistently
- Use a 301 Redirect for all non-canonical URL's
- Go into Google's Webmaster Tools and specify www vs non-www
- New option: use the rel=canonical HTML tag
Greg goes on to explain how to use the canonical tag, and explains it's similarity to the 301 redirect and your option to use either.
The last bit of the video covers Multiple Site Issues, such as different URL's for different audiences (by country or language) - a .co.uk version of the same .com site for example. Or French and German versions of your site.
The statement made here was "Google thinks Multiple Domains are OK". Your only real concerns are diluting Link Reputation. Google will choose the best page for any given query - not necessarily all of your domains.
An example is given of two domains with the same content targeting different countries, an Australian version (.com.au) and a British version (.com.co.uk) - obviously both in the English language. Google will attempt to serve the correct domain to the appropriate searcher, based on their location.
Greg suggests you help Google out by logging in to Google Webmaster Tools and set a particular domain for a particular locale. But again, there is no penalty even though both domains contain the exact same content. In fact, they encourage it because users prefer to read content in their own language, and even on country-specific domains that relate specifically to their location.
I hope this helps clear up the "duplicate content scare" and gives you a better idea of what Google wants and expects in regards to your site structure, and your use of duplicate web content.
Video shared by Christopher Hooper