Remove Any Duplicate Content On Your Blogger Blog And How To Handle Them

Wednesday, May 16, 2012

Duplicate content means there are two or more posts with the same title or content, or both of them, on a site, and google doesn't love it at all. It may come up when: a publisher posted an article and then make a change on the title or content after some period where google has already indexed it on its server. Or, the publisher activated the "Archive" function on the site. Or even, the publisher running a mobile version of its site. Or any other things that change the url of the original content's url, for example: http://j-smith-site.blogspot.com/example.html to http://j-smith-site.blogspot.com/example.html?m=0, etc. In short, there's a post with two or more different urls in a site.

How To Check If There Or There Are No Duplicate Contents On A Site?

Just go to Google Webmaster Tool (www.google.com/webmasters)> Click on the site you want to check> Optimization> HTML Improvements. There, you'll know if your site has any problems with duplicate content or not.

How To Clean up And Handle This Problem?

1. Remove The Content: Download or explore all the contents shown on the HTML Suggestions page on Google Webmaster and then open a new tab on your browser, from the new tab of your browser, go to Google Webmaster again> Click on the site you want to remove the duplicate contents> Google Webmaster Tool (www.google.com/webmasters)> Click on the site you want to check> Optimization> Remove URLs, and then click "Create A New Removal Request" of the the option and follow the instruction. You'll have to decide whether to remove content from chaced page only or chaced page and search, just select "chaced page and search". But note, remove the duplicate, not the original content. - Here, you'll have to wait before it makes any effects. The eror information on your "HTML Improvements" page on google webmaster won't clean up right away, but it could take a month or maybe longer. It was a month to my site, or there abouts.

2. Handle The Problem: After you remove all posts with the duplicate content issue, you'll have to handle this problem so it may won't ever come back again. There are maybe "several" ways to handle this problem, but i know just two ways. First, you can use webmaster tool, and second you can create a "robots.txt" file. Here we go.

*Using Webmaster To Prevent "Google" (Exactly Google Bot Crawler) From Indexing A Specific Pages On Your Site

Go to Webmaster> Click on your site> Configuration> URL Parameters> Add Parameter. You'll find a column where you should fill in it your parameter to prevent google from indexing a specific pages on your site. And here is the problem, or maybe my problem, i don't even know how to fill in the parameter. The only parameter that i know is only the "m" parameter, which to blogger blog it stands for urls that are ending with "?m=1", or ?m=0, and somekind. So, if your blogger blog confronts such a problem, then you can run the parameter by simply filling the column with a single letter "m" and select "Yes, changes or reorders" from the dropdown list below the column where you fill in its "m" parameter. And after, choose "Narrows" from the dropdown list given to you and tick/choose "No Urls" of the option below it. -However, if you have a different problem, then i'm sorry i don't even know any other parameter to opt you, out.

*Using Robots.txt File To Prevent Google From Indexing A Specific Pages On Your Site

Maybe a few times ago this was a matter to you cuz there was no way to add any robots.txt file to your site. Not from google webmaster or even from blogger itself, but no longer today. A few weeks ago, maybe a month, blogger has again added some new features to keep you pleased with your blog. One among them is a feature to set your own robots.txt file, where you can use this to handle your duplicate content issue. All what you have to do with this robots.txt feature is just to activate it and fill it with short parameter, which in your case is to prevent google from indexing/crawling a specific pages/urls. -Just go to your blogger and get at the setting field and click "Search Preferences". After, click "Edit" of the column "Custom robots.txt" and to your issue with duplicate contents, fill in the column provided with the below parameter:

User-agent: Mediapartners-Google Disallow:

User-agent: *
Disallow: /search
Allow: /

User-agent: *
Disallow: /*.html
Allow: /*.html$

Sitemap: http://j-smith-site.blogspot.com/feeds/posts/default?orderby=updated

Now click save and you are done.

The Disallow: /*.html and Allow: /*.html$ will allow any web crawler to index only a url that ends on ".html", any longer urls will not be indexed, such as .html?m=0, .html?comments, and any others.

Every blogger blog have a robots.txt file by default, it's located at "Your Blog Url/robots.txt. Here is mine: John Smith's Blog | Robots.txt

3. Lastly, the service that will get your posts duplicate contents is "Archive". An archive will generate a url similar to the following: http://j-smith-site.blogspot.com/01_01_2012.html. -To handle the problem, you can place a robot meta tag on the header. Here it is:

< meta content='noarchive' name='robots'/>

Or the better one

< b:if cond='data:blog.pagetype == "archive"'>< meta content='noindex' name='robots'/>< /b:if>

Also include this (additional meta)

< b:if cond='data:blog.pagetype == "static_page"'>< meta content='noindex' name='robots'/>< /b:if>

You can also deactivate the feature through the setting field on your blogger. Just find it yourself cuz i forgot where you can find it exactly.

That's all guy. By implementing all the above tricks, your posts won't ever get duplicate content issue anymore, 99% won't. Perhaps..

Updated: Perfect Way To Get Rid Of Duplicate Content Issue

11 comments:

Shuvendu S Sahu6/02/2012
Hw I.add robots.txt n sitemap to my blogsite.
Unknown6/04/2012
I told you: Just go to your blogger and get at the setting field and click "Search Preferences". After, click "Edit" of the column "Custom robots.txt" and insert your parameter (robots.txt).

Example robots.txt and sitemap.

User-agent: *
Disallow: /search
Allow: /

Sitemap: http://j- smith-site.blogspot.com/ feeds/posts/default?orderby=updated
Unknown8/16/2012
dear sir i have same problem with blogspot site when i check html suggestion so i found 1056 duplicate tag as example www.example.blogspot.com
http://example.blogspot.com/html?m=0

all URL Links end of ?m=0
so after that i removed all Url Links end of ?m=0 using Remove URLs webmaster tool
after that i add m parameter Yes, changes or reorders" from the dropdown list below the column choose "No Urls

but sir i am not Select "Narrows" option it is important that i select this option also

Unknown8/16/2012
@jay kay : That's not a matter brother. It's just something like a reason for google.

By the way, i have the better idea to handle duplicate content issue. Check this post: New Methode To Remove Duplicate contents
Daniel Nicolae2/13/2013
Hi
If you modify the robots.txt like you say, I believe that the homepage of your blog will no longer be indexed.
Unknown3/13/2013
Hello;

I've been looking for this solution. Thank you for sharing it here. Is it ok to use setup only the gwmt and not to edit robot txt? Or it is a must to setup both?
Unknown3/16/2013
It confuse me. How could it be available in the google search if we made it noindex?