Duplicate Content Issue: How To Remove?

Thursday, August 16, 2012

Duplicate Content Issue: What Is This Issue? Does My Articles Face It? How To Handle It? This is just my second post about duplicate content. If you wanna know what is duplicate content, do your contents face it, or any other questions due to this matter, you may find on my previous post. Here i just wanna tell you the better suggestion than what i suggested a few times ago. Here is my previous post: Remove Any Duplicate Content.

Well, let's straight down to the business.

#Step 1: Removing Duplicate Contents

1. First, find and download or explore all the duplicate contents: Go to your Google Webmaster Dashboard > Click the site you want to remove the duplicate contents > Optimization > Html Suggestion.

2. Removal Request: Google Webmaster Dashboard > Click on the site you want to remove the duplicate contents > Optimization > Remove URLs > Create A New Removal Request and follow the instruction.

Here you'll have to decide whether to remove content from chaced page only or chaced page and search, just select "chaced page and search". But note, remove the duplicates, not the original contents. It may take some time before it makes any effects. The eror information on your "HTML Improvements" page won't clean up right away, but it could take a month or maybe longer.

#Step 2: Handling Duplicate Contents

After you remove all the duplicate contents, now time to handle the problem so it may won't ever come back again. Follow the below instructions.

1. Activate Robots.txt File

Login to your blogger > Setting > Search Preferences > Click "Edit" on "Custom robots.txt" and insert the following parameter:

User-agent: Mediapartners-Google
Disallow:

User-agent: Googlebot-Image
Disallow:

User-agent: Googlebot
Disallow: /*.html
Allow: /*.html$

Sitemap: http://j-smith-site.blogspot.com/sitemap.xml

Now save your preference.

Note: Change j-smith-site.blogspot.com with your own blog url.

The Disallow: /*.html and Allow: /*.html$ will allow Google web crawler to index only a url that ends on .html, any longer urls will not be indexed. Such as .html?m=0, .html?comments, and any others.

Every blogger blog has a robots.txt file by default, it's located at "http://Your_Blog_Name.blogspot.com/robots.txt". Take a look at mine for demo: John Smith's Blog | Robots.txt

2. Activate Robots Meta Tag

Login to your blogger > Setting > Search Preferences > Click "Edit" on "Custom robots header tags" > Enable that Custom robots header tags, and specify the parameter by following the below instruction:

There are three points here, Home page, Archive and search pages, and Default for posts and pages.

*Home page: Check nofollow, noindex, and noarchive.
*Archive and search pages: Check nofollow, noindex, and noarchive.
*Default for posts and pages: Check only nofollow and noarchive.

Now click Save Changes and you're done.

Congratulations dude, your blog is now away from duplicate content issue. Nothing to worry about. No more.. Just post your comment if you still face any duplicate content issue. I'll be glad to help.. :)

Have a nice day everybody. :)

2 comments:

Unknown3/16/2013
hello;

about the header tags can tell us why you choose those options? like home page why we should check no follow, no index, no archieve? Does this mean that my home page will be no index in the search engine? Then I won't be able to see my home page in the google search? correct me if i'm wrong.