Guide to Preventing Content Blog Scraping in WordPress

Blogging techniques like RSS feeds were designed to benefit users so that they could aggregate the content from multiple sites and read at once place. Google even came up with Google Reader – an RSS feed aggregator product.

Guide to Preventing Content Blog Scraping in WordPress

With misuse of RSS feeds on the rise through content scraping scripts, many experts have suggested that owners should do nothing against scrapers, that they should keep producing better content as it is too much of a hassle to deal with scrapers.

Is content scraping bad for your site?

If your site has a much higher domain authority than the duplicated site, Google won’t trust the duplicate site. This gives a cushion to high authority sites like DZONE, Entrepreneur, and other.

Whether you are new to blogging or professional blogger, it is incredibly difficult to ignore these scrapers; no blogger would want the scraped site to rank above your original content. There have also been instances where Google has flagged genuine sites instead of the scraped ones.

So, search engines might not be able to tell differences between original and the duplicate site.

How to deal with content scrapers

How to deal with content scrapers

Content scraping, in many ways, is digital identity theft. It’s not a pleasant feeling knowing that there is a website out in the wild which looks just like you – it can be used against you, or worse the copied site may be ranked higher in search engines. This negates all your efforts.

Being a part of many Facebook blogging groups, I see at least 2 such posts every week where someone’s blog has been fully copied.

What can you do against automated content scrapers

Most of these sites employ automated bot scripts to scrape content. There is no filtering and human intervention. Whatever you post will be posted on their site too. You can use this lack of filtering as your weapon against them.

Ways to take advantage of duplicated sites with scraped content

In this sections, we will share techniques where you can use sneak their traffic. There is also a way to tell Google that you are the original authority sites.

1. Internal Linking

If you never paid attention to internal linking, now you have the perfect reason to. If you interlink your articles, the same articles will also be posted on the scraped site. When the visitor on the scraped site clicks these links, they lands on your site. Most websites will tell you that the internal linking is important to reduce the bounce rate. But when your site gets scraped, interlinks give you traffic and a free backlink.

WordPress Hyperlink Editor lets you search for posts related to a specific keyword. Just highlight the word/phrase you want to link and click the “Insert/edit link” within the “Insert” menu. You will see a textbox where you can paste an external link, or type a term and let WordPress search for internal posts to link.

WordPress Hyperlink Editor

Besides the manual linking, there is Internal Links Generator plugin and few other similar plugins as well.

2. Insert Affiliate Links

Now that we have seen how to steal the imposter’s traffic, why not take it up a notch? Mix a few affiliate links along with internal links and let the visitors of scrape sites by your products. It’s like another channel for your affiliate sales. The SEO Smart Links plugin mentioned above helps you insert automatically insert affiliate links into keywords.

3. RSS Footer Magic

Most bloggers might not know this but Yoast SEO Plugin (free version) also has some secondary uses like adding source credits to the footer section of the RSS feed. When Google crawls the scraped site, you can inform the bot that the content first appeared on your website by putting a backlink in the footer.

To enable this, install Yoast SEO plugin and go to SEO >Dashboard screen. Enable the “Advanced settings pages” option if it is currently disabled.

Now go to SEO>Advanced screen. Open the RSS tab. Scroll down and you will see Yoast has already inserted some text into your RSS Feed Footer. You can customize this text using the variables given at the end of this screen.

RSS Feed Footer

Ways to block content scraping completely

Alternatively, you can also block the scraping to a good extent. In this article, we will tell you certain tweaks that you can make to prevent content scraping. However, these techniques have their own drawbacks.

1. Convert Your RSS feed into a summary feed

This step is effectively like closing the door to avoid creepy stalkers. Since the scrapers rely on your RSS feed, you can cut what information they receive down from a full text of each post to just a summary of each post.

To do so, select Settings and choose theReading option on your WordPress backend. Scroll down and you will see the below setting:

Full text to summary

Change the selected option from Full text to Summary. Keep in mind that all this will disable full-text RSS feeds for your genuine users as well.

2. Disable Trackbacks on your site

Trackbacks are simply a way for other blogs to notify that they have given a link to your website. Scrapers will use trackbacks to try to get a backlink from your site.

To turn trackbacks off on existing posts:

  • Go to Posts >All Posts screen.
  • Expand the Screen Options panel on the top right corner of the screen. Set Pagination to 999 items per page. Click Apply.
  • Once you see all posts, select all posts at once by selecting the checkbox next to the “Title” header.
  • From the Bulk Actions drop-down box, select Edit and click Apply button,
  • In the Pings drop down, select Do Not Allow and click Update button

Repeat this procedure in case you have more than 999 posts. This disables trackbacks and pingbacks on all existing posts.

To disable trackbacks on all future pages:

  • Select Settings and choose theDiscussion option on your WordPress backend.
  • In the Default article settings, uncheck the option that allows link notifications on new articles.

Default article settings

3. File a DMCA violation complaint

The first step in resolution is by asking the webmaster through a Contact Us form to notify them that they have violated the DMCA (Digital Millennium Copyright Act). You can find some content on the DMCA Wikipedia page to include in your email to the webmaster. Ask them totake all the content down immediately.

If they don’t have a contact us form or if they choose to ignore your request and scraping goes on without change, head to whois.com and do a lookup for the scraped domain that has copied your website.

Locate the name of the registrar and google “[Registrar Name] DMCA complaint” to see their DMCA complaint page. You can also choose to contact the customer support of the host of the scraped site. Few third-party takedown services like DMCA.com provide chargeable takedown services.

4. Block them completely

A reverse lookup will give you the IP of the scraped (impostor) site. Take this IP address and insert a “Deny from” command in your root folder “.htaccess” file through your cPanel File Manager.

It will look something like this (replace the IP with your impostor’s IP):

Deny from 123.456.789.012

Preventing Manual Scrapers

Some fraudulent manual scrapers will carefully copy the contents manually and post it on their duplicated site. After they have set up a blog, they will use website builders to easily copy the design of your site.

If you to prevent all visitors from copying your text, the WP Content Copy Protector plugin can easily do this job for you.

Summary of Content Scraping Prevention

Content Scraping is a menace no blogger likes to face. After all, your site is a product of your sweat and blood; it takes years of effort to perfect a website design and its content. Using the methods described in this blog, you can prevent and take advantage of those scumbag scrapers.

Have you known any site whose content was duplicated through scraping? If so, how did the owners stop content scraping?

Get more stuff like this
in your inbox

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for subscribing.

Something went wrong.

Written by:

Catherrine Garcia is a passionate blogger and a freelance Web Developer. She along with her group of freelance developers, are experts of creating Websites on CMS.

Leave a Reply

Your email address will not be published. Required fields are marked *