12 Ways to Prevent Your Blog Posts from Being Stolen
Content theft is a surprisingly common problem with content marketing. For every legitimate content marketer out there, there are dozens of spammers who would love to just steal your content, pack it with affiliate links or spam ads, pull in a few dollars off it until their site is reported, and repeat with a new target the next month.
Now, having your content stolen isn't always a bad thing. It's never a good thing, but a lot of the time, it won't actively harm your site. On rare occasions, you can even get a minor SEO boost, though that's exceedingly uncommon.
There are a few ways you can minimize blog post theft and a few that people recommend but that you should avoid. Let's talk about it!
30 Second Summary
You can spot stolen content using Google Alerts for unique phrases, plagiarism detectors like Copysentry, internal links that show up as backlinks and reverse image searches. If you find theft, you should first email the site owner to remove it. If that fails, you can file a DMCA notice with their web host or Google. To prevent future theft, you can show only RSS summaries, add Cloudflare protection, delay your RSS feed, watermark images and block scraper bot IPs.
Identifying Content Theft
First up, let's talk about identifying when your content has been stolen. See, 99% of the time, when your content is stolen, the person stealing your content is not going to be ranking on Google. It might not be in the first ten pages, and it will almost definitely not outrank your copy. Google scrapes websites consistently, and they also take the website's trust into consideration. If your website has no history of stealing content, and the person stealing your content has been stealing content for years, it's not going to be very hard for them to determine who is the original creator.
It's only if a big name site steals you content that ranking becomes an issue. Most of the time, it's simply content stolen for PBN usage or other spam. Since Google is so effective at finding spam, these sites are difficult to find in the search results!
With that, these first four tips are to help you find when your content is being stolen so that you can take action to do something about it.
1. Set Up Google Alerts
Google Alerts is a service Google offers that takes advantage of its crawling and indexing power. You set up alerts about names, keywords, or key phrases, and when Google detects a new post using that keyword or name, they will alert you that the post exists. You can use this for a lot of different purposes, including inspiration, following the news, and watching for brand mentions, but you can also use it for content theft monitoring.
What you need to do is identify a unique phrase or sentence you use in your content. You can either seed a particular phrase into all of your content – something like "we here at BrandName" that other companies wouldn't use – or you can pick a unique sentence from each new post you write. Create an alert, and let it run. When Google finds new content that matches the alert, you'll get a notification, and can check to see if it's stolen content.
2. Use a Plagiarism Detector
There are a handful of tools out there that help look for instances of plagiarism. They're usually meant to check writing you buy or commission to make sure it's not stolen, but you can use it proactively as well.
There are several options to choose from.
- Copysentry, by Copyscape, is the best-known tool. They scan and monitor the web and look for instances of duplicate content. They also have tools to help you deal with any stolen content results that they find.
- On that note, you can use Copyscape by itself without a subscription to test individual pages one at a time.
- Grammarly is generally used to check spelling and grammar errors in your text, but it will also run a scan to check for plagiarism. You can run your old content through the tool and see what copies come up.
- Plagium is a free or paid tool with varying levels of searching for copies, and it can scan potential content theft inside PDFs and other files, not just web-indexed content.
You can also just use a Google search for a unique phrase in your posts too. It's more manual, but it works well enough.
3. Use Internal Links
Linking from one blog post to another, like this, is generally good SEO practice. It helps keep users on your site (when they click from one post to another). It helps Google index every page on your site. Internal links aren't really a link juice powerhouse or anything, but they're a useful tool.
So how does this help with content theft? Well, a lot of content thieves simply take the content from another site without editing it. I've even seen people who steal an entire site, design and all. It's happened to me before, on one of my older sites.
When you have internal links in the content, those links become external backlinks when someone steals your content and posts it as-is on another domain. You can see those links using a backlink monitoring tool, or even just using WordPress's default trackback/pingback system, or using Google Analytics and looking at new referrers. This means that when a WordPress site steals your content (and most "autoblogs" that steal content are built on WordPress), you'll be notified as soon as it happens by WordPress.
Whenever you see a new referring domain, check to see if it's spam or stolen content, and take appropriate action as necessary.
4. Use Reverse Image Search
Your blog posts aren't the only content that can be stolen, and indeed, another piece of content is likely to be stolen much more often: your images. A lot of people seem to think that anything they find on Google Image Search is fair game to use when many images on there are not. So here's what you do.
First, determine if your images are yours. If you bought a stock photo license to use or if you've been using creative commons images, ignore them. Those images can be used freely by other people, so searching for them won't do you any good. The same goes for screenshots; anyone can take a screenshot of the same thing, so unless it's something proprietary like your internal analytics, it's not something you can pursue.
If you license assets through a service like Canva, keep in mind that other people can use those assets too, so while the exact composition is yours, other people can make similar images and not violate your copyright.
On the other hand, if you pay for unique graphic design, take your own photos, or otherwise produce unique images, those are your copyright and you can defend them. Use Google's reverse image search or a service like TinEye to see if other people have used your images.
Dealing with Current Theft
If you have identified blog posts, content, site design, or images that have been stolen and are definitely your copyright, you can take action to get the stolen content removed.
1. Ask the Webmaster to Stop
In some cases, the stolen content wasn't willfully stolen. For example, I've seen instances where a blogger hires a freelancer for cheap to produce a piece of content for them. The freelancer "produces" a piece of content and the blogger publishes it, without ever checking to see if the content is original. Turns out, that was your content! Now, this other blogger has unwittingly published your content. This is actually the worst-case scenario, because that blogger might actually out-rank you for your own content, and that's bad.
In these cases, you can often just send an email to the site owner and notify them of the theft. As long as you can prove it's your content (generally by linking your own version, though you may have to prove publication dates as well) the blogger will likely be apologetic and remove it.
If they aren't, or if the blog is willfully stolen, or if they simply don't respond, you can move on to playing hardball.
2. Issue a DMCA Takedown Notice
Hardball, in this case, is a takedown. There are a few ways you can go about this, in increasing severity.
First, issue a takedown notice to the site owner. You can find numerous guides on how to draft a DMCA notice online, but the main thing is it's basically just a legal threat. Either they take down the content in compliance with your notice, or you can pursue legal action.
If the site owner themselves doesn't remove the content, use a service like Who Is Hosting This to identify the web host of the offending site. Send them the notice – they likely have their own DMCA process and form you can fill out – and they should remove the offending content.
If the web host doesn't (and there are some shady hosts that largely ignore legal threats), you can file the DMCA with Google. Google has its own process for removing content from one of its services, found here. If the content isn't indexed on Google (and you can do the same with Bing), the spammer is likely not going to keep it up much longer.
We create blog content that converts - not just for ourselves, but for our clients, too.
We pick blog topics like hedge funds pick stocks. Then, we create articles that are 10x better to earn the top spot.
Content marketing has two ingredients - content and marketing. We've earned our black belts in both.
Preventing Future Theft
All of the above is about finding and dealing with current theft, but what about preventing theft in the future? There are some steps you can take to prevent future content theft.
1. Make Your RSS Display Summaries
Most blogs have RSS feeds built into them. Many bloggers don't even know they have an RSS feed until they check. That's what a lot of content thieves prey upon; they use a bot to scrape the RSS feed, which is by default sharing the full text of the blog post in an easy to scrape format.
It's easy enough to change this to just a summary mode, which will only show either your meta description or the first paragraph or so of the post. In WordPress, all you have to do is go to your admin console, go to the Reading section, find the feed, and change "full text" to "summary" under the appropriate option. It's detailed here. For other blog platforms, you may have to take different steps or use a third party RSS management tool, but it's still going to be pretty easy.
2. Use Cloudflare's Content Protection
Using third-party tools can work pretty well to help prevent content scraping bots, though very little can fully prevent a content scraper from doing it manually. Cloudflare, for example, offers "content scraping protection" as part of all of their plans, including their free plan. You can talk to them about setting up this protection, as well as the DDoS protection and other benefits that Cloudflare can bring to the table.
Their Scrape Shield setting itself doesn't actually prevent bots from stealing your content, but it does have a few extra features like email and hotlink protection. The standard Cloudflare firewall settings do most of the heavy lifting from preventing bot requests from ever hitting your servers.
There are, of course, other tools you can use to do the same thing. Radware lets you control bots, for example, though it's less automatic. If you don't like Cloudflare, you can always find an alternative that works well for you.
3. Use a Feed Delay
I mentioned up above that most of the time, scraped content isn't a big deal. There are two reasons for this. First, 99% of the time, the site that's stealing your content will never out-rank you, so you don't have to worry about it splitting your audience. Second, though, Google is very good at catching scraped content these days.
So how do they determine which content is the original and which is scraped? Primarily, they simply look at when they discovered it. If they find your content today and a scraped copy next week, chances are they'll trust your content more. Now, other factors do go into consideration here, like the relative quality levels of the sites and so forth, but generally, Google can identify when content is stolen versus syndicated versus backdated or whatever.
So, simply add a delay to when a bot can scrape your content. Keeping the RSS method of scraping in mind, you can set a delay on your RSS feed to only show your posts a day or so later than when they're published. You can simply add this to your functions.php file in your theme:
function publish_later_on_feed($where) { global $wpdb; if ( is_feed() ) { // timestamp in WP-format $now = gmdate('Y-m-d H:i:s'); // value for wait; + device $wait = '10'; // integer // http://dev.mysql.com/doc/refman/5.0/en/date-and-time-functions.html#function_timestampdiff $device = 'MINUTE'; //MINUTE, HOUR, DAY, WEEK, MONTH, YEAR // add SQL-sytax to default $where $where .= " AND TIMESTAMPDIFF($device, $wpdb->posts.post_date_gmt, '$now') > $wait "; } return $where; } add_filter('posts_where', 'publish_later_on_feed');
This gives Google time to index your content before the scrapers get to it.
4. Watermark Your Blog Images
Blog images are stolen far more often than blog content, so go ahead and watermark them. Watermarks come in a variety of forms, from barely-visible patterns to clearly visible logos to artist elements added to the designs to digital watermarks.
Watermarking doesn't specifically prevent your images from being stolen, but it makes it more obvious when they are. You can point to a watermark to prove that it's yours, and people who are aware that they're stealing may have to put the work in to remove the watermark. Since they don't want to have to do that, they'll be more likely to leave your images alone and look elsewhere for their theft needs.
5. Add a Copyright Notice
While it might not seem like adding a copyright notice to your website would stop scrapers, it works for some of them. There are some people out there who seem to believe that if there's no copyright notice, the content is fair game to take. That's 100% not true – once you publish something original, you own the copyright to it – but since copyright is a huge and complex topic, I can understand the confusion.
Adding a copyright notice to your site is easy enough. For WordPress, all you need to do is add a block of text to your footer that says something like this:
Copyright © 2020 SiteName, All Rights Reserved.
You can manually edit this once a year, or you can use <?php echo date(‘Y', time()); ?>
to automatically pull the current year. You can also use "1999-2020" or whatever the date you founded your site is to make sure everyone knows that you've had the copyright the whole time. It doesn't really matter how exactly you phrase it, as long as you're stating that your website is copywritten in a place that is visible to your users.
6. Block Scraper Bot IPs
Once you recognize that your content is being stolen, you can look for the IP addresses of the bots that are stealing and scraping the content. Bots have to access your site, which means they have IP addresses, and you can block those. You can do it manually through robots.txt, though shady bots might just ignore robots.txt directives. You can do it forcefully through .htaccess edits, as described here. You can also use plugins to help you do it, like this.
To block an IP in your .htaccess, simply find the .htaccess file on the root directory of your website and add this line to it (remember to place the IP with the website's IP address):
Deny from 123.123.123.123
You should be extremely cautious here; don't block IP addresses that are too broad, that are associated with ISPs, or that are associated with good bots like Google or Bing. Blocking those can have devastating effects on your search traffic.
What Not to Do
In the process of researching this topic, you may have come across one or two recommendations for things to do that can stop manual scrapers. In fact, most of what I've written above is meant to stop automatic scraper bots, not people just copy and pasting your content or saving your images. There's a reason for that: it's nearly impossible to do.
If you want to stop someone from copying and pasting your content, you can disable right-clicks or disable text highlighting. There are plugins and scripts to do that. I highly recommend not doing that, however.
Why? Three reasons. First, it's hugely disruptive to normal users. Some people highlight as they read to mark their place. Some people want to copy and paste a snippet to save a quote or share a snippet with a friend. Some people use their right-click menu for other purposes. It's a huge usability issue.
Second, it's a huge hit to your social media. People love quoting and sharing posts they read, but if you disable the ability to copy a line to share, they're never going to do it. You lose out on all of that benefit.
Third, and most importantly, it doesn't work. Disabling right-click or disabling highlighting text has to be done with a script, and it's trivially easy to block scripts from the client-side. There's even browser extensions that can do it for you automatically.
Anyone who cares enough to copy your content can do so, and blocking right=clicks will stall them out for, at most, 30 seconds. Heck, they don't even have to block the script, they can just press Ctrl+S
on their keyboard (or ⌘+S
for us MacOS folks) and save the entire webpage to the desktop, along with the images you're trying to protect. If there's content on your site, people can steal it, and scripts will only slow them down and hurt user experience.
Stick with more proactive ways of blocking scraping, and deal with scraping aggressively when it occurs. That's all you really need to do.
January 13, 2021
I have someone who I think is copying my articles, (well can't say that she is copying it but some parts of my articles were partially copied and she's just making some edits) is there anything I can do about it?
January 14, 2021
Hey Catherine!
Sorry to hear your content is being stolen. I get asked this a lot by my clients, and it's way more common than you think.
The first thing you should do before you do anything is to check what time your blog post was published. Then, check what time her post was published.
Also, check with Google and Way Back Machine to see when each post was first discovered.
If your post was indexed on Google first, it's generally not an issue from an SEO standpoint.
It only becomes an issue if they're using an auto-blog plugin and grabbing it via RSS. In rare cases, Google will see the thief's version first and assume you're the one who stole their article. However, Google is very effective at determining who is the original author of an article, so this is unlikely to ever be the case.
To answer your question, the most realistic solution is to simply email the person who plagiarized your content.
If they aren't cooperative, you could send a Cease & Desist letter, and the next step after that is sending a copyright complaint to their hosting provider.
A simple email solves 99% of these issues though - try that first.
July 27, 2022
This is the first time it happened to me and I am lost. I'll try emailing them first and see how they respond.
July 29, 2022
Hey Bo, sorry that's happening to you. Good luck, starting by contacting the webmaster is always the right move. Then go from there if they refuse 🙂
September 24, 2024
Did email work for you?
October 01, 2024
Hey Lula!
Honestly email has been pretty hit or miss for me. Sometimes you get a quick response and action but other times not so much.
I usually like to use Google Alerts to track down stolen content. It's been pretty helpful for me.
Hope this helps! Do you have a blog too?
👍
February 19, 2021
Hey James. Is the weekly protection of Copysentry includes all the blog posts in my blog or there is a limited number?
February 26, 2021
Hey Lara!
They charge you per page, starting at $0.25 per page.
If you ask me, I don't think plagiarism is something that should warrant a monthly recurring subscription. It's inevitable, and eventually, you get used to people stealing your blog posts. Google understands which blog post was the original creator of the content, and chances are pretty good that the site that stole your content has a history of stealing other people's content too. Their reputation is going to be hurt, not yours.
I think it can be valuable if you want to do a bulk scan of your entire site to check for plagiarism, and just sign up for 1 month so they'll do all of your pages in one go.
I hope this helps!
March 30, 2021
TinEye is very helpful for me. I was able to catch someone using my own photos. I don't actually understand why people need to steal other's work. Glad there are apps now that can help you determine if your property is being stolen
March 31, 2021
Hey Kristin, happy it helped you! I've had better success with the Google Images reverse search but TinEye can catch some of these as well.
April 03, 2021
How do I protect myself when my content is being stolen word for word from my blog on my Facebook page? Everything I have read seems not to be relevant for my situation, and I can't find anything that applies. It is super frustrating and feels massively violating. Thanks for your time.
April 07, 2021
Thanks for your comment!
Have you tried clicking reporting the post?
You click the three dots in the upper right-hand corner and click "Find support or report post". Explain the situation to them - hopefully, they'll do something about it.
You could also have some friends report it a couple of weeks later as well if they haven't acted on it.
April 14, 2021
Thanks for this. I'll try to use the free plan of Cloudfare first then see if needs to be upgraded to the paid version.
April 14, 2021
Hi June!
You should be just fine on the free plan.
It may not stop your content from being stolen altogether, though.
Usually disabling your RSS feed is the most effective solution, provided that you aren't using it for other purposes.
September 12, 2024
Thanks so much for the tip!
You know, I might just check if disabling the RSS feed is an option. Have you tried that yourself?
September 12, 2024
Hey Lillie!
Glad you found the tip helpful! You might find that disabling the RSS feed can surely help. I've tried it, and it worked well to reduce content theft.
Another trick is to use a plugin that prevents right-click copying. Have you thought about watermarking your images too? That could add an extra layer of protection.
Keep those great ideas coming!
Need more help? Just ask!