Skip navigation

WordPress.com Blogs Feeds Scraped

WordPress.com NewsIn the most recent WordPress Wednesday News on the Blog Herald, I reported on the mass copyright violations and scrapings of WordPress.com blogs, as reported by Letters Home to You in “Has Anyone Stolen Your Writing Lately?” and “Please Help Me Get Google to Pull Their Ads from a Blogging Thief”.

There is also a WordPress Support Forums discussion on the subject, though it is now closed.

This is not the first time that WordPress.com blogs have been the target of scrapers. Last year, many of us fought against the giant blog scraper and splog, Bitacle, which continues to scrap WordPress and WordPress.com blogs, unfortunately. Still, a united voice can carry a lot of weight. And almost two million bloggers on WordPress.com makes for a tonnage of voices.

A recent article on CNet, ZDNet, and the New York times, Please Don’t Steal This Web Content (print version), describes the type of copyright violation – yes, featuring me:

VanFossen isn’t referring to the kind of plagiarism in which a lazy college student copies sections of a book or another paper. This is automated digital plagiarism in which software bots can copy thousands of blog posts per hour and publish them verbatim onto Web sites on which contextual ads next to them can generate money for the site owner.

Such Web sites are known among Web publishers as “scraper sites” because they effectively scrape the content off blogs, usually through RSS (Really Simple Syndication) and other feeds on which those blogs are sent.

WordPress.com bloggers cannot put ads on their blogs. But scrapers are taking your WordPress.com blog content and making money with your words.

Like that?

I don’t.

I’ve written about what to do when someone steals your blog content, but here is the reader’s digest version on how to respond if your WordPress.com blog content is being used in violation of your copyright policy.

Report to WordPress.com: To report copyright violations on this massive scale to WordPress.com, you can use their Feedback link from within your Administration Panels, or the WordPress.com Support. Do not report isolated cases to them as there is nothing they can do about it unless the offending blog is on WordPress.com.

Report to Google: To report copyright violations, splogs (spam blogs), and scrapers to Google, you can submit the information through their form or the Webmaster Tools Report Spam Form.

At WordCamp, Google blogger, Matt Cutts, explained that if enough reports on a specific site came in through the online form, it would rise to the top of the list and Google would take action.

It’s time to make these scrapers’ names rise to the top of the list.

Related Articles

Member of the 9Rules Blogging Network


Site Search Tags: , , , , , , , , , , , , , , , ,
Feed on Lorelle on WordPress Subscribe Feedburner iconVia Feedburner Subscribe by Email

Copyright Lorelle VanFossen, member of the 9Rules Network, and author of Blogging Tips, What Bloggers Won't Tell You About Blogging.

24 Comments

  1. Posted August 9, 2007 at 5:12 pm | Permalink

    Is there a comparable reporting for WordPress.com content scraped for WordPress.org blogs as there is for WP.com by WP.com?

  2. Posted August 9, 2007 at 8:07 pm | Permalink

    How can I tell if my blog’s been scraped?

  3. Posted August 9, 2007 at 9:53 pm | Permalink

    mpb: Any blog which has had a copyright violation can report through the Google options. WordPress cannot help, and there is no exclusive copyright service for only WordPress. The option for WordPress.com users is only when there has been massive scraping, like 20 – hundreds or thousands of blogs, targeting WordPress.com blogs specifically.

    WordPress.com won’t take any individual action like contacting Google or the web host and such, but they may block the scraper’s access, if enough blogs are hit. At least, that is my hope.

    Jenny: Include intrasite links in your blog posts, like links to other posts on your blog, a common practice, and check your trackbacks to see what comes in the comment door. And check out the other methods in Finding Stolen Content and Copyright Infringements.

  4. Posted August 10, 2007 at 1:41 am | Permalink

    Lorelle,
    Thanks for the mention! I like your suggestion to Jenny about including intra-site links in your blog posts to see what lands in the comments pile. Some posts, however, don’t lend themselves to intra-site linking. For those, I’ve thought of simply adding a link into my copyright notice at bottom. Somewhat more elegant than: This Post Stolen from…..

  5. Posted August 10, 2007 at 3:34 am | Permalink

    A good plugin I use is Digital Fingerprint. I have found many of my articles posted on others websites from this.

  6. Posted August 10, 2007 at 7:06 am | Permalink

    One way to get back at them is by embedding links back to your own blog. I’ve found many sites don’t bother changing the link, so my blogs get that extra bit of juice! :)

  7. Posted August 10, 2007 at 7:16 am | Permalink

    This kind of scraping was the event that caused me to move my blog from WordPress.com, which was a shame. Something like the anti-leech plug-in to blacklist some IP numbers might be helpful?

  8. Posted August 10, 2007 at 7:34 am | Permalink

    Donncha: I assume you know that Google’s new Blog PageRank will take away “points” if blogs linked to or from your blog are classified as splogs, scrapers, etc. Who links to you still carries weight, as does the duplicate content issue. The new algorithm checks for duplicate content off your blog, as a sign of copyright violations through scrapings. That link juice you think you may be enjoying could hurt you.

    Alun: WordPress.com is not exclusive for being scraped. In many ways, it is more protected than a full version WordPress blog would ever be, from a lot more vicious stuff than scraping. EVERY blog can be scraped and have their content taken. The couple of instances of mass scraping of WordPress.com blogs in two years of service is nothing compared to what some full version WordPress blogs suffer.

    In the recommendations and links in my article, I point to discussions about Antileech and Digital FingerPrint WordPress Plugins. These are fabulous for full version WordPress blogs.

    Trackbacks and Google Alerts tend to be the most successful in tracking down scrapers FAST, though.

  9. Posted August 10, 2007 at 8:49 am | Permalink

    Please see my article on how to deal with these scraper sites.. since it is nearly impossible to contact them, I think this is the best approach. Overall, the main thing you want to do is keep them from duplicating your content. For one blog of mine I now have them scraping their OWN rss feed, thus sending them in an infinite loop. Here’s how I did it.

  10. Posted August 10, 2007 at 9:19 am | Permalink

    I had a problem with a content thief a while back. It actually caused me to write my first WordPress plugin to get rid of the guy called Block Countries WordPress Plugin.

  11. Posted August 10, 2007 at 10:53 am | Permalink

    Lorelle,

    I can’t find the aggregates function on Bitacle anymore. I sent my “DMCA From Hell” to Adsense about a year ago over their abuse and got all ads removed from the site, they appear to have removed the aggregates function as well.

    Am I missing something? Did they rearrange the site?

    Please advise as you know you have my support in any action.

  12. Posted August 10, 2007 at 1:17 pm | Permalink

    Jonathan: I also submitted Bitacle to Google, Adsense, and a variety of other sources. I’m glad something happened. I need to check in with it again, but a couple months ago, I found my content on their site AGAIN, but tracking through to the full post took a lot of hopping, even though the trackback went directly to the scraped post.

    They are disguising their efforts well. I’ll check again and let you know.

    Thanks for helping us fight the good fight!

  13. Posted August 10, 2007 at 3:38 pm | Permalink

    It’s certainly a daily event for me, and yea, it’s getting annoying. I just wish I had more time to report people.

  14. Posted August 10, 2007 at 4:17 pm | Permalink

    Reporting these through Google’s Webmaster Tools Spam Report (which I have bookmarked right on my bookmark toolbar) takes less than 30 seconds each. I can do them even faster sometimes.

    The more we report those who scrap a lot of us, the sooner the message will get across to Google that this is a site that needs action.

  15. Posted August 10, 2007 at 5:40 pm | Permalink

    I recently had my content used by someone at WordPress.com where the user removed my byline. He did not just do this to me, but several authors. I was pleased how quickly WordPress staff handled the issue. Within a day the offending parties blog was suspended.

  16. Posted August 10, 2007 at 6:08 pm | Permalink

    WordPress.com doesn’t mess around with this issue much. That’s why they have continues to get the highest rating as an unfriendly-to-splogs free blog service, unlike others we could name, but we won’t. :D

  17. Posted August 10, 2007 at 7:11 pm | Permalink

    I hear you dear. No need to name names. lol :)

  18. sourasis
    Posted September 13, 2007 at 3:44 am | Permalink

    hi,
    I guess u dont understand that to set-up something like google , one mr larry and one mr page scraped the whole internet! theres nothing wrong about doing that , and I find even more non-sense in going over to google support to catch a scraper who just snatched ur blog!.. Afterall google is the biggest scraper in the world. Moreover , if somebody is making money with ur blog it does not amount to a crime anyway. Afterall when u chose to write a blog , u wanted it to be a socialising medium or a medium of expression. If you want to make money out of it, buy a website and host ur blogs. As far as copyright is concerned, if it exists , it exists for no good. I guess its fair enough to scrape and make money out of it. Afterall somebody else is smart about making money while somebody is more focused in expressing yourself. Jealousy is unacceptable in this free web!
    -sourasis

  19. Posted September 13, 2007 at 8:09 am | Permalink

    I’m so glad your view is a tiny minority on the web. Yes, Google is a major copyright violator – and its day may come. Copyright law exists, one of the new international laws honored by countries around the world.

    If a writer works hours on writing and publishing an article, they own it. It’s theirs. And it’s their right to make money with their work as much as a mechanic, doctor, lawyer, dentist, truck driver, or ditch digger. If you buy property and build a house, you have the right to sell it. I can’t come in, take the property and house and sell it, making money on it, without compensating you, right? It’s no different. I did that, you’d take me to court and possibly even have me arrested.

    Remember having your lunch or lunch money stolen when you were young in school. Or having your purse or wallet stolen as an adult? Or your car? Or your house broken into and something stolen? When you take someone’s content without permission, it’s no different. You are stealing something they worked hard to create. You are stealing something that is not yours. You are taking potential income from them, and for some, that’s the difference between paying the rent and not. Honest.

    Copyright laws are in place to protect the income and rights of artists, writers, photographers, and those who are often abused by society which things that anything creative isn’t work. It is. It’s a skill. It requires training. It requires education. It deserves compensation when used, sold, or reprinted.

    Everything on the web is copyrighted. If it is published, it is copyrighted, details of those rights apply to the country in which the copyright holder lives. It’s up to the copyright owner to decide how their content is to be used, so if they want to give it up for free usage, they can. If they don’t, they can do that. If they want limited restriction of how their content can be used, such as restricting content to only non-profit, non-commercial sites, they can. It’s their copyright. They control it.

    Assume everything is copyrighted. Ask first. It’s the nice thing to do.

  20. sourasis
    Posted November 25, 2007 at 3:05 pm | Permalink

    I guess you are right when you say that when somebody does a real hardwork and creates something , a form art, a thing that others are only allowed marvell upon. Again you are right to say that one does not have the right to steal a piece artistic creation attributed the hardwork of the creator. The society has never accepted that. A creation is a creation, either it is a house, a lunch box, or a Monalisa or a blog. But unfortunately like every social value and social rule, it remains the same with the blogosphere, the powerful owns the rules and the powerful is never wrong!! In one hand it is the powerful,I mean Google, who brought me to this blog of yours through their search engine , whose disdainful ambassadors of devil called crawlers are embraced by a nice robots.txt file which says GoogleBot is allowed to scrape me. Seriously your blog against scraping also allows the crawler to scrape you and u dont protest anywhere in your bloglists , unfortunately only on one occasion when I pointed it out, why? I will tell you why, because Google is the powerful and the powerful never does anything wrong. On the other hand , is the general scraper [a whopping 99% of them are just fun loving scrapers having fun and not really trying to make your blog a JPRowling bestseller in their name]. And you are fighting these scrapers and the actual 1% moneymaking scrapers, who actually cannot make much money out of anyway [u know 99.99% of peoples open blogs , which are actually kind of not open , sucks!! and are of no real monetary value]. I guess you should rethink the identity of your real enemy and fight the war accordingly. If Google stops all the others will stop.

  21. sourasis
    Posted November 26, 2007 at 12:50 am | Permalink

    First of all I will request you with due respect,that you please stop giving me links to posts which have 80% of content on moral values and rest well known facts. I am not here for a confession or get a soul searching mission.

    Blogs are currently under various forms of copyright violation infringement issues like:

    1. Sites like Google aggregating the web including blogs into several indexes, rendering online reference to these indexes and content for reading. But the marketing angle for these aggregation is very clear and is not any hidden truth that your blogs do make them money!

    2. The most disgusting attack is probably splogs. One where your content is actually reproduced in totally different place. Something I can relate to a real copyright infringement and will obviously oppose.

    3. The automated content learning, where your content is again only meant to be read but by a machine instead of a human being. The marketing angle is not really understandable to this, but as it sounds , does not really amount to a reproduction of content anywhere else.

    To me a Copyright infringement issue is the point 2, where the other two should be debatable but definitely does not sound like on the other side of the law. I believe your allegations and protest should be more directed to one of these categories “read” or “stolen”.

    Here I am not really trying to discuss whether one should exercise his/her rights related to the Copyright issues related to their online post. I am trying to reason the clauses in the Copyright , a subject , am not sure , whether interests you.

    To be a little loose on thoughts, I always believe , copyrights and IPR should be very much debated and reasoned for the sake of social and moral values, which is more important than debating on the issues of exercising the rights. Otherwise, there will be a day when you will be breaking a Copyright or IPR clause every moment in your life unknowingly and get sued for that!

  22. Dan
    Posted January 8, 2008 at 9:16 pm | Permalink

    I have found our content copied from our website and posted on WordPress.com blogs . This was reported to WordPress a week ago yet the content is still showing.

    Is this is what blogs are becoming all about … a good place to lift and publish others’ work? I’m not impressed with WordPress over this. They publish rules against this but ignore reports.

  23. Posted January 8, 2008 at 9:51 pm | Permalink

    @ Dan:

    WordPress.com is exceptionally good at helping with copyright theft and shutting down splogs. If you had left an email address, I could have helped you more with this issue. I recommend that you report it again. And make sure you are doing so properly according to their terms of service for abuse.


18 Trackbacks/Pingbacks

  1. [...] Apparently scrapers are targeting WordPress.com now – Lorelle’s posted “WordPress.com Blogs Feeds Scraped”. [...]

  2. [...] guess folks at WordPress.com start talking about copyright violation only when it hits home. When we cried foul about WordPress themers ripping our work and calling it their own, no one cared [...]

  3. [...] has a post “WordPress.com Blogs Feeds Scraped,” where she tells bloggers how to respond if their WordPress.com blog content is being used in [...]

  4. [...] you post on your blog. Lorelle VanFossen touches on the problems of blog scraping in her post WordPress.com Blogs Feeds Scraped. As always, Lorelle includes a list of related articles she has written, well worth the price of [...]

  5. [...] you post on your blog. Lorelle VanFossen touches on the problems of blog scraping in her post WordPress.com Blogs Feeds Scraped. As always, Lorelle includes a list of related articles she has written, well worth the price of [...]

  6. [...] you post on your blog. Lorelle VanFossen touches on the problems of blog scraping in her post WordPress.com Blogs Feeds Scraped. As always, Lorelle includes a list of related articles she has written, well worth the price of [...]

  7. [...] you post on your blog. Lorelle VanFossen touches on the problems of blog scraping in her post WordPress.com Blogs Feeds Scraped. As always, Lorelle includes a list of related articles she has written, well worth the price of [...]

  8. [...] you post on your blog. Lorelle VanFossen touches on the problems of blog scraping in her post WordPress.com Blogs Feeds Scraped. As always, Lorelle includes a list of related articles she has written, well worth the price of [...]

  9. [...] you post on your blog. Lorelle VanFossen touches on the problems of blog scraping in her post WordPress.com Blogs Feeds Scraped. As always, Lorelle includes a list of related articles she has written, well worth the price of [...]

  10. [...] you post on your blog. Lorelle VanFossen touches on the problems of blog scraping in her post WordPress.com Blogs Feeds Scraped. As always, Lorelle includes a list of related articles she has written, well worth the price of [...]

  11. [...] You can get more exotic and create a page footer like we have here for example, the key is to not only name the blog but link to it, if you want to get really clever link to the home page and when published edit the link to be the post permalink. This means that at the very least the splog is providing a link back, just one small thing if your at wordpress.com and use the more tag your feed is cut at that point so you will want to put the link above that line or it won’t be picked up. There is some more pretty generic advice for wordpress.com users at lorelles’ blog [...]

  12. [...] you post on your blog. Lorelle VanFossen touches on the problems of blog scraping in her post WordPress.com Blogs Feeds Scraped. As always, Lorelle includes a list of related articles she has written, well worth the price of [...]

  13. [...] you post on your blog. Lorelle VanFossen touches on the problems of blog scraping in her post WordPress.com Blogs Feeds Scraped and Content Theft and WordPress is a brand new post worth checking out. First, you noticed that I [...]

  14. [...] you post on your blog. Lorelle VanFossen touches on the problems of blog scraping in her post WordPress.com Blogs Feeds Scraped and Content Theft and WordPress is a brand new post worth checking out. First, you noticed that I [...]

  15. [...] WordPress.com Blogs Feeds Scraped [...]

  16. [...] Fight Feed Scrapers CNET News article Sock money’s report of a DMCA violation for their blog. Lorelle On WordPress accounts and offers pro-active suggestions to defend your intellectual property. Take action: Find [...]

  17. [...] Lorelle On WordPress accounts and offers pro-active suggestions to defend your intellectual property. [...]

  18. [...] Lorelle On WordPress accounts and offers pro-active suggestions to defend your intellectual property. [...]

Post a Comment

Follow

Get every new post delivered to your Inbox.

Join 21,007 other followers

%d bloggers like this: