EMPOWERING ORDINARY MARKETERS
TO BUILD EXTRAORDINARY LINKS.

Introducing Scrape Rate – A New Link Metric

by Jon Cooper
146 Flares 146 Flares ×

Drop what you’re doing and go the Boing Boing. Go to the archives page and select three different posts that were published at least a month ago. Go to Google and type in intitle:”post title (obviously replace post title with the actual post title. Keep the quotes). Do this for each post, add the number of results together, divide by 3, and you have now calculated the scrape rate for Boing Boing.

After doing a quick check, I calculated Boing Boing’s scrape rate as 40.33. Your number will vary based on which 3 you check, but this number should give you an overall feel on how often Boing Boing’s content is scraped. 

 

Scrape rate is a guest blogger’s best friend. For those who guest blog, the links you get in the post only go so far. Having the content scraped, although no longer on original content, gives you more link equity.

Imagine if you wrote a guest post on an average blog that was scraped 100 different times. Now compare that link power to a guest post written on a more authoritative blog that only gets scraped once or twice. While the original content on the second option yields more quality & trust, you can’t beat the quantity of links the first option provides. Argue all you want, but in terms of link building, having your content scraped (as long as the links are intact) like that trumps the quality of the original source in most cases.

Here’s a good real life example. Go to OSE and paste in the URL of a recent blog post from the SEOmoz blog. A lot of the posts don’t get an overwhelming number of high quality links, save a few successful posts, so the majority of the link power is coming from the content being scraped. The result? Most of the posts have a page authority of 60 or greater.

Note: I know a lot of that authority comes from the site it’s hosted on & the internal linking, but the point I’m making is when all else equal, scraped links can provide authority when found in great numbers.

When I guest posted on the SEOmoz blog in October, I had a targeted anchor text link back to a page I was trying to rank for a certain keyword. After the post was live for a few days, I saw no change in the SERPs, but after a week or two, my content was scraped by roughly 30  sites, and I saw an immediate jump in the SERPs. I went from not even in the top 50 for that keyword to the second page.

Why is scrape rate important?

If you’re guest blogging on a regular basis, you need to make sure you do your research. Guest blogging is more than taking an hour of your time to write up a post and throwing it at any blogger that’s willing to publish it; it’s about finding what resonates with the audience, interacting with the readers (i.e. via comments), and getting the most bang for your buck in terms of links. That’s where scrape rate comes in.

I’m not sold on sorting guest blogging prospects solely on domain authority and pagerank. Take it a step further. Go to Google and calculate the scrape rate (if someone creates a tool that does this automatically, let me know; I’ll happily send a few links your way). The best part about the metric is that some previously looked over blogs that don’t get pitched as much for guest posts might actually be the one who provides the most link power.

While everyone else in your niche is struggling to put together a post that gets published on blog X, you’re putting together a post for blog Y that you know has its content scraped and, in the end, passes more link juice.

The idea isn’t perfect, because a lot of blogs don’t get scraped at all, but this metric can help you find identify, as I said, a few picked-over blogs that others have missed.

I’m not dedicated to the idea of finding 3 average month-old posts, counting the number of times scraped, and dividing by three, but I think it’s fairly accurate. Here’s why:

  • If you just calculated it by looking at one post, it could skew the results. That’s just basic data knowledge.
  • Having it a month old means that it was given enough time for it to be scraped. That’s the problem with just finding the most recent post on the blog: some sites might syndicate it, but it will take a week or so after it’s published for it to happen.
  • Using the intitle search yields pinpoint accuracy, but only if the title is unique. One problem I’ve run into when doing this is that some results are from Friendfeed, Tweetmeme, and other similar social sites; I count them because it’s not worth the hassle of individually counting them out.

Granted this is a brand new idea, I want to hear your thoughts on this metric. I think it’s got potential to catch on in the SEO community, but I’m biased, because I’m the one who came up with it. Please leave me a comment; if you think it’s a bad idea, and you see flaws, feel free to trash me. I can take it. At the same time, if you like the idea, I’d love the words of encouragement.

 

Thanks for reading! Make sure you follow me on twitter and grab my RSS.

This post was written by...

Jon Cooper – who has written 119 posts on Point Blank SEO.

Jon Cooper+ is an SEO consultant based out of Gainesville, FL who specializes in link building. For more information on him and Point Blank SEO, visit the about page. Follow him on Twitter.

NEED LINKS?
Relax - I send out free emails full of
cutting edge link building tips.
27 Comments
  1. Darren says:

    Very interesting read, and makes sense. But to what degree does duplicate content rule out the value of it being scraped and the secondary links? Would this need to be factored in?

  2. Darren says:

    re-reading my comment, its not to say scrape rate needs this factored in, and I think its a great metric that is very insightful. But the bigger question is how do we apply scrape rate to the authority passed from these scrapes. For example 5 scrapes of same post, do not equal 5 unique content posts. I saw another post somewhere complaining how a lot of companies just do PR releases and think as they get a load of articles published they assume its great link building. When in fact its same sites each release scraping/publishing each release, plus duplicate content for all the links.

    Unique scrape rate, or a scrape rate which somehow removed the sh*t who have scraped the post may be a good enhancement?

  3. Very, very interesting. I was analysing a competitors links through guest posting yesterday and noticed a trend of some posts being duplicated elsewhere. Whilst I saw the value of the extra links these scrape sites were generating I hadn’t thought to formalise it and actually LOOK for this feature.

    What I found, after trying your method on the same stuff I looked at yesterday, was that not all sites are republishing the full page content, even if they do use the same title. However, all of these that I found at least linked back to the article on the original blog,, increasing the value of that page (and any links coming off it).

    Where I think it gets really interesting is when you start to analyse the page metrics of the scrape-pages (don’t know what else to call them). A lot of them will be 1, that is true, but I was seeing some 10s, 20s, 30s etc… If you combine this into your scrape rate you could come back with a metric that measures value rather than just quantity.

    Using the MozBar in Mozilla allows you to export results to CSV, with Page Authority. You could add that to your Google Search (put results on 100) and add that to your measure – or as a secondary measure?

    Anyway, I’m gonna test it. Great stuff Jon.

    • Jon Cooper says:

      Patrick,

      Great insights man. Thanks for taking it to the next level (analyzing scrape pages) before I had the chance to mention it.

      One thing you mentioned is that not all sites scrape the whole article. Here’s the solution: if you can, get a link in the first few paragraphs. Of course, you’ll have to make it relevant, but if you can, there’s a much higher likelihood that the link will get scraped no matter how much of the article is scraped.

      Thanks Patrick for the comment! Hope to see you blogging more often on the Search Engine People blog – you honestly write some great stuff :)

  4. Gareth Brown says:

    Interesting metric. Although most of the scrapers I’ve come across try to remove the links, especially those that are doing some kind of low level spin.

    This has reminded me of a post Michael Gray wrote some time ago, on how to take advantage of scrapers on your own site.

    http://www.wolf-howl.com/seo/use-scrapers-to-build-links/

    It looks like it was quite effective back in the day and as you’ve noticed the strategy still works. Its amazing really that even though those scraped posts are duplicated content, Google clearly gives them some value. As mentioned, nothing like the original, but at least you’re rewarded something.

    Thanks for sharing.

  5. Neil says:

    Interesting stuff Jon – what would be a good scrape rate vs a bad one?

  6. @Patrick “analyse the page metrics of the scrape-pages” Great thought. Let us know how the test turns out. I’m theorizing that the major difference between these pages will be whether or not the scrape-page links back.

    Great post, Jon. Definitely an interesting thought. I look forward to hearing how everyone’s tests go.

  7. Maybe the benefit seen in the serps was not virtue of the scraped content and the links but more about the idea that many republishing is still seen as some kind of low level social proof. Just a thought!

    Hippi the guy who thinks 80110**5 to that! I have more than enough metrics to keep me going for life :)

  8. Patrick Hathaway says:

    Hey Jon thanks for the comments. I’m a big fan of your blog so it’s nice to be able to contribute. I’ll post on here any results when I’ve had a chance to test.

    Good solution to the problem with scrape sites truncating the post, although as you say may be difficult to keep it relevant. Some sites allow you to put guest post mini bios at the top, maybe these are more valuable still?

    P.S. Should be on SEP blog again this week :)

  9. How do you ensure that you are the authority page? Also if this is known to work, what would stop people from scraping their own content?

  10. Ross Hudgens says:

    Really smart Jon. I would only say that I’m not sure it’s worth really computing the Scrape Rate each time – it is however worth looking at the PA of each historical blog post. Cause these can vary depending on architecture and things like this – the scrape rate. I like looking at historical blog post value to put a finger in the air value of a link I might get – doing that analysis is important when determining how much we should invest in said link.

  11. Bob Jones says:

    I think this type of link-building died a while ago. Like mentioned in the comments above, scrapers do their best to strip out or nofollow any external links. Besides that, the term “bad neighborhood” comes to mind.

    Instead of inutl:”title” – have you considered “sentence from post with your link in it” ? That should come up with less results, but at least they’ll be the full posts as opposed to partial feeds.

  12. Tim Grice says:

    Love the concept and agree with the principles behind it.

    I’ve had a few links from SEOmoz and a few of the other big SEO sites recently, it provides hundreds of scraped back links, and the ranking/visibility jump you see afterwards is almost always significant.

    This should definitely catch on and be apart of every link builders ‘quality check’.

    Thanks

  13. Probiotix says:

    Hi Jon,

    If this is a good example of writing a controversial link bait post then well done as I’m sure the post will get some good attention.

    Scraped content is duplicate content and gets completely devalued. These days even spun content – which is much more unique than identical content – also gets devalued. Unfortunately we aren’t in 2007 and such theories do not work any more. Not sure what makes you think that Google trusts domains consisting purely of duplicate content. Does the word Panda ring a bell?

  14. Shelli says:

    Hi Jon
    Interesting article posing some thought provoking questions.

    Guest posting on a selection of influencers and authorities within your niche is a great way to build a network which drives quality traffic to your site. The exposure and brand building you get from a well placed guest post is far more valuable than a handful of duplicate content scraped links and should that not be your primary driving factor in terms of selecting sites to target and not purely how many scrapes your article can achieve?

    I can see why you have suggested the scrape rate but am questioning if you only select a guest post based on the amount of scrapes it can attract how it differs from using automated software to distribute a spun article (which was never a sustainable way to achieve good rankings, it was always gave flaky short term results).

    Does a duplicate content link give value on a long term basis or is it just a short term boost (like article spinning)?

  15. Hi Jon,
    Very nice idea and will be most effective when used in conjunction with other metrics. Liek you state, nothing works in pure isolation.
    Great post

146 Flares Twitter 81 Facebook 4 Google+ 37 LinkedIn 24 146 Flares ×