Even if the post isn’t 100% directly relevant to you, there are a few tips & tidbits (i.e. my Mechanical Turk project settings, a few outreach tips, etc.) that should be useful to anyone.
I wanted a chance to share an advanced link building process that we’ve been using internally for large eCommerce sites that carry thousands of different brands in their inventory.
For these kinds of sites, there’s one type of low-hanging fruit that’s an absolute no-brainer: retailer listing pages on brand websites. To give you an idea of what I mean, try out this search:
These brands want to tell people where they can buy their products, and in a lot of cases, they’ll list off online stores. When they do, they usually link. And if they’re to a brand category page on your target site, then there’s none more relevant you could possibly get (i.e. Acer linking to your Acer category page).
But when you’re dealing with 4+ figure number of brands that your target site carries, it can become a bit of a head scratcher as to how to scrape the very bottom of the barrel for these opportunities, seeing as you don’t want to leave any of these easy links on the table.
So, here’s how we scaled things. An overview:
And here’s the detailed step-by-step guide.
Ask your client, or management, for an entire list of the brands the site carries. Sometimes this isn’t easy to get, as the information isn’t readily available somewhere. But there’s usually a SQL database that could be queried to get it.
The most recent client we did this with also gave us a 2nd column in the spreadsheet detailing the “product category”. So for example, if it was Acer, the category might be “computers & electronics”. This is VERY handy for future steps, so if you can, try to get the categories of these different brands when possible.
Unfortunately, you’ll never get the list of brands upfront, along with where their websites are. You’ll have to do this yourself. But luckily, there’s one SUPER handy tool for finding them – Mechanical Turk.
Mechanical Turk is essentially a scalable workforce that you can outsource micro-tasks to. It’s the perfect platform for what we’re going to be using it for. In this case, we want someone to use Google to find a brand’s website.
If you don’t have an account, signup as a Requester. Then navigate to the “Create” tab, and hit “New Project”.
You’ll then setup a project, which in essence, is giving your workforce instructions on exactly what you’ll be asking them to do. In this case, it’s taking an input (the brand name), and then finding their website, if they have one.
Here’s an example of our instructions for our most recent client:
As you can see, our instructions are tailored to the client in that we’re also asking for them to find out how many of the brand’s products we carry on their site. This helps us figure out if we carry a lot, or a little. The more that’s carried, the more of a case we have to get the brand to list our client. If it’s i.e. only 1, then it’s much tougher to push hard in outreach, especially if it’s i.e. a shorter listing list for a major brand (i.e. 3M). We also look to see if we carry ANY at all, in case the list we get from the client in step #1 is outdated.
Anyways, below that list of instructions, the worker is displayed the following:
Notice the two inputs they’re given: Name & Category. When we “create a batch” for this project, what we’ll be doing is uploading a 2-column CSV for those two values (the list of brand names, and their product categories). You may only have 1 input (brand name) if you can’t get any more info on the brand.
(The reason we liked the product category input is because some brand names are generic, and there are multiple companies with it. Given the product category, the worker can tell whether or not the brand website they found is for the same brand that our client carries.)
Here are a couple other details about our project that are important:
Once you’ve got that setup, you’re ready to start a new batch on that project. For a batch of 2,000 for us, it usually takes ~3-4 hours to complete 95% of it. I usually queue them up the day before I need it finished, and just come back then.
(This is FAR from a perfectly detailed description of using Mechanical Turk. See this for more information. This is simply the bare minimum, and campaign specific details, you’d need.)
Now that we’ve got the list of websites of all of the different brands, it’s time to identify pages on each of their sites that might be listings pages.
First, you’ll need to start with a list of title/URL footprints that are popularly used to describe these kinds of pages. Here are a few of my favorites:
Ultimately, this is one area that I won’t be giving up COMPLETELY to you :). A bit of searching will reveal the rest.
We’ve ultimately nailed it down to 10 fairly broad footprints with words like in the above example. The reason we nailed it down to 10, is because of how many individual Google searches we’ll be conducting (scraping) for each brand website. I originally had 40, but narrowed it down to 10, for the reason you’ll see soon.
Next up, take the list of websites, and go to this tool.
You’ll input the list of websites into the first text area, then you’ll input “site:” into the Prefix input, and finally the footprint into the Suffix input (with a space at the beginning).
Here’s what the output would look like for a single footprint:
You’ll then copy & paste this list of queries into an Excel spreadsheet, and then repeat the steps for each footprint you want to use.
When you’re finished, get the entire list of queries for all footprints into a single column in Excel, and sort it from A to Z. It’s not necessary, but I’ll explain why based off the tool I use for scraping Google.
Now that you’ve got this list of queries, it’s time to scrape Google for them. I personally use Link Prospector, and will be using it for this example.
With this tool, you can only scrape Google for 1000 queries at a time, so I suggest narrowing your list of footprints down to a smaller number (for me, 10), so you aren’t running an egregious amount of reports. For our client, we had 15k brands carried, so we had to generate 150 different Link Prospector reports (100 sites per report). It’s still high, but hey, it’s not 600 as it would’ve been with a 40 footprint list 🙂
(Thus, is the limitation of Link Prospector. If you’ve got other means to scrape Google at scale, use them for this!)
When you open up Link Prospector to create a new Custom report, here are the settings I’d go with:
I only scrape 1 page per query, but you could easily do 3 or 5 at no extra cost. We’re simply using queries that would turn up a relevant page, if there was one, in the first 10 results we got back.
From there, load up your report with 1000 queries, and then hit Submit.
(At the time of this writing, reports can take a bit to finish (~30 mins), but I’ve been told by Garrett French, the brains behind the tool, that this should improve as they load up more proxies.)
When the reports are finished, you’ve now got a list of pages for each brand domain that may be a listings page. We’re now going to find some extra info on each of those pages to see if they are, in fact, what we’re looking for.
So for that list of pages we’ve scraped from Google, we’re now going to be finding some extra info about them that will give us a better idea on if they’re what we’re wanting.
Here are the things we’re looking for:
You may have your own tools to find this info on a set of URLs, but I personally use URL Profiler for this. I’ll run the Server version on a Windows VPS overnight, depending on the number of results from Step #3.
Once we’ve generated the necessary reports for that info, it’s time combine all of the data we’ve got into a single spreadsheet. That means data from:
Here are the specific columns that we use in our finalized spreadsheet of all of those URLs of potential listings pages:
Here are a few additional tips on the above:
On top of that, here is the custom sorting we did for manual review purposes for the next step:
What this is doing is first grouping the URLs together by the brand name, then sorts them from the highest # of outbound links on the page, and finally if they’re tied there, by the LTS score from Link Prospector (the higher the #, the more likely it’s a listings page).
Here’s an example of the finished product:
Now that we’ve got all the data as we want it, it’s time to review the opportunities!
It can definitely be a time-suck, but if you do want to extract as many of these opportunities as possible, WITHOUT reaching out to non-qualified prospects, you’ll have to go through the list manually.
We ultimately want to go through each brand grouping in 15-30 seconds. Here’s our quick & dirty processes:
If those 3 don’t yield anything worthwhile, then we ultimately move onto the next brand grouping.
From here, it’s bombs away! Depending on your resources, if you’re able to get a client email address, I’d start with a simple email asking them if you could be listed. Keep it as concise as possible, give evidence of you carrying their products, and be polite.
If you don’t hear anything back after following up a few times, hit the phones. This is a perfect opportunity for phone-based outreach, as it’s the kind of opportunity where you should be listed (as opposed to could). We use Hushed in order to create a phone number that’s:
It’s also worth noting that some clients will have existing partner relationships that you can tap into internally. Getting them to put the calls through from a person the brand’s company is familiar with can skyrocket conversions, and also means less work for you :).
So how much did this all cost to do for a batch of 15,000 brands, using 10 footprints each for Step #3? Let’s do the math:
I don’t have a number for you in terms of how many qualified prospects a batch like that would generate, but I do know that doing things at this scale & detail would mean, at the very least, an additional 50% or so links that you otherwise wouldn’t have gotten if you just went the competitor research route.
With that said, if you enjoyed the above post, then I’ve got great news for you: for the next 2 days, my Advanced Guide to eCommerce Link Building is open for registration. It’s only open to 100 more people before it’s closed for good (in order to protect the longevity of the information).
So what do you think? Are there any gaps in my process? I would love to hear about them in the comments below!