Black Hole SEO: The Real Desert Scraping

Alright fine. I’m going to call uncle on this one. With my last Black Hole SEO post I talked about Desert Scraping. Now understand, I usually change up my techniques and remove a spin or two before I make them public as to not hurt my own use of it. However on this one, in the process, I totally dumbed it down. Upon retrospect it definitely doesn’t qualify as a Black Hole SEO technique, more like a general article, and yet no one called me on it! Com’n guys you’re starting to slip. Enough of this common sense shit, lets do some real black hat. So the deal is I’m going to talk about desert scraping one more time and this time just be perfectly candid and disclose the actual spin I use on the technique.

The Real Way To Desert Scrape1. Buy a domain name and setup Catch-All subdomains on it using Mod-Rewrite and the Apache config.

2. Write a simple script where you can pull content from a database and spit it out on it’s own subdomain. No general template required.

3. Setup a main page on the domain that points links to the newest subdomains along with their titles to help them get indexed.

4. Signup for a service that monitors expiring domains such as DeletedDomains.com (just a suggested one, there’s plenty much better ones out there).

5. On a cronjob everyday have it scan the newest list of domains that were deleted that day. Store the list in a temporary table in the database.

6. On a second cronjob continuously ran throughout the day have it lookup each expired domain using Archive.org. have it do a deep crawl and replace any links to their local equivalents (ie. www.expireddomain.com/page2.html becomes /page2.html). Do the same with the images used in the template.

7. Create a simple algorithm to replace all known ads you can find and think of with your own, such as Adsense. Also it doesn’t hurt to replace any outgoing links with other sites of yours that are in need of some link popularity.

8. Put the scraped site up on a subdomain using the old domain minus the tld. So if the site was mortgageloans.com your subdomain would be mortgageloans.mydomain.com.

9. Have the cronjob add the new subdomain up on the list of completed ones so it can be listed on the main page and indexed.

What Did This Do?Now you got a site that grows in unique content and niche coverage. Everyday new content goes up and new niches are created on that domain. By the time each subdomain gets fully indexed much of the old pages on the expired domains will start falling from the index. Ideally you’ll create a near perfect replacement with very little duplicate content problems. Over time your site will start to get huge and start drawing BIG ad revenue. So all you have to do is start creating more of these sites. Since there are easily in the six figures of domains expiring everyday that is obviously too much content for any single domain, so building these sites in a network is almost required. So be sure to preplan the load possible balancing during your coding. The fewer scraped sites each domain has to put up a day the better chances of it all getting properly indexed and ranking.

And THAT is how you Desert Scrape the Eli way.

*Wink* I may just have hinted at an unique Black Hole SEO way of finding high profit and easy to conquer niches. How about exploiting natural traffic demand generated by article branding?

–>

Black Hole SEO: Desert Scraping

In my introduction post to Black Hole SEO I hinted that I was going to talk about how to get “unique authoritative content.” I realize that sounds like an oxymoron. If content is authoritative than that means it must be proven to work well in the search engines. Yet if the content is unique than it can’t exist in the search engines. Kind of a nasty catch-22. So how is unique authoritative content even possible? Well to put it simply, content can be dropped from the search engines’ index.

That struck a cord didn’t it? So if content can be in the search engines one day and be performing very well and months to years down the road no longer be listed, than all we have to do is find it and snag it up. That makes it both authoritative and as of the current moment, unique as well. This is called Desert Scraping because you find deserted and abandoned content and claim it as your own. Well, there’s quite a few ways of doing it of course. Most of which is not only easy to do but can be done manually by hand so they don’t even require any special scripting. Let’s run through a few of my favorites.

Archive.orgAlexa’s Archive.org is one of the absolute best spots to find abandoned content. You can look up any old authoritative articles site and literally find thousands of articles that once performed in the top class yet no longer exist in the engines now. Let’s take into example one of the great classic authority sites, Looksmart.

1. Go to Archive.org and search for the authority site you’re wanting to scrape.

2. Select an old date, so the articles will have plenty of time to disappear from the engines.

3. Browse through a few subpages till you find an article on your subject that you would like to have on your site.

4. Find an article that fits your subject perfectly.

5. Do a SITE: command in the search engines to see if the article still exists there.

6. If it no longer exists just copy the article and stake your claim.

See how easy it is? This can be done for just about any old authority site. As you can imagine there’s quite a bit of content out there that is open for hunting. Just remember to focus on articles on sites that performed very well in the past, that ensures a much higher possibility of it performing well now. However, let’s say we wanted to do this on a mass scale without Archive.org. We already know that the search engines don’t index each and every page no matter how big the site is. So all we have to do is find a sitemap.

SitemapsIf you can locate a sitemap than you can easily make a list of all the pages on a domain. If you can get all the pages on the domain and compare them to the SITE: command in the search engines than you can return a list of all the pages/articles that aren’t indexed.

1. Locate the sitemap on the domain and parse it into a flat file with just the urls.

2. Make a quick script to go through the list and do a SITE: command for each URL in the search engines.

3. Anytime the search engine returns a result total of greater than 0, just delete the url off the list.

4. Verify the list by making sure that each url actually does exist and consists of articles you would like to use.

There is one inherent problem with the automatic way. Since it’s grabbing the entire site through its sitemap than you are going to get a ton of negative results, like search queries and other stuff they want indexed but you want no part of. So it’s best to target a particular subdirectory or subdomain within the main domain that fits your targeted subject matter. For instance if you were wanting articles on Automotive, than only use the portion of the sitemap that contains domain.com/autos or autos.domain.com.

There are quite a few other methods of finding deserted content. For instance many big sites use custom 404 error pages. A nice exploit is to do site:domain.com “Sorry this page cannot be found” then lookup the cached copy in another search engine that may not of updated the page yet. There is certainly no shortage of them. Can you think of any others?

Cheers

–>

Follow Up To 100’s Of Automated Links/Hour Post

This is a follow post to my 100’s Of Links/Hour Automated – Black Hole SEO post.

I’m going to concede on this one. I admittedly missed a few explanations of some fundamentals that I think left a lot of people out of all the fun. After reading a few comments, emails and inbound links (thanks Cucirca & YellowHouse for good measure) I realize that unless you already have adequate experience building RSS Scraper Sites than its very tough to fully understand my explanation of how to exploit them. So I’m going to do a complete re-explanation and keep it completely nontechnical. This post will become the post to explain how it works, and the other will be the one to explain how to do it. Fair enough? Good, lets get started with an explanation of exactly what a RSS scraper site is. So once again, this time with a MGD in my hand, cheers to Seis De Mayo!

Fundamentals Of Scraping RSSMost blogs automatically publish an RSS feed in either a XML or ATOM format. Here’s mine for an example. These feeds basically consist of a small snipplet of your post(usually the first 500 or so characters) as well as the Title of the post and the Source URL. This is so people can add your blog into their Feed Readers and be updated when new posts arrive. Sometimes people like to be notified on a global scale of posts related to a specific topic. So there are blog search engines that are a compilation of all the RSS feeds they know about through either their own scrapings of the web or people submitting them through their submission forms. They allow you to search through millions of RSS feeds by simply entering a keyword or two. An example of this might be to use Google Blog Search’s direct XML search for the word puppy. Here’s the link. See how it resulted in a bunch of recent posts that included the word Puppy in either the title or the post content snippet (description). These are known as RSS Aggregators. The most popular of which would be, Google Blog Search, Yahoo News Search, & Daypop.

So when a black hatter in an attempt to create a massive site based on a set of keywords needs lots and lots of content one of the easiest ways would be to scrape these RSS Aggregators and use the Post Titles and Descriptions as actual pages of content. This however is a defacto form of copyright infringement since they are taking little bits of random people’s posts. The post Title’s don’t matter because they can’t be copyrighted but the actual written text can be if the person chose to have description within their feed include the entire post rather than just a snippet of it. I know it’s bullshit how Google is allowed to republish the information but no one else is, but like i said its defacto. It only matters to the beholder(which is usually a bunch of idiotic bloggers who don’t know better). So to keep in the up and up the Black Hatters always be sure to include proper credit to the original source of the post by linking to the original post as indicated in the RSS feed they grabbed. This backlink slows down the amount of complaints they have to deal with and makes their operation legitimate enough to continue stress free. At this point they are actually helping the original bloggers by not only driving traffic to their sites but giving them a free backlink, Google becomes the only real victim(boohoo). So when the many many people who use public RSS Scraper scripts such as Blog Solution and RSSGM on a mass scale start producing these sites they mass scrape thousands of posts from typically the three major RSS Aggregators listed above. They just insert their keywords in place of my “puppy” and automatically publish all the posts that result.

After that they need to get those individual pages indexed by the search engines. This is important because they want to start ranking for all these subkeywords that result from the post titles and within the post content. This results in huge traffic. Well not huge, but a small amount per RSS Scraper site they put up. This is usually done in mass scale over thousands of sites (also known as Splogs, spam blogs) which results in lots and lots of search engine traffic. They fill each page with ads (MFA, Made For Adsense Sites) and convert the click through rate on that traffic into money in their pockets. Some Black Hatters make this their entire profession. Some even create in the upwards of 5 figures worth of sites, each targeting different niches and keywords. One of the techniques they do to get these pages indexed quickly is to “ping” Blog Aggregators. Blog aggregators are nothing more than a rolling list of “recently updated blogs.” So they send a quick notification to these places by automatically filing out and submitting a form with the post title, and url to their new scraped page. A good example of the most common places they ping can be found in mass ping programs such as Ping-O-Matic. The biggest of those would probably include Weblogs. They also will do things such as comment spam on blogs and other link bombing techniques to generate lots of deep inbound links to these sites so they can outrank all the other sites going for the niche the original posts included. This is a good explanation of why Weblogs.com is so worthless now. Black Hatters can supply these sites and generate thousands of RSS Scraped posts daily. Where legitimate bloggers can only do about one post every day or so. So these Blog Aggregator sites quickly get overrun and it can easily be assumed that about 90% of the posts that show up on there are actually pointed to and from RSS Scraper Sites. This is known as the Blog N’ Ping method.

I’m going to stop the explanation right there, because I keep saying “they” and it’s starting to bug me. Fuck guys I do this to! Haha. In fact most of the readers here do it as well. We already know tens of thousands, if not more, of these posts go up everyday and give links to whatever original source is specified in the RSS Aggregators. So all we got to do is figure out how to turn those links into OUR links. Now that you know what it is at least, lets learn how to exploit it to gain hundreds of automated links an hour.

What Do We Know So Far? 1) We know where these Splogs (RSS Scraper sites) get their content. They get them from RSS Aggregators such as Google Blog Search.

2) We know they post up the Title, Description (snippet of the original post) and a link to the Source URL on each individual page they make.

3) We know the majority of these new posts will eventually show up on popular Blog Aggregators such as Weblogs.com. We know these Blog Aggregators will post up the Title of the post and a link to the place it’s located on the Splogs.

4) We also know that somewhere within these post titles and/or descriptions are the real keywords they are targeting for their Splog.

5) Also, we know that if we republish these fake posts using these titles to the same RSS Aggregators the Black Hatters use eventually (usually within the same day) these Splogs will grab and republish our post on their sites.

6) Lastly, we know that if we put in our URL as the link to the original post the Splogs, once updated, will give us a backlink and probably post up just about any text we want them to.

We now have the makings of some serious inbound link gathering.

How To Get These Links 1) First we’ll go to the Blog Aggregators and make a note of all the post titles they provide us. This is done through our own little scraper.

2) We take all these post titles and store them in a database for use later.

3) Next we’ll need to create our own custom XML feed. So we’ll take 100 or so random post topics from our database and use a script to generate a .xml RSS or ATOM file. Inside that RSS file we’ll include each individual Title as our Post Title. We’ll put in our own custom description (could be a selling point for our site). Then we’ll put our actual site’s address as the Source URL. So that the RSS Scraper sites will link to us instead of someone else.

4) After that we’ll need to let the three popular RSS Aggregators listed above (Google,Yahoo,Daypop) know that our xml file exists. So, using a third script, we’ll go to their submission forms and automatically fill and submit each form with the URL to our RSS feed file(www.mydomain.com/rss1.xml). Here are the forms:

Google Blog SearchYahoo News SearchDaypop RSS Search

Once the form is submitted than you are done! Your fake posts will now be included in the RSS Aggregators search results. Then all future Splog updates that use the RSS Aggregators to find their content will automatically pickup your fake posts and publish them. They will give you a link and drive traffic to whatever URL you specify. Want it to go to direct affiliate offers? Sure! Want your money making site to get tens of thousands of inbound links? Sure! It’s all possible from there, its just how do you want to twist it to your advantage.

I hope this cleared up the subject. Now that you know what you’re doing you are welcome to read the original post and figure out how to actually accomplish it from the technical view.

100’s Of Automated Links/Hour

–>

100’s Of Links/Hour Automated – Introduction To Black Hole SEO

I really am holding a glass of Guinness right now so in all the authority it holds…Cheers! I’m kind of excited about this post because frankly it’s been a long time coming. For the last 7-9 months or so I’ve been hinting and hinting that there is more to Black Hat than people are willing to talk about. As “swell” as IP delivery and blog spam are there’s an awesome subculture of Black Hats that takes the rabbit hole quite a bit deeper than you can probably imagine. This is called Black Hole SEO. By no means am I an expert on it, but over the last few years I’ve been getting in quite a bit of practice and starting to really kick some ass with it. In the gist, Black Hole SEO is the deeper darker version of black hat. It’s the kind of stuff that makes those pioneering Black Hat Bloggers who dispel secrets like parasite hosting and link injection techniques look like pussies. Without getting into straight up hacking its the stuff black hatters dream about pulling off, and I am strangely comfortable with kicking in some doors on the subject. However lets start small and simple for now. Than if it takes well we’ll work our way up to some shit that’ll just make you laugh its so off the wall. Admit it, at one point you didn’t even think Advanced SEO existed.

In my White & Black Hat Parable post I subtly introduced this technique as well as the whole Black Hole SEO concept. It doesn’t really have a name but basically it follows all the rules of Black Hole SEO. It targets sites on a mass scale, particularly scraper sites. It tricks them into giving you legitimate and targeted links and it grabs its content on an authoritative scale (will be explained in a later related post). So lets begin our Black Hole SEO lesson by learning how to grab hundreds of links an hour in a completely automated and consenting method.

ObjectiveWe will attempt to get black hat or scraper sites to mass grab our generated content and link to us. It’ll target just about every RSS scraper site out there, including Blog Solution and RSSGM installs including many private scrapers and Splogs.

Methodology1) First we’ll look at niche and target sources. Everyone knows the top technique for an RSS scraper is the classic Blog N’ Ping method. It’s basically where you create a scraped blog post from a search made on a popular Blog Aggregator like Google Blog Search or Yahoo Blog Search. Then they ping popular blog update services to get the post indexed by the engines. For a solid list of these checkout PingOMatic.com. Something to chew on, how many of you actually go to Weblogs.com to look for new interesting blog posts? Haha yeah thats what I thought. 90% of the posts there are pinged from spam RSS scraper blogs. On top of that there’s hundreds going in an hour. Kinda funny, but a great place to find targets for our link injections none the less.

2) We’ll take Weblogs.com as an example. We know that at least 90% of those updates will be from RSS scrapers that will eventually update and grab more RSS content based upon their specified keywords. We know that the posts they make already contain the keywords they are looking for, otherwise they wouldn’t of scraped them in the first place. We also have a good idea of where they are getting their RSS content. So all we got to do is find what they want, where they are getting it from, change it up to benefit us, and give it back.

3) Write a simple script to to scrape all the post titles within the td class=”blogname” located between the !– START – WEBLOGS PING ROLLER — comments within the html. Once you got a list of all the titles store it in a database and keep doing it infinitely. Check for duplicates and continuously remove them.

4) Once you got all the titles steadily coming in write a small script on your site that outputs the titles into a rolling XML feed. I know I’m going to get questions about what a “rolling XML feed” is so I’ll just go ahead and answer them. It’s nothing more than an xml feed that basically updates in real time. You just keep adding posts to it as they come in and removing the previous ones. If the delay is too heavy you can always either make the feed larger (up to about 100 posts is usually fine) or you can create multiple XML feeds to accommodate the inevitably tremendous volume. I personally like the multiple feed idea.

5) Give each post within the feed the same title as you scraped from Weblogs. Then change the URL output field to your website address. Not the original! Haha that would do no good obviously. Then create a nice little sales post for your site. Don’t forget to include some html links inside your post content just in case their software forgets to remove it.

6) Ping a bunch of popular RSS blog search sites. The top 3 you should go for are:Google Blog SearchYahoo News SearchDaypop RSS Search

This will republish your changed up content so the RSS scrapers and all the sites you scraped the titles from will grab and republish your content once again. However, this time with your link. This won’t have any affect on legitimate sites or services so there really are no worries. Fair warning: be sure to make the link you want to inject into all these Splogs and scraped sites as a quickly changed and updated variable because this will gain you links VERY quickly. Lets just say I wasn’t exaggerating in the title A good idea would be to put the link in the database, and every time the XML publishing script loops through have it query it from the database. That way you can change it on the fly as it continuously runs.

As you’ve probably started to realize this technique doesn’t just stop at gaining links quickly, it’s also a VERY powerful affiliate marketing tool. I started playing around with this technique before last June and it still works amazingly. The switch to direct affiliate marketing is easy. Instead of putting in your URL, grab related affiliate offers and once you got a big enough list start matching for related keywords before you republish the XML feed. If a match is made, put in the affiliate link instead of your link and instead of the bullshit post content put in a quick prewritten sales post for that particular offer. The Black Hat sites will work hard to drive the traffic to the post and rank for the terms and you’ll be the one to benefit.

Each individual site may not give you much but when you scale it to several thousands of sites a day it starts really adding up quickly. By quickly I mean watch out. By no means is that a joke. It is quick. There are more RSS scraped pages and sites that go up everyday than any of us could possibly monetize no matter how fast you think your servers are.

–>