Hot New List of Places To Scrape
You’re not still scraping RSS feeds are you? *Shakes his head in shame* I am so disappointed in you. How about we review a few places where only the trekies are brave enough to scrape. Its a huge content filled Internet world out there. I’m sure we can do a little better than crappy RSS feeds.
Encarta Encyclopedia- WTF?! Yeah you heard right. The articles and content are held in huge datafiles on the CDs. Go fuckin’ grab ‘em already! No one else is.
YouTube- This one is too easy. They even have a feed you can use to grab the videos, descriptions, and titles.
IMDB- Same as YouTube. There is even an example of how to grab and parse IMDB content on the LWP module example code.
Newsgroups- A classic and too easy to pass up.
Drudge Report- Nothing is more beautiful than snagging big news that has popularity but isn’t already stolen by CNN and MSN. Also consider who your competitor in the SERPS is. Drudge Report may have a ton of links but the site itself is SEO’d to shit. My little sister could kick his ass in the SERPS.
Craigslist- Same as Drudge Report but I’m going to stay out of this one because I have a ton of respect for Craig Newmark. It is also a bit harder to beat him in the SERPS, but the vast volume of new content being added every day more than makes up for it.
IRC- I’ve beatin this technique to death so I’m not even going to bother talking about it.
Froogle- I couldn’t help but mention this one. However please respect when I say, stay off my turf! Seriously…
Forums- One of the easiest way to build millions of pages of content quickly. The quality tends to suck but you hit such a high range of topics in such a short amount of data it really helps bring in traffic from those odd phrases.
Looksmart and Article Finder- Their templates make it way too easy to scrape the content. The articles are also long which makes it nice.
User Contributed(Check Comments)
Google News- Uhhhg yes.
Public Libraries- Simply fuckin brilliant!
Ebay- That one is news to me. I’m all over that one. Ever thought of scraping Ebay and then feeding it into froogle. Ebay does the same damn thing, but why not go through your aff links? Its worth a shot and there has got to be a good way to make some cash off it.
University Data- Such an asshole thing. I love it!
This is great. Keep em comin!