Joho the Blog » So you think you can scrape?
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

So you think you can scrape?

If you’re thinking about scraping a web page to extract the delicious data bits from it, ScraperWiki looks like a great place to start. It’s got tools, examples, and a community. Right now the tools are in Ruby, Python and PHP, but they’re thinking about adding Javascript.

If I have time this weekend, I’m going to give it a try scraping the weekly Berkman Buzz post. Until a couple of weeks ago, I was fairly routinely posting the Buzz on this blog, because I had written a little scraper and formatter that let me go from the email version to the blog markup I prefer. But then those bahstahds at Berkman went all HTML on the weekly email, which completely broke my scraper. But the Berkman page that lists the Buzz looks like it’s ripe for trying out the ScraperWiki tools. Looking forward to it…

Previous: « || Next: »

Leave a Reply

Comments (RSS).  RSS icon