Joho the BlogTED Archives - Joho the Blog

June 25, 2016

TED, scraped

TED used to have an open API. TED no longer supports its open API. I want to do a little exploring of what the world looks like to TED, so I scraped the data from 2,228 TED Talk pages. This includes the title, author, tags, description, link to the transcript, number of times shared, and year. You can get it from here. (I named it tedTalksMetadata.txt, but it’s really a JSON file.)

“Scraping” means having a computer program look at the HTML underneath a Web page and try to figure out which elements refer to what. Scraping is always a chancy enterprise because the cues indicating which text is,say, the date and which is the title may be inconsistent across pages, and may be changed by the owners at any time. So I did the best I could, which is not very good. (Sometimes page owners aren’t happy about being scraped, but in this case it only meant one visit for each page, which is not a lot of burden for a site that has pages that get hundreds of thousands and sometimes millions of visits. If they really don’t want to be scraped, they could re-open their API, which provides far more reliable info far more efficiently.)

I’ve also posted at GitHub the php scripts I wrote to do the scraping. Please don’t laugh.

If you use the JSON to explore TED metadata, please let me know if you come up with anything interesting that you’re willing to share. Thanks!

Comments Off on TED, scraped

July 14, 2010

TEDglobal talks, blogged by Ethanz the Amazing

Ethan Zuckerman, the world’s best live blogger — full awesomeness for his intellect, his writing talent, and his typing skills – is blogging TEDglobal. He is a prodigy of live blogging. I can’t even list all of the talks he’s blogged.

And Ethan’s own TED talk there is brilliant, funny, surprising, and compassionate.

1 Comment »