Joho the Blog » Importing HTML into Google Docs spreadsheets

Importing HTML into Google Docs spreadsheets

Rick Klau points [g+] to a feature of Google Docs spreadsheets I didn’t know about (although I’m far from a spreadsheet maven): It can automatically include a table from any HTML document accessible on the Web. It turns out it can also include the contents of lists.

It’s not the most intuitive feature. Into a cell you type:

=ImportHTML(“[URL]”,”[query]”,”[index]”)

Except you put in the HTML page’s url instead of [URL], “table” or “list” instead of [query], and which the number of the tables or list you have in mind instead of [index]. For example:

=ImportHTML(“http://www.hyperorg.com/blogger/index.html”,”list”,1)

gets the first list (ul or ol) on Joho The Blog (this page you’re reading), which turns out to be the one on the left called “Other Stuff.” If you ask for 2 instead of 1, you’ll get my blogroll.

Or, to use Rick’s more useful example:

=ImportHtml(“http://www.accuweather.com/en/us/anchorage-ak/99501/august-weather/346835?view=table”,”table”,1)

That imports AccuWeather’s table of weather for Anchorage (where Rick is headed for vacation.)

The data updates every time you open the spreadsheet.

ImportCVS does the same for CVS data. And Kingsley Idehen explains how you can update your spreadsheet with Linked Open Data by going through SPARQL. (SPARQL lets you query a database for linked data.) (Yes, it’s over my head.)

Wouldn’t it be useful to be able to import a single element into a Google spreadsheet, even if it’s not in a list or a table? For example, suppose I want to get the headline of the first posting at, say, DailyKos.com. That element has an id of “article-1″. (I know this because I looked at the source.) So, why not let me specify the url and the id, and plop the contents into a cell in the spreadsheet? Or suppose I want the content of one particular cell of a table?

No, we’re never satisfied.

 


Two seconds after I pressed the “Publish” button, Rick Klau responded to my questions on the Google Plus thread where he talks about this feature. He suggests importXML for grabbing an item by its id. And to get a frozen copy of the data, copy and paste it. He also points to a post from 2007 about these features. (Oh, yeah, you can trust Joho to stay on top of the news!) In fact, that post gives an example of how to obtain the latest headline from the NYT:

=GoogleReader (“http://graphics8.nytimes.com/services/xml/rss/nyt/HomePage.xml”, “items title”, “false”, 1)

It still works. Cool!!

8 Responses to “Importing HTML into Google Docs spreadsheets”

  1. Had some fun with importHTML myself, followed by geocoding a bunch of data and running it through Google’s Spreadsheet Mapper tool to get a map of librarians doing the Library Day in the Life project last time around: http://andromedayelton.com/blog/2012/01/17/mapping-libday8/ (oh, and pulled in some Flickr and WordPress via RSS, and a Twitter hashtag, why not)

    Was subsequently told it all would have been much easier with Google Fusion Tables, but never got around to wrapping my head around those.

    (And in completely unrelated news, Kingsley Idehen’s daughter was in my daughter’s class a couple years ago. What a small internet we inhabit.)

  2. Hi, Any idea how to import the top data (Aug 6th, 2012) from the table in http://finance.yahoo.com/q/hp?s=DBC&a=01&b=6&c=2006&d=07&e=7&f=2012&g=d

    I can figure out how to import the whole table, but I just need that one particular data point (the first/most recent date on the table)

  3. […] Importing HTML into Google Docs spreadsheets Rick Klau points [ g+ ] to a feature of Google Docs spreadsheets I didn’t know about (although I’m far from a spreadsheet maven): It can automatically include a table from any HTML document accessible on the Web. It turns out it can also include the contents of lists. It’s not the most intuitive feature. SPARQL. It still works. MORE >> 10 Tweets […]

  4. Please help. I need to import a table from web to GoogleDocs but to get to that page i have to login with user name and password. Any solutions?

  5. So I accidentally uninstalled Google Chrome and it deleted all my saved bookmarks (I had a lot). I’ve managed to track them all back to this file called “Bookmarks.bak” but when I try to import them from an HTML file it doesn’t show up as one… I have all the HTML that was in the backup document, though, I just don’t know how to convert that to a file I can import to Google Chrome…

    I use a PC by the way… Thanks in advance!

  6. Any idea how to have Google docs read HTML and display plain or formatted text? I have an IFTTT recipe pulling content from emails and putting them in Google Docs – but the emails are HTML and they are hard to read in the cell. I would like to get rid of all the html code and just have plain text.

  7. At this time I am going to do my breakfast, afterward having my
    breakfast coming yet again to read further news.

  8. […] http://www.hyperorg.com/blogger/2012/07/27/importing-html-into-google-docs-spreadsheets/ […]

Leave a Reply


Web Joho only

Comments (RSS).  RSS icon