Joho the Blog » html

July 27, 2012

Importing HTML into Google Docs spreadsheets

Rick Klau points [g+] to a feature of Google Docs spreadsheets I didn’t know about (although I’m far from a spreadsheet maven): It can automatically include a table from any HTML document accessible on the Web. It turns out it can also include the contents of lists.

It’s not the most intuitive feature. Into a cell you type:

=ImportHTML(“[URL]”,”[query]”,”[index]”)

Except you put in the HTML page’s url instead of [URL], “table” or “list” instead of [query], and which the number of the tables or list you have in mind instead of [index]. For example:

=ImportHTML(“http://www.hyperorg.com/blogger/index.html”,”list”,1)

gets the first list (ul or ol) on Joho The Blog (this page you’re reading), which turns out to be the one on the left called “Other Stuff.” If you ask for 2 instead of 1, you’ll get my blogroll.

Or, to use Rick’s more useful example:

=ImportHtml(“http://www.accuweather.com/en/us/anchorage-ak/99501/august-weather/346835?view=table”,”table”,1)

That imports AccuWeather’s table of weather for Anchorage (where Rick is headed for vacation.)

The data updates every time you open the spreadsheet.

ImportCVS does the same for CVS data. And Kingsley Idehen explains how you can update your spreadsheet with Linked Open Data by going through SPARQL. (SPARQL lets you query a database for linked data.) (Yes, it’s over my head.)

Wouldn’t it be useful to be able to import a single element into a Google spreadsheet, even if it’s not in a list or a table? For example, suppose I want to get the headline of the first posting at, say, DailyKos.com. That element has an id of “article-1″. (I know this because I looked at the source.) So, why not let me specify the url and the id, and plop the contents into a cell in the spreadsheet? Or suppose I want the content of one particular cell of a table?

No, we’re never satisfied.

 


Two seconds after I pressed the “Publish” button, Rick Klau responded to my questions on the Google Plus thread where he talks about this feature. He suggests importXML for grabbing an item by its id. And to get a frozen copy of the data, copy and paste it. He also points to a post from 2007 about these features. (Oh, yeah, you can trust Joho to stay on top of the news!) In fact, that post gives an example of how to obtain the latest headline from the NYT:

=GoogleReader (“http://graphics8.nytimes.com/services/xml/rss/nyt/HomePage.xml”, “items title”, “false”, 1)

It still works. Cool!!

8 Comments »

January 2, 2012

Fixing the quirky noQuirks blog template

Thanks to Mirek Sopek, the folks at Mako Lab diagnosed why my new WordPress blog template was going all wonky in Internet Explorer 9. Even after I’d discovered that the problem was that I was declaring the HTML page with Quirks, I’d put the type declaration in the wrong spot. I put it at the top of header.php, thinking that would put it at the top of the HTML page that WordPress assembles out of various files. Nope. You have to put it at the top of the index.php file. D’oh!

We still don’t know why it worked on my copy of IE 9, at the same version level and both 64-bit.

Thank you Mirek and Mako Lab! I would never have figured this out without you.

2 Comments »

December 30, 2011

Quirky html

In the recent — and probably unabated — unpleasantness attending the launch of the update of this blog’s look, I have learned a little about Quirks mode. I learned this because Internet Explorer 9 was not displaying rounded corners or laying out divs (blocks of content) the way Firefox and Chrome were. Once I switched off Quirks mode in my blog pages, it worked much better.

There’s a good explanation and some very detailed info here. But as I “understand” the story, quirks mode was introduced to handle the problem that different browsers were expecting different sorts of markup (particular for CSS style information). Then, once the browsers realized it would be helpful to everyone if they agreed to support truly standardized standards, they had to decide what to do with the old code written in the particularities of each browser. So, they agreed to allow the HTML developer to specify whether the page she’s created should be interpreted according to the modern standardized standards, or if it’s quirky and ought to be interpreted according to the idiosyncratic expectations of the various browsers.

You, the HTML developer, specify your intentions in the DTD declaration at the very beginning of your HTML pages. This page will help you figure out exactly how that line should read.

Meanwhile, the shame and humiliation the launch of the new look of this blog has brought upon me only deepens, for I have given up on controlling the placement of divs by getting my floats in order. Screw it. I’ve plunked them into a table. Yeah, I’m CSS-ing like it’s 1995.

2 Comments »

July 10, 2008

Wanna play Fix My Code?

LATER THAT DAY: I took Wray Cummings’ advice in the comments below, which worked. So, now all the examples of uncentered HR statements in this post are in fact examples of centered HR statements, which makes the post rather mysterious. Imagine, if you will, then, that all of the little horizontal rules are left-justified. And, thanks, Wray!


I know I’m going to be embarrassed about this, but for months, if not for years, I’ve been unable to bend the simple <hr> element to my will. I can adjust its length, but I can’t get the little !@#$% to center itself.

I’ve tried everything I can think of to make it work:

<hr width=’100pt’ >:


<hr width=’100pt’ align=center />:


<hr width=’100pt’ align=’center’ />:


<hr width=’100pt’ style=’text-align:center’ />:


None of these work in Firefox or Safari. I have not intentionally redefined hr in any of my many CSS style sheets, but wouldn’t the local, inline setting take precedence anyway?

What incredibly obvious, embarrassing thing am I missing? Go ahead, make me look bad. And I’ll thank you for it. [Tags: ]

14 Comments »