Joho the Blog » tech

July 27, 2012

Importing HTML into Google Docs spreadsheets

Rick Klau points [g+] to a feature of Google Docs spreadsheets I didn’t know about (although I’m far from a spreadsheet maven): It can automatically include a table from any HTML document accessible on the Web. It turns out it can also include the contents of lists.

It’s not the most intuitive feature. Into a cell you type:

=ImportHTML(“[URL]“,”[query]“,”[index]“)

Except you put in the HTML page’s url instead of [URL], “table” or “list” instead of [query], and which the number of the tables or list you have in mind instead of [index]. For example:

=ImportHTML(“http://www.hyperorg.com/blogger/index.html”,”list”,1)

gets the first list (ul or ol) on Joho The Blog (this page you’re reading), which turns out to be the one on the left called “Other Stuff.” If you ask for 2 instead of 1, you’ll get my blogroll.

Or, to use Rick’s more useful example:

=ImportHtml(“http://www.accuweather.com/en/us/anchorage-ak/99501/august-weather/346835?view=table”,”table”,1)

That imports AccuWeather’s table of weather for Anchorage (where Rick is headed for vacation.)

The data updates every time you open the spreadsheet.

ImportCVS does the same for CVS data. And Kingsley Idehen explains how you can update your spreadsheet with Linked Open Data by going through SPARQL. (SPARQL lets you query a database for linked data.) (Yes, it’s over my head.)

Wouldn’t it be useful to be able to import a single element into a Google spreadsheet, even if it’s not in a list or a table? For example, suppose I want to get the headline of the first posting at, say, DailyKos.com. That element has an id of “article-1″. (I know this because I looked at the source.) So, why not let me specify the url and the id, and plop the contents into a cell in the spreadsheet? Or suppose I want the content of one particular cell of a table?

No, we’re never satisfied.

 


Two seconds after I pressed the “Publish” button, Rick Klau responded to my questions on the Google Plus thread where he talks about this feature. He suggests importXML for grabbing an item by its id. And to get a frozen copy of the data, copy and paste it. He also points to a post from 2007 about these features. (Oh, yeah, you can trust Joho to stay on top of the news!) In fact, that post gives an example of how to obtain the latest headline from the NYT:

=GoogleReader (“http://graphics8.nytimes.com/services/xml/rss/nyt/HomePage.xml”, “items title”, “false”, 1)

It still works. Cool!!

6 Comments »

July 22, 2012

Capturing control keys in Chrome et al.

Hallelujah! For years — literally years — I’ve been limping along with a blender full of spaghetti to do something that should be really simple: capture control key combos (like CTRL-S or CTRL-I) via Javascript in all the major browsers. I finally found some simple code that seems to work beautifully.

The problem is that the browsers don’t agree about what’s going on when a user presses a control key and another key simultaneously, which is, after all, the usual thing people do with the control key. Some of the browsers think that it’s two events, so you have to record the control keypress, remember it, and treat the next keypress differently. Other browsers think of it as a single keypress that you can just process as a if CTRL-S were a unique key. Then, depending, you may or may not have to nullify the S press. The way I was doing it (cribbed from multiple sources, of course) involved first checking on which browser the Javascript was running in, and then process keystrokes, looking for an initial control press. Pain in the butt, and it was fragile.

I am certain that this is not a problem for actual developers. For example, jquery handles keystrokes, although I had trouble getting it to work (becaues, if it’s not clear, I am a ham-fisted hobbyist who mainly just copies in other people’s code. Thank you, other people!)

Today I had a a few minutes, so I went back to Google and found some simple code from Ganesh. Thank you!

Here’s what you do:

First, include jQuery. Place the following into your Javascript the following block. Put it toward the top, and don’t put inside a function. You want it to run whenever your Javascript loads. (Well, you could put it into an initialization function if you want.)

$.ctrl = function(key, callback, args) {
    $(document).keydown(function(e) {
        if(!args) args=[]; // IE barks when args is null 
        if(e.keyCode == key.charCodeAt(0) && e.ctrlKey) {
            callback.apply(this, args);
            return false;
        }
    });        
};

After that, you can bind a control combo, such as Control-S, to the function you want (e.g., “SaveMe()”) this way:

$.ctrl('S', function() {
    SaveMe();
});

By the way, Ganesh thanks an anonymous commenter for improving the code. God bless the iterative Web!

1 Comment »

May 20, 2012

Lion fixed my SuperDrive. (Alternative title: Snow Leopard broke my SuperDrive)

Yesterday Disk Utility told me to restart my Mac from a boot disk and run the disk repair function (= Disk Utility). Fine. Except I was unable to boot from any of my three Mac boot disks (including the original) whether they were in my laptop’s SuperDrive (= Apple’s plain old DVD drive) or in a USB-connected DVD drive. The system would notice the DVD when asked to look for boot devices (= hold down the Option key when starting up), but froze after I clicked on the DVD (= no change in the screen after 30 mins).

So, what the hell, I installed Lion, which I had been hoping to avoid (= my pathetic resistance to Apple’s creeping Big Brotherism). Thanks to the generosity of Guillaume Gète, I downloaded Lion DiskMaker, followed the simple instructions (= re-downloaded Lion, all part of Apple’s makings things hard by making them easy program), and now have a Lion boot disk. I was able to boot from it and fix my hard drive.

The whole episode was so reminiscent of why I left Windows (= Windows 7 looks pretty good these days).

3 Comments »

January 15, 2012

So you think you can scrape?

If you’re thinking about scraping a web page to extract the delicious data bits from it, ScraperWiki looks like a great place to start. It’s got tools, examples, and a community. Right now the tools are in Ruby, Python and PHP, but they’re thinking about adding Javascript.

If I have time this weekend, I’m going to give it a try scraping the weekly Berkman Buzz post. Until a couple of weeks ago, I was fairly routinely posting the Buzz on this blog, because I had written a little scraper and formatter that let me go from the email version to the blog markup I prefer. But then those bahstahds at Berkman went all HTML on the weekly email, which completely broke my scraper. But the Berkman page that lists the Buzz looks like it’s ripe for trying out the ScraperWiki tools. Looking forward to it…

3 Comments »

January 2, 2012

Fixing the quirky noQuirks blog template

Thanks to Mirek Sopek, the folks at Mako Lab diagnosed why my new WordPress blog template was going all wonky in Internet Explorer 9. Even after I’d discovered that the problem was that I was declaring the HTML page with Quirks, I’d put the type declaration in the wrong spot. I put it at the top of header.php, thinking that would put it at the top of the HTML page that WordPress assembles out of various files. Nope. You have to put it at the top of the index.php file. D’oh!

We still don’t know why it worked on my copy of IE 9, at the same version level and both 64-bit.

Thank you Mirek and Mako Lab! I would never have figured this out without you.

2 Comments »

December 30, 2011

Quirky html

In the recent — and probably unabated — unpleasantness attending the launch of the update of this blog’s look, I have learned a little about Quirks mode. I learned this because Internet Explorer 9 was not displaying rounded corners or laying out divs (blocks of content) the way Firefox and Chrome were. Once I switched off Quirks mode in my blog pages, it worked much better.

There’s a good explanation and some very detailed info here. But as I “understand” the story, quirks mode was introduced to handle the problem that different browsers were expecting different sorts of markup (particular for CSS style information). Then, once the browsers realized it would be helpful to everyone if they agreed to support truly standardized standards, they had to decide what to do with the old code written in the particularities of each browser. So, they agreed to allow the HTML developer to specify whether the page she’s created should be interpreted according to the modern standardized standards, or if it’s quirky and ought to be interpreted according to the idiosyncratic expectations of the various browsers.

You, the HTML developer, specify your intentions in the DTD declaration at the very beginning of your HTML pages. This page will help you figure out exactly how that line should read.

Meanwhile, the shame and humiliation the launch of the new look of this blog has brought upon me only deepens, for I have given up on controlling the placement of divs by getting my floats in order. Screw it. I’ve plunked them into a table. Yeah, I’m CSS-ing like it’s 1995.

2 Comments »

September 12, 2011

How to embed a WordPress admin page

I’m posting this so I’ll remember, and in case someone else is googling around for it.

I have a little editor I wrote in javascript for creating blogposts. When I’m done editing, it loads the transmogrified text into an iframe that contains the WordPress /wp-admin/post.php page (which is the one you create posts with). Except that it stopped working recently, giving me “X-FRAME-OPTIONS” errors.

A little research showed that x-frame-options are set at the server to prevent people from capturing your pages in their own evil iframes (e.g., inserting your blog posts into their spammy site), either by preventing anyone from doing so, or preventing anyone from inserting into a page that isn’t from the same site as the source page.

I couldn’t figure out how to unset those options. But Chason C., at MediaTemple.net — my hosting company — got back to me within 24 hours with the answer. It turns out that MediaTemple isn’t setting that option; WordPress is. The solution is explained in this blog post, which Chason found for me.

The irony is that the blogpost with the answer has actually captured and embedded the original blog post by Igor at KrazyWorks, which you can find here.

2 Comments »

August 12, 2011

How to move text in iMovie ’09

If you insert a text overlay into a movie you’re editing with iMovie 09 and then want to move it so that it matches up better with what’s going on in the movie, go ahead. I dare you. Aarrrgggh.

trying to movie a text marker

When you try to grab the blue text box floating above the clip, iMovie will think you’re trying to move the cursor (the red line) that marks where you are in the clip. No matter how you try to grab the little bugger, it won’t work. (Well, occasionally it seems to, but I haven’t figured out why.)

The trick (which I keep forgetting, which is one reason I’m blogging this) is to click on a clip so that you get the thick yellow outline around it, and then click on the little gear button that appears at the bottom left of the clip, and choose “Precision Editor.”

The Precision Editor button

Now you’ll be able to drag the text box to where you want.

Don’t forget to close the Precision Editor by clicking on the “Done” button that shows up at the top of the bottom window (well, unless you’ve switched the position of the windows) so that you can go back to normal editing of the clip.

2 Comments »

July 26, 2011

Microsoft Word does regex!

After literally decades of using Microsoft Word I just found out that it does regex!

I discovered this because I needed to delete comments inserted throughout my book manuscript, in the form . Hundreds of them. I was contemplating exporting to HTML so I could use a text editor that can handle this type of search and replace, but came across an article on how to use regular expressions in Word. Regexes let you use magical incantations that no one understands but that cause text to dance in little circles and transform themselves in puffs of smoke.

For example, to get rid of the pesky markup in my manuscript, I just had to tell the Replace dialogue to use wildcards, and then had it search for \<AU:?\>. The backslashes are necessary so that the angle brackets are not read as regex instructions. The question mark tells Word to find everything between <AU: and >. Simple! And it accepts far more complex regular expressions that. (Here’s a site that lets you test your regular expressions.)

Take a well deserved bow, Microsoft Word! (And then fix auto-numbers.)

1 Comment »

June 19, 2011

Ragged right for Kindle

Full justification of a page — so the page margins are flush to both the left and right edges — sounds like what you want in a professional book, but when computers are laying out pages on the fly on small screens, and especially when they are under the constraints of having relatively few words per line to play with, it can result in ugliness.

When I first got my Kindle 1, it let you decide whether you wanted left justification (= “ragged right”) or full justification. Then Amazon upgraded the software and took away that option, which was not my favorite upgrade ever. (Maybe I just failed to find the hack to restore it.) I just got a Kindle 3, on the occasion of my Kindle 1′s screen losing a valiant battle against pressure in an over-stuffed backpack. There is a hack for the Kindle 3 that has restored the option, except where publishers have explicitly created fully justified texts. Go here and follow the advice in reply #1 scrupulously. (If you don’t know about UNIX line endings, you might not want to try this.)

I also altered one of the existing lines to “JUSTIFICATION=left”, which may be having the effect of setting the default to ragged right, but I’m not sure. At least it didn’t obviously break my Kindle. (Which reminds me: You’re responsible for whatever damage following the advice here may cause. What are you doing following advice in a blog, anyway?)

2 Comments »

« Previous Page | Next Page »