|
|
If you’re thinking about scraping a web page to extract the delicious data bits from it, ScraperWiki looks like a great place to start. It’s got tools, examples, and a community. Right now the tools are in Ruby, Python and PHP, but they’re thinking about adding Javascript.
If I have time this weekend, I’m going to give it a try scraping the weekly Berkman Buzz post. Until a couple of weeks ago, I was fairly routinely posting the Buzz on this blog, because I had written a little scraper and formatter that let me go from the email version to the blog markup I prefer. But then those bahstahds at Berkman went all HTML on the weekly email, which completely broke my scraper. But the Berkman page that lists the Buzz looks like it’s ripe for trying out the ScraperWiki tools. Looking forward to it…
Categories: tech Tagged with: scraping • tech Date: January 15th, 2012 dw
Thanks to Mirek Sopek, the folks at Mako Lab diagnosed why my new WordPress blog template was going all wonky in Internet Explorer 9. Even after I’d discovered that the problem was that I was declaring the HTML page with Quirks, I’d put the type declaration in the wrong spot. I put it at the top of header.php, thinking that would put it at the top of the HTML page that WordPress assembles out of various files. Nope. You have to put it at the top of the index.php file. D’oh!
We still don’t know why it worked on my copy of IE 9, at the same version level and both 64-bit.
Thank you Mirek and Mako Lab! I would never have figured this out without you.
Categories: tech Tagged with: css • html • quirks • wordpress Date: January 2nd, 2012 dw
In the recent — and probably unabated — unpleasantness attending the launch of the update of this blog’s look, I have learned a little about Quirks mode. I learned this because Internet Explorer 9 was not displaying rounded corners or laying out divs (blocks of content) the way Firefox and Chrome were. Once I switched off Quirks mode in my blog pages, it worked much better.
There’s a good explanation and some very detailed info here. But as I “understand” the story, quirks mode was introduced to handle the problem that different browsers were expecting different sorts of markup (particular for CSS style information). Then, once the browsers realized it would be helpful to everyone if they agreed to support truly standardized standards, they had to decide what to do with the old code written in the particularities of each browser. So, they agreed to allow the HTML developer to specify whether the page she’s created should be interpreted according to the modern standardized standards, or if it’s quirky and ought to be interpreted according to the idiosyncratic expectations of the various browsers.
You, the HTML developer, specify your intentions in the DTD declaration at the very beginning of your HTML pages. This page will help you figure out exactly how that line should read.
Meanwhile, the shame and humiliation the launch of the new look of this blog has brought upon me only deepens, for I have given up on controlling the placement of divs by getting my floats in order. Screw it. I’ve plunked them into a table. Yeah, I’m CSS-ing like it’s 1995.
Categories: tech Tagged with: css • dtd • html • quirk • tech Date: December 30th, 2011 dw
I’m posting this so I’ll remember, and in case someone else is googling around for it.
I have a little editor I wrote in javascript for creating blogposts. When I’m done editing, it loads the transmogrified text into an iframe that contains the WordPress /wp-admin/post.php page (which is the one you create posts with). Except that it stopped working recently, giving me “X-FRAME-OPTIONS” errors.
A little research showed that x-frame-options are set at the server to prevent people from capturing your pages in their own evil iframes (e.g., inserting your blog posts into their spammy site), either by preventing anyone from doing so, or preventing anyone from inserting into a page that isn’t from the same site as the source page.
I couldn’t figure out how to unset those options. But Chason C., at MediaTemple.net — my hosting company — got back to me within 24 hours with the answer. It turns out that MediaTemple isn’t setting that option; WordPress is. The solution is explained in this blog post, which Chason found for me.
The irony is that the blogpost with the answer has actually captured and embedded the original blog post by Igor at KrazyWorks, which you can find here.
Categories: tech Tagged with: iframes • tech • wordpress • x-frame-options Date: September 12th, 2011 dw
If you insert a text overlay into a movie you’re editing with iMovie 09 and then want to move it so that it matches up better with what’s going on in the movie, go ahead. I dare you. Aarrrgggh.

When you try to grab the blue text box floating above the clip, iMovie will think you’re trying to move the cursor (the red line) that marks where you are in the clip. No matter how you try to grab the little bugger, it won’t work. (Well, occasionally it seems to, but I haven’t figured out why.)
The trick (which I keep forgetting, which is one reason I’m blogging this) is to click on a clip so that you get the thick yellow outline around it, and then click on the little gear button that appears at the bottom left of the clip, and choose “Precision Editor.”

Now you’ll be able to drag the text box to where you want.
Don’t forget to close the Precision Editor by clicking on the “Done” button that shows up at the top of the bottom window (well, unless you’ve switched the position of the windows) so that you can go back to normal editing of the clip.
Categories: tech Tagged with: imovie Date: August 12th, 2011 dw
After literally decades of using Microsoft Word I just found out that it does regex!
I discovered this because I needed to delete comments inserted throughout my book manuscript, in the form . Hundreds of them. I was contemplating exporting to HTML so I could use a text editor that can handle this type of search and replace, but came across an article on how to use regular expressions in Word. Regexes let you use magical incantations that no one understands but that cause text to dance in little circles and transform themselves in puffs of smoke.
For example, to get rid of the pesky markup in my manuscript, I just had to tell the Replace dialogue to use wildcards, and then had it search for \<AU:?\>. The backslashes are necessary so that the angle brackets are not read as regex instructions. The question mark tells Word to find everything between <AU: and >. Simple! And it accepts far more complex regular expressions that. (Here’s a site that lets you test your regular expressions.)
Take a well deserved bow, Microsoft Word! (And then fix auto-numbers.)
Categories: tech Tagged with: regex Date: July 26th, 2011 dw
Full justification of a page — so the page margins are flush to both the left and right edges — sounds like what you want in a professional book, but when computers are laying out pages on the fly on small screens, and especially when they are under the constraints of having relatively few words per line to play with, it can result in ugliness.
When I first got my Kindle 1, it let you decide whether you wanted left justification (= “ragged right”) or full justification. Then Amazon upgraded the software and took away that option, which was not my favorite upgrade ever. (Maybe I just failed to find the hack to restore it.) I just got a Kindle 3, on the occasion of my Kindle 1′s screen losing a valiant battle against pressure in an over-stuffed backpack. There is a hack for the Kindle 3 that has restored the option, except where publishers have explicitly created fully justified texts. Go here and follow the advice in reply #1 scrupulously. (If you don’t know about UNIX line endings, you might not want to try this.)
I also altered one of the existing lines to “JUSTIFICATION=left”, which may be having the effect of setting the default to ragged right, but I’m not sure. At least it didn’t obviously break my Kindle. (Which reminds me: You’re responsible for whatever damage following the advice here may cause. What are you doing following advice in a blog, anyway?)
Categories: misc, tech Tagged with: justification • kindle • typography Date: June 19th, 2011 dw
Because I had to click around a few times, trying out instructions that did not work, here’s a link to instructions that actually do install LibreOffice on Uubuntu. The instructions that don’t work tell you to open a DEBS folder that does not exist. The instructions that do work have you use a PPA (Personal Package Archives), about which you need to know as little as I do (= nothing).
You do have to know how to type commands in a terminal window, however.
Also, this installation leaves your old copy of Open Office untouched. If you want to uninstall Open Office entirely, I am told you should type this into a terminal: sudo apt-get remove openoffice*.*
LibreOffice is the fork from Open Office now that Oracle has taken possession of the latter. Right now it’s almost exactly the same, but it’s where the interesting future developments will occur.
Categories: tech Tagged with: howto • libreoffice • openoffice • ubuntu Date: April 17th, 2011 dw
Firefox is hyping/explaining Firefox 4 with a “web o’ wonder” site that features three demos. I found the HTML5 poster helpful.
On the other hand, the page does a lousy job of explaining what “The London Project” is demonstrating. (Plus, the video playback really needs some indication of how long it is.)
Categories: tech Tagged with: html5 Date: March 4th, 2011 dw
I enjoyed this explanation of how Google updates Chrome faster than ever by cleverly only updating the elements that have changed. The problem is that software in executable form usually uses spots in memory that are hard-coded into it: Instead of saying “Take the number_of_miles_traveled and divide it by number_of_gallons_used…”, it says “Take the number stored at memory address #1876023…” (I’m obviously simplifying it.) If you insert or delete code from the program, the memory addresses will probably change, so that the program is now looking in the wrong spot for the numbers of miles traveled, and for instructions about what to do next. You can only hope that the crash will be fast and while in the presence of those who love you.
So, I enjoyed the Chrome article for a few reasons.
First, it was written clearly enough that even I could follow it, pretty much.
Second, the technique they use is not only clever, it bounces between levels of abstraction. The compiled code that runs on your computer generally is at a low level of abstraction: What the programmer thinks of as a symbol (a variable) such as number_of_miles_traveled gets turned into a memory address. The Chrome update system reintroduces a useful level of abstraction.
Third, I like what this says about the nature of information. I don’t think Courgette (the update system) counts as a compression algorithm, because it does not enable fewer bits to encode more information, but it does enable fewer bits to have more effect. Or maybe it does count as compression if we consider Chrome to be not a piece of software that runs on client computers but to be a system of clients connected to a central server that is spread out across both space and time. In either case, information is weird.
Categories: infohistory, tech Tagged with: chrome • compression • courgette • google • updates Date: February 13th, 2011 dw
« Previous Page | Next Page »
|