Joho the BlogSeptember 2012 - Page 2 of 3 - Joho the Blog

September 10, 2012

Obesity is good for your heart

From TheHeart.org, an article by Lisa Nainggolan:

Gothenburg, Sweden – Further support for the concept of the obesity paradox has come from a large study of patients with acute coronary syndrome (ACS) in the Swedish Coronary Angiography and Angioplasty Registry (SCAAR) [1]. Those who were deemed overweight or obese by body-mass index (BMI) had a lower risk of death after PCI [percutaneous coronary intervention, aka angioplasty] than normal-weight or underweight participants up to three years after hospitalization, report Dr Oskar Angerås (University of Gothenburg, Sweden) and colleagues in their paper, published online September 5, 2012 in the European Heart Journal.

Can confirm. My grandmother in the 1930s was instructed to make sure she fed her husband lots and lots of butter to lubricate his heart after a heart attack. This proved to work extraordinarily well, at least until his next heart attack.

I refer once again to the classic 1999 The Onion headline: Eggs Good for You This Week.

Comments Off on Obesity is good for your heart

September 9, 2012

Beginner2Beginner: Javascript multi-file upload + PHP to process it

Much as I love being a hobbyist programmer, it can sometimes be frustrating as all get-out. Sometimes it’s just a bug because I made a dumb error (getting a variable’s scope wrong) or because I made an assumption about how something works (BBedit‘s hex dump does not show you the content of the file on the disk, but of the file in memory, including the line endings it’s transformed). But then there are the frustrations that come from not having the slightest idea of the basics. The worst are the basics that are so basic that the explanations of them assume you already know more than you do.

Welcome to the world of pain known as uploading multiple files using Javascript. For example, suppose you are writing an app that lets users take notes on an article using a plain old text processor. They can then upload those note files to some code that processes them, perhaps turning them into a database. Rather than having users upload one file at a time, you want to let them upload a bunch.

There are plenty of slick, free examples on the Web that provide beautiful front ends for doing this. Many of them I could get to work, but not exactly the way that I wanted. Besides, I was really really really confused about what happened after the front end.

So, after a lot of skullpounding and forehead slapping, here’s a guide. But please take seriously this warning: I barely know what I’m doing, and I’m undoubtedly doing this in the clunkiest fashion possible. I am very likely getting important things wrong. Some of them may be fatal, at least in the programmatic sense. (If you have a correction, please let me know. Note I may fix the code that follows, rather than doing the usual — and proper — Web thing of striking through the errors, etc.) Here goes….

To enable uploads, you’ll be writing code that will live in two places. The user interface code is in the HTML running in the user’s browser. The files are going to be uploaded — copied — to a web server. The code in the browser is going to be Javascript. The code on the server is going to be PHP, because Javascript is for browsers. (Oversimplification noted.) Those two places can be physically the same machine. If you’re using a Mac, a web server comes standard; if you don’t have a web server handy, there are bunches of free ones; that’s beyond the scope of this post.

After experimenting with many of the beautiful packages that are available, I gave up and went with the HTML5 solution. Modern browsers will have no problem with this. If you need to design for the world of old-fashioned browsers, then you need to find a competent explanation. In short: I love you, but go away! (Apparently, if you use the HTML5 solution, it will still be usable in lesser browsers, although it will allow users to select only one file at a time.)

HTML5 makes it ridiculously easy to do the user interface bit: you just tell it you want to allow multiple selections within a file-chooser dialogue. This comes from a post by Tiffany Brown:

<form action="processThem.php" method="post" enctype="multipart/form-data">
<input type="file" value="" name="upload[]" multiple>
<button type="submit">Upload!</button>
</form>

This will create an input field that when clicked will launch a file-browsing dialogue box in which the user can select multiple files. It then sends it to the PHP script called “processThem.php” on your server. The two key points to note are that the first line does the work of allowing multiple choices, and the “[]” in the name turns that variable into an array that will pass the entire list of user choices to the PHP script waiting on your server.

Here’s what it looks like, although this particular example won’t actually upload anything:



Now you have to create the “processThem.php” script (or whatever name you’ve specified) on your server. The uploaded files get placed in an array called $_FILES. But, they don’t get stored on the server for long: they are stored in a temporary folder from which they are automatically deleted after a little while. So, you need to process them, and quite possibly move them for permanent storage to a folder of your choosing. Here’s some sample PHP code from an anonymous commenter (“Me”) on the Tiffany Brown post:


<?php
$error_message[0] = "Unknown problem with upload.";
$error_message[1] = "Uploaded file too large (load_max_filesize).";
$error_message[2] = "Uploaded file too large (MAX_FILE_SIZE).";
$error_message[3] = "File was only partially uploaded.";
$error_message[4] = "Choose a file to upload.";

$upload_dir  = './tmp/';
$num_files = count($_FILES['upload']['name']);

for ($i=0; $i < $num_files; $i++) {
    $upload_file = $upload_dir . urlencode(basename($_FILES['upload']['name'][$i]));

    if (!preg_match("/(gif|jpg|jpeg|png)$/",$_FILES['upload']['name'][$i])) {
        print "I asked for an image...";
    } else {
        if (@is_uploaded_file($_FILES['upload']['tmp_name'][$i])) {
            if (@move_uploaded_file($_FILES['upload']['tmp_name'][$i], 
                $upload_file)) {
                /* Great success... */
                echo "hooray";
                //$content = file_get_contents($upload_file);
                //print $content;
            } else {
                print $error_message[$_FILES['upload']['error'][$i]];
            }
        } else {
            print $error_message[$_FILES['upload']['error'][$i]];
        }    
    }
}
?>

Let’s walk through this.


$error_message[0] = "Unknown problem with upload.";
$error_message[1] = "Uploaded file too large (load_max_filesize).";
$error_message[2] = "Uploaded file too large (MAX_FILE_SIZE).";
$error_message[3] = "File was only partially uploaded.";
$error_message[4] = "Choose a file to upload.";

In the first few lines, Me does us the favor of providing non-technical explanations of possible errors, so that if something goes wrong, it will be easier to know exactly what it was.


$upload_dir  = './tmp/';

Then Me designates a particular folder as the container for the uploaded files. Me chooses one called “tmp” in the same directory as the PHP script. (Make sure you have such a folder and the permissions are set, or, of course, create one with whatever name you’d like.)


$num_files = count($_FILES['upload']['name']);

Then Me gets a count of how many files were uploaded, and stores it in the variable $num_files. You tell that variable that you want the number of files that were included in “upload[]” in the form on your HTML. (You can use whatever name you want on that form, so long as you use the same one in your PHP.)


for ($i=0; $i < $num_files; $i++) {
    $upload_file = $upload_dir . urlencode(basename($_FILES['upload']['name'][$i]));

Then Me loops through all the files, assigning each one in turn to the variable $upload_file. But notice the weirdness of this part of the line:

$upload_file = $upload_dir . urlencode(basename($_FILES[‘upload’][‘name’][$i]));

First the easy parts. The baseline function returns just a file’s name without any path information; we want that because the point of this line is build a path to where the file will be saved in the folder you’ve set up for it. Also, I added the urlencode function in case the name of the file your user uploaded contains spaces or other characters that makes your server barf.

Now consider $_FILES[‘upload’][‘name’][$i]. It’s got those weird bracketed terms because $_FILES is an associative array. You can think of the usual sort of arrays as pairing a value with a number; give the array the number and it returns the value. In an associative array, values are paired with arbitrary keys (i.e., a word); give it the key and it returns the value. Here are the pre-defined keys for the associative array that gets sent to the PHP script when a user uploads files:

  • name: The file name of the uploaded file
  • type: Is it an image? A music file? etc.
  • size: The size in bytes
  • tmp_name: The crazy-ass name of the copy being stored on the server
  • error: Any error codes resulting from the upload

So, suppose you’re cycling through the array of uploaded files as in our example, and you want to get the name of the current file (i.e., file $i in the sequence):


$fname = $_FILES['upload']['name'][$i];

The [‘upload’] designates the array of all uploaded files. The [$i] pulls out of that array all of the information about one particular uploaded file, just like with ordinary array. The [‘name’] gets the value associated with that key for that particular file. As $i is incremented, you get the name of each file. If you wanted the crazy-ass temporary name, you would put in tmp_name instead of name, and likewise for the other parameters.


if (!preg_match("/(gif|jpg|jpeg|png)$/",$_FILES['upload']['name'][$i])) {
        print "I asked for an image...";
    }

Next Me optionally checks the name of the uploaded file for a particular set of extensions in case you want to limit the uploads to images (as in Me’s example) or whatever. Me is using regex to do the work, a function for which I am doomed to a copy-and-paste understanding.


  if (@is_uploaded_file($_FILES['upload']['tmp_name'][$i])) {

Next Me does a check to make sure that the file was actually uploaded and is not the result of malicious code. is_uploaded_file is a built-in PHP function. The preceding “@” suppresses error messages; I don’t know why Me did that, but I’m confident it was the right thing to do. While you are debugging, you might want to take the @ out.


move_uploaded_file($_FILES['upload']['tmp_name'][$i], $upload_file)

If the file passes that test, then Me moves it to the folder s/he designated. Note that we’ve already put the pathname to the storage folder into $upload_file.

Obviously, where the script has the comment “Great Success” you would put the code that does what you want to do to the uploaded file. I have added two lines — commented out — that get and print the content of the file; that gets ugly if the file isn’t some type of text file.

So, please let me know what I’ve gotten wrong here. (And maybe sometime I’ll give you some truly embarrassing code for returning the results of the PHP on the page that has the upload form on it.)

8 Comments »

September 7, 2012

Mitt Romney’s distrust of entrepreneurship

Mitt Romney is taking some flack for using some notoriously flaky science as his example of good science. But in the same passage he betrays a Big Corporate view of how innovation works that should cost him the support of every entrepeneurial startup in the country.

Here’s the passage from his Washington Examiner interview (with a hat tip to BoingBoing):

CARNEY: What role should government have in promoting certain industr

And keep in mind that Romney here is not talking about the auto industry specifically; rather, he is explaining why governments ought not to back entrepreneurial companies. It’s not just that governments are bad at picking winners, it’s that when the winners are startups — even when they’re way out of the prototypical garage — they’re unlikely to get past “delight.” So, wies or economic activities such as homeownership, or manufacturing, renewable energy or fossil fuel energy, eBig Corp xports, or just advanced technology? What sort of subsidies and incentives do you favor? You had some of these in Massachusetts, I know.

ROMNEY: Very limited — my answer Big Corp to your first question. I’m not an advocate of industrial policy being formed by a government. I do believe in the power of free markets, and when the government removes the extraordinary burdens that it puts on markets, why I think markets are more effective at guiding a prosperous economy than is the government.

So for instance, I would not be investing massive dollars in electric car companies in California. I think Tesla and Fisker are delightful-looking ve

And keep in mind that Romney here is nBig Corp ot talking about the auto industry specifically; rather, he is explaining why governments ought not to back entrepreneurial companies. It’s not just that governments are bad at picking winners, it’s that when the winners are startups — even when they’re way out of the prototypical garage — they’re unlikely to get past “delight.” So, whicles, but I somehow imagine that Toyota, Nissan, and even General Motors will produce a more cost-effective electric car than either Tesla or Fisker. I think it is bad policy for us to be investing hundreds of millions of dollars in specific companies and specific technologies, and developing those technologies.

I do believe in basic science. I believe in participating in space. I believe in analysis of new sources of energy. I believe in laboratories, looking at ways to conduct electricity with — with cold fusion, if we can come up with it. It was the University of Utah that solved that. We somehow can’t figure out how to duplicate it.

So, first the problem with his science remark. I understand that he’s boosting Utah. But the 1989 experiment by Stanley Pons and Martin Fleischmann was famous not only because it could not be replicated, but because it was prematurely hyped by Pons and Fleischmann before it had gone through peer review or had been replicated. (As BoingBoing points out, the Wikipedia article is worth reading.) No matter what you think of the experiment, it is a terrible example to use as proof that one appreciates basic science…unless you’re citing the rejection of the Pons-Fleischmann results, which Romney explicitly was not. The issue is not merely that Romney continues to believe in a discredited claim. The real issue is that this suggests that Romney doesn’t understand that science is a methodology, not merely the results of that methodology. That’s scary both for a CEO and for a possible president.

I’m at least as bothered, however, by Romney’s casual dismissal of entrepreneurial startups as a source of innovation: “I think Tesla and Fisker are delightful-looking vehicles, but I somehow imagine that Toyota, Nissan, and even General Motors will produce a more cost-effective electric car than either Tesla or Fisker.” “Delightful” is a dismisive word in this context, as evidenced by the inevitability of the “but” that follows it. Romney, it seems, doesn’t believe that startups can get beyond delight all the way to the manly heavy lifting that makes innovation real. For that you need the established, massive corporations.

Wow. Could there be a more 20th century vision of how a 21st century entrepreneurial economy should work?

2 Comments »

September 6, 2012

One way Obama’s speech might go

He can’t possibly top Bill Clinton’s speech, but here’s what I hope President Obama does tonight.

First, I hope he stays entirely on policy points, although I wouldn’t mind a little uplifting rhetoric. I certainly don’t need to be told again about Bain.

Second, while Clinton did a superb job explaining what’s wrong with the Republican argument, there’s still work to do. So, I hope tonight President Obama reminds us of all that he has done, for his achievements are epic. But I hope he does so in a way that neatly folds and stacks each item on that laundry list.

For example, he might remind us of how bad the circumstances were, and then show us the method by which he addressed those problems. First, you stimulate the economy: what was the money spent on, and what were the results. Second, you take care of the most vulnerable: here’s what we did, and here are the results. Third, you make investments for the future: here’s what we invested in and here’s why it matters. Fourth, you do this while you also deal with the developments and opportunities the world presents: here’s what happened, and here’s how we responded. Fifth, from Ledbetter to ending Don’t Ask Don’t Tell, you try to make life more fair for all our citizens.

That’s my idea for conveying the methodological competency with which the Administration has dealt with the worst economic meltdown since the Great Depression. But, I’m looking forward to hearing the speech tonight and thinking, “Man, that’s waaay better!”

2 Comments »

September 5, 2012

[2b2k] Library as platform

Library Journal just posted my article “Library as Platform.” It’s likely to show up in their print version in October.

It argues that there are reasons why libraries ought to think of themselves not as portals but as open platforms that give access to all the information and metadata they can, through human readable and computer readable forms.

1 Comment »

Innovation at the State Dept.

I just read [email protected], a pretty amazing report by the Lowy Institute, an independent policy think tank, about the extent and depth of e-diplomacy initiatives at the State Department.

I came away with several impressions:

  • The Internet and social networking are central to how State does its business

  • The Net and social networking are transforming how State does its business

  • The Net is bringing about cultural changes at State

That third point is for me the most striking. The State Department has been hypersensitive about security. While that of course remains part of State’s DNA, the Department is also becoming realistic about the gains that can be made by not reflexively shutting down every proposal. For example, the Lowry report writes:

At a Twitter training course for State Department employees attended by the author, the 50 or so officers present — some of whom admitted to never having used social media — were exhorted to give it a go, you can’t go wrong. Policy guidane was barely mentioned.

Closer edamination reveals why this has not led to disaster. To begin with you are dealing with highly educated employees with a strong desire to keep their jobs…

Likewise, the report cites a new willingness to experiment and fail, which is essential for innovation but anathema to State’s traditional culture. Implicit in many of the initiatives, there is new emphasis on Need to Share rather than Need to Know; the latter policy optimizes for security at the cost of intelligence.

The report goes through the many offices directly involved in e-diplomacy, but singles out the 80-person E-Diplomacy group for special focus and praise, lauding its entrepreneurial spirit. That’s the group I’m proud to have been attached to that group for two years as a State Department Franklin Fellow, and, as they say at Reddit, I can confirm.

If you’ve had any interaction with the State Department — where in my limited experience I have met true patriots — you know that it is one of the least likely institutions to hop on the Internet train. I’d give credit to the transformation to three factors:

First, starting with Colin Powell, continued by Condaleeza Rice, and especially with Secretary Clinton (and her choice of Alec Ross (twitter) and until recently Ben Scott), the leadership has embraced these changes.

Second, groups like E-Diplomacy have served State by building tools that serve State’s needs, and have at the same time modeled the webby way of doing business. One great example is Corridor, State’s new professional networking environment, specially tuned to the needs and norms of State Dept. employees.

Third, the State Department’s 80,000 employees are on the ground around the world. This means that the organization is fundamentally reality-based, even when the leadership gets warped by politics. These Net-based initiatives are being embraced because they work. Likewise for the Net-based culture that is infusing State as more of the world and more State Dept. employees go online. Leaders of the e-initiatives such as E-Diplomacy’s Richard Boly combine a drive to achieve pragmatic results with an entrepreneurial appreciation of failure as a key tool for success.

I acknowledge that my personal experience of the State Department is warped by the amount of time I’ve gotten to spend with its webbiest elements. But I’ve also seen tangible evidence that a belief in openness, innovation, and connection is taking root there. The Lowy report confirms that. Worth reading.

2 Comments »

September 4, 2012

Couchsnarking: I’ll be tweeting the DNC

Just a warning: I’m going to be watching the Democratic National Convention this week, and undoubtedly will be unable to keep my finger off the tweet button. When I tweeted Romney’s speech at the RNC, I was unable to control the pace of my tweeting, although I expect not to be as provoked during the DNC. (Spoiler: I am a Democrat.)


If you want to follow me — or mute me — my twitter handle is @dweinberger.

Comments Off on Couchsnarking: I’ll be tweeting the DNC

[2b2k] Crowdsourcing transcription

[This article is also posted at Digital [email protected].]

Marc Parry has an excellent article at the Chronicle of Higher Ed about using crowdsourcing to make archives more digitally useful:

Many people have taken part in crowdsourced science research, volunteering to classify galaxies, fold proteins, or transcribe old weather information from wartime ship logs for use in climate modeling. These days humanists are increasingly throwing open the digital gates, too. Civil War-era diaries, historical menus, the papers of the English philosopher Jeremy Bentham—all have been made available to volunteer transcribers in recent years. In January the National Archives released its own cache of documents to the crowd via its Citizen Archivist Dashboard, a collection that includes letters to a Civil War spy, suffrage petitions, and fugitive-slave case files.

Marc cites an article [full text] in Literary & Linguistic Computing that found that team members could have completed the transcription of works by Jeremy Bentham faster if they had devoted themselves to that task instead of managing the crowd of volunteer transcribers. Here are some more details about the project and its negative finding, based on the article in L&LC.

The project was supported by a grant of £262,673 from the Arts and Humanities Research Council, for 12 months, which included the cost of digitizing the material and creating the transcription tools. The end result was text marked up with TEI-compliant XML that can be easily interpreted and rendered by other apps.

During a six-month period, 1,207 volunteers registered, who together transcribed 1,009 manuscripts. 21% of those registered users actually did some transcribing. 2.7% of the transcribers produced 70% of all the transcribed manuscripts. (These numbers refer to the period before the New York Times publicized the project.)

Of the manuscripts transcribed, 56% were “deemed to be complete.” But the team was quite happy with the progress the volunteers made:

Over the testing period as a whole, volunteers transcribed an average of thirty-five manuscripts each week; if this rate were to be maintained, then 1,820 transcripts would be produced every twelve months. Taking Bentham’s difficult handwriting, the complexity and length of the manuscripts, and the text-encoding into consideration, the volume of work carried out by Transcribe Bentham volunteers is quite remarkable


Still, as Marc points out, two Research Associates spent considerable time moderating the volunteers and providing the quality control required before certifying a document as done. The L&LC article estimates that RA’s could have transcribed 400 transcripts per month, 2.5x faster than the pace of the volunteers. But, the volunteers got better as they were more experienced, and improvements to the transcription software might make quality control less of an issue.

The L&LC article suggests two additional reasons why the project might be considered a success. First, it generated lots of publicity about the Bentham collection. Second, “no funding body would ever provide a grant for mere transcription alone.” But both of these reasons depend upon crowdsourcing being a novelty. At some point, it will not be.

Based on the Bentham project’s experience, it seems to me there are a few plausible possibilities for crowdsourcing transcription to become practical: First, as the article notes, if the project had continued, the volunteers might have gotten substantially more productive and more accurate. Second, better software might drive down the need for extensive moderation, as the article suggests. Third, there may be a better way to structure the crowd’s participation. For example, it might be practical to use Amazon Mechanical Turk to pay the crowd to do two or three independent passes over the content, which can then be compared for accuracy. Fourth, algorithmic transcription might get good enough that there’s less for humans to do. Fifth, someone might invent something incredibly clever that increases the accuracy of the crowdsourced transcriptions. In fact, someone already has: reCAPTCHA transcribes tens of millions of words every day. So you never know what our clever species will come up with.

For now, though, the results of the Bentham project cannot be encouraging for those looking for a pragmatic way to generate high-quality transcriptions rapidly.

1 Comment »

Who forces Google to remove search results because of copyright claims?

According to a post at TechDirt by Riaz K. Tayob, Google has released data on which organizations request certain search results be suppressed because of copyright issues.

From TechDirt:

It may be a bit surprising, but at the top of the list? Microsoft, who has apparently taken down over 2.5 million URLs from Google’s search results. Most of the the others in the top 10 aren’t too surprising. There’s NBC Universal at number two. The RIAA at number three (representing all its member companies). BPI at number five. Universal Music at number seven. Sony Music at number eight. Warner Music doesn’t clock in until number 12.

The velocity is increasing:

As it stands now, Google is processing over 250,000 such requests per week — which is more than they got in the entire year of 2009. For all of 2011, Google receive 3.3 million copyright takedowns for search… and here we are in just May of 2012, and they’re already processing over 1.2 million per month.

The requests and Google’s responses must both be generated automatically. This raises once again the problem with having robots enforcing the law: They don’t know about leeway, which means they (a) lack common sense, (b) have no way of balancing against greater goods, and (c) can’t tell when Fair Use should provide an exception. (Here’s an op-ed I wrote in 2003 about this.)

We saw this this weekend as robots blocked the use of perfectly legitimate film clips at the Hugo Awards. Ridiculous. And scary.

3 Comments »

September 3, 2012

If I were running the Democratic National Convention…

Speaker: A system that gives every child a free public education so they’re better citizens and workers. Who built that?

Audience: We built that!

Speaker: That’s right, we Americans built that. Together. And a national highway system in the 1950s that opened up markets for farmers, and small businesspeople. Who built that?

Audience: We built that!

Speaker: And the marvels of science and engineering that put people on the moon, and put the Curiosity rover on Mars. Who built that?

Audience: We built that!

Speaker: And a health care system so that every worker, every mother, every father, every child gets the medical care they need, without having to worry about losing their job or their home. Who built that?

Audience: WE BUILT THAT! WE BUILT THAT! WE BUILT THAT!

Well, you get the point. Let’s own the phrase.

(Also, I believe I have hereby demonstrated one good reason why I am not running the Democratic National Convention.)

3 Comments »

« Previous Page | Next Page »