July 30, 2009

Annals of No One Cares But Me: Big pixel drivers

During the late 1980s and early 1990s, I had a peculiar interest in two topics just about no one should or does care about: Drivers for unexpected output devices, and bigpixels. These interests were piqued by my place of employment. I worked at Interleaf, an early innovator in electronic publishing. Back in the day, we had to write our own drivers for the rare and expensive high-resolution printers able to show off the high-res, proportionally printed, typeset-quality, text ‘n’ graphic output our software was able to create. So, I naturally used to care about odd output devices — e.g., eventually our software was used to print low-res codes on soda cans — and output composed of huge pixels.

Therefore, I was delighted to read in the Boston Globe about Artaic, a company that uses a computer to translate images into robotically-created mosaics. It’s got it all: An unusual output device that uses macro-scale pixels.


July 25, 2009

AP to digitally monitor copyright

The AP has announced it is going to use an automated system to monitor the use of AP content on the Web, looking for copyright violations. The empire is fighting back. From the press release:

The Associated Press Board of Directors today directed The Associated Press to create a news registry that will tag and track all AP content online to assure compliance with terms of use. The system will register key identifying information about each piece of content that AP distributes as well as the terms of use of that content, and employ a built-in beacon to notify AP about how the content is used.

I think there are three possible broad-stroke outcomes:

1. The AP takes an enlightened and generous view of copyright protection and its terms of use, encouraging people to link to and cite its stories, and saving its angry face for commercial thieves, wholesale infringers, and other scum. The AP remains a major source of news, fulfills the social mission of the newspapers who are its members, and our culture is better off for it.

2. The AP’s automated system is set on a hair trigger. The AP protects its copyright so well that no one ever hears from it again.

3. The AP acts inconsistently. It sends scary letters to teenagers who copy three paragraphs about the Jonas Brothers and sics lawyers on a professor teaching a course on media studies. No one understands what the AP is doing, so we all get scared and hate it.

To start with, it’d be great if the AP’s copyright warnings didn’t just tell people what they can’t do, but also told them what they can do, and encouraged us to re-use the material as much as possible. On the other hand, since one of the aims of the new system (according to the press release) is to facilitate the use of pay walls, I expect we’ll see more of the AP’s content making itself irrelevant.

July 23, 2009

Accountable bloggers and journalists

[Note 1.5 hrs after posting this: Ethan Zuckerman has just put up a superb post on this topic. I suggest you read that instead of this.]

Jillian York of the Berkman Center explains the current confusion about the NY Times’ rather casual suggestion (in a blog post) , based on an accusation in a tweet from Omid Habibinia, that Hossein Derakhshan (aka Hoder) has been an agent for the Iranian government, basically ratting out pro-democracy bloggers. The NY Times has now gone meta on the accusation, saying it only reported it because it’s a sign of the discord and distrust, etc., etc. But it’s still a dangerous charge to propagate. Jillian wants to know why we’re blaming the NY Times blogger and not Habibinia.

I’ve got enough blame in my backpack for both. But I do think that since the NY Times trades on its credibility, it has a greater responsibility. When the NY Times reports a rumor, it not only amplifies the rumor, it inevitably adds credibility to it. That’s just the way it is, and, it’s also how the NY Times wants it.

(I wish I could track down the article I read today about the difficult the human brain has in unlearning bad info even after it’s been shown to be bad. The article talked about the increase in the belief that Iraq had WMDs after it was shown that it did not.)

July 22, 2009

My PDF talk on facts ‘n’ transparency

Link. (The video embeds my slides, but (1) they get more and more out of order in this YouTube; they were in the right order when I actually presented them. 2. My font got lost somewhere in the translations, and so there’s a fair bit of mis-sizing, text overflows, etc.) (I posted about one of the ideas in the talk (transparency as the new objectivity) here.)

July 18, 2009

When there’s no such thing as the best

I posted my post about the Sotomayor hearings over at Huffington, where I got a grand total of two comments. The second one raised an interesting point. (The first one was funny.)

Or, “Senator, would you simply prefer that the Court be comprised of the best legal minds in the nation, regardless or their race, creed, or color, despite the fact that such a concept is foreign to the race conscious liberals among us?” – Parducci

That’s a reasonable response (leaving out everything after the “despite”), but I think it’s fundamentally wrong, since it assumes there is a way to rank order legal minds. There isn’t, because there is no such order.

Look at the current Justices. You may be able to say that one particular Justice’s “legal mind” is not as good as the rest (“Judge So-and-So just isn’t up to snuff”), but there isn’t any real way to rank them in order (except perhaps by ow well their decisions accord with political sides). With heart surgeons, maybe you can look at the survival rates of their patients — and there are problems with that — but for judges, there aren’t criteria that result in a reliable, accurate, and agreed-upon quantitative ranking. Likewise, who would think there’s any sense in trying to numerically rank philosophers, historians, or chefs? You can see that a particular one isn’t in the top rank or is out of her league, but within that top rank, there isn’t a numeric ordering.

So, for nominees to the Supreme Court, the idea that we should take “the best legal minds” actually means that we should choose from among those who are highly qualified for the job. Since that class is far larger than nine, we get to choose our Justices based on many considerations, including the likely effect they’ll have on the political balance of the court and — yes — the likely effect they’ll have by bringing a diversity of experience and outlook. For the wisdom of a group is enhanced by including difference within it.

In fact, it would be interesting to see how the degree of qualification (based on whatever criteria one wants to suggest) going into the Court matches with the performance of the Justice over the course of her or his term.

July 17, 2009

Search matchups

Google vs. Yahoo

Google vs. WolframAlpha

Google vs. Bing

July 14, 2009

[berkman] Mapping the global commons

Giorgos Cheliotis of the National Univ of Singapore, and a visitor researcher here at the Center, is giving a lunchtime talk at the Berkman Center called “Mapping the Global Commons: A quantitative perspective on free cultural practice.” How large and free are the Commons? (He’s excluding open source software from his discussion of the Commons)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Giorgos has been working with Creative Commons. He points to a number of works, including by Lessig, and David Bollier’s “Viral Spiral,” which is a history of the digital commons. If the movement is old enough that histories are being written, Giorgos says, it may be time to take a fresh look at it.

He says the digital commons consists of shared resources, users, open licenses, and remixes. To measure its size, you can ask how people use it, how many resources in it, how quickly it’s growing, and how much is contributed back to the pool. How free the pool is will obviously affect how its gets used and remixed. All this is hard to measure, Giorgos says, because there’s no central registry. One approach would try to count everything that’s there. Another uses estimates, community-specific data, and external reports and local knowledge. Giorgos uses the latter technique. There is a trade-off between scale and accurate/richness of the data set.

He and his colleagues are building a live-data wiki platform to track the global development of open licensing (CC only for now): (It’s early beta, pre-release, and still under development.) Giorgos walks us through it. [You can give it a try yourself. It’s self-explanatory.] AT the moment, the wiki says that there are 170,268,161 Creative Commons-licensed works. At the site you can break this down by region. Asia is growing quickly. Brazil has lots. Spain is ranked #1. (You can zoom in on the map by drag-selecting an area.)

The project is aimed at the media, researchers, funding organizations…

The regions each have a “freedom score” that weights the CC licenses by how restrictive or permission they are. The overall weighted average is 3.29 out of 6. US: 3.1. Spain: 3.47. Brzil: 2.34. Thailand: 2.58 (which is a decrease). Korea: 1.76 (but lots of licenses). Giorgos says that presenting this data sometimes nudges people to work on boosting their country’s score.

The tables of data and the maps generated from them are automatically generated and cannot be changed by wiki users; the annotation and commentary can be changed. To see an example of a manually-curated page, see Singapore‘s. Giorgos points out that this raises synchronization issues: The data is updated but the narrative may not be.

How now asks how much is being remixed. They’ve focused on ccMixter, where everything has a CC license and can be remixed. You can see the chains of influences. He shows a visualization of the data: Each track is a node, with lines connecting them to remixes. The maximum path length is 6 (a remix of a remix of a remix, etc.) But it drops off quite steeply after path length 2. 60% of uploaded items don’t get remixed, but remixing accounts for more than half of the total production volume. In a“bow-tie” analysis, there’s a core of about 12% core contributors whose authors’ tracks are linked to and who link out; if you take contests out of the picture, the core goes up to 18% (although about the same absolute number) and the “tendrils” go down from 50% to 20%. [Giorgos presents some other visual analyses, but I can’t follow the visual presentation of quantitative information. Sorry. It’s a brain problem of mine.] In the core, there are more reciprocal relationships, which seems to show that the members of the core community see one another as peers.

33% of generation 1 remixes are contest entries: An artist or label sponsors a contest for the best remixes of a track. Contests attract one-time remixes who are “not productive otherwise in the community.” But, are contests part of a sharing economy, he asks? Some scholars say that contests help strengthen a sense of community. Giorgos is uncertain about what to make of contests.

Q: [me] Public domain? Media types?
A: Neither of those types of metadata are easily available.

Q: CC has the metadata about the media type. And it would be interesting to see how the licenses vary by media type.
A: It’s possible, but we haven’t done it so far. I have noticed that photographers tend to be more protective of their content than are musicians.

Q: Maybe photographers are worried that their work will be used to create a false image, which isn’t an issue for musicians.
A: I think that’s probably right. Music is usually used for entertainment. Photos are also used for information.

Q: What are you aspirations for this as data collection project?
A: I was motivated initially to do this


July 8, 2009

Running thoughts

I run. Yes, I know the idea is ridiculous, but not half as ridiculous as the actual sight of me “running.” The only indication that I am running and not, say, just leaning forward slightly is that that posture could under no circumstances produce that amount of sweat. Showering for me is not going from dry to wet; it’s merely the replacement of sea water with fresh.

Well, enough about my sweat. Here are some random thoughts from the hard sidewalks of Brookline and Brighton.

0-10 minutes: Jill Sobule is really good. Why don’t I listen to her more? I especially like the songs where she reveals something unexpected about the person she’s singing about. That’s the essence of the narrative art. Also, why is that new song about kissing a girl bigger than Jill’s was?

10-12 minutes: Jeez, Hegel was right about how history works. Everywhere you look at what the Net is doing to us, old forms are being contradicted, but also raised up, and then overcome by something new that includes it while going beyond it. E.g., experts are better able to do what they do, but put them into a network with other experts — and non-experts — and you get the whole expertise taken to a new level. Likewise, the massness of the Web nevertheless is raising up a new type of local-ness, including that in some public, mass-y places there will be nooks with the Norman Rockwell expectation that people will know your name. Or avatar, anyway.

12-20 minutes: Since at the current pace, the number of registered Web domains will hit infinity in the year 2013, what will be the most efficient search algorithm to look up any one of them? Even if they were alphabetized, could you do the old thing of dividing the list in half to see if the target term is in part one or part two, and then dividing it again and again? With an infinite list, wouldn’t that on average take, um, forever? In fact, how would you even know where the middle was to do the first divide? Well, I suppose you could assign them numbers and then divide them into even and odd numbers. But you’re still talking about infinities here. Jeez, I wish I’d taken math after high school.

20-43 minutes: What is the name of the part of the leg between the ankle and the calf — the back part of the leg, not the shins — because whatever it is, it’s on fire. Undercalf? Backshin? The limpmaster? The quadralimpcets? The supra-ankle-scorcholater? Ow ow ow ow ow ow ow ow ow ow ow ow ow ow ow ow ow wow that’s a lot of sweat ow ow ow ow ow ow ow ow.

July 7, 2009

Free book on search interfaces

Berkeley’s Marti Hearst, who was way ahead of everyone else in faceted classification (e.g.,flamenco) , has written a a definitive book on user interfaces to search engines. And it’s up on the Web for free, if that’s the way you roll. Thanks, Marti!

July 1, 2009

PDF: The takeway

PDF was an unusually rich conference. Great folks there and an especially good year to be talking about the effect of the Net on politics and governance.

My take-away (although having a single take-away from a conference I just said is rich is rather contradictory, don’t you think?): The Web has won in a bigger way than I’d thought. The people President Obama is appointing to make use of the Web for increased citizen participation and greater democracy (well, at least as access to the Web and the skills required are distributed more evenly) are our best, brightest, and webbiest. And they are doing remarkable things.

Douglas Rushkoff interviewed me for his radio show yesterday or was it the day before? Anyway, here it is. We talked about PDF and about my presentation there, which was about transparency and the changing role of facts.

