logo

Let’s just see what happens

Mobile Version

About me

Newsletter

Videos

Speaker

Hard to Read? Choose a style: Style 1 Style 2 Style 3 Default Toggle Sidebars

Blog disclosure statement button

I twitter as dweinberger

Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary edition

Everything Is MiscellaneousEverything Is Miscellaneous
"[A] hell of a book ... an instant classic" - Cory Doctorow, BoingBoing.net

A "page-turner ... makes the consequences of the changes clearer than any work before", Frankfurter Allegemeine

Complete list of reviews, good bad and indifferent (with some commentary from me)

My 100 Million Dollar Secret cover
My 100 Million Dollar Secret

(For kids - Free!)

Small Pieces cover
Small Pieces Loosely Joined

( Buy it at Amazon)

Cluetrain cover
Cluetrain Manifesto

  • Blogroll

    • boingboing
    • Euan Semple
    • Akma
    • Jennifer Balderama
    • Thomas Barnett
    • Berkman Center
    • Blogher
    • Blog Sisters
    • danah boyd
    • BradSucks
    • Tim Bray
    • Dan Bricklin
    • Suw Charman
    • Ed Cone
    • Copyfight
    • Susan Crawford
    • Luca De Biase
    • Betsy Devine
    • Cory Doctorow
    • Richard Edelman
    • Paul English
    • Ernie the Attorney
    • Tom Evslin
    • Harold Feld
    • Seth Finkelstein
    • Glenn Fleishman
    • Steve Garfield
    • Dan Gillmor
    • Global Voices
    • Seth Gordon
    • Mathew Gross
    • Steve Himmer
    • Hoder
    • Denise Howell
    • Tara Hunt
    • David Isenberg
    • Joi Ito
    • Jeff Jarvis
    • Steve Johnson
    • Kalilily
    • Kenyan Pundit
    • Scott Kirsner
    • Valdis Krebs
    • Liz Lawley
    • Lawrence Lessig
    • Jessica Lipnack
    • Chris Locke
    • Rebecca MacKinnon
    • Kevin Marks
    • Tom Matrullo
    • Ross Mayfield
    • Peter Merholz
    • Susan Mernit
    • misbehaving
    • Peter Morville
    • Charlie Nesson
    • Michael O’Connor Clarke
    • John Palfrey
    • Frank Paynter
    • Chris Pirillo
    • Shelley Powers
    • Reed/Frankston
    • Jay Rosen
    • Scott Rosenberg
    • Karen “Freerange” Schneider
    • Doc Searls
    • Wendy Seltzer
    • Jeneane Sessum
    • Clay Shirky
    • Tim “Librarything” Spalding
    • Fred Stutzman
    • Tim Hwang
    • Joe Trippi
    • Jon Udell
    • Nancy White
    • M. Sue Willis
    • Dave Winer
    • WorldChanging
    • Ethan Zuckerman
  • Categories

    • abundance
    • ahole
    • berkman
    • blogs
    • broadband
    • business
    • censorship
    • cluetrain
    • copyright
    • culture
    • education
    • egov
    • entertainment
    • everythingIsMiscellaneous
    • experts
    • humor
    • infohistory
    • journalism
    • law
    • libraries
    • marketing
    • media
    • misc
    • moi
    • net neutrality
    • open access
    • peace
    • philosophy
    • policy
    • politics
    • puzzles
    • quick links
    • science
    • social media
    • taxonomy
    • tech
    • too big to know
    • travel
    • whines
  • Archives

    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009
    • January 2009
    • December 2008
    • November 2008
    • October 2008
    • September 2008
    • August 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
    • March 2008
    • February 2008
    • January 2008
    • December 2007
    • November 2007
    • October 2007
    • September 2007
    • August 2007
    • July 2007
    • June 2007
    • May 2007
    • April 2007
    • March 2007
    • February 2007
    • January 2007
    • December 2006
    • November 2006
    • October 2006
    • September 2006
    • August 2006
    • July 2006
    • June 2006
    • May 2006
    • April 2006
    • March 2006
    • February 2006
    • January 2006
    • December 2005
    • November 2005
    • October 2005
    • September 2005
    • August 2005
    • July 2005
    • June 2005
    • May 2005
    • April 2005
    • March 2005
    • February 2005
    • January 2005
    • December 2004
    • November 2004
    • October 2004
    • September 2004
    • August 2004
    • July 2004
    • June 2004
    • May 2004
    • April 2004
    • March 2004
    • February 2004
    • January 2004
    • December 2003
    • November 2003
    • October 2003
    • September 2003
    • August 2003
    • July 2003
    • June 2003
    • May 2003
    • April 2003
    • March 2003
    • February 2003
    • January 2003
    • December 2002
    • November 2002
    • October 2002
    • September 2002
    • August 2002
    • July 2002
    • June 2002
    • May 2002
    • April 2002
    • March 2002
    • February 2002
    • January 2002
    • December 2001
    • November 2001
    • 0
Top 10 Google First Names

June 30, 2010

 

SpokenWord lookng for curators

SpokenWord.org aggregates podcasts, almost all of which are free, and makes it easy for users to export them to, say, iTunes. It’s a non-profit site and is all about the openness. (Disclosure: I’m on its board.) Now SpokenWord is looking for volunteers to curate podcast feeds and episodes in topics that interest them. Their curated collections will be the main feature at the SpokenWord site, because nothing knows what’s interesting to humans better than other humans do. Details here.

Tags: aggregation, everythingIsMiscellaneous, podcasts

Date: June 30th, 2010

1 Comment »

June 20, 2010

 

Twitter metadata and where standards come from

Matthew Ingram at Gigagom blogs about an upcoming Twitter feature called Twitter Annotations. Well, it’s not actually a feature. It’s the ability to attach metadata to a tweet. This is potentially great news, since it will give us a way to add context to tweets and to enable machine-processing of tweets, not to mention that URLs could be sent as metadata rather than as subtractions from the 140-character limit. This is yet another example of information scaling to the point where we have to introduce more information to manage it. How about one of those bogus “laws” people seem to like (well, I know I do): Information sufficiently scaled creates a need for more information.

Twitter is specifying the way in which Annotations will be encoded, but not what the metadata types will be. You can declare a “type” with its own set of “attributes.” What types? Whatever you (or, more exactly, developers and hackers) find useful. Matthew cites a number of folks who are basically positive but who express a variety of worries, including Google open advocate Chris Messina who warns that there could be a mare’s nest of standards, that is, values for types and attributes. Dave Winer takes Google to task for slagging off on Twitter for this. I agree with his sentiment that Goliath Google ought to be careful about their casual criticisms. Nevertheless, I think Chris is right: Specifying the syntax but not the actual types and attributes will inevitably give rise to confusion: What one person tags as “topic,” someone else will tag as “subject,” and some people might have the nerve to actually use words for types in, say, Spanish or Arabic. The nerve! [THE NEXT DAY: Here's Chris' original post on the topic, which is more balanced than the bit Matthew excerpts, and which basically agrees with the next paragraph:]

But, so what? I’d put my money on Ev Williams and Biz Stone any time (important note: If I had money). You couldn’t have seriously proposed an idea as ridiculous as Twitter in the first place if you didn’t deeply understand the Web. So, yes, Chris is right that there’ll be some confusion, but he’s wrong in his fear. After the confusion there will be a natural folksonomic (and capitalist) pull toward whatever terms we need the most. Twitter can always step in and suggest particular terms, or surface the relative popularity of the various types, so that if you want to make money by selling via tweets, you’ll learn to use the type “price” instead of “cost_to_user,” or whatever. Or you’ll figure out that most of the Twitter clients are looking for a type called “rating” rather than “stars” or “popularity.” There’ll be some mess. There’ll be some angry angry hash tags. But better open confusion than expecting anyone — even the Twitter Lads — to do a better job of guessing what its users need and what clever developers will invent than those users and developers themselves.

Tags: eim, everythingIsMiscellaneous, metadata, twitter, twitter annotations

Date: June 20th, 2010

12 Comments »

June 13, 2010

 

Every color is miscellaneous

I’m embarrassed to say that I just read Randall Munroe’s fabulous color survey from early May. Readers were asked to supply names for colors. It’s a rich experiment: Naming and discrimination, gender differences, hacking, tagging, spamming, hilariousness. The results also seem to support prototype theory’s idea that we agree on what the “real” (prototypical) colors are, at least within a culture: This is blue, but that one is a variant that needs a modifier in front of it (“light blue”) or for which we use a variant name (“teal”).

Randall writes the webcomic XKCD, of course, which is the Doonesbury of his generation, except while you can imagine Garry Trudeau writing a satiric HBO series, you can’t imagine him running and analyzing a color survey.

(I heard about Randall’s color survey via the Mainstream: Christopher Shea at the Boston Globe blog. Christopher also points to Stephen von Worley’s color map. BTW, that post by Christopher also has a great note about iPad censoring a graphic version of the oft-banned James Joyce’s Ulysses. Anyway, I’ve really got to do a better job keeping up with XKCD.)

Tags: color, everything is miscellaneous, everythingIsMiscellaneous, prototype, xkcd

Date: June 13th, 2010

4 Comments »

June 6, 2010

 

Democratized curation

JP Rangaswami has an excellent post about the democratizing of curation.

He begins by quoting Eric Schmidt (found at 19:48 in this video):

“…. the statistic that we have been using is between the dawn of civilisation and 2003, five exabytes of information were created. In the last two days, five exabytes of information have been created, and that rate is accelerating. And virtually all of that is what we call user-generated what-have-you. So this is a very, very big new phenomenon.”

He concludes — and I certainly agree — that we need digital curation. He says that digital curation consists of “Authenticity, Veracity, Access, Relevance, Consume-ability, and Produce-ability.” “Consume-ability” means, roughly, that you can play it on any device you want, and “produce-ability” means something like how easy it is to hack it (in the good O’Reilly sense).

JP seems to be thinking primarily of knowledge objects, since authenticity and veracity are high on his list of needs, and for that I think it’s a good list. But suppose we were to think about this not in terms of curation — which implies (against JP’s meaning, I think) a binary acceptance-rejection that builds a persistent collection — and instead view it as digital recommendations? In that case, for non-knowledge-objects, other terms will come to the fore, including amusement value, re-playability, and wiseacre-itude. In fact, people recommend things for every reason we humans may like something, not to mention the way we’s socially defined in part by what we recommend. (You are what you recommend.)

Anyway, JP is always a thought-provoking writer…

Tags: 2b2k, everythingIsMiscellaneous, jp rangaswami

Date: June 6th, 2010

4 Comments »

May 12, 2010

 

The rectangular display of information

Search engines have traditionally focused on building lists. Increasingly, they’re turning to the rectangular display of information: Boxes and tables. Boxes require extracting the relevant information and presenting it four-square in front of the user. While lists sort in a single dimension, tables show at least two dimensions. Boxes and rectangles are useful filters.

Google today announced the further boxing and tabling of data, in response (one supposes) to Bing.com. The Google Blog recommends trying searching for dog breeds, broadway shows, catherine zeta-jones date of birth, or zebra. (Look for the “something different” list in the left margin when you do the zebra search.) I especially like the summary of sources Google gives when it flat-out answers a question.

More boxes! More tables!

Tags: everythingIsMiscellaneous, google

Date: May 12th, 2010

1 Comment »

April 27, 2010

 

[berkman] Luis von Ahn on free lunches, captcha, and tags

Luis von Ahn of Carnegie Mellon University is giving a Berkman lunchtime talk. [NOTE: I'm liveblogging. I'm making mistakes, leaving stuff out, paraphrasing, getting things wrong. This is an unreliable record.]

Luis invented captchas, the random characters you have to type in to convince a web page that you are a human and not a hostile software program. (He shows randomly generated sequences that happened to spell out “wait” and “restart.”) Captchas are useful, he says, when you’re trying to prevent people from gaming a system by writing a program to enter data robotically. They’re also useful to prevent spammers from signing up for free email accounts. To get around this, spammers have started up sweat shops where humans type captchas all day long; it costs the spammers about $0.33/account. And some porn companies ask users to type in a captcha to see photos; the captchas are drawn from email account applications. Damn clever!

He shows some variants. A Russian asks you to solve a mathematical limit. In India one asks you to solve a circuit. Luis says these aren’t all that effective because compputers can solve both problems, but they’re still better than the “what is 1 + 1?” captchas he’s found on US sites.

He says that about 200M captchas are typed every day. He was proud of that until he realized it takes about 10 seconds to type them, so his invention is wasting 500,000 hours per day. So, he wondered if there was a way to use captchas to solve some humungous problem ten seconds at a time. result: ReCAPTCHA. For books written before 1900, the type is weak and about 30% of the text cannot be recognized by OCR. So, now many captchas ask you to type in a word unrecognized when OCR’ing a book. (The system knows which words are unrecognized by running multiple OCR programs; ReCAPTCHA uses those words.) To make sure that it’s not a software program typing in random words, ReCAPTCHA shows the user two words, one of which is known to be right. The user has to type in both, but doesn’t know which is which. If the user types in the known word correctly, the system knows it’s not dealing with a robot, and that the user probably got the unknown word right.

ReCAPTCHA is a free service. Sites that use it have to feed back the entries for the unknown word. About 125,000 sites use it. They’re doing about 70M words per day, the equivalent of 2-4M books per year. If the growth continues, they’ll run out of books in 7 years, but Luis doesn’t think the growth will continue, so it might take twenty years. (There are 100M books.)

(In response to a backchannel question, Luis tells the penis captcha story.)

The ReCAPTCHA system filters out nationalities, known insult terms, and the like, to avoid unfortunate juxtapositions. It’s soon going to be released in 40 languages. Google acquired ReCAPTCHA.

Q: When will OCR be good enough to break captchas?
A: I don’t know. We’ll probably run out of books first.

Q: Business model?,br>
A: Google Books gets help digitizing.

ReCAPTCHA “reuses wasted human processing power.” The average American spends 1.9 seconds per day typing captchas. We also spend 1.1 hours a day playing electronic games. We humans spent 9B hours spending in 2003. It took less than a day of that to build the Panama Canal. So, Luis switches topics a bit to talk about how to solve human problems by playing games.

First is tagging images with words. Image search works by looking at file names and html text, because computers can’t yet recognize objects in images very well.

Does typing two words take twice as long as typing random letters? No, it takes about the same time, he says. Luis says about 10% of the world’s population have typed in a captcha. The ESP game asks two people unknown to each other to label an image until they agree. The game taboos words that other players have already agreed on. The system passes images through until they get no new labels. They’ve gotten over 50M agreements. 5,000 players playing simultaneous could label all Google images in a month. Google has itsown version; Google has an exclusive license to the patent.

Q: Demographics?
A: For my version, average age is 29 (with huge variance), evenly split between women and men.

Q: Compared to Flickr tags?
A: Only a small fraction of Flickr images have useful tags. The tags from flickr tend to be significantly more exact, but also significantly noisier (e.g., a person tagging an image in a way that means something idiosyncratic).

Q: Bots?
A: Yes, we don’t want you to wait for a partner, so sometimes we’ll give you a bot that replays the moves a human had made with the same image.

Q: Google Images benefits from its version of your game. Who benefits from your version of the game?
A: No one.

For some images, guesses change over time. E.g., a Britney Spears photo five years ago got labels like britney and hot. About two years ago, the labels changed to crazy, rehab, and shaved head. Now they’re back to britney and hot. By watching a player for 15 mins, you can guess whether the player is male or female with 95-98% accuracy.

Why do people like the ESP game? Sometimes they feel an intimacy with their partners. They have to step outside of themselves to make the match. They can have a sense of achievement.

He ends by saying that the about the same number of people — 100,000 — have worked on humanity’s big projects, e.g., pyramids, Panama Canal, putting a person on the moon. That’s in part (he says) because it is so hard to coordinate large numbers of people. Now we can get 100M people to work on something. What can we do?

Tags: eim, everthingismisc, everythingIsMiscellaneous, flickr, folksonomy, google, labels, tags, taxonomy

Date: April 27th, 2010

2 Comments »

April 5, 2010

 

Shirky’s myth of complexity

Clay Shirky has given us a surprising number of Internet myths. And by this I mean not falsehoods but the opposite: Broad, illuminating ways of making sense of what’s going on. For example, Clay’s post about the power law distribution of links in the blogosphere (based on research by Cameron Marlow) changed how we view authority, fame, and success in the Web ecosystem, and provided the structure within which Chris Anderson could point to the Long Tail. And Clay’s Ontology Is Overrated made clear that a change in how we categorize our world affects very real power relationships; that essay was highly influential, including on my own Everything Is Miscellaneous.

Clay’s new post — The Collapse of Complex Business Models — gives us a broad way of understanding why those who used to provide us with content will not be the ones who give us content in the future…and why they cannot fathom why not.

Tags: everythingIsMiscellaneous, everything_is_miscellaneous, media, shirky

Date: April 5th, 2010

5 Comments »

March 31, 2010

 

Order, art, and the miscellaneous

Giulia Ricci’s investigates:

the shift between order and disorder within different systems, which is the reason why I recurrently use geometrical grids, although on a more abstract level I am also interested in systems of categorisation and lists and how these can be visualised with diagrams and geometrical drawings.

For example, take a look at these. I find them fascinating as they swim close to resolution but never quite make it.

Tags: art, everythingIsMiscellaneous

Date: March 31st, 2010

3 Comments »

January 25, 2010

 

How to use the Web to teach: An example

Want to see one way to use the Web to teach? Berkman‘s Jonathan Zittrain and Stanford Law’s Elizabeth Stark are teaching a course called Difficult Problems in Cyberlaw. It looks like they have students creating wiki pages for the various topics being discussed. The one on “The Future of Wikipedia” is a terrific resource for exploring the issues Wikipedia is facing.

Among the many things I like about this approach: It implicitly makes the process of learning — which we have traditionally taken as an inward process — a social, outbound process. By learning this way. we are not only enriching ourselves, but enriching our world.

My only criticism: I wish the pages had prominent pointers to a main page that explains that the pages are part of a course.

Tags: 2b2k, education, everythingIsMiscellaneous, pedagogy, wikipedia, wikis

Date: January 25th, 2010

1 Comment »

November 21, 2009

 

Will books survive? A scorecard…

New media generally don’t replace old media, as Marshall McLuhan pointed out. After TV we still have radio. After telephones we had telegrams for a good long while. So what about books? After we have networked digital books, we’ll still have and produce physical books. But will physical books be as ubiquitous and culturally important as radio? Or will they be as cherished but infrequently attended as live theater?

In my interview with Cory Doctorow, I wondered, in the midst of an overly-elaborate three-part question, whether ebooks will provide enough of what we value about physical books (pbooks) that pbooks will lose the historic significance Cory had pointed to.

We won’t know the answer until we invent the future. But, I’m going to hypothesize, predict, or stipulate (pick one) that at some point we will have ebooks (which may be distinct hardware or be software running in something other device we carry around), with paper-quality displays that are full-color and multimedia, that are fully on the Net, with software that lets us interact with the book and with other readers, that are a part of the standard outfitting of citizens, and within a physical environment that provides ubiquitous Net connectivity.

Those are a lot of assumptions, of course, and each and every one of them could be disrupted by some 17 year old at work in her parents’ basement. Nevertheless, if the future is something like that, then what of pbooks’ value will be left unreplaced by ebooks?

Readability. I’m assuming paper-quality displays, which may turn out to be unattainable without having to wheel around batteries the size of suitcases. But, even without that, the ability of ebooks to display text in various fonts and sizes should remove this advantage from pbooks.

Convenience. I am assuming that ebooks will be more convenient than pbooks: as good in sunlight as pbooks, at least as easy to hold and use, easier to use for those with certain disabilities, long enough battery life, possibly self-lit, etc. The biggest open question, I believe, is whether it will be as easy to annotate ebooks…

Annotatability. The current crop of ebooks make highlighting passages and making notes so difficult that you have to take a break from reading to do either of those things. But, that’s one big reason why the current crop of ebooks are pathetic. With a touchscreen and a usable keyboard (or handwriting recognition software), ebooks of the future should be as easy to annotate as a pbook is. And those annotations will then become more useful, since they will be searchable and sharable.

Affordability. The marginal cost of producing ebook content is tiny, which doesn’t mean prices will drop as dramatically as we might like. Nevertheless, it’s hard to imagine a world in which ebook content costs more than pbooks.

Social flags. You probably carefully choose which book you’re going to bring with you on a job interview, and which books get moved to the shelves in your living room. We use the books we own as tribal flags, as Cory points out. Ebooks can serve the same role when introduced into social networks, including social networks explicitly built around books, such as LibraryThing.com. They obviously don’t work in physical space that way; if you want to show off your books to people who visit your home, you’re going to have to get physical copies.

Aesthetic objects. Many of us love the feel and smell of books. While ebooks might be able to simulate that in some way — maybe their page displays could yellow over time — it’d still just be a simulation. While ebooks will undoubtedly develop their own aesthetics, so that we’ll call people over to see how beautiful this or that new ebook is, they can’t replace the particular aesthetics of pbooks. So, those who love pbooks will continue to cherish them.

Sentimental objects. For my bar mitzvah, some friend of my parents gave me a leatherbound copy of A.E. Housman’s “A Shropshire Lad” and other poems. It was a beautiful aesthetic object, but I also understood that it had a personal meaning to the giver. I doubt that that particular copy did — I don’t think it came from his own collection — but the physicality of the book was itself a marker for the personal meaning it had for the giver. As Cory says, the books your father read — the very copies that were in his hands — probably have special meaning to you. It’s hard to see how ebooks could have the same sentimental value, except perhaps if you are reading the highlights and notes left by your father, and even then, it’s not the same.

Historic objects. Likewise, knowing that you’re looking at the very copy that was read by Thomas Jefferson gives a book an historic value that ebook content just can’t have. It’s hard to see how an author could autograph an ebook in any meaningful way.

Historical objects. As John Seely Brown and Paul Duguid have pointed out, as has Anthony Grafton, books as physical objects collect metadata that can be useful to historians, e.g., the smell of vinegar that indicates the book came from a town visited by cholera. Ebooks, however, accumulate and generate far more metadata. So, we will lose some types of metadata but gain much more…maybe more than our current norms of privacy are comfortable with.

Specialized objects. It will take somewhere between an improbably long time and forever for all collections of pbooks to be digitized. Thus, books in special collections are likely to be required well after we can take the presence of ebooks for granted.

Possessions. We are headed towards a model that grants us licenses to read books, but not outright ownership. (This is Cory’s main topic in the interview.) If we lose ownership of ebooks, then they won’t have the sentimental value, they will lose some of their economic value to readers (because we won’t be able to resell them or buy them cheaper used), and we won’t be as invested in them culturally. Whether ebooks will be ownable, and whether that will be the default of the exception, is unresolved.

Single-mindedness. Books are the exemplar in our culture of thinking. We write our best thoughts in books. We engage with the best thoughts of others by reading books. Books encourage and enable long-form thinking. Ebooks, because they are (ex hypothesis) on the Net, are distracting. They string together associated chunks and tempt us with links beyond themselves. It is easy to imagine ebooks providing the singleminded pbook experience: “Press here to remove all links.” But, of course, you could always unpress the button. Besides, since your ebook is on the Net (ex hypothesis), all that’s stopping you from jumping out of the book and into your email or Facebook is self-discipline. So, while ebooks can provide the singledminded experience of pbooks, some of us may prefer the paper version to keep the distraction of the Net at bay.

Religious objects. Some books have special meaning within some religions. It’s hard to imagine, for example, that an ebook is going to replace the Torah scrolls in synagogues. In fact, orthodox Jews can’t use electronic devices on the Sabbath, so they are certainly going to continue to buy pbooks. But, this is the very definition of a specialty market.

So, what does all this mean for the future of books? It depends.

First, are there other values of pbooks that I left off the list?

Second, I haven’t listed any unique advantages of ebooks. For example, ebooks will allow social reading: Engaging with others who are reading the book or with the traces left by those who have already it. That’s pretty important. Also, ebooks are likely to radically reduce the cost of reading, especially of some categories of overpriced pbooks (e.g., textbooks). Also, ebooks will make it much easier to understand the content of books through embedded dictionaries, search capabilities, and links to explanatory discussions. Also, as more of the corpus gets digitized, ebooks will make it far easier for scholars to pursue the footnotes (except they’ll be embedded links, not footnotes). Also, ebooks will incorporate multimedia. Also, reading ebooks will build a searchable personal corpus that is far more useful to us than bookcases filled with out conquered pbooks. Also, we’ll always have our entire library with us, ready to be read or reread, which is good news for readers.

I leave it to you to decide how this mix of values is likely to play out. What will be the social role and meaning of pbooks as we go forward into the ebook era? In twenty years — giving ourselves plenty of time to develop usable ebook readers, to digitize most of what we need, and to built an always-available network — will pbooks be used mainly by collectors, and scholars working with unique texts? Will they be sentimental objects? The poor person’s medium? Will physical books be the equivalent of AM radio, of the road company of “Cats,” of quaint objects in book museums — and/or the continuing pinnacle and embodiment of learning?v

Tags: books, ebooks, everythingIsMiscellaneous, kindle, libraries

Date: November 21st, 2009

32 Comments »

November 20, 2009

 

Cory Doctorow in support of copyright

In this edition of Radio Berkman, Cory Doctorow argues in favor of copyright … the part of copyright that protects the rights of readers to own (and not just license) books.

It being Cory, the discussion covers topics such as the way in which books are like dogs and his sentimental attachment to his digital collection.

Tags: books, copyleft, copyright, cory doctorow, eula, everythingIsMiscellaneous, google books

Date: November 20th, 2009

3 Comments »

November 15, 2009

 

OMG. I disagree with Umberto Eco!

It makes me very nervous to disagree with Umberto Eco because he is so fathomlessly smart. But I think in this case I do. Sort of.

There’s a fabulous interview with Eco in Spiegel (in English) about why he loves lists. He is characteristically pithy, provocative and wise. A crucial paragraph, from the beginning:

The list is the origin of culture. It’s part of the history of art and literature. What does culture want? To make infinity comprehensible. It also wants to create order — not always, but often. And how, as a human being, does one face infinity? How does one attempt to grasp the incomprehensible? Through lists, through catalogs, through collections in museums and through encyclopedias and dictionaries. There is an allure to enumerating how many women Don Giovanni slept with: It was 2,063, at least according to Mozart’s librettist, Lorenzo da Ponte. We also have completely practical lists — the shopping list, the will, the menu — that are also cultural achievements in their own right.

I read the first sentence and was provoked, as Eco intends. Lists are the origin of culture? Please say more! But Eco doesn’t really explain, in this interview, why lists — as opposed to other forms of collections and orderings — are so important. The urge to make order, yes, but not lists themselves.

A list is one particular way of creating order. Lists are sequential and one-dimensional: Wines listed by year, or by place, or by ranking, or by the chronology of when you first encountered them. (Lists can be hierarchical, but they’re only lists if they can be resolved back down to the one-dimensional.) Lists thus are one elemental way of ordering the world. And they have a peculiar fascination, which Eco expresses beautifully. But I think it’s wrong to say that they’re the origin of culture. I think it’d be more accurate and useful to say that culture originates with collecting: Pulling things around us because of their appeal (a word I’m purposefully leaving vague).

I’m sure I’m making too much of Eco essentially drumming of interest in his exhibit at the Louvre, but the issue matters a little bit. I think (based on little to nothing) that lists emerged as a stripping down of multi-dimensional collections. Culture first happened (I imagine) when we pulled together pieces of the world that spoke to us in ways we could not articulate. We assembled them as spaces through which we could wander, or piles through which we could collectively sort (“Oooh, I particularly like that green shiny stone!”). Lists are an abstraction, and culture began (I suppose) with an unarticulated sense that some things go together — and perhaps our first conversations were about why.

Eco goes on to say many wonderful things about why we have liked lists, including proposing that listing properties of an object can liberate us from looking for the definitional essence of things. (For more on this, read his important book, Kant and the Platypus.) In fact, Eco suggests that a mother defines a tiger to her child “Probably by using a list of characteristics: The tiger is big, a cat, yellow, striped and strong.”

I have a bunch of issues with that.

First, that type of definition really just makes explicit what’s implicit in the traditional approach to definitions as essence. In the traditional Aristotelian approach, the essence is the creature’s spot in the hierarchy of beings. So, a tiger is a species of cat, and thus would be specified by its difference from other cats but also by all of the properties of the classes above it (mammal, vertebrate, animal, etc.). The essential definition and the list definition both consist of a list of properties, but the essential definition nests them so that they don’t all have to be spelled out, and so we can see which differences “count.” Eco says, “The essential definition is primitive compared with the list,” but it seems to me that a beautifully nested, hierarchical system of essential definitions is in fact more advanced — it requires abstraction and systems thinking — than a mere list.

But, I don’t want to miss Eco’s essential (so to speak) point here, which is that defining something with a list breaks us out of the notion that there is a single, knowable essence. Absolutely. There’s no eternal essence, “just” a set of properties that are relevant depending upon our circumstances. With that I wholeheartedly agree.

My second problem with this is that — as George Lakoff says in Women, Fire and Dangerous Things, explicating and expanding the work of Eleanor Rosch — the mother (heck, maybe even the father) probably actually teaches the child what a tiger is by pointing at one, or at a picture of one. We learn through prototypes, not through essential definitions, and not by making lists. List-making is an abstraction and a secondary activity.

Third, the listing the parent does seem to me to not have the properties that make lists captivating to Eco. The parent isn’t trying to give a complete listing that brings a sense of mastery over the infinite and over death. She’s just pointing out some of the salient features. If it is a list, it’s not a list of the sort that Eco has charmed us about.

Fourth, while lists of properties are a useful corrective to thinking that things are exhausted by a definition of their essence, lists strip out so much that they don’t seem like much more adequate than essential definitions. A tiger isn’t a list.

This is just a fun interview in Spiegel, so I may be taking it too seriously. So, even if lists occur within culture — including the lists in literature he points to — rather than being the origin of culture, the interview does indeed help us to see why our fascination with lists is a fascination with something bigger than lists.

Tags: classification, eco, everythingIsMiscellaneous, hierarchies, lists, taxonomy, umberto eco

Date: November 15th, 2009

38 Comments »

November 12, 2009

 

Lego blocks unmiscellanized

Giles Turnbull at the Morning News reports on his research interrogating (gently) children from different families about what they call various Lego pieces. Quite interesting in its own taxonomic way, and a topic that’s amusing even just to contemplate.

Tags: classification, everythingismisce, everythingIsMiscellaneous, legos, nomenclature, taxonomy

Date: November 12th, 2009

Be the first to comment »

October 24, 2009

 

FCC’s Net Neutrality discussion board

The FCC has put up a site — openinternet.gov — where anyone (after registering with a valid email address) can post an idea, or vote existing ideas up or down. I love the idea of the feds opening discussions up, although, I am not convinced that this particular implementation achieves its presumed aims. But, what the heck! Try-fail-try is the right rhythm for the Net.

The site defaults to listing the ideas reverse chronologically, which adds some serendipity, or you can choose to view them listed in order of popularity, which encourages piling on. You can also browse by category/tag.

Anyone who registers can post a comment. The comments are unthreaded, discouraging much development of ideas but also discouraging flaming. You can report a comment as being “abusive,” but otherwise cannot rate them.

At the moment, the most popular posting is from Tim Karr, who, according to his biography at SaveTheInternet.com, a site sponsored by FreePress.net, “oversees all Free Press campaigns and online outreach efforts, including SavetheInternet.com.” Tim — who I know a bit and like — is an activist. He has the most popular post at the FCC’s site presumably because FreePress.net sent out a mailing urging supporters to vote it up.

There’s absolutely nothing wrong with that. It’s how politics is played in this country. If an anti-NN group sponsored by, say, AT&T wanted to play the same game, it’s perfectly entitled to. It’s not hard to imagine a well-funded group swamping FreePress’s shoestring efforts and getting orders of magnitudes more people to thumbs-up an anti-NN comment.

Which is to say that an open discussion board like the one the FCC has posted can serve either of two purposes. It can be a place where people come for rational discussions across political positions, or it can serve as an informal poll of citizens’ sentiments about an issue. But combining the two means that neither works very well. It becomes simply an opportunity for gaming the system.

It seems to me that sites such as these cannot serve as a poll that has any value at all. Besides, we have lots of other ways of gauging public opinion, including scientific polling and elections. If, on the other hand, the FCC wants to sponsor a forum for useful discussion or to generate new ideas, it could modify the current implementation. For example — and these are just ideas that may turn out to be gigantic belly flops — comments could be divided into two tracks, pro and con, with most-popular listings for each. Readers could be allowed to vote up but not down. Comments could be threaded. The comments could be rated. Postings could have buttons for “agree/disagree” and “interesting,” so that the site could highlight articles that people disagree with but find interesting.

All of these techniques could be gamed because everything can be gamed. Some discussion boards do work, though. I don’t know what the magic keys are, but I’m pretty confident that a political discussion board that includes an overall popularity contest will so encourage gaming that its results will necessarily be unreliable. At the very least, the popularity contest should be confined to determining the best arguments for each side.

But I don’t want to close on a negative note, for the FCC is to be congratulated on its efforts to open its processes up not only to lobbyists and geeks who know how to walk and talk like an FCC commenter, but to the general public. And it’s doing so in the proper Webby way of taking small steps and not being afraid to fail in public. That takes guts.

Tags: broadband, conversation, discussion boards, everythingIsMiscellaneous, experts, fcc, net neutrality, social media

Date: October 24th, 2009

1 Comment »

October 20, 2009

 

Radio Berkman on Forgetting, and Remembering the Media

There are two new-ish Radio Berkman interviews up: Me talking with Viktor Mayer-Schönberger about his book that argues that we are in danger of forgetting how to forget, and Russell Neuman on learning from the past of the media.

Tags: everythingIsMiscellaneous, forgetting, media, podcasts, tv

Date: October 20th, 2009

3 Comments »

October 11, 2009

 

Do-it-yourself Google Books — a million dollar idea for Amazon?

Harry Lewis has a terrific post about a $300 do-it-yourself book scanner he saw at the D is for Digitize conference on the Google Book settlement. The plans are available at DIYBookScanner.org, from Daniel Reetz, the inventor.

There are lots of personal uses for home-digitized books, so — I am definitely not a lawyer — I assume it’s legal to scan in your own books. But doesn’t that just seem silly if your friend or classmate has gone to the trouble of scanning in a book that you already own? Shouldn’t there be a site where we can note which books we’ve scanned in? Then, if we can prove that we’ve bought a book, why shouldn’t we be able to scarf up a copy another legitimate book owner has scanned in, instead of wasting all the time and pixels scanning in our own copy?

Isn’t Amazon among the places that: (a) knows for sure that we’ve bought a book, (b) has the facility to let users upload material such as scans, and (c) could let users get an as-is scan from a DIY-er if there is one available for the books they just bought?

Tags: amazon, books, everythingIsMiscellaneous, google, google books, libraries

Date: October 11th, 2009

11 Comments »

Net uncovers new type of cloud

There are reports of a new type of cloud, one that is not currently in the official International Cloud Atlas. Or, possibly, it is a formation that’s been around forever, but the scattered reports are only now coalescing thanks to the Net.

According to Amazon’s review of Richard Hamblyn’s The Invention of Clouds, we only began thinking clouds could be categorized in 1802 when Luke Howard started giving public lectures. The very idea that clouds — the paradigm of uncatchable — could be divided into groups was (apparently) fascinating and thrilling. (Lamarck had also categorized clouds, but it didn’t catch on.)

A quick googly scan makes it seem that the cloud taxonomy is pretty messy. For example, the University of Illinois’ “cloud types” page lists four broad categories, and a list of miscellaneous clouds, each of which is categorized under one of the four basic types, evoking a “Huh?” reaction from at least one of us. The cloud taxonomy page at Univ. Missouri-Columbia lists eight types. Do you categorize by what they look like, how high they are, what they do (rain or not?), which celebrity profiles they resemble …? Categorizing clouds is truly a Borgesian task.

And, dammit, wouldn’t you know? Here’s a poem by Jorge Luis Borges called: “Clouds (II)” (with the line-endings probably removed):

Placid mountains meander through the air, or tragic cordilleras cast a pall, overshadowing the day. They are what we call clouds. And their shapes are often strange and rare. Shakespeare observed one once. It seemed to be a dragon. That one cloud of an afternoon still kindles in his words and blazes down, so that we go on seeing it today. What are the clouds? An architecture of chance? Perhaps they are the necessary things from which God weaves his vast imaginings, threads of a web of infinite expanse. Maybe the cloud is emptiness returning, just like the man who watches it this morning.

(translated by Richard Barnes. B; Robert Mezey; Richard Barnes. “Clouds (II). (poem).” The American Poetry Review. World Poetry, Inc. 1996. HighBeam Research. 11 Oct. 2009 v)

More Borges poems…

Tags: borges, clouds, crowdsourcing, everything, everythingIsMiscellaneous, expertise, experts, poems, taxonomy

Date: October 11th, 2009

2 Comments »

October 7, 2009

 

[berkman] Viktor Mayer-Schönberger on the virtue of forgetting

Viktor Mayer-Schönberger is giving a talk at the Berkman Center (well, actually at Pound Hall) on his book Delete: The Virtue of Forgetting in the Digital Age. Viktor teaches at Singapore University, and was at the Kennedy School for ten years.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He begins with a story of person studying to become a teacher who was kicked out of school because the school noticed a photo of her drinking on Facebook. She tried deleting it, but the Internet remembered it. He gives another example: A person who noted in an article that he had taken LSD in the 1960s. When trying to cross into the US, an immigration officer refused him admittance because he hadn’t offered up that information, and the officer uncovered it by googling him. What’s put on the Web is never forgotten. In another example, the information was not put up by the individual but by someone else: a bar/club in Europe records all the people, all the drinks, etc., and hasn’t ever deleted any information. Likewise, Google knows more about us than we can remember.

For millennia, forgetting was easy, and remembering was hard, says Viktor. So, we’ve come up with ways to pass on our memories. The oral tradition. Painting. Writing. “But these tools have not altered the fundamental fact that for us humans, forgetting is easy, and remembering is time-consuming and expensive.” The book and the photo also haven’t altered this fact. What is long past fades in our mind. We depreciate what is no longer relevant. But because forgetting is biological, we never had to develop explicit strategies to forget. Now we’ve moved from biologically forgetting to permanent remembering. [Hmm. I haven't. We still don't remember much. But we have more records, and thus are able to retrieve more. That seems different to me.]

This has happened because storage is cheap in the digital world. Google has server farms with a capacity of 100,000 terabytes perhaps. And we’ve gotten much better at retrieving information. And we have global access. Remembering has become the default.

There are, of course, benefits to this, Viktor says. But undoing forgetting has deep consequences, far beyond the information efficiencies. He points to power and time.

Power: If others have info about us and can keep that info accessible for a very long time, the informational power increases, and can affect how we transact and interact. It’s Bentham’s Panopticon: behavioral compliance through the permanent threat of constant surveillance.

Time: Imagine Jane is about to catch up with her old friend John, but when reviewing their history of email, discovers msgs from a time when he was nasty to her. She had forgotten that time. Now it comes back. Her current relationship with John now is ruined. [Or, she discovers msgs that remind her she once loved him. Isn't Viktor's example actually an argument for more remembering, so she can see how she got over the bad time?] “In analog times, the dangers were limited” because our biology would have brought us to forget.

Viktor talks about AJ, a non-fictional woman who has difficulty forgetting. It is a weird and unhappy condition.[This is why the conflation of human remembering and the presence of a fairly complete digital record matters. The presence of digital info and the tools for retrieving it does not turn us into AJ.]

Without forgetting, we have trouble changing. We have trouble forgiving. We may turn into an unforgiving society. “This is the real danger of shifting the default from forgetting to remembering.” Worse, suppose we stop relying on our own memories and rely instead on the digital memories. “Does that give those who control digital memory the power over history?”

What to do? Perhaps give privacy rights to individuals. But there are weaknesses: It’s not politically feasible in the US. The European have those rights, but people have not used them.

Or perhaps we could create an information ecology, a regulatory construction of what can be remembered. E.g., it might require the deletion of info after a particular time. This does not require individuals to go to court for enforcement, and it protects against an unforeseen future as when the benign Dutch social services registry was repurposed by the Nazis to identify Jews. “It may be better to store less than more.” But, after 9/11, we’re seeing requirements for increasing data retention, Viktor notes.

So, maybe we need to augment these approaches. “Digital abstinence,” for example. Don’t put everything on Facebook. But abstinence isn’t all that reasonable, he says. By the end of 2007, two out of three young Americans had put their info online.

The opposite approach is “full contextualization.” E.g., Jane can’t find the context of her bad treatment by John. Full contextualization would restore that. But will that ever be technically feasible? And if it were, would it really address the challenge of digital remembering? Do we have time to relive our past again and again?

Another approach: Hope for a cognitive adjustment. That is, over time we’ll learn to devalue older info and learn to live with an omnipresent past. “That would solve our problem. But is it likely?” How long would it take us to change how we assess information? “Cognitive psychologists are very critical of our ability to change our decision making in the short run.” [But a change in norms can happen much faster than that, and we govern what we're allowed to notice and remember through norms. Statements like "That's water under the bridge" and "Youthful indiscretions" are expressions of norms that enforce social forgetting without requiring actual brain evolution.]

Or, we could change our technology, rather than changing ourselves. E.g., a global DRM system to protect privacy. Viktor is not recommending this: “Wouldn’t this be a perfect surveillance system?” And we’d have to make sure that privacy is built deep into the infrastructure.

None of these six solutions are sufficient, although all offer something.

“I advocate a revival of forgetting…to establish a mechanism that makes forgetting easy, and makes remembering just a bit more strenuous.” Just enough to shift the incentives back to what we humans are used to. Viktor suggests an expiry date for information. Whenever we save info, we should be prompted to put in a date when we want it deleted. We should be able to change those dates.

The core of this proposal isn’t the automatic deletion, he says. Rather, the prompting for the date will remind us humans that most information is not of permanent value.

E.g., search engines could offer us an easy way to say how long we should remember searches. Or people could carry a device on their keyring to set expiration dates, perhaps tagging the expiration dates for the images of the people in digital photos.

Any expiry date system must have only two characteristics. First, it must aim at changing the default from remembering back to forgetting. Second, it must remind us of information’s temporal nature.

Expiry dates are also no silver bullet, and don’t solve digital privacy problems, Viktor says. But they could be useful when used with some of the other proposed solutions.

“Forgetting is often forgotten…Let us remember to forget.”

Q: You don’t mention the propensity of all media to fade over time. Digital memory is not perfect. Also, data is growing so quickly that it gets too expensive to digitally remember everything. The amount of data is growing faster than Moore’s Law.
A: You don’t need much space to remember a billion queries a day. A couple of hundred dollars worth of data storage. And Google’s way of saving data is relatively future-proof.

Q: [me] If we take memory to mean only the human capacity, and digital “memory” to be more like what we usually call storage, then what has actually happened to human memory in the digital age?
A: I chose the term “digital memory” carefully. If I can’t access my VCR tapes easily, they’re pretty much useless to me. Digital stuff is so easily accessible. How has digital remembering changed human remembering? I don’t know. But my argument isn’t that it’s changed human remembering, but that it has changed the external stimuli affecting our memory.

Q: One of the way a culture forgets is that it lets books go out of print, get moved out of libraries, etc. Now we have Google Books, which will make all books ever printed available (pretty much). Do you see negative effects of this project?
A: I haven’t given it enough thought because authors would like to set their books’ expiry dates very far in the future. Some preliminary research we’re doing on court decisions are showing an interesting effect on memory.
Q: The author of the book isn’t the only one concerned with the info in it. There may be people written about who would want to a say…
A: Yes, and the author’s rights aren’t always fully owned by them.

Q: Digital memory has value as cultural memory. The things we’d put expiration dates on have value even if against the interests of the people at the time, because it has social and historic meaning…
A: That’s just conjecture…
Q: No it’s not. We’re leaving traces now all the time. How we put that info to use is a different question.
A: Suppose you’re an author. Shouldn’t you be able to put bad early stories into the trash bin? Why should society have the right to take it from you and preserve it and make it public?
Q: Great point, but we still do struggle with this. Nonetheless, I would recommend we give thought to how these things might sensibly be balanced. E.g., the Iran election twitter stream. Enormous amt of fascinating info has been lost.
A: The solution is built in. For certain contexts, we may be required to mandate a very long expiry date. We do that all the time. I’m arguing for keeping that as the exception to the rule.

Q: I’m a cultural historian, trained as a Medievalist. There’s data scarcity in that field. Who decides about inclusion, preservation, etc.? Institutions have performed the filtering role. Google keeps some types of info and not others. Others are interested in your social security number, etc. So, who are the gatekeepers? There’s power to the Internet Archive’s approach of capturing everything. The stuff that the institutions of memory don’t preserve may turn out to be the most interesting for historians. (I basically buy your core argument, although I’m a believer in the cognitive adjustment.)
<
A: Brewster Kale and I (of Internet Archive) are in general agreement. The Archive sets expiry dates. [Not sure I got that right. Sorry.] My core argument is to give back the choice to the individuals.

Q: I too believe in the cognitive adjustment because I see myself and others already doing that. Sure, you find old emails reminding of something you wanted to forget, but when you accidentally delete some years’ worth, you feel an intense sense of loss.
A: When I lost all my email at the end of 1998, I was completely horrified. But then I discovered it doesn’t really matter. I started out believing the cog adjustment argument, but after I read cog science books, I changed my mind. I want to plug The Seven Sins of Memory, which shows how hard it is to readjust.

Q: Suppose two of us in a shared record have different expiry preferences…
A: I talk about that a lot in the book.

Q: There’s a big diff between what I want to preserve and what others do. The European privacy laws require data deletion. Google and others are now negotiating with the European Commission about this …
A: We need to differentiate between privacy rights and norms.

[missed a couple of questions. sorry.]

Viktor says that he recognizes that expiry dates are a crude instrument. Too binary. “I’d prefer rusting or something like that.” :)

Tags: berkman, cultural history, delete, everythingIsMiscellaneous, forgetting, google, google books, memory

Date: October 7th, 2009

1 Comment »

The Dewey Belushi system

Here’s the Onion on the Dewey Decimal Classification system meeting its nemesis, Jim Belushi. (Thanks to Jay Hurvitz for the pointer.)

Tags: categorization, dewey decimal, everythingIsMiscellaneous, humor, jim belushi, libraries, oclc, taxonomies, the onion

Date: October 7th, 2009

1 Comment »

October 2, 2009

 

Libraries sans Dewey

Barbara Fister has a terrific article in LibraryJournal about libraries who have moved away from the Dewey Decimal Classification (DDC) system, many in favor of some version of the BISAC system that arranges books alphabetically by topic. This is a more bookstore-like approach. The article presents the multiple sides of this discussion, with lots of examples.

The disagreement among librarians is, to my mind, itself evidence that there is no one right way to organize physical objects. Classification is pragmatic. You classify in a way that works, but what works depends upon what you’re trying to do. Libraries serve multiple purposes, so librarians have to make hard decisions. If the DDC isn’t the safe and obvious choice, then libraries have to confront the question of their mission. The classification question quickly becomes existential in the JP Sartre sense.

At the end, she quotes from Everything Is Miscellaneous where I say that the Dewey system “can’t be fixed.” I still think that’s right in its context: No single classification system can work for everyone or for every purpose, although they can be better or worse at what they’re trying to do. In that sense, the DDC can be improved, and the OCLC has continuously improved it. But because it’s premised on assigning a single main category to each book, it is repeating the limitations of the physical world that require physical books each to go on a single shelf. Any single classification is going to be inapt for some purposes, and is going to embody biases constitutive of its culture. It’s the job of a library and of a book store to decide which single way of classifying works best for its patrons, with the obvious recognition that no single way works best for all. Books are miscellaneous. Libraries, bookstores, and the shelves over your desk are not.

Anyway, Barbara’s article is a fascinating look at how libraries are trying to do the best for their patrons, working within the constraints of the physical.

Tags: bisac, classification, ddc, dewey, everythingIsMiscellaneous, folksonomies, libraries, oclc, tags, taxonomies

Date: October 2nd, 2009

5 Comments »

September 29, 2009

 

Herkko Hietanen: Network Recorders and Social Enrichment of Television

Herkko Hietanen, a Berkman Fellow, is giving a talk about TV. “Television is really broken.” It’s not providing what consumers want: programs when we want them, where we want them. It lacks interaction with other viewers and with broadcasters. It has ads. It’s geographically limited. If you had to pitch TV to a venture capitalist, it would have a hard time getting funding.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Herkko gives a brief history of the highlights. VCRs were an early attempt to fix tv. This frightened the broadcasters, who took it to court, where — in Sony vs. Betamax — they lost. The court said the manufacturers were not responsible for infringing uses because the devices had non-infringing uses, and personal use was declared a fair use. Satellites extend over-the-air (OTA) broadcast. Community antennas were first set up by stores selling TV sets. Now cable is dominant. But contracts limit core innovation. “If you’re afraid you’ll piss off your content provider, you’re not going to do something that’s good for the consumer.”

There has been some innovation in the core. On-demand video. Time-Warner “LookBack” lets you view any show on the day it’s broadcast at any time during that day. Cable also provides a whole lot of channels. But, “Intelligence in the middle stops innovation at the edge.” The industry has litigated against just about everything innovative. E.g., Cablevision want to launch a service that would centralize storage rather than putting it in the set-top boxes. Just about everyone sued Cablevision for copyright infringement. The court saw that every user would have their own copy of a saved show. The court decided it doesn’t matter where the copies are stored. Herkko says it’s too bad it didn’t go to the Supreme Court so we’d have a definitive decision.

The problem with mythtv, Herkko says, is that it’s not user-friendly. [I spent 1.5 yrs trying to get MythTV to work, and failed :( Wendy Seltzer, seated across the table, has been using MythTV for years.] Tivo is easy but not all that easily hackable. You can’t share TiVo’ed shows, you can modify the code in the box. ReplayTV got sued for having a skip commercials feature, and went bankrupt.

Herkko points to living room clutter as another problem with TV today.

Herkko looks forward to PVRs getting connected to the Internet, because connected users create social networks, and they start to innovate. “We want stupid networked records and intelligent open client-players.” We want connected and tagged shows. We’ll have interactive TV for real, including gambling. Social groups could recommend what to watch.

This all creates privacy problems. E.g., an MIT study discovered they could identify gays by analyzing their social networks, with a high degree of accuracy.

At some point, users will probably start sharing their resources, cluster their recorders. Why should everyone record the same show over and over? Why get it from a central recorder when your neighbors have a copy? Of course, this is what got Replay TV into trouble, Herkko notes. He thinks that the social interaction around shows will happen before and after the show, because people won’t sit with a keyboard in their laps. [Since I'm on the backchannel as I listen to him, I guess I disagree.]

What about ads? Adding social networks would mean that people could watch ads they actually want to watch.

Overall: TV can be fixed. Social networks. Socially-oriented recorders.

Q: This is a compelling vision of the opposite of the Net. The Net is smart at the edges and dumb in the middle. TV has been the opposite. You seem to hope that the future will invert so consumers can get what they want. But consumers have never gotten what they wanted. What will change it?
A: We need brave entrepreneurs to test it in the courts. Having network recorders isn’t that different from having a VCR.

Q: When you were talking about the keyboard in your lap, I think that’s wrong generationally.
A: Voice works while watching tv. But typing and sharing the screen doesn’t.

Q: You’re talking about what the cable companies will do. But then there’s the stuff in the IP world: mythTV, Boxee, etc. That’s where the exciting stuff is.
A: Innovation at the core is very slow, while innovation at the edge is happens very fast.

Q: If the Internet arises to bypass the core, will the quality decline? Will it be more like YouTube style?
A: That’s a real concern. If everyone skips the ads, then there won’t be profit in producing high quality shows. Although there are also premium channels. And in Finland we pay an annual fee and get 4 channels.

Q: There are a lot of forces driving the centralization of TV. With that comes control against innovation at the edges. Is TV going to change or be changed by people sharing content from the edges?
A: If we force a change on TV, the broadcast flag will be re-introduced. Big audiences still demand the lay-back experience.
Q: The sitting back phenomenon has persisted for 50 yrs. Why will it continue?

Q: What is your main research question?
A: When recorders get connected, what sort of innovation are we going to get?

Q: Don’t we need non-Net neutrality to ensure that the video experience over the Net is good enough to inspire innovation in that space?
A: It can be done in other ways. You don’t need immediate delivery of all packets if you’re downloading for viewing late. E.g., in Finland I have a box that records 2 weeks of all 10 channels.

Q: The picture you’re painting is not very TV-like. It’s not broadcast, not one-directional, the business model doesn’t work, we’ll be using our computers…So, it seems like you’re dissolving what TV is. Rather talking about the “social enrichment of TV” [the title of Herkko's talk], we should be talking about the visual enrichment of the Internet. E.g., how do you see Hulu, which has some community features.
A: I defined TV at the outset: It’s geographically bounded, it’s broadcast, it’s scheduled, etc. And Hulu takes some of the edge approach, but it’s very much a core app. We’re going to see a big shift of control from the rights owners to consumers.

Tags: broadcast, everythingIsMiscellaneous, mythtv, television, tv

Date: September 29th, 2009

4 Comments »

September 28, 2009

 

Sidewiki: Google at the center

I agree with Jeff Jarvis’ critique of Google’s Sidewiki.

Sidewiki is ThirdVoice yet again. Both let you write and read comments on a site — actually on the site — so long as you have the proprietary client. ThirdVoice failed mainly because it couldn’t get enough people to install its client. (Of course, one could ask why enough people weren’t interested in this.) Sidewiki might succeed because it’s part of the vastly popular Google Toolbar. And, as Jeff says, that means it might succeed because Google is using its near ubiquity as a center of the Net. Which is troubling. For example, again as Jeff reports, insofar as the commentary on his site about his Sidewiki post occurs in Sidewiki, Google now owns the comments on his post. Troubling.

I think there are reasons to doubt Sidewiki’s success. As more people add comments, we need good ways to sort through them, to eliminate spam, to decide which types of comments are useful to us. Google is promising us algorithms. But algorithms won’t know that I don’t particularly want to read comments about my friend Jeff’s character, but I am particularly interested in what technologists are saying, or about Net politics, or what my friends are saying, or about how to hack Sidewiki.

Sidewiki has its uses. I’d rather see it connected to social networks, and I’d rather see it provided as an open source browser add-in. But I don’t know who should own the comments and what the control mechanisms should be. This is one of the edges of the Web that defies easy answers because it’sso hard to tell what is the center and what are the sides.

Tags: everythingIsMiscellaneous, google, jeff jarvis, sidewiki, thirdvoice

Date: September 28th, 2009

2 Comments »

September 25, 2009

 

News is a river is a blog…

WLEX-TV in Lexington, Kentucky, an NBC affiliate, has turned its news site into a blog. It actually contains news produced independently of what goes out on broadcast. Very very interesting. It’s a different way of slicing the news, with much debt to Dave Winer’s river of news idea, and it’ll be fascinating to see how and in what ways it’s useful and how it changes our idea of what news should be.

Tags: everythingIsMiscellaneous, journalism, media, news

Date: September 25th, 2009

6 Comments »

September 18, 2009

 

The temptation of stories

Journalism at its best is a way to uncover and communicate the truth, subject to all the usual human limitations. But journalism’s fundamental form, the story itself, brings a special temptation to manipulate the truth for economic or aesthetic reasons. The temptation is resistible to varying degrees, depending on the type of story (the temptations are greater for feature stories than for hard-core reportage of the day’s events), the nature of the journal, and the standing of journalist. Nevertheless, the temptation is there, built into the form itself.

The very idea that there’s a story is itself a temptation. Maybe the story is on Facebook addiction or the rise in incivility. A journalist who goes back to her editor and says, “Nope, no story there” has disappointed the editor who now has to find another story to fill the hole in the paper newspaper or to feed the maw of the online publication. Not a big deal; it happens all the time. But if it’s fifth consecutive time that the reporter says there was no story there, it’s getting to be a problem. If it’s the reporter who has suggested the stories in the first place, as is often the case at many publications, she will be judged a failure because she’s wasted her time and gummed up the editor’s planning.

It’s not like it’s supposed to be in science, where a failed hypothesis is as valuable as a proved one, even though of course every scientist would rather discover that a new compound cures cancer than that it doesn’t. A failed hypothesis in the world of journalism is a story that won’t run, that won’t bring in readers, that won’t give businesses a page on which to place an ad. There are real prices to stories failing to pan out. Reporters are thus tempted to make the story work.

Even when the hypothesis of a story is true, journalists almost always reach a place in the story where they know what they want their interviewees to say. An interview is requested of a particular person to provide the “some experts disagree” statement or the “the implications of this are vast” verbiage. If that person doesn’t provide it, someone else will. Depending on the stage of the story, the interviewee may spark interest in a side issue or an approach the reporter hadn’t considered…resulting in someone else being called to provide the other side or the amplification.

This happens at some of stage of the story even when the topic is interesting no matter what storyline it takes. For example, the death of Pat Tillman is interesting because it is instantly symbolic: Football star turns down a life of fame and wealth in order to defend his country, and dies a soldier’s death in Afghanistan. Beyond the basic reportage the day that it happened, it was bound to inspire journalistic stories. A reporter could enter with an open mind. Even so, she’ll enter with an open mind looking for an angle, which is to say, looking for a story. Is it a relatively simple narrative of an inspiring patriot who gave his life to support his ideals? Or was there “more” to it? That search for the “more” isn’t simply a hunt for unknown truths. It’s a search for a narrative that reveals the simple surface to be a veneer from which we will learn something unexpected. The reporter may have no idea what the more is, but once she gets a hint of it, she’ll be on it, and the narrative itself — if not personal ambition — will carry her forward. Maybe Tillman wasn’t as virtuous as we thought. Maybe his death wasn’t as straightforward as we were told. Maybe his story was of a life fulfilled or of a life wasted or of a life more complex than we’d thought. Maybe it’s about the government’s cynical use of him, or of the media’s own eagerness to find a hero. But something will emerge. And as it emerges, it gathers its story around it, and the reporter is off looking for the voices who will play certain roles in the story. Why? Because the story demands it.

At the very least, the temptation journalistic stories is that of all story-telling, the basic way we humans make sense of our world. Stories, not just in journalism, are about the gradual revealing of truth. The surface wasn’t as it seemed. The ending was contained, hidden, in the beginning. What looked continuous was in fact disruptive. Stories have a shape, and story-tellers fit the pieces into that shape. There’s nothing wrong with that, except in an environment where there’s economic and social pressure to produce a story. Then the temptation is to get the pieces to fit. And that can corrode the truth.

So can the simple fact that stories tend towards closure. They end. They’re done. Some circle of understanding has been drawn and closed, tip to tip. The story says, simply by ending. “This is what you needed to know.” There can often be truth in that, but there is always falsity in it. The world, its events, and its people escape even the best of stories.

Stories are not going away from journalism, just as they’re not going away from history, biography, or how we talk about our day over dinner. They’re fundamental. Stories are how we understand, but they also inevitably are constructions, incomplete, and organized around a point of view. All stories are temptations. Journalistic stories have their own special and strong temptations because of their economics and because of the nature of the medium in which they’ve been embodied. Now those economics and that medium are changing, diminishing the old temptations but creating new ones:

::: Because we are increasingly turning to publications that explicitly take a stand, the temptation to include false views for “balance” is diminished. But, the preference for partisan media creates a new temptation: To over-state, in order to attract attention. [Guilty as charged!]

::: The old medium limited the length of stories, forcing unnecessary trimming except in very special circumstances. The new medium has infinite space so that stories can be right-sized. But it turns out that prolixity discourages on-line readers, so the new temptation is toward brevity. It’s not clear if that’s an expression of an impatience that’s always been with us or if the new medium constitutes a new temptation.

::: The old medium’s inability to embed links encouraged journalists to try to encapsulate the world in a single column of text. The new hyperlinked medium can tempt authors to gloss over points and contradictions because they’ve put in some links, putting the burden on readers who are (usually) lazier than the writers.

::: The economics of the old medium tempted publications to appear valuable by being a reliable source of the single truth. While they of course have encouraged discourse on controversial topics, their bread and butter have been stories that “get it right” and thus serve as a stopping point for belief. Stories are the bulwark of authority, and authority is the currency of the old journalistic economics. The new medium now can include as many stories as we want, from as many different points of view, connected by curators above the stories and by hyperlinks within the stories. The story no longer has to tell the whole truth. It’s just one of the stories. But, while that’s true of the ecosystem as a whole, the old temptation to be a single-source truth shop exists for individual online publications, whether they’re commercial or personal.

Now, the form I’ve adopted for this essay, which is itself a type of story-telling, is one of balance: Old temptations matched by new temptations. It’s a form that aims at inspiring trust: “See, I’m presenting both sides!” And that itself can be corrosive. Indeed, in this case it is. While the old temptations are being replaced by new ones, the locus of truth is moving decisively from individual stories and publications to the network of stories and publications. The balancing of temptations misses this most important change. The hyperlinked context of stories creates not only new temptations to go wrong, but a greater possibility for going right.

Tags: everythingIsMiscellaneous, experts, journalism, media, narrative, narratives, truth

Date: September 18th, 2009

8 Comments »

[berkman] Transforming Scholarly Communication

Lee Dirks [site] Director of Education and Scholarly Communication at Microsoft External Research is giving a Berkman-sponsored talk on “Transforming Scholarly Communications.” His group works with various research groups “to develop functionality that we think would benefit the community overall,” with Microsoft possibly as a facilitator. (Alex Wade from his group is also here.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He begins by noting the “data deluge.” But, compuing is stepping up to the problem: Massive data sets, evolution of multicore, and the power of the cloud. We’ll need all that (Lee says) because the workflow for processing all the new info we’re gathering hasn’t kept up with the amount we’re taking in via sensor networks, global databases, laboratory instruments, desktops, etc. He points to the Life Under Your Feet project at Johns Hopkins as an example. They have 200 wireless computers, each with 10 sensors, monitoring air and soil temperature and moisture, and much more. (Microsoft funds it.) Lee recommends Joe Hellerstein’s blog if you’re interested in “the commoditization of massive data analysis.” We’re at the very early stages of this, Lee says. For e-scientists and e-researchers, there’s just too much: too much data, too much workflow, too much “opportunity.”


We need to move upstream in the research lifecycle: 1. collect data and do research, 2. author it, 3. publish, and then 4. store and archive it. That store then feeds future research and analysis. Lee says this four-step lifecycle needs collaboration and discovery. Libraries and archives spend most of their time in stage 4, but they ought to address the problems much early on. The most advanced thinkers are working on these earlier stages.


“The trick there is integration.” Some domains are quite proprietary about their data, which makes it problematic to get data and curation standards so that the data can move from system to system. From Microsoft’s perspective, the question is how can they move from static summaries to much richer information vehicles. Why can’t a research reports be containers that facilitate reproducible science? It should help you use your methodology against its data set. Alter data and see the results, and then share it. Collaborate real time with other researchers. Capture reputation and influence. Dynamic documents. [cf. Interleaf Active Documents, circa 1990. The dream still lives!]


On the commercial side, Elsevier has been running an “Article of the Future Competition.” Other examples: PLoS Currents: Influenza. Nature Preceedings. Google Wave. Mendeley (“iTunes for academic papers”). These are “chinks in the armor of the peer review system.”


Big changes, Lee says. We’ll see more open access and new economic models, particularly adding services on top of content. We’ll see a world in which data is increasingly easily sharable. E.g., the Sloan Digital Sky Survey ios a prototyupe in data publishing: 350M web hits in 6yrs, 930k distinct users, 10k astronmers, delivered 100B rows of data. Likewise, GalaxyZoo.org at which the public can classify galaxies and occasionally discover a new object or two.


Lee points to challenges with data sharing: integrating it, annotating, maintaining provenance and quality, exporting in agreed formats, security. These issues have stopped some from sharing data, and have forced some communities to remain proprietary. “The people who can address these problems in creative ways” will be market leaders moving forward.


Lee points to some existing sharing and analysis services. Swivel, IBM’s Many Eyes, Google’s Gapminder, Freebase, CSA’s Illustra…


The business models are shifting. Publishers are now thinking about data sharing services. IBM and RedHat provides an interesting model: Giving the code away but selling services. Repositories will contain not only the full text versions of reserach papers, but also “gray” literature “such as technical reports and theses,” and real-time streaming data, images and software. We need enhanced interoperability protocols.


E.g., Data.gov provides a searchable data catalog that provides access through the raw data and using various tools. Lee also likes WorldWideScience.org, “a global science gateway” to international scientific databases. Sxty-sevenety countries are pooling their scientific data and providing federated search.


Lee believes that semantic computing will provide fantastic results, although it may take a while. He points to Cameron Neylon’s discussion of the need to generate lab report feeds. (Lee says the Semantic Web is just one of the tools that cojuld be used for semantics-based computing,.) So, how do we take advantage of this? Recommender systems, as at Last.fm and Amazon. Connotea and BioMedCentral’s Faculty of 1000 are early examples of this [LATER: Steve Pog's comment below says Faculty of 1000 is not owned by BioMedCentral] . Lee looks forward to the automatic correlation of scientific data and the “smart composition of services and functionality,” in which the computers do the connecting. And we’re going to need the cloud to do this sort of thing, both for the computing power and for the range of services that can be brought to bear on the distributed collection of data.


Lee spends some time talkingabout the cloud. Among other points, he points to SciVee and Viddler as interesting examples. Also, SmugMug as a photo aggregator that owns none of its own infrastructure. Also Slideshare and Google Docs. But these aren’t quite what researchers need, which is an opportunity. Also interesting: NSF DataNet grants.


When talking about preservation and provenance, Lee cites DuraSpace and its project, DuraCloud. It’s a cross-repository space with services added. Institutions pay for the service.


Lee ends by pointing to John Wilbanks‘ concern about the need for a legal and policy infrastructure that enables and encourages sharing. Lee says that at the end of the day, it’s not software, but providing incentives and rewards to get people to participate.


Q: How soon will this happen?
A: We can’t predict which domains will arise and which ones people will take to.


Q: What might bubble up from the consumer sector?
A: It’s an amazing space to watch. There are lots of good examples already?


Q: [me] This is great to have you proselytizing outside. But as an internal advocate inside Microsoft, what does Msft still have to do, and what’s the push back?
A: We’ve built 6-8 add-ins for Word for semantic markup, scholarly writing, consumption of ontologies. A repository platform. An open source foundation separate from Micrsooft, contributing to Linux kernel, etc.

Q: You’d be interested in Dataverse.org.
A: Yes, it sounds like it.


Q: Data is agnostic, but how articles aren’t…
A: We’re trying to figure out how to embed and link. But we’re also thinking about how you do it without the old containers, on the Web, in Google Wave, etc.
Q: Are you providing a way to ID relationships?
A: In part. For people using their ordinary tools (e.g., Word), we’re providing ways to import ontologies, share them with the repository or publisher, etc.


Q: How’s auto-tagging coming? The automatic creation of semantically correct output?
A: We’re working on this. A group at Oxford doing cancer research allows researchers to semantically annotate within Excel, so that the spreadsheet points to an ontology that specifies the units, etc. Fluxnet.org is an example of collaborative curation within a single framework.


Q: Things are blurring. Traditionally libraries collect, select and preserve schoilarly info. What do you think the role of the library will be?
A: I was an academic librarian. In my opinion, the safe world of collecting library journals has been done. We know how to do it. The problem these days is data curation, providing services, working with publishers.
Q: It still takes a lot of money…
A: Definitely. But the improvements are incremental. The bigger advances come further up the stream.

Q: Some cultures will resist sharing…
A: Yes. It’ll vary from domain to domain, and within domains. In some cases we’ll have to wait a generation.


Q: What skills would you give a young librarian?
A: I don’t have a pat answer for you. But, a service orientation would help, building services on top of the data, for example. Multi-disciplinary partnerships.


Q: You’re putting more info online. Are you seeing the benefit of that?
A: Most researchers already have Microsoft software, so we’re not putting the info up in order to sell more. We’re trying to make sure researchers know what’s there for them.

Tags: everythingIsMiscellaneous, microsoft, open access, publishing, research, science, standards

Date: September 18th, 2009

8 Comments »

September 17, 2009

 

Reuse metadata, don’t reinvent it

John Udell has a lovely post talkingabout an interview with Ian Forrester of the BBC who cites Tom Scott using a phrase from Michael Smethurst: “The simple joy of webscale identifiers.” The point is that if someone has invented an identifier for an object and you want to point to it, use the existing identifier. That enables a namespace conglomerating that keeps information all huddled and cozy, rather than drifting apart on ice floes.

Tags: everythingIsMiscellaneous, metadatas

Date: September 17th, 2009

Be the first to comment »

September 13, 2009

 

From Technorati to WordPress tag namespace

The excessively sharp-eyed of you may have noticed that I have recently switch from listing tags at the end of posts to using WordPress tags at the end of posts. Here’s why. Not that you should care.

When tagging first took off, there weren’t a lot of good places to link your tags to. So, I chose to have them link to Technorati because Technorati was then the leading search engine for blogs. Plus, Technorati had taken the lead in making itself tag-worthy. Plus, Technorati was founded by a friend of mine — David Sifry — who I trusted (and still do trust) to do the Right Thing. Also, I was on the Technorati board of advisers (uncompensated), so I had some basic familiarity with the site and the the people. As a result, when you click on one of my old-style tags, it does a search for tags at Technorati and shows you the results. For example, here’s a tag to try: [Tags: taxonomy ].

A couple of years ago, Word Press — the blogging software I use — introduced its own tagging capability. Instead of my having to hand-create links to the tags I want to use (actually, I wrote a little javascript to do it for me), I can enter tags and Word Press will turn them into links that aggregate all of my own postings that I’ve tagged that way. At the bottom of this post, you can try out the taxonomy link.

This is a further step into narcissism, for rather than seeing what the rest of the world has tagged “e-gov” (or whatever), you now see only my posts tagged that way. But I suspect that is probably what most users expect and want when they click on a tag at the bottom of a post. If you want to search all posts by everyone that have a certain tag, Technorati and other sites will do it for you.

(By the way, many thanks to Brad Sucks for writing the scripts that extracted my old tags and auto-inserted them as Word Press tags. He says the scripts are too focused to be of general use, so don’t ask. But do buy his music.)

Tags: everythingIsMiscellaneous, tags, taxonomy, technorati, word press

Date: September 13th, 2009

3 Comments »

September 6, 2009

 

Data and metadata: Together again

Terry Jones has an excellent post that lists the problems introduced by maintaining a hard distinction between metadata and data.

Terry cites Everything Is Miscellaneous (thanks, Terry), which argues that the distinction, which is hard-coded in the Age of Databases, becomes a merely functional difference in the Age of Messy Links: Metadata is what you know and data is what you’re looking for. For example, the year of a CD is metadata about the CD if you know the year a Bob Dylan CD came out but you don’t remember the title, and the title can be metadata if you know the title but want to find the year. And in both cases, it could all be metadata in your search for lyrics.

This is all very squishy and messy because the distinction is, as Terry says, artificial. It comes from thinking about experience as content that gets processed, as if we worked the way computers do. More exactly, it comes from thinking about experience as a set of Experience Atoms that then have to be assembled; metadata are the labels that tell you that Atom A goes into Atom Z. But experience is far more like language than like particle physics or Ikea assembly instructions. And that’s for a very good reason: linguistic creatures’ experience cannot be understood apart from language. Language doesn’t neatly separate into content and meta-content. It all comes together and it’s all intertwingled. Language is so very non-atomic that it makes atoms realize how lonely they’ve been.

That doesn’t mean that computer software that separates metadata from data is useless. Lord knows I love a good database. But it also means that computer software that can treat anything as metadata depending on what we’re trying to do opens up some interesting possibilities…

[Tags: everything_is_miscellaneous fluiddb metadata databases language ]

Tags: databases, everythingIsMiscellaneous, everything_is_miscellaneous, fluiddb, knowledge, language, metadata, philosophy

Date: September 6th, 2009

3 Comments »

Evolution of Evolution

Ben Fry posts an amazing visualization of the changes in the six editions of Darwin’s Origin of Species, based on meticulous work done by Dr. John van Wyhe and others. From Ben’s introductory text:

The second edition, for instance, adds a notable “by the Creator” to the closing paragraph, giving greater attribution to a higher power. In another example, the phrase “survival of the fittest” — usually considered central to the theory and often attributed to Darwin — instead came from British philosopher Herbert Spencer, and didn’t appear until the fifth edition of the text.

[Tags: darwin evolution drafts everything_is_miscellaneous transparency ]

Tags: darwin, drafts, everythingIsMiscellaneous, everything_is_miscellaneous, evolution, metadata, science, transparency

Date: September 6th, 2009

1 Comment »

September 4, 2009

 

The price of free law

The latest Radio Berkman episode has me interviewing Steve Schultze about his RECAP project that posts public domain legal records that otherwise you’d have to pay to access. And the federal courts are not all that happy about it.

[Tags: law public_domain pacer recap copyright copyleft everything_is_miscellaneous ]

Tags: copyleft, copyright, digital rights, everythingIsMiscellaneous, everything_is_miscellaneous, law, pacer, public_domain, recap

Date: September 4th, 2009

Be the first to comment »

Google Books metadata meta-wreck

Geoff Nunberg has a fantastic post warning about the poor quality of the metadata attached to the books Google is scanning into its soon to be dominant-to-the-point-of-monopoly digital library. Apparently, the attempt to gather metadata automatically from the scans has resulted in the introduction of legions of errors. But the real problems are, as Geoff points out, that Google seems not to have a plan for dealing with this problem and that it has not opened up the metadata design process.

[Tags: google_books libraries metadata worldcat everything_is_miscellaneous ]

Tags: everythingIsMiscellaneous, everything_is_miscellaneous, google_books, libraries, metadata, worldcat

Date: September 4th, 2009

4 Comments »

September 2, 2009

 

Wikipedia’s bio policy explained

Billy Barnes explains what’s really going on with Wikipedia’s new process for editing the biographies of living people.

What the media reported: In response to vandalism of bios, Wikipedia is not allowing any edits to bios of living people to be posted before they have been reviewed by trusted editors. (Implication: Wikipedia has failed at its mission of completely open, ungoverned editing [which of course isn't Wikipedia's mission].)

What actually is happening: Wikipedia has a two month trial of a “patrolled revisions” system that lets a reviewer (and I’m not sure who is in that class) set a flag on a bio of a living person to indicate that that particular version is vandalism free. According to the Wikipedia page describing this: “Currently, the number of edits to BLPs [biographies of living people] is so large that we don’t have the power to check all of them. This system allows us to monitor changes to BLPs by reducing the number of diffs to check by comparing new edits to previously patrolled revision.”

Does this mean that if you make a change to a living bio, it first has to be marked as approved before it will be posted? Not as far as I can tell: ” Patrolling does not affect the revision viewed by unregistered users by default, it’s always the latest one (unless the article is flag protected).” In fact, Jimmy Wales has said (on an email list I’m on) that the aim of this change is to use more efficient patrolling to enable some pages that have been locked to once again be editable by any user. That’s more or less the opposite of what the media coverage said. And, I hasten to add, what slashdot and, um, I said about it. (And I hope I’m getting it right this time…)

[Tags: wikipedia ]

Tags: digital culture, everythingIsMiscellaneous, wikipedia

Date: September 2nd, 2009

4 Comments »

August 31, 2009

 

Copyright’s creative disincentive

Tucows is participating in the Canadian copyright consultation process. Rather than submitting a comment written in the usual lawyerly prose, Elliot Noss, Tucow’s CEO, asked me to write up something about copyright in my usual imprecise and incoherent prose. I like Elliot a lot, and I care about copyright, so I wrote about the argument that without strong copyright protection, creators won’t have an incentive to create. The piece is now posted… [The next day: I absolutely should have mentioned that this was a commissioned piece. I.e., Elliot paid me to write something, and posted it unaltered.]

[Tags: copyleft copyright culture canada everything_is_miscellaneous ]

Tags: canada, copyleft, copyright, culture, digital rights, everythingIsMiscellaneous, everything_is_miscellaneous, policy

Date: August 31st, 2009

13 Comments »

August 26, 2009

 

Encyclopedia of Life – Now by Humans!

The Encyclopedia of Life is encouraging citizen contributions to its experts-vetted pages, so far with what seem like excellent results. There’s a good article about this at Science Daily. After two years, they’ve got 150,000 species pages underway, with 1.4 million stubs awaiting drafting.

[Tags: crowdsourcing everything_is_miscellaneous science biology taxonomy ]

Tags: biology, crowdsourcing, everythingIsMiscellaneous, everything_is_miscellaneous, science, taxonomy

Date: August 26th, 2009

4 Comments »

August 25, 2009

 

Wikipedia’s tactical change mistaken for strategic

At the English language version of Wikipedia now, changes to articles about living people won’t be posted until a Wikipedian has reviewed it. Those articles are now moderated. (See Slashdot for details and discussion.)

I am surprised by the media being surprised by this. Wikipedia has a complex set of rules, processes, and roles in place in order to help it achieve its goal of becoming a great encyclopedia. (See Andrew Lih’s The Wikipedia Revolution‘, and How Wikipedia Works by Phoebe Ayers, Charles Matthews, and Ben Yatesfor book-length explanations.) This new change, which seems to me to be a reasonable approach worth a try, is just one more process, not a signal that Wikipedia has failed in its original intent to be completely open and democratic. In effect, edits to this class of articles are simply being reviewed before being posted rather than after.

The new policy is only surprising if you insist on thinking that Wikipedia has failed if it isn’t completely open and free. No, Wikipedia fails if it doesn’t become a great encyclopedia. In my view, Wikipedia has in many of the most important ways succeeded already.

PS: If you think I’ve gotten this wrong, please please let me know, in the comments or at selfevident.com, since I’ll be on KCBS at 2:20pm EDT to be interviewed about this for four minutes.

[Tags: wikipedia ]

Tags: digital culture, everythingIsMiscellaneous, knowledge, wikipedia

Date: August 25th, 2009

5 Comments »

August 19, 2009

 

Dilbert goes miscellaneous

Amusing Dilbert today, for those who can’t resist a good taxonomy joke. (Thanks for the tip, Helena!)

[Tags: everything_is_miscellaneous comics dilbert humor taxonomy ]

Tags: comics, dilbert, everythingIsMiscellaneous, everything_is_miscellaneous, humor, taxonomy

Date: August 19th, 2009

1 Comment »

August 18, 2009

 

RecapTheLaw.org

RecapTheLaw.org has a Firefox extension that both gives access to public docket records and makes them actually publicly accessible. The courts charge for access to these dockets, including every time you search and for every page of search results. The system is called PACER. RECAP gives you access to PACER (and is PACER spelled backwards). When you use RECAP to view a docket through PACER, RECAP uploads it into the Internet Archive, since the docket info is in the public domain even though the courts charge you for accessing it. The next time someone goes through RECAP to find that docket, she’ll get it for free from the Internet Archive. RECAP also adds helpful headers and other metadata.

RecapTheLaw comes out of the Princeton Center for Information Technology Policy. Well done!

[Tags: law courts dockets ]

Tags: courts, digital rights, dockets, egov, everythingIsMiscellaneous, expertise, law, metadata

Date: August 18th, 2009

2 Comments »

August 14, 2009

 

Search Pidgin

I know I’m not the only one who’s finding WolframAlpha sometimes frustrating because I can’t figure out the magic words to use to invoke the genii. To give just one example, I can’t figure out how to see the frequency of the surnames Kumar and Weinberger compared side-by-side in WolframAlpha’s signature fashion. It’s a small thing because “surname Kumar” and “surname Weinberger” will get you info about each individually. But over and over, I fail to guess the way WolframAlpha wants me to phrase the question.

Search engines are easier because they have already trained us how to talk to them. We know that we generally get the same results whether we use the stop words “when,” “the,” etc. and questions marks or not. We eventually learn that quoting a phrase searches for exactly that phrase. We may even learn that in many engines, putting a dash in front of a word excludes pages containing it from the results, or that we can do marvelous and magical things with prefaces that end in a colon site:, define:. We also learn the semantics of searching: If you want to find out the name of that guy who’s Ishmael’s friend in Moby-Dick, you’ll do best to include some words likely to be on the same page, so “‘What was the name of that guy in Moby-Dick who was the hero’s friend?’” is way worse than “Moby-Dick harpoonist’.” I have no idea what the curve of query sophistication looks like, but most of us have been trained to one degree or another by the search engines who are our masters and our betters.

In short, we’re being taught a pidgin language — a simplified language for communicating across cultures. In this case, the two cultures are human and computers. I only wish the pidgin were more uniform and useful. Google has enough dominance in the market that its syntax influences other search engines. Good! But we could use some help taking the next step, formulating more complex natural language queries in a pidgin that crosses application boundaries, and that isn’t designed for standard database queries.

Or does this already exist?

Tags: search pidgin nlp natural_language_processing google everything_is_miscellaneous

Tags: everythingIsMiscellaneous, everything_is_miscellaneous, google, metadata, natural_language_processing, nlp, pidgin, search

Date: August 14th, 2009

3 Comments »

August 11, 2009

 

The universality of names

There’s a terrific article by Carol Kaesuk Yoon in the NY Times about research that shows that humans around the world tend to cluster the natural world in highly similar ways, even using similar-ish names.

[Tags: everything_is_miscellaneous taxonomy ]

Tags: everythingIsMiscellaneous, everything_is_miscellaneous, folksonomy, taxonomy

Date: August 11th, 2009

1 Comment »

August 9, 2009

 

Twitterelevancy

With it’s new Fresh view, Delicious builds on the TweetNews idea of using links in Tweets (and other measures) as a way to find what’s newest and most interesting. As the blog post about it says:

Underneath the hood, Fresh factors several features into the ranking like related bookmark and tweet counts, “eats our own dogfood”  by leveraging BOSS to filter for high quality results, as well as stitches tweets to related articles even if the tweets do not provide matching URLs (as ~81% of tweets do not contain URLs). Try clicking the ‘x Related Tweets’ link for any given story to see the Twitter conversation appear instantly inline.

It’s a welcome reslicing, not a whole new beast, but it seems useful.

[Tags: delivious everything_is_miscellaneous twitter news ]

Tags: delivious, everythingIsMiscellaneous, everything_is_miscellaneous, metadata, news, social networks, tagging, twitter

Date: August 9th, 2009

1 Comment »

Next Page »



Web Joho

RSS Feed:
http://www.hyperorg.com/
blogger/index.rdf

Copy this link as RSS address

Subscribe to feed of this blog READ ALOUD by ReadSpeaker

Subscribe to my free, intermittent newsletter

Radio Berkman interviews
Weekly interviews

 

The Berkman-Wired
Miscellaneous Podcasts

A series of interviews with very smart people on topics in David Weinberger's book. (Sponsored by Wired.com and the Berkman Center.)

Click to display

Cory "BoingBoing, Activist, Writer" Doctorow
Markos "DailyKos" Zuniga
Arianna "HuffingtonPost" Huffington
Neil DeGrasse "Astrophysicist" Tyson
Jimmy "Wikipedia" Wales
Craig "sList" Newmark
Paul "Kayak" English
Richard "BBC World Service" Sambrook

Featured Writings

Cluetrain Manifesto
World of Ends
Andrew Keen's Best Case
From Trees to Leaves (Tagging)
The Unspoken of Groups
Myth of Interference
Open Spectrum and OS FAQ
NetParadox
China Blog
W's Psychology
The History of My Face
NPR Commentaries

'Zine
JOHO

Columns
KMWorld

Trademarked Trademarks

Creative Commons License
Joho the Blog by David Weinberger is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. Share it freely, but attribute it to me, and don't use it commercially without my permission.

Joho the blog uses WordPress blogging software.
Thanks, WordPress!