November 18, 2007
Future of books
Aargh. Steven Levy's excellent article on the new Amazon e-reading device came out a day before I was about to send out the new issue of my newsletter, the main article of which is about the future of books. I hate when that happens!
Well, I'll send it out anyway, and will link to it here tomorrow. Damn the pace of human events!
Posted by self at 02:09 PM | Comments (4) | TrackBack
November 14, 2007
Crowd cover
Jay Rosen has another initiative launching today: Enabling a dozen beat reporters to have a social network composed of people who know the topic and have an interest in having the coverage be thorough, accurate, and deep. Very cool experiment.
Posted by self at 11:30 AM | Comments (0) | TrackBack
November 12, 2007
Webbifying Dewey
The estimable Lorcan Dempsey of the OCLC points to a presentation by Michael Panzer (also of the OCLC) about how to "webbify" the Dewey Decimal System.
The question Michael addresses is how to take the Dewey Decimal Classification system to the "networked level," defined as "Infrastructural improvements to make a KOS [Knowledge Organization System] web-scale accessible, to make sharing, syndicating, leveraging of its data feasible." He begins by scoping the problem. He then talks about the issues in webbifying the DDC, which he boils down to three: URI design, caption design, and format considerations.
He proposes a scheme for URI's (which, especially in the condensed form of a PowerPoint presentation I don't fully understand, but are probably beyond me even if spelled out), with examples such as http://dewey.info/concept/338.4/en/edn/22/. Notice the DDC number after the "concept" designation.
Captions he acknowledges depend on context, and with Web services (Michael points out), one cannot always know the context in which one's captions are going to be used. He also discusses the importance of maintaining the hierarchy, but the bullet points are too compressed. (Not a criticism. The PowerPoint deck wasn't intended to be self-standing, and I don't know enough to be able to fill in all the missing context.)
To the third point, he looks at adopting either the MARC 21 or (and?) SKOS formats.
As Lorcan says, "This is part of an ongoing investigation of what it means to release more of the value of 'classic large-scale vocabularies' in a web environment." There's lots of info packed into Dewey's system. How can we best liberate that info?
Posted by self at 06:36 PM | Comments (3) | TrackBack
November 10, 2007
Dave Snowden: From fragments to sense
Terrific post by Stu Henshall about what sounds like a fantastic talk by Dave Snowden (whose blog is here) at KMWorld. Dave combines the broad and deep with the incisive and the practical. Yikes! (Don't miss the four posts from Dave that Stu points to as "must reads.")
Posted by self at 07:36 AM | Comments (2) | TrackBack
November 05, 2007
Open access to Journal of Neglected Tropical Diseases
Public Library of Science has started yet another open access journal. This one, appropriately enough, is the PLoS Journal of Neglected Tropical Diseases. PLoS is a peer-reviewed journal that limits what it publishes to what it considers to be the best and most important articles. According to A Blog around the Clock, written by the online community manager at PLoSOne, the inaugural issue is fully international, and the site is now using TOPAZ software that enables comments, annotations, ratings and trackbacks. It will also take an interdisciplinary approach because, as WHO's director general Margaret Chan writes in a guest commentary:
Although these diseases have been overshadowed by better-known conditions, especially the "big three"--HIV/AIDS, malaria, and tuberculosis--evidence collected in the past few years has revealed some astonishing facts about the NTDs. They are among the most common infections of the poor--an estimated 1.1 billion of the world's 2.7 billion people living on less than US$2 per day are infected with one or more NTDs. When we combine the global disease burden of the most prevalent NTDs, the disability they cause rivals that of any of the big three. Moreover, the NTDs exert an equally important adverse impact on child development and education, worker productivity, and ultimately economic development. Chronic hookworm infection in childhood dramatically reduces future wage-earning capacity, and lymphatic filariasis erodes a significant component of India's gross national product. The NTDs may also exacerbate and promote susceptibility to HIV/AIDS and malaria.
PLoS is trying to be a high-quality, recognized journal, and there's value in that. It therefore limits what it publishes to what pases peer review and is deemed important. PLoS One, on the other hand, publishes anything that passes its peer review process even if the topic is relatively minor. I wonder: Do all articles that pass PLoS' peer review but that don't make it into PLoS get sent over to an appropriate PLoS One journal, if there is one, and if the authors agree?
Anyway, neglected tropical diseases is a perfect topic for an open access journal. But, then, I sort of think everything is.
Posted by self at 07:19 AM | Comments (1) | TrackBack
November 04, 2007
What's unspoken between us
I'm giving the opening talk at Defrag tomorrow, and for some reason I insist on talking about the implicit. I keep coming back to this topic, and I still don't get it right. Here are the notes for my talk; they accompany a deck, which might explain their sketchiness. You may notice bits I've talked about before, but much of this is new...and at least this audience isn't going to have to watch my "Everything Is Miscellaneous" talk again.
Here goes:
At Defrag we’re talking about how we can put the pieces back together. The pieces aren’t broken because the original order is there. But now we can ALSO arrange them the way we want.
I want to talk about the role of the implicit, because as we put pieces together, the way we do it is more in service of what isn’t said -- it’s more mysterious than we sometimes think, and we should be humble about our ability to piece ourselves together.
I’ve decided to call it the unspoken because the implicit is about what we don’t see or don’t know, whereas the unspoken says that what isn’t there has to do with language and meaning.
This talk is divided into five moments of the unsaid
#1
[I'll read the following poem:]
Blue Hydrangea
Like the green that cakes in a pot of paint,
these leaves are dry, dull and rough
behind this billow of blooms whose blue
is not their own but reflected from far away
in a mirror dimmed by tears and vague,
as if it wished them to disappear again
the way, in old blue writing paper,
yellow shows, then violet and gray;
a washed-out color as in children's clothes
which, no longer worn, no more can happen to:
how much it makes you feel a small life's brevity.
But suddenly the blue shines quite renewed
within one cluster, and we can see
a touching blue rejoice before the green.
Rainer Maria Rilke
William H. Gass, trans.
Look at how much isn’t said in that line. We wash clothes, and they become more our own as they lose their color. That’s something we know implicitly. We know that clothes need washing.
The next line makes explicit that Rilke is thinking of clothing folded and put away for a child who has grown. Rilke is giving us increasing degrees of explicitness. Poet has to get this right.
But, computers are explicit. At the hex level, the poem is unambiguous and explicit
Even more explicit at the bit level. Anything left unsaid is simply undone when it comes to bits.
Computers began as engines of the explicit.
In the 1950s, they were the symbol of reducing life to data, and thus were symbols of conformity - we had to conform ourselves to their needs.
There was truth to the old Hollywood view. We all know that computers have reduced us. We look like this, but to the database we look like this,
We have allowed ourselves to be informationalized - thoroughly reconceived in terms of information
Information has even somehow been added to the basic mix of how we understand ourselves, as if we had a flesh and blood organ that processes information.
But, the Web is different from fifties computers. The Web links one page to another, but does so through language...the language of the anchor text as well as the words around it that contextualize it.
Hyperlinks are the opposite of information. They enrich, rather than reduce. Open-ended, decentralized, messy… all the things databases of info are not. Most of all, they are social...
...They are done for someone by someone. Linking is a type of writing. We link for some anticipated set of readers.
So, the Web works against the regime of informationalization.
Rashi said [I can't find the reference] about dogs that contact with humans ensouls them. That’s what we’re doing with computers, in a way.
Which is so different from where we thought computers were going in the Fifties. We thought in fact that computers as engines of informationalization when they became human, as with HAL in “2001,” they’d be demonic precisely because they grew up alone, in a world of mere information.
#2
I can’t tell you everything about my children. If I could, something would be wrong with our relationship.
If everything about a character can be expressed by saying she’s the dumb blond or the wisecracking sidekick, the character has failed. So, I can’t tell you everything about my children. But here’s what our relationship looks like to Facebook, when my son friended me. [The form with the categories of relationships]
This is a poor beginning. But it’s just the beginning.
We quickly ensoul Facebook by what’s said, and by what isn’t said, just as with all human relationships.
Judith Donath talks about this in terms of signaling...
...which we could also think of as gesturing. The value often isn’t in what’s said, but in what isn’t said ... the gesture, unintended or intended (Tommie Smith, 1968). It is hard to exhaust the meaning of such a gesture. It is hard to say what it gestures to.
#3
In an informationalized age, we think we are always giving off information. We used to see a street ...
… as a flow and eddies of publicness and privacy -- unfathomably rich with the implicit. That’s why we can sit at a sidewalk cafe and watch the river.
But now we think it’s all information, and all is information is alike. The surveillance cameras can’t tell the interesting bits from the uninteresting. It’s all explicit. That’s why we’re ok with 5000,000 surveillance cameras in London. The private has gone from what is kept off the record to, now that everything is on the record, what we’re allowed to pay attention to on the record. We may trust our government to see the right statistical correlations, but we can see beyond the statistics. We know there's more there. But why?
#4
We understand things through their potential. We simply don’t understand what an acorn is if we don’t see that it’s a potential oak tree, even though statistically, most acorns will rot in the ground.
Compare that to ["If you can dream it, you can be it," which claims all is possible. There’s got to be a better way to give our children hope than to lie to them.
Compare this to Rilke's lines about the child, in which we grieve the loss of potential, even when the potential is actualized, as when children grow up.
That’s not to say we’re good at understanding potential itself. For example, both sides in the abortion debate are prone to get this wrong. The pro-choice people have been known to refer to an embryo as a mere lump of flesh, as a growth. The anti-choice folks confuse the potential of the fetus with its actuality, thinking of abortion as the murder of a person. We’re not very good at understanding potential. Both are wrong. The fetus is a potential person, although that doesn’t help you resolve the debate, because we don’t know what rights are owed to lumps of flesh that can grow into into personhood.
We can informationalize potential and make statistical guesses, which may be quite accurate.
We can even teach a computer about potential. Doug Lenat’s CYC is trying to teach a computer all that we know without having to speak it -- that clothes have to be washed, and that washed clothes sometimes lose their color. It’s quite difficult to utter everything you know. CYC uses teams of philosophy PhD’s, for well over a decade. Yet even if CYC passes the Turing test about children’s clothing, we know something is missing. What?
Potential is lumpy. The world shows itself to us in those lumps. What turns the statistical homogeneity of possibility into the curds of potential?
#5
Rilke shows us something about old blue writing paper, and leaves most of it unsaid: That there is connection to hydrangea and to childhood. That the decomposition of time can reveal what was there but hidden. That the natural world and the world of art are not separate. But there is a world of possible connections Rilke could make. He chooses to make some of them apparent. He lets the world show in terms of what matters. Mattering makes possibility lumpy. The fact that we care about the world creates the lumps of potential. That’s the difference between us and CYC. It’s not simply that we care and CYC doesn’t. It’s that our caring creates a shared unspoken that is the source of meaning and value. We have divided the world into lumps because it matters, because we care.
It is ultimately language that is the unspoken between us. Language is driven by what matters to us. We have words, sentences, paragraphs, punctuation.... That’s the shared lumpiness of the the unsaid. And now we have links. Links that have presence and persistence.
Our brains discriminate edges, but we we also are fascinated by the transcendence of edges. The value is in the complex, the loose-edged, the potential, the unspoken, because that is what we share and how the world matter to us.
Defrag -- our generational project, not just this conference -- isn’t about reassembling pieces. It’s not about clarity and simplicity. It’s about how we are finding ways to let the world matter to us together. For that we need to enable, cherish, and protect the unspoken between us.
Posted by self at 07:34 PM | Comments (12) | TrackBack
November 03, 2007
OpenSocial explained
Marc Andreessen has a terrific post explaining the nature and effect of the OpenSocial standard promulgated by Google and a gaggle of social networking sites ... an EBFB (everyone but facebook) coalition. The API allows access to profiles, networks and apps, so that a developer can write an application that will run within any social networking site that supports OpenSocial.
Thus, the walled garden approach will at least allow us to move among the walled gardens. [Tags: opensocial social_networking_sites marc_andreessen google facebook everything_is_miscellaneous]
Posted by self at 01:16 AM | Comments (4) | TrackBack
October 29, 2007
Blogclouds
While chatting with Chris Heuer today — one of those f2f chats like they used to have when our homes were lit with whale oil — we had an idea. We were talking about blogrolls and Grazr. I was complaining (about myself, to myself) that I don't update my blogroll hardly ever, and when I do, I find it to be a psycho-socially fraught activity. On the other hand, the blog-writing software I use logs every link I make, along with the date and the anchor text. So, imagine if you will a blogroll that consists of the places I link to most often, the places I've linked to most recently, the anchor text phrases I've linked to most often, or (if I started logging more data about each link) the places I've linked to sorted by the tags I've applied to the posts the links are in. And a lot more, too. Imagine aggregating all this socially.
Why not? It's just technology. [Tags: blogroll blogs chris_heuer everything_is_miscellaneous]
Posted by self at 08:16 PM | Comments (3) | TrackBack
October 28, 2007
Is the Web as weak as its weakest link?
Donnacha DeLong argues that "Web 2.0 is rubbish" in an article in The Journalist, the National Union of Journalists' magazine. The article argues against wiping out traditional media and replacing it with citizen journalism, which is not a position a lot of people hold. He concludes:
There are those who claim that Web 2.0 democratises the media. It would make everyone equal, yes, but should they be? It’s like saying anyone can play for Manchester United. In one of the main examples given to explain Web 2.0, Wikipedia replaces Britannica Online. Is that the kind of democracy we want – where anyone can determine the information that the public can access, regardless of their level of knowledge, expertise or agenda?
Oh sigh. This commits two fallacies.
First, it equivocates on "equal." No one argues that all blog posts and all bloggers are of equal value. That's why we have blogrolls. Hell, that's why we have links. But, we all (well, all with economic means, physical access, etc.) have an equal ability to post. Equal access to post != equal value of posts.
Second, Donnacha ignores the social dynamics, as if Wikipedia (for example) were nothing but a series of posts by random individuals. In fact, Wikipedia results from a complex social dynamic and set of processes designed to move articles towards encyclopedic goodness. We can argue about whether those processes work and whether Wikipedia is reliable, and so forth, but Donnacha ignores those processes altogether. In fact, the processes are designed to keep all entries from being treated as equal.
Donnacha acts as if the Web were as weak as its weakest link because we can't tell the difference between weak and strong links. In fact, the Web at its best is stronger than its strongest links, because those links get tempered through the exposure to multiple points of view. Of course the Web isn't always at its best, and Donnacha is right to remind us of that. But perhaps this is Donnacha's third fallacy: Citizen journalism is not "everybody writes what they want and we have to read it all as if it were all of equal value," just as Wikipedia isn't just a big blank scratch pad with publicly available pencils. Citizen journalism is founded on the idea that while many people can contribute, we need ways to surface what is of value. Everyone working in the field of citizen journalism understands Donnacha's objection. Donnacha's complaint isn't a criticism of citizen journalism. It is citizen journalism's starting point.
The fact that Donnacha's credit at the end of the article reports that "He represents new media journalists on the union’s National Executive Council" is a bit scary. Indeed, veteran journalist Roy Greenslade resigned from the National Union of Journalists because of its attitude toward new media. Laura Oliver has an article about Roy's resignation here. (Thanks to Richard Sambrook for the link.)
Posted by self at 12:04 PM | Comments (2) | TrackBack
October 25, 2007
Finding what we can't spell
I went to look up Plumpy'nut - a peanut-butter based nutritional compound important where people are starving - at Wikipedia, but I thought it was spelled "plumpinut." The misspelling leads you to a Wikipedia dead end page. I thought about creating a "plumpinut" entry that does nothing but point to the Plumpy'nut entry. But somehow I don't think Wikipedia wants to have entries that are misspellings.
Anyone know the right way help people find the entries of terms they don't know how to spell?
Plumpy'nut, which essentially is a recipe, is patented. If you want to mix up peanuts 'n' stuff, you may be violating the law. Christine Gorman, at the Harvard Nieman Center, has a blog on the patenting of peanut butter + mixins.
Posted by self at 02:01 PM | Comments (5) | TrackBack
October 23, 2007
Berkman lunch: Aaron Swartz on Open Library
Aaron Swartz is giving a Berkman talk on the Open Library project. [As always, I'm typing quickly, missing stuff, getting things wrong. You can hear the whole thing as Media Berkman.]
The basic idea is to give each page a Web page that collects all the information about that book. Books have never had "a first class place on the web." They've been distributed across publishers' Web sites, etc.
The book pages are a "structured wiki." Wikipedia lacks the structure required to let computers access it. So, the OL wiki page has separate fields for all of the metadata about it. E.g., click on the author's name and you get a list of all the books the author has written.
It has to be really open, Aaron says. "This is something that has to be a collaboration among a lot of different people." They've brought in publishers, reviews, authors, etc. It's all available for free, for download or reuse. Anyone can use it.
When books are out of copyright, the OL brings in the full text, when available. But that raises issues about how people want to read books on line he says.
OL also wants to be able to point people to libraries that have copies of books. There are "Buy, borrow or download" options for every book (when possible).
Readers can review books on the site.
The first thing librarian argued about when they saw OL was what subject classification system to use. "We don't have to choose on the Internet. We can store all the category systems and let people choose which ones they want." Likewise with all the different identifiers, e.b., ISBN, OCLC numbers, OL identifiers. ("We have to make our own identifier system because we're going to have more books.")
Ferberization means connecting physical books to all the different abstractions, e.g., print runs, editions, translations, etc. The library world has focused primarily on the physical books on the shelves. "We're going to have to come up with new ways of expressing the relationships," including allowing people to create new relationships, e.g., this book is based on that one, this book refutes that one, this one replaces that one.
They'd like to be able to do print on demand, and mail you a physical copy. Also scan on demand: You pay some money and someone goes and scans it.
Amazon is doing something similar to OL. But Amazon is trying to sell you stuff and doesn't have good info about books that are out of print. Google Books has very few community features. And there's WorldCat from OCLC, but their business model depends on selling information. OL wants to be a public group available to everyone.
Q: English language only?
A: Right now we're English only but internationalization is a huge part of this. We want to get summaries in multiple languages as well as
Q: (terry martin - law school librarian) Journals?
A: Serials are the next task after this. Serials are more complex. They're in vast sets over long periods of time.
Q: (wendy) Fuzzy connections? Is West Side Story an adaptation of Romeo and Juliet?
A: Library systems are generally binary. We have lots of ways of connecting books but we haven't really done anything fuzzy.
Q: User-generated categories?
A: Sure. Tagging.
Q: (jpalfrey) We'd love to hear what you say about how a huge library, such as Harvard Law School Library could contribute...
Aaron now talks about the current status of the project. The software is working well, he says. They worried about it because it combines a database and a wiki in a new ways. They have about 10 million catalog records, including 6M from the Library of Congress and 5M from U of NC. They have about 400,000 full text copies, mainly from the Internet Archive. Publishers have been good about providing info. They're looking for collections of reviews. Publishing on-demand works well; they have machines that print and assemble books in about 5 mins. They're going to repopulate the New Orleans public library with the 400,000 books the OL has. OL wants more data. Also, they need more programmers. "If you love books, we'd love your help soon curating and annotating them."
Q: (sj klein) Interlibrary loan for books in copyright?
A: We want to do digital interlibrary loans. We scan a copy and send you the pdf. Some publishers seem ok with it. Some are going to go ahead with it, with us as their partner, for books you can't get in a bookstore but not yet out of copyright.
Q: (gene koo) The publishers are ok with it but the non-profit book association has problems with it?
A: For publishers, it's another way of promoting their books. They have Onyx Feeds in XML that promote their books. Libraries have been much more difficult, primarily because of the complicated bureaucracy and concerns about legal issues. It's been a long hard slog to persuade them to give us their records. Can any librarians here give us advice?
Q: International?
A: We're working on several countries. We know people in India. We're looking all the time for people who can help us with it.
Q: Are you working with delicious library, etc., to see if they can contribute?
A: We've been working mainly with LibraryThing.com. Delicious etc, generally aggregate existing library records.
Q: What are you doing to reach the social tipping point?
A: The plan is to do it in two phases. First, get the data into the right format. Second, we need to bring people in, getting them to contribute. We think that a lot will be pulled in through Google.
Q: (oliver goodenough) Money?
A: Mainly funded by the Internet Archive. We have a grant from California. We hope that long-term it will be funded through affiliate fees and some scanning on demand fees.
Q: What is the glue? I don't see a unique ID...
A: Working on it.
Q: (me) FRBR is pretty structured. But the number of ways we might want to connect things is open ended. How are you going to figure out the right way to have structured vs unstructured?
A: We'll start with something. We'll pick the ones we like. Then we hope the user community will emerge and figure out the right ways to categorize and connect.
Q: (tim spalding - librarything) Tagging allows for multiple categorizations and relationships. E.g., at librarything we got pressure to include more choices under gender. How to resolve?
A: Tough problem.
A: (terry martin) Some data is unambiguous. Author names should be unambiguous.
A: (aaron) It'd be good to have a shared point of view, as at Wikipedia.
Q: (sj) Are you hotlinking to any databases? I.e., not importing but doing calls.
A: When you have 10M records, you have to do the import. For price records, we'll do live queries.
Q: Frequently, wikipedia will put in a note to clarify ambiguous categorizations, e.g., a gender categorization that isn't right. But OL is more constrained
A: From the beginning we've faced the tension between reusable data and flexibility. Our compromise is that things are structured but can be changed on the fly for an individual entry or class of entries. The hope is that people don't change the names of the fields so the database remains reliable.
Q: (Terry martin) Greg Crain, 25 yrs ago you did something like this for a closed domain. Would you do it this way now?
A: (Greg) People don't care about books. They care about a poem or a chapter. Most of the world's expertise is distributed. How to take advantage of the distributed labor. Tricky question. Not just a means but an end. Wikipedia is the dog and the academy is the tail. How do you integrate the two? And it's not books, it's objects. E.g., we're dealing with the European museum classification system. The general issue is how you add more structure within the book.
A: (aaron) That's the hope. And it certainly comes up with journal articles, and songs where you want to point to a song within an album.
A: (greg) The important thing about what you're doing is that it's open.
Q: (sj) What about unpublished works?
A: You can scan them and upload the metadata. There's a bit of question about what belongs in the OL library, but we're not in a position to kick things out. Maybe we'll have metadata indicating that it's not a "real" book.
A: (oliver) This could become a self-publishing system.
Q: (me) And then doesn't it get spammed as people link their self-published book to existing books?
A: It's the Internet. Everything is spammed. If it happens, there will be spam fighters.
Q: Why won't OCLC give you the data?
A: We'd take it in any form. We'd be willing to pay. Getting through the library bureaucracy is difficult...
A: (terry) You need to find the right person at OCLC
A: We've talked with them at a high level and they won't give us any information. Too bad since they're a non-profit. Library records are not copyrightable. OCLC contractually binds libraries.
Q: (tim) The greatest thing about OL is that it's an OCLC killer. Libraries shouldn't pay for it. Why not just explicitly say that the enormous value is that libraries won't have to pay for cataloging records.
A: (librarian) Who's going to create the records?
A: They're created already. We just need to get a couple of libraries to provide their collections.
Q: (sj) OCLC culls and curates. OL will need this.
A: I'd love to talk about this with the OCLC more. Their mission is the same as ours, but they have this enormous revenue stream from the records. They've gotten more open maybe partially in response to us.
A: Why not just give OL the records?
Q: (terry) Because we have them from OCLC and we're contractually bound.
A: There's an exemption for providing them to non-profits.
A: (terry) Hmm. Maybe. It includes lots of journal records. But where does it take us? Do you have out of copyright books? I'm not particularly interested in promoting in-print commercial books.
A: Yes. Publishers are happy to hand over in-print data. The struggle is getting out of print books. Everyone at the project is more interested in out of print books. We want to pull people from the latest, hottest thing to the older and more interesting books. We're happy to link to already scanned collections.
Even if contracts allow you to distribute your records, wouldn't that annoy OCLC?
A: (terry) Nah.
Q: (sjklein) What happened to Wikicat?
A: It seems kind of dead.
A: How do you plan on promoting it once you open it up?
Q: We want to get ranked highly in Google. We're also talking about a partnership with Wikipedia. Right now, citing a book in Wikipedia is complex. We're working on letting you just search at OL and it populates the record.
Q: You will have solved the age old problem of where the ISBN number points to.
Q: (me) What do you need to succeed?
A: More data. More people contributing. More book lovers, like at LibraryThing.com. And a few more programmers.
Posted by self at 03:26 PM | Comments (5) | TrackBack
Debatepedia launches
Debatepedia wants to collect the best arguments pro and con for issues that matter. It's not a place for people to shout at each other. On the contrary, it aims at assembling reasoned arguments.
It's a noble idea. I don't know if it'll catch on, of course, but I do like the way the Web is shortening the MTBNI (mean time between noble ideas).
Posted by self at 07:56 AM | Comments (2) | TrackBack
October 21, 2007
Aaron Swartz on the Open Library project
If you're interested in the future of books and libraries, and if you're in Cambridge MA on Tuesday, you should come to the Berkman Center at 12:30 to hear Aaron Swartz talk about the Open Library project, which is gathering a global, open and free list of every book it can find out about. It's also attempting to help with the problem that books exist at multiple levels of abstraction: There's Hamlet, editions of Hamlet, Hamlet in anthologies, Hamlet in translation, books based on Hamlet, etc. This is an important and fascinating project.
We serve lunch. Please RSVP. See you there...or on the webcast. (Details)
Posted by self at 10:15 AM | Comments (1) | TrackBack
October 20, 2007
Alan Watts lives
Here's Alan Watts talking to IBM (1 2), probably in the early 1970s, although I'm just guessing. Very Alan Wattsian, very Sixties yet contemporary, and very enjoyable. Here's a bite:
"But nature itself is clouds, is water, is the outline of continents, is mountains, is bilogical existences. And all of them wiggle. And wiggly things are to human consciousness a little bit of a nuisance, because we want to figure it out."
(Thanks to Steven Kruyswijk for the link.) [Tags: alan_watts everything_is_miscellaneous ]
Posted by self at 10:16 AM | Comments (1) | TrackBack
October 18, 2007
A river runs through it
Dave Winer has come up with a clever way of reslicing the NY Times. Not only does it group articles by keyword, the layout creates a histogram of the topics.
Posted by self at 12:41 PM | Comments (2) | TrackBack
When satisficing is good enough
After years to talking about our move to "good enough" information, I'm just a little late to learning that Herbert Simon coined a term for this phenomenon in 1957. Yes, it's the fiftieth anniversary of "satisficing."
I found this via a very interesting blog post at Just Communicate by a knowledge management grad student who, in the course of discussing the wisdom of Cory Doctorow's Metacrap article, also points to a post by Steven Bell at the Association of College & Research Libraries blog, on using social sites to move good enough research beyond good enough.
Posted by self at 09:19 AM | Comments (2) | TrackBack
October 17, 2007
Everything is miscellaneous explained in a 5 and a half minute YouTube
Michael Wesch, who did the incredible info-visualization YouTube, The Machine Is Us/ing Us, has now done the same to explain the change from paper-based information to digital information. In just a few minutes, he explains the thesis of Everything Is Miscellaneous (which he credits, thank you). It is a brilliant piece of work. And totally delightful.
Posted by self at 10:51 PM | Comments (3) | TrackBack
The miscellaneous is making my eyes bleed
You know what's not helpful? A bill from AT&T that spreads across 56 pages of tiny print the information that explains why my bill is twice as high this month as usual.
You know, if they organized their information in a useful way (which is actually what my sense of the miscellaneous is about), I might even be able to tell that I should up my plan and pay AT&T more money every month. So, how about fewer lists of data — I don't really need to know about each and every text message our children send — and perhaps some notifications of where my usage has swerved off the norm?
Who designs these bills? Squirrels? [Tags: information_architecture, whines]
Posted by self at 10:16 AM | Comments (2) | TrackBack
October 15, 2007
Peer review review
I have a friend who is in charge of managing the peer review process at some serious scientific journals. It's a tough job requiring a set of skills that includes dealing with sometimes ornery people, managing multiple schedules, and expertise in the fields in which she works. She makes a good case for peer review, and for the journals that rely on it. Peer review has value and costs money, she says. So, journals have to charge fees to support the peer review process, and they have to hold onto the rights at least long enough to recover their costs.
I recognize the value of peer review. It not only directs our attention to worthwhile research, it is part of an editorial process that improves articles before they're published. But peer review doesn't scale. There's so much research being done. A lot of it is good work but isn't important enough to merit the investment in a traditional peer review process (including the failed hypotheses that we were taught in school were not failures at all). Peer review is valuable, but it's a choke point required because traditional publishing's neck is so thin. And it may — may! — turn out that the combination of crowds and quirky individuals can replace peer review's value. Of course, we'd want the crowd to consist of people with some standing for evaluating the research. And we'd want to be sure that the quirky individuals who buck the crowd are not delusional psychotics. I of course don't know what the world will look like (or what it does look like, when you come down to it), but I suspect that we're going to have a mixed research ecology, with peer reviewed journals making recommendations we trust highly, and a wide variety of other ways of finding the research that matters to us. With PLoS and PLoS, and arXiv, and Nature's version of arXiv, and all the rest of it, we're already well on the way to filling the important niches in this new knowledge ecology.
In fact, peer review generally establishes two characteristics of a piece of work: It was performed properly and it is important enough to merit throwing some ink at it. Those are important criteria, but hardly the only ones. "This hastily performed work uses a flawed methodology but turns up an interesting fact worth considering" is the type of criterion researchers use when recommending articles to one another. There's value there, and with research that has good data that it misanalyzes, research that is promising but incomplete, research that inadvertently demonstrates a flaw in some lab equipment, etc. etc. etc. And, as always, the value is in the long tail of et ceteras.
Posted by self at 01:36 PM | Comments (6) | TrackBack
October 12, 2007
Auto-tag your blog
Jeremy Wagstaff on Jiglu for auto-tagging your blog and its archive...
Posted by self at 03:38 AM | Comments (0) | TrackBack
Auto-tag your blog
Jeremy Wagstaff on Jiglu for auto-tagging your blog and its archive...
Posted by self at 03:38 AM | Comments (0) | TrackBack
October 10, 2007
My maybe-talk at Veerstichting
I've been working hard on a new presentation, to be given tomorrow at the Veerstichting conference in Leiden, in the Netherlands. After tonight's speakers dinner, I'm thinking maybe the last half (including the Wikipedia portions) of my Everything is Miscellaneous talk would be more suitable. I don't what I'll decide.
Here's the gist of the new talk. I'm going to be sketchy, because I have to go to sleep very soon, but mainly because there's something missing at the talk's core. The title is something like "The Challenge of the Implicit." It's a 20-minute talk.
The Web is best understood as a social realm. But groups (vs. mere groupings) become real when people know more about one another than they can say. For example, I can't tell you much of what I know about my kids. And when you can express a character in just a phrase, the character's been badly written. What makes a group a group is not the lines among the people, but what is unsaid and can't ever be said fully
But computers are monsters of the explicit. That's why in the 1950s they symbolized the mechanizing of relationships. From the beginning, information itself was invented to manage, and thus reduce, complex relationships. Now this poorly defined word (few use it in Shannon's sense) has become an assumed part of how we know our world.We think we're constantly emitting info. E.g., a street scene used to be a river with eddies of public and private. Now it's all info. This has enabled a switch in how we think of privacy, from that which we exclude from the record, to what the authorities are not allowed to pay attention to in the record that now includes everything.
The Web is a disruption in this informationalization. It is built of links, which use language to contextualize relatioships. Links are the opposite of databased information: They enrich rather than reduce, are decentralized, personal, and fundamentally social in that they are written by one person for others to use.
Yet the Web is (in a sense) lousy at the social. It knows about links but not about people or groups. That's why social networking sites are rising so quickly. They internalize the Web, providing the connective features we're used to on the Net (email, IM, etc.).
While groups depend on the implicit, social networking sites start by asking for explicit info about our network and interests. But that's ok because they so quickly transcend those sticks and twine. Real, messy social relations grow. Good!
But: (1) Making things explicit can be highly disruptive. Computers — and software designers — are not always good at this, especially since we don't have good norms yet, and perhaps never will. (2) Much of what's of value in the implicit was created without intending to. There are thus issues about how much we are entitled to make not just explicit but public. (3) The implicit is by its nature messy and connective. It always drags more into the light than it intended. It's thus hard to keep the above issues separate and containable. (4) We have an obligation and an opportunity to increase and preserve the unspoken. Explicitly.
The end.
I'm thinking that this talk is not ready to be presented. Too bad. I've worked hard on it. I guess I'll decide tomorrow morning. Sigh.
Posted by self at 06:30 PM | Comments (1) | TrackBack
Google buys Jaiku
I like Jaiku both because as the second entrant, it learned from Twitter, the first entrant, and because Jyri Engeström is one of those brilliant, sweet people who make the world better in several dimensions at once. (Disclosure: Jyri is a conference buddy.)
It'll be interesting to see where Google surfaces the UI for entering Jaiku microblog posts and where it surfaces the posts themselves.
And most important, of course, is whether Jaiku will be renamed Jaigoo or Jookle.
Posted by self at 01:37 AM | Comments (2) | TrackBack
October 08, 2007
Tags needed
Why oh why aren't there tags in Google Calendar? Oh my sweet Jeebus, I want tags for events! I get so tired ot trying to find every birthday, every speech, every "maybe" event. In fact, I try to use those terms — embedded tags! — in the content itself just so I can find the events again. Please, oh great Google, give us, your unworthy supplicants, calendar tags!
Wordie started out as a joke - a site that was all tags and no content. Now it's added tags. I have to run for a train, so I don't have time to step into its infinite loop of metareference, but John McGrath explains it all here.
Posted by self at 02:40 PM | Comments (3) | TrackBack
October 03, 2007
Harvard moves towards Open Harvard moves towards Open Access scholarship
According to the Harvard Crimson, the Harvard Faculty of Arts & Sciences' governing body has proposed an open access policy according to which faculty members would make their research available for free either on a university site or on their own site. This would be in addition to publishing in academic journals, some of which charge $20,000 a year for a subscription. It'd be an opt-out program. The Harvard Crimson has a good editorial supporting it.
Yay! Locking research up in for-pay journals slows the pace of knowledge. The peer review system -- one important way ideas are vetted -- does not require the existing print publication system. Harvard's move will not only make more information more widely available, it may help nudge the system itself into a form that better serves our species' interests: As more schools adopt open access programs, researchers will have an increasing disincentive not to lock their work up.
I'm actually not sure how this will work, especially with regard to its being opt-out. If I've just had an article accepted by The Journal of Hydroponic Pediatrics. do I then also submit it to the Harvard open access server? If so, in what sense is that opt out?
Obviously, I'm also interested in what sort of metadata and aggregation facilities Harvard will supply to make these articles easily findable.
But what pleasant questions to contemplate!
Posted by self at 08:42 AM | Comments (2) | TrackBack
October 02, 2007
Meta-radio
James Vasile, who just gave a Berkman lunch-time talk, distributed a copy of a brief paper, "Unlock the Rock," which is not yet up on the Web. In it, James suggests that we separate radio into its two functions: DJs who figure out what to play, and the delivery mechanism. Someone should create a plug-in (or sump'in) that lets everyone create playlists using simple HTML, and lets everyone listen to those playlists by scouring multiple sources for the music. So, if you have a copy on your disk, it'll play that. If there's an online distributor that has it available, great. If you have to buy it from iTunes, then it'll let you. Or maybe you have a small p2p network of friends who are sharing music.
Interesting. It'd at least make it difficult to find someone to sue. And the publishers might make some money out of it. And, from my provincial point of view, it'd be a nice case of separating the metadata from the data....
Posted by self at 02:00 PM | Comments (3) | TrackBack
Karen Schneider moves on from ALA blog
Karen Schneider (the Free Range Librarian) is one of those strong-voiced writers who makes a real difference in her domain. Now she is leaving the American Library Association's TechSource blog — which she was instrumental in beginning — in order to follow her writerly instincts. Her last post is a message to librarians that usefully points them toward their fears. [Tags: karen_schneider libraries ala everything_is_miscellaneous]
Posted by self at 07:42 AM | Comments (1) | TrackBack
October 01, 2007
The front page is dead, but not yet quite reborn
I like what Michael Wolff says in his Vanity Fair piece about his new news site:
The metaphor, for 150 years — from print to radio to network to cable — has been the front page: important stuff first. "It should have to do now with falling through something, or floating through the totality of information or of intersecting worlds and interests," offers [Patrick] Spain, not a man wild with his metaphors. [VF, October, p. 126]
I've been saying for a while, and I think in Everything Is Miscellaneous, that the new front page is distributed across our day and our network. Much of it comes through our inbox. It consists of people we know and people we don't know recommending items for our interest.
So, I was disappointed by Wolff's new site, Newser.com. It presents a view of the news that's much less hierarchical than a typical front page, and it's well-designed for quickly finding what matters to you, but: (1) It assumes its nine top-level categories reflect how every reader views the world; (2) Where are our voices? Comments? Blogs? (3) I couldn't let it arise from my social network (where that network includes people I don't know but whose views interest me). It competes with Google News, not with the intersection of Digg and FaceBook, which is what I'm waiting for.
Posted by self at 07:35 AM | Comments (1) | TrackBack
September 30, 2007
Web 2.0 via Web 2.0
Ed Yourdon has created a mother lode of a Google docs presentation that gathers tons of info about Web 2.0. Plus, he's inviting bunches of people to add to it, edit it, put in a nicer background, etc.
Posted by self at 10:37 PM | Comments (3) | TrackBack
September 29, 2007
Picnic O7 presentation and (sort of) debate
Here's a video of the full session I was at at Picnic '07. It includes Walt Mossberg's introduction, my 40 minute keynote (very similar to the presentation that I did at Google, although with a short section on the importance and difficulty of the implicit added, and some references in anticipation of the debate to follow), and then the half hour or so of my debate with Andrew Keen, moderated by Walt M.
I haven't watched the video beyond the first few minutes -- the production quality is high -- but my sense of the debate was that Andrew was on an oddly anti-intellectual track, attacking me as a "professional philosopher," which I'm not (I was an assistant professor of philosophy 22 years ago), and even if I were, why would that be a criticism, especially coming from a guy who is out arguing for the importance of credentialed authorities? Not helpful to discussing the actual topic. Frustrating. My feeling coming out of the discussion over all was indeed frustration. I didn't think we were able to pursue points sufficiently.
BTW, somewhere in my presentation you can see me very carefully get left and right confused. Also, I'm going to plug again my more coherent attempt to explain and evaluate Keen's argument: Andrew Keen's Best Case.
Posted by self at 12:02 PM | Comments (2) | TrackBack
September 25, 2007
The future of content
Martin Weller has an excellent article on the future of content, presenting an economic and a quality argument for why it's bound to be (in my terms) miscellanized.
This is the first in a "distributed blogging" experiment that will have three other bloggers responding.
Posted by self at 06:41 AM | Comments (2) | TrackBack
September 20, 2007
Wines are not miscellaneous
Donna Maurer, an information architect, writes about how she organizes her wine, thereby answering the question: What is the opposite of miscellaneous? But who cares? She is not aiming at organizational purity, although her scheme has the attention to detail that purists often demand. But those details represent the information that matters to her, and her system lets her find and use that information...exactly as you would expect from a leading information architect. A folksonomic, tag-based wine cellar — while a fun concept — is not exactly called for here.
Posted by self at 06:43 PM | Comments (1) | TrackBack
September 15, 2007
NYTimes continues its slow climb to consciousness
My Times is in beta. I'm not sure how much of it I'm getting for free because Times Select comps people at universities. And I haven't played with it extensively. But what I'm seeing I'm liking.
my.nytimes.com lets you choose your feeds. Of course, NY Times material is available, but you could make a page that shows the feeds from the Washington Post, Slate, and BBC and not the NY Times. The site lets you see suggested feeds from various NY Times celebrities. You can add widgets like a Flickr photo browser. You can lay out the page you want. You can add tabs to organize your many feeds. You can even add your own feeds. Plus there's a meta-tab that will take you to Times Topics, taking them from their undeserved obscurity.
It's not perfect, even at first glance. The feeds only show headlines, not any of the text. It doesn't input or output OPML. The feed of the NYTimes columnists only shows the title of their posts, not the names of the authors. There's still no way to comment on the articles, not even a thumbs up or down. The articles don't link to blog posts about them.
Nevertheless, the decision to allow us to aggregate other sources on a page at the nytimes.com domain is a big symbolic deal. [Tags: nytimes media blogs newspapers journalism everything_is_miscellaneous ]
Posted by self at 06:53 PM | Comments (1) | TrackBack
September 13, 2007
Bin Laden word cloud
W. David Stephenson has created a word cloud of the latest bin Laden video. Interesting... [Tags: osama_bin_laden w_david_stephenson everything_is_miscellaneous visualizations ]
Posted by self at 11:53 AM | Comments (0) | TrackBack
September 12, 2007
Non-mashuppable debates, no thanks to Yahoo
From TechPresident's indispensible Daily Digest, compiled by Joshua Levy:
[Tags: politics yahoo mashups everything_is_miscellaneous ]...Wired's ever-diligent Sarah Lai Stirland reports that "Yahoo has decided not to support citizen remixing of the footage — reducing the once-bold experiment to little more than a fancy online version of an on-demand cable television offering." Yahoo had originally planned to upload the raw footage from the debate to its Jumpcut service for citizens to use in their mashups. But now a spokesman told Stirland that Yahoo will only let participants choose what candidates they want to hear from, without the ability to mashup actual footage. "Bloggers will be able to embed the video into their sites, YouTube-style, but will have no easy way to repurpose it," writes Stirland. Those who want to create mashups won't be able to use the simple Jumpcut service, and will instead be forced to download individual videos and use desktop video-editing software. Not nearly as fun.
Posted by self at 11:53 AM | Comments (0) | TrackBack
September 11, 2007
Berkman Lunch: Peter Galison
Peter Galison is a university professor of physics at Harvard. He's giving a Tuesday lunchtime talk. [As always, I'm paraphrasing, getting things wrong, etc.]
Positivists tried to ground knowledge in an accumulation of observations, with a minimum of theory, Prof. Galison says. Science would come in the form of little bricks. The result would be "out of the reach of metaphysical theories." Observation-based science would get better over time.
After WWII, via Thomas Kuhn and others, there was a rebellion against the positivist view. Theory comes first, they said. Science was so framed by theory that what counted as valid observation was dictated by the framework of theory. There is no neutral observation and there's no raw perception outside of the framing provided by our theories. Various theories therefore were not continuous (as for the positivists) but were ships passing in the night...at least according to this point of view.
Example: The positivists saw special relativity as the capstone of a continuum of observation-based theories, while the Kuhnians think Einstein overthrew his predecessors and created a new whole.
Prof. Galison looks at the rhythms of the rise of theories. There are breaks in the strands of experiment, theory and instrument but the breaks don't occur at the same time. And that's to be expected because new instrumentation takes a while to yield new experiments and theories.
Doesn't this just make the Kuhnian predicament worse? Now there are three strands with discontinuities, not just the strand of theort. "How do subcultures of science coordinate? What is shared between experimentalists and theorists, or between instrument makes and theorests...or between a subculture like instrument making and the wider technical world?" When a string theorist want to talk to a bioc
