Joho the Blogfolksonomy Archives - Page 2 of 3 - Joho the Blog

September 28, 2008

University home page word cloud

Matt Pasiewicz at Educause has created a word cloud out of 1,000 university home pages. Nothing too surprising, but interesting nonetheless.

[Tags: ]

Comments Off on University home page word cloud

June 24, 2008

Berkman lunch: Karim Lakhani and Ned Gulley on collaborative innovation

Karim Lakhani of Harvard Business School and Ned Gulley of MathWorksMathLab are giving a Berkman talk called “The Dynamics of Collaborative Innovation: Exploring the tension between knowledge novelty and reuse.”

Karim begins by looking at research by Meyer on the airplane’s hidden collaborative history: It didn’t spring whole cloth from the brow of the Wright brothers. E.g., Chanute served as a hub for pre-Wright research and innovation. The Wright brothers actively corresponded with him. Once the Wright brothers patented their inventions, innovation moved to Europe (which is why so many of our aviation terms are French … l’fusilage, anyone?).

Ned talks about the contest MathLab (where he works) runs every six months– sixteen times so far — designed to encourage the free flow of ideas. It’s a week-long open collaborative competition for MATLAB programmers. Entries are displayed, scored, and ranked immediately. Anyone can modify anyone else’s code and resubmit it as their own. The leader is determined objectively by putting it through some hidden tests that judge its efficiency. (They don’t make the optimization suite public because they don’t want people to “game” it.) The prize is a t-shirt or baseball cap, although the real prize is reputation.

Ned shows a graph of entries and processing times. It’s quite a dramatic set of cliffs. On the other hand, there are lots of dots representing people who make “improvements” that aren’t improvements. This may be people with bad ideas or people whose ideas happen not to work the way MATLAB prefers.

The winning entries on average have contributions from 30 people. Ned says that when some code leaps ahead, you’ll see “splash” as tweakers try to improve it marginally, often making it marginally worse.

Q: In the commercial realm, what happens when an early innovator patents it?
You don’t get collaborative innovation.

People name their entries, and sometimes sell social signals with them: “Tweakfest” or “I wish I knew how this works.”

Ned says that if a chicken is only an egg’s way of making another egg, then a hacker is only code’s way of making more code.

Karim talks about some statistical analysis of entries into the contest. He looks at how many lines an entrant borrows and how many times the entry’s reused. There is a power law distribution: A few lines are used thousands of times, but most are used zero to three times. His analysis shows that when it comes to entries that become leaders, borrowing pays off more than novelty.


Q: Have coders evolved in these games?
Yes. More collaborative. And more sophisticated in their gaming of the contest.

Topcoder.com uses this model to develop code solving practical problems. [Tags: ]

3 Comments »

June 10, 2008

Britannica tweaks the wiki

Britannica has announced that it’s going to enable some measure of reader participation in the extending of the online version of their encyclopedia. You can see the beta of the new site here.

The detailed overview of the planned site says:

two things we believe distinguish this effort from other projects of online collaboration are (1) the active involvement of the expert contributors with whom we already have relationships; and (2) the fact that all contributions to Encyclopaedia Britannica’s core content will continue to be checked and vetted by our expert editorial staff before they’re published.

Excellent! We needs lots of variations on the theme of collaboration. Editing and expertise add value. They slow things down and reduce the ability to scale, but Wikipedia’s process makes it possible to read an article that’s been altered, if only for a minutes, by some devilish hand. It all depends on what you’re trying to do, and collectively we’re trying to do everything. So, this is good news from Britannica. It’ll be fascinating to watch.

To pick a nit, I’m not as convinced by Britannica’s insistence on objectivity as a value, however. The blog post says “we believe that the creation and documentation of knowledge is a collaborative process but not a democratic one.” It lists three positive consequences of this. The third is “objectivity, and it requires experts.” In a reference that makes you wish they’d at least once use the word “Wikipedia,” the post continues: “In contrast to our approach, democratic systems settle for something bland and less informative, what is sometimes termed a ‘neutral point of view.'” I think it would be reasonable for Britannica to tell us that an expert-based, edited system is likely to yield articles that are more comprehensive, more uniform in quality, more accurate and more reliable. But haven’t we gotten past thinking that expertise yields objectivity?

Anyway, I think it’s amazing that the Britannica, in its 240th year, is taking this step. Britannica will be better for it, and so will we. [Tags: ]

6 Comments »

May 31, 2008

Scan and Release: Digitizing the Boston Public Library

I’ve lived in Boston since 1986, but have never made it into the great Boston Public Library. Until today. My streak was totally broken because the little group digitizing the BPL’s holdings invited me in to see what they’re doing. And, oy, the work they have cut out for them!

But they’re an intrepid band. And they recognize that they’re up to something important. Although some in the BPL may have thought that digitized prints and photos are just lesser-qualities backups, the group knows that they’re not only bringing hidden images into the public sun, they are engaged in a social project that changes how and what we know. (What’s not to love about librarians?)

The Print Stack, where photos, prints and miscellaneous other objects are stored, only seems to be in the basement. The ceiling is low, there are no windows, and the lighting leaches vitamin D out of your body. It’s long and overflowing, reminiscent of the warehouse that ends Citizen Kane, and that is echoed in two Indiana Jones movies.

Boston Public Library storage area
Boston Public Library Print Stack

If you want to find a particular image in the roughly two million prints and images (no one knows for sure), you ask Aaron. Some bits and portions have catalogs of various sorts, but overall, it’s a disarray of metadata. For example, the Herald Traveler collection of photos has about 1.2 million pieces, arranged in 104 cabinets, each with four drawers. The folders and drawers are labeled, which helps a lot, but they’re not indexed, much less cross-indexed.

Herald Traveler collection in file drawer
Herald Traveler collection

At least those photos have captions. Aaron shows me some beautiful 19th century photographs of Indian architecture. Many years ago, the BPL went to enormous trouble to paste the photos into multiple volumes — turning the photos into a book, as Aaron points out — but didn’t bother to record the notes on the back of the photos. Aaron is now going to have to dissolve the pages to expose the notes.

Eroded negative
Aaron holds up a degraded negative.
A dirigible is barely visible on it.
Tough reclamation project.

The archive doesn’t just have pictures and prints. It’s got, well, everything, including a couple of old typewriters and a collection of matchbook covers from Boston restaurants.

matchbook covers
Boston matchbook cover collection

Of this abundance, the digital group has so far scanned about 24,000 objects. When I point out to Maura Marx, the group’s head, that, given the library’s estimate that it has maybe 23 million objects, she’s looking at a 2,000 year project, she tells me that they’re just getting started. They’re going to bulk up, maybe do some offsite digitizing, and begin to make some serious progress. When I ask Thomas Blake, who does the actual digitizing, how he decides which stuff to do, he laughs a little and says, “What I think is cool.” And, since the public has an appetite for “choochoo trains, maps and postcards,” he’s done a bunch of them. The BPL is, after all, a public institution that both serves the public and relies upon the public’s support.

stacked volumes

The Library has been posting digitized works at Flickr. Take a look at the 19th century photos of Egypt, or, yes, the postcards And the book fetishists among you should definitely check out the “Art of the Book” collection. Predictably and hearteningly, the public — you and me, sister — have been commenting and adding to what’s known. Maura hopes to get permission to put the images into the Commons. Digitizing and posting — “scan and release,” in the group’s memorable way of putting its mission — turns patrons into historians.

The scanning is slow because it’s one guy who’s doing a careful job. The camera has a 22 megapixel chip, but they’ve been known to digitize at 88mps, creating files that are half a gig in size. Tom likes saving the RAW files to avoid unnecessary data loss. You never know what’s going to be useful. For example, he had been scanning postcards at 300 dpi, but a curator pointed out that then you couldn’t see the dotscreen pattern, which might be of interest to someone. So now Tom scans them at 600dpi. Overall, they have about 1.5 terabytes of stored images.

The metadata is a whole ‘nother issue. Chrissy Watkins, who has been there for four days — she had been at the JFK Presidential Library — is working on it. For now, Tom gives every item an arbitrary and unique ID number, the key piece of any metadata scheme. But the BPL is facing the inevitable conundrum: Maximize the metadata but slow the process, or gather less metadata but go at a far faster clip. The group seems to be leaning toward the latter, which makes sense to me. They’ve been using what Tom calls the “Curator Core,” a reference to the Dublin Core metadata standard for books. Trying to capture everything that might be useful is a task beyond daunting. For example, Michael Klein points to “fore-edge paintings,” paintings done on the edges of a book that are revealed when you fan the book slightly. Does the BPL have to come up with a standard that includes whether you fan the book to the left or right? There are so many different types of objects that building a standard or an ontology that captures them all would absorb all of the team’s time. (“The special case is not as special as you’d think,” says Michael.) Instead, they need to scan scan scan, and capture some reasonable set of metadata, to which more metadata can accrete.

OCA
One of the ten Open Content Alliance book scanners.

“We’re going from collect and hide to scan and release,” says Tom. And in so doing, the until-now unpublished holdings are going not just from no value to some value. The digital group is in fact radically multiplying the value of the Boston Public Library’s holdings. And as we the recipients of this gift incorporate the images, adding information to them, and contextualizing them, we are further enriching the holdings, far beyond what any small group, no matter how intrepid, could manage.
[Tags: ]

4 Comments »

April 23, 2008

Tags vs. identity politics

Ike Piggott posts about the effect of tags ‘n’ such on identity politics. Nicely done. (And, if I may say be so self-centered he seems unknowingly to be channeling Everything Is Miscellaneous.)

[Tags: ]

1 Comment »

April 3, 2008

[topicmaps] Lars Helgeland on Topic Map-driven Web sites

Lars Helgeland says that Ted Nelson called Web sites “decorated directories.” The Web has failed the expectations of the Web’s visionaries, Lars says. Topic Maps can help.[Caution: Live-blogging]

Web sites have become reflections of their technical structure, which is usually hierarchical. Knowledge is not natively hierarchical. Knowledge works through people associating ideas.

Lars shows examples of sites redesigned using topic maps; they use the knowledge representation of topics maps without using the familiar circles-and-lines display. “We need to see portals as layered architecture, where content is independent of both presentation and the underlying technology structures.”

[Tags: ]

Comments Off on [topicmaps] Lars Helgeland on Topic Map-driven Web sites

[topicmaps] Alex Wright

Alex Wright is keynoting the Topic Maps conference in Oslo. [I’m live blogging, getting things wrong, etc.]

Europe has been thinking about organizing information for a long, long time, he says. He goes basck to Thomas Aquinas who thought the two pillars of memory: Association and order. He likens “memory palaces” to topic maps. [Hmm. The associations weren’t topical, as I understand them.] He fast-forwards to Charles Cutter who invented a book cataloging system and foresaw in 1883 the day when clicking on a reference would retrieve the object. [Cutter numbers are routinely added to Dewey Decimal numbers in library catalogs.] H.G. Wells in 1938 foresaw an infrastructure for sharing info electronically. Teilhard de Chardin wrote about the “noosphere.” [It’s been a long time since I read him, but I recall the noosphere as a spiritual realm, not a tech realm. I could be entirely wrong.]

Alex points especialy to Paul Otlet, a Belgian who thought libraries were too fixed on books. Rather, we should be thinking about the structure of information within and across books. There’d be an underlying classification scheme, represented in index cards, pointing to books. He tried to actually build this, starting in 1921. He invented the “Uniersal Decimal Classification” scheme. The UDC was designed to classify the info inside of book. Auxiliary Tables marked relationships between topics, i.e., typed links. [The Web only succeeded because it let the typing of links be accomplished by the words around it.] He also had the idea of a social space around information.

Alex visited the Mundaneum — an Otlet museum — a few days ago and shows photos. Very cool. They’ve only managed to catalog a tenth of the collection in the past ten years.[Pretty good argument against Otlet’s idea. It doesn’t scale.] He shows pictographic representatives showing how info can be remixed and browsed.

Alex points to facetag, an Italian project that uses faceted classification that are established at the toplevel. Within that, users assign their own tags. Also vote-links puts meaning into hyperlinks.

Next Alex turns to Vannevar Bush and “How We May Think,” the essay that proposed the memex. In some ways, it was more sophisticated than the Web, he says. E.g., whe you made a link, it was visible in both directions. And the trails should be public so there could be collective intelligence.

Eugene Garfield was inspired by Bush and founded the Science Citation Index, which ranked citations. Doug Engelbart was also inspired by Bush. (He recommends Englebart’s “mother of all demos” demo, which is indeed truly amazing.) Engelbart was concerned with tools for group colaboration, process hierarchies, and multi-level nesting of organizational knowledge. He points quickly also to Xero PARC’s “note cards,” Apple’s Hypercards, Ted Nelson, Andries van Dam, and others. When the Web became dominant, Alex says, a lot of promising prior research dried up, which is a shame.

Thje Web that wasn’t” Tying top-down taxomonies with bottom up social space; two say linking; visible pathways; typed associations…

[Terrific talk. Great to hear some history. [Tags: ]

5 Comments »

April 1, 2008

Thoughtcloud scrapes neurons

The Media Re:Public group at Berkmanhas announced a breakthrough technology that promises to take the “conference” out of “un-conference.”

Comments Off on Thoughtcloud scrapes neurons

March 17, 2008

What tagging loses

Library of Congress Reference Librarian Thomas Mann has a long, detailed and fierce argument against the LC Working Group on the Future of BibliographicControl. He is quite specific about what will be lost to scholars with the Working Group’s more folksonomic approach.

Much of what I’ve read so far points to the huge amount of information contained in the existing LC Subject Headings and their cross references, and how well they can convey to a scholar a lay of the land she is researching. (I don’t know why we’d want to throw out the LCSH instead of supplementing them with yet more metadata.) I haven’t read the entire piece yet, but what I’ve seen is fascinating, learned and will, I hope, occasion a productive debate.

[Tags: ]

4 Comments »

TopicMaps in Oslo

April 2-4, I’m going to TopicMaps, a conference that may be particularly interesting (to people who are particularly interested in it, of course):

The basic idea is simple: the organizing principle of information should not be where it lives or how it was created, but what it is about. Organize information by subject and it will be easier to integrate, reuse and share – and (not least) easier for users to find. The increased awareness of the importance of metadata and ontologies, the popularity of tagging, and a growing interest in semantic interoperability are part and parcel of the new trend towards subject-centric computing.

The organizers have let it be known that there’s still room… [Tags: ]

3 Comments »

« Previous Page | Next Page »