Joho the Blog

March 21, 2014

Reading Emily Dickinson’s metadata

There’s a terrific article by Helen Vendler in the March 24, 2014 New Republic about what can learn about Emily Dickinson by exploring her handwritten drafts. Helen is a Dickinson scholar of serious repute, and she finds revelatory significance in the words that were crossed out, replaced, or listed as alternatives, in the physical arrangement of the words on the page, etc. For example, Prof. Vendler points to the change of the line in “The Spirit” : “What customs hath the Air?” became “What function hath the Air?” She says that this change points to a more “abstract, unrevealing, even algebraic” understanding of “the future habitation of the spirit.”

Prof. Vendler’s source for many of the poems she points to is Emily Dickinson: The Gorgeous Nothings, by Marta Werner and Jen Bervin, the book she is reviewing. But she also points to the new online Dickinson collection from Amherst and Harvard. (The site was developed by the Berkman Center’s Geek Cave.)


Unfortunately, the New Republic article is not available online. I very much hope that it will be since it provides such a useful way of reading the materials in the online Dickinson collection which are themselves available under a CreativeCommons license that enables
non-commercial use without asking permission.

Be the first to comment »

March 18, 2014

Dean Krafft on the Linked Data for Libraries project

Dean Krafft, Chief Technology Strategist for Cornell University Library, is at Harvard to talk about the Mellon-funded Linked Data for Libraries (LD4L) project he leads. The grantees include Cornell, Stanford, and the Harvard Library Innovation Lab (which is co-sponsoring the talk with ABCD). (I provide nominal leadership for the Harvard team working on this.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Dean will talk about the LD4L project by talking about its building blocks. [Dean had lots of information and a lot on the slides. I did a particularly bad job of capturing it.]

Ld4L

Mellon last December put up $1M for a 2-year project that will end in Dec. 2015. The participants are Cornell, Stanford, and the Harvard Library Innovation Lab.

Cornell: Dean Krafft, Jon Corso-Rickert, Brian Lowe, Simeon Warner

Stanford: Tom Cramer, Lynn McRae, Naomi Dushay, Philip Schreur

Harvard: Paul Deschner, Paolo Ciccarese, me

Aim: Create a Scholarly Resource Semantic Info Store model that works within and across institutions to create a network of Linked Open Data to capture the intellectual value that librarians and other domain experts add to info, patterns of usage, and more.

Ld4L wants to have a common language for talking about scholarly materials. – Outcomes: – Create a SRSIS ontology sufficiently expressive to encompass catalog metadata and other contextual elements – Create a SRSIS semantic editing display, and discovery system based on Vitro to support the incremental ingest of semantic data from multiple info sources – Create a Project Hydra-compatible interface to SRSIS, an active triples software component to facilitate easy use of the data

Why use Linked Data?

LD puts the emphasis on the relationships. Everything is related.

Benefits: The connections have meaning. And it supports “many dimensions of nearness”

Dean explains RDF triples. They connect subjects with objects via a consistent set of relationships.

A nice feature of LOD is that the same URL that points to a human-readable page can also be taken as a query to show the machine-readable data.

There’s commonality among references: shared types, shared relationships, shared instances defined as types and linked by relationships.

LOD is great for sharing data. There’s a startup cost, but as you share more data repositories and types, the costs/effort goes up linearly, not at the steeper rate of traditional approaches.

Dean shows the mandatory graphic of a cloud of LOD sources.

Building Blocks

VIVO: Vivo was the inspiration for LD4L. It makes info about researchers discoverable. It’s software, data, a standard, and a community. It connects scientists and scholars through their research and scholarship. It provides self-describing data via shared ontologies. It provides search results enhanced by what it knows. And it does simple reasoning.

Vivo is built on the VIVO/Vitro platform. It has ingest tools, ontology editing tools, instance editing tools, and a display system. It models people, organizations, grants, etc., the relationships among them, and links to URIs elsewhere. It describes people in the process of doing research. It’s discipline-neutral. It uses existing domain terminology to describe the content of research. It’s modular, flexible, and extensible.

VIVO harvests much of its data automatically from verified sources.

It takes a complexity of inputs and makes them discoverable and usable.

All the data in VIVO is public and visible.

Dean shows us a page, and then traverses the network of interrelated authors.

He points out that other institutions are able to mash up their data with VIVO. E.g., the ICTS has info about 1.2M publications that they’ve integrated with VIVO’s data. E.g., you can see research papers created with federal funding but not deposited in PubMed Central.

VIVO is extensible. LASP extended VIVO to include spacecraft. Brown U. is extending it to support the humanities and artistic works, adding “performances,” for example.

The LD4L ontology will use components of the VIVO-ISF ontology. When new ontologies are needed, it will draw upon VIVO design patterns. The basis for SRSIS implementations will be Vitro plus LD4L ontologies. The multi-institution LD4L demo search will adapt VIVOsearch.org.

The 8M items at Cornell have generated billions of triples.

Project Hydra. Hydra is a tech suite and a partnership. You put your data there and can have many different apps. 22 institutions are collaborating.

Fundamental assumption: No single system can provide the full range of repository-based solutions for a given institution’s needs, yet sustainable solutions do require a common repository. Hydra is now building a set of “heads” (UI’s) for media, special collections, archives, etc.

Fundamental assumption: No single institution can build the full range of what it needs, so you need to work with others.

Hydra has an open architecture with many contributors to a common core. There are collaboratively built solution bundles.

Fedora, Ruby on Rails for Blacklight, Solr, etc.

LD4L will create an activeTriples Hyrdra component to mimic ActiveFedora.

Our Lab’s LibraryCloud/ShelfRank is another core element. It provides model for access to library data. Provides concrete example for creating an ontology for usage.

LD4L – the project

We’re now developing use cases. We have 32 on the wiki. [See the wiki for them]

We’re identifying data sources: Biblio, person (VIVO), usage (LibCloud, circ data, BorrowDirect circ), collections (EAD, IRs, SharedShelf, Olivia, arbitrary OAI-PMH), annotations (CuLLR, Stanford DMW, Bloglinks, DBpedia LibGuides), subjects and authorities (external sources). Imagine being able to look at usage across 50 research libraries…

Assembling the Ontology:

VIVO, Open Annotation, SKOS

BibFrame, BIBO, FaBIO

PROV-O, PAV

FOAF, PROVE, Schema.org

CreativeCommons, Dublin Core

etc.

Whenever possible the project will use existing ontologies

Timeline: By the end of the year we hope to be piloting initial ingests.

Workshop: Jan. 2015. 10-12 institutions. Aim: get feedback, make a “sales pitch” to other organizations to join in.

June 2015: Pilot SRSIS instances at Harvard and Stanford. Pilot gather info across all three instances.

Dec. 2015: Instances implemented.

wiki: http://wiki.duraspace.org/display/ld4l

Q&A

Q: Who anointed VIVO a standard?

A: It’s a de facto.

Q: SKOS is considered a great start, but to do anything real with it you have to modify it, and if it changes you’re screwed.

A: (Paolo) I think VIVO uses SKOS mainly for terms, not hierarchies. But I’m not sure.

Q: What are ActiveTriples?

A: It’s a Ruby Gem that serves as an interface for Hydra into a Fedora repository. ActiveTriples will serve the same function for a backend triple store. So you can swap different triple stores into the Fedora repository. This is Simeon Warner’s project.

Q: Does this mean you wouldn’t have to have a Fedora backend to take advantage of Hydra?

A: Yes, that’s part of it.

Q: Are you bringing in GIS linked data?

A: Yes, to the extent that we can and it makes sense to.

A: David Siegel: We have 6M data points from 1.1M Hollis records. LibraryCloud is ingesting them.

Q: What’s the product at the end?

A: We promised Mellon the ontology and instances of LOD based on the ontology at each of the 3 institutions, and search across the three.

Q: Harvard doesn’t have a Fedora backend…

A: We’d like to pull from non-catalog sources. That might well be an OAI-PMH ingest, or some other non-Fedora source.

Q: What is Simeon interested in with regard to Arxiv.org?

A: There isn’t a direct relationship.

Q: He’s also working on ORCID.

A: We have funding to do some level of integration of ORCID and VIVO.

Q: What is the bibliographic scope? BibFrame isn’t really defining items, etc. They’ve pushed it into annotations.

A: We’re interested in capturing some of that. BibFrame is offering most of what we need, but we have to look at each case. Then we communicate with them and hope that BibFrame does most of the work.

Q: Are any of your use cases posit tagging of contents, including by users perhaps with a controlled vocabulary?

A: We’ll be doing tagging at the object level. I’m unsure whether we’re willing to do tagging within the object.

A: [paolo] We assume we don’t have access to the full text.

A: You could always point into our data.

Q: How can we help?

A: We’re accumulating use cases and data sources. If you’re aware of any, let us know.

Q: It’s been hard for libraries to put enough effort into authority control, to associate values comparable across different subject schemes…there’s a lot of work to make things work together. What sort of vocabulary or semantic links will you be using? The hard part is getting values to work across domains.

A: One way to deal with that is to bring together the disparate info. By pulling together enough info, you can sometimes use the network to you figure that out. But in general the disambiguation challenge (and text fields are even worse) is not something we’re going to solve.

Q: Are the working groups institutionally based?

A: No. They’re cross-institution.

[I'm very excited about this project, and about the people working on it.]

Be the first to comment »

March 15, 2014

It’s the 25th anniversary of the Web, not the Internet. It’s important to remember the difference.

I just posted at Medium.com about why it’s important to remember the difference between the Net and the Web. Here’s the beginning:

A note to NPR and other media that have been reporting on “the 25th anniversary of the Internet”: NO, IT’S NOT. It’s the 25th anniversary of the Web. The Internet is way older than that. And the difference matters.

The Internet is a set of protocols?—?agreements?—?about how information will be sliced up, sent over whatever media the inter-networked networks use, and reassembled when it gets there. The World Wide Web uses the Internet to move information around. The Internet by itself doesn’t know or care about Web pages, browsers, or the hyperlinks we’ve come to love. Rather, the Internet enables things like the World Wide Web, email, Skype, and much much more to be specified and made real. By analogy, the Internet is like an operating system, and the Web, Skype, and email are like applications that run on top of it.

This is not a technical quibble. The difference between the Internet and the Web matters more than ever for at least two reasons.

Continued at Medium.com

Be the first to comment »

March 11, 2014

WWPD: What Would Putin Do?

Putin  reacts to a man who kisses his hand

We could attribute this to surprise or even to a democratic instinct except for the adorable “I’m gonna punch you so hard” fist Putin starts to make at the very end.

 


And on a lighter note, here’s Pres. Obama on Between Two Ferns.

3 Comments »

March 8, 2014

I enjoy isometric projection. You all know the isometric cube from video games:

qbert

An isometric cube’s lines are all the same length and shows all three sides equally. It is thus unnatural, assuming that seeing things from a particular perspective is natural.

This makes isometric cubes similar to Egyptian paintings, at least as E.H. Gombrich explains them.

ancient egyptian painting

Paintings in the Egyptian style — face in profile, torso turned out towards us, legs apart and in profile — are unrealistic: people don’t stand that way, just as cubes seen from a human perspective don’t show themselves the isometric way.

Gombrich talks about Egyptian paintings to make a point: our idea about what’s realistic is more infected with cultural norms than we usually think. The Egyptian stance seemed to them to be realistic because it shows the parts of the human form in the view that conveys the most information, or at least what the Egyptians considered to be the most distinctive view.

And the same is true of isometric cubes.

1 Comment »

March 6, 2014

Report from Denmark: Designing the new public library at Aarhus, and the People’s Lab

Knud Schulze, manager of the main library in Aarhus, Denmark and Jne Kunze of the People’s Lab in Denmark are giving talks, hosted by the Harvard Library Innovation Lab. (Here are his slides.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Knud begins by reminding us how small Denmark is: 5.5M people. Aarhus has a population of about 330,000. [My account is very choppy. The talk was not.]

Now that the process of digitizing all information is well underway, the focus is on what can only be experienced in the library. Before, the library was a space for media. Now the space is a medium. Seriousness was prized in libraries. Now a sense of humor. We’ve built libraries with books and other media to serve an industrial society. Some are truly beautiful, but they’re under-used. Now we’re moving to libraries for networked society.

Three and a half years ago, the Danes wrote a report on public libraries in the knowledge society, and went looking for partnerships, which is unusual for the Danes, says Knud. The new model of the library intersects four spaces: inspiration, learning, performative, and meeting spaces. But the question is what people are going to do in those spaces. Recognition/experience, empowerment, learning, innovation. Knud shows pictures of those activities currently going on in the library.

Two hundred of Denmark’s 500 public libraries are “open libraries” — open 24 hours a day, with staffing only about 12 hours a week. If you have a library card, you can open the door. You can check media in and out, use the Internet, use a PC, read newspapers, study, arrange study circles. “The point is to let users take control.”

A law in 2007 said there had to be one-stop shopping for govt services. Most libraries offer these services. You go to the library for a passport, drivers license, health insurance, etc. Every citizen needs to have a personal account for communication with banks, from the state (e.g., about taxes). Libraries have helped educate the citizenry about this.

Often libraries are community centers that involve public and private sectors and a wide range of services. Sometimes the other services overwhelm the library services. “People ask me, ‘Where is the public library in this?’, and I say, ‘Think about the library as the glue.’”

There have to be innovation spaces in the local libraries.

The Danish Digital Library (Danskernes Digtale Bibliotek) is an open source infrastructure for digital objects, including a resouce management system for the whole country, and to purchase digital content. All its digital services are accessible anywhere in the world. 86 of the 98 municipal library systems have contributed to a shared contract for a new library system based on Open Source. They share operations and development. “There’s a very good business case.”

So, why Dokk1, the new library?

Libraries are symbols of development and innovation in the society. They drive city development. They add new stories about the town. All public libraries are examples of the citizens’ interest in innovation. E.g., the Opera, Munch museum and library in Oslo have transformed the waterfront and brought a new identity to the city. Helsinki, Birmingham (UK), and others as well. “The same will happen in Aarhus, we hope.”

DOKK1 is being built into the harbor, “transforming it into an open sea front.” There’s 200,000 sq. feet of library, parking for 1,000 cars, two new urban harbor squares, a light rail station. Cost: US$390M . It will open in early 2015.

The front of the current library features new programs every few months, rather than the entrance being a way of controlling the users. They’ve run projects like iFloor (social interaction), a news lab (producing TV), AI robots, displays that capture and freeze images of people interacting with it, and much more. The building needs to interact with its surroundings and adapt to it, says Knud.

DOKK1 is “no building with an advanced roof.

“It’s all about facilitating relations.” “The library of the future is all about people.” It will be a user-driven process: “From tradition to transcendence so users can deconstruct their old knowledge about libraries.” Knud shows a photo of children doing searches by interacting with blocks on the floor. They paid no attention to the info on the screens.

They have partnerships with the Gates Foundation, Chicago Public Libraries, IDEO, and the Aarhus Public Library

Another project: “Intelligent Libraries”: how to “work smart” by improving logistics. The project knows where all the books are in all the nation’s libraries, and how often they’re used. They use “media hotels”: “local or remote storage of overflow, slow moving materials.”

The name “DOKK1″ came from a competition. 1,250 proposals. Seven were considered by a jury. “It’s about branding the library.” 90% of all city inhabitants should know about the new project. In August 2013 75% did. In the existing library, users are invited to engage in the “mental construction” of the new one.

Now Jane Kunze talks about People’s Lab. She begins with a sign: “Shut up and hack.”

They’ve been setting up labs for the past two years to test different ways of interacting with users. Innovation is important to the Danish govt. (Denmark was just rated the most innovative country in Europe.) How can the public library be part of this?

They were inspired by Maker culture. Fab labs and maker spaces have been popping up everywhere. There’s also a trend in Denmark to repair rather than replace. And a focus on hand skills and not just academic knowledge. Also rapid prototyping, with inspiration from design thinking (as per IDEO).

The People’s Lab is a result of a collaboration among the library, community, and partners. Partners include public libraries, Aarhus School of Architecture, Moesgaard Museum, Roskilde festival, Orange Innovation, and more.

When they began, it was about kick-ass technology. But , while tech is fun, it’s really about people and community-building. “Don’t wait to involve people until your grand opening.” People will see your imperfections “but that’s part of what will make them committed to the place.”

The six labs:

  • TechLab: having a maker in residence is powerful. See Valdemar’s hovercraft:

  • Guitar Lab. Use local people and their passions.

  • Dreamcity: A maker space at the Roskilde rock festival. “You have to put yourself into play. You have to be there with your whole personality, and not just your professional side.”

  • WasteLab: Trash from dump “spiced up with specially selected trash.” “Creativity comes from chaos — stop tidying!”

  • Magentic Groove Memories: cut your own vinyl records and fix up old radios

  • The first maker faire in Aarhus will be 2014

They’ve been building a ladder of involvement, so people can come in for something basic and find themselves increasingly engaged — “small steps that make it possible for people to become more and more free in their thinking.”

They’ve learned that when the community already has hacker spaces and maker spaces, maybe the library should just be a gate to this ecosystem, opening them up to a broader public. Maybe the library is a place where people are introduced to making and working more creatively with their hands. “You can work with maker culture without having a makerspace.” You don’t have to have a room dedicated to machinery, especially for the smaller communities.

Q&A [with six of the Danes responding]

Q: Is this like a library plus the SF Exploratorium

A: Yes.

A: We’re looking at how to create relationships among the patrons, staff, the media…

A: We want to make a place where people get involved in different kinds of competencies.

Q: Many of the other libraries you showed are on the edge of the city. Are you trying to make the library a destination? In Boston I wouldn’t let my 14 yr old grandchild go down to the harbor by himself.

A: In Aarhus, children move through the city at 10-12 yrs old. They can get to the new library by public transportation or bike. But we are trying to transform the city so that it is looking out, not in.

Q: We’re seeing more random innovation in library spaces in this city, as opposed to your carefully planned and articulated change. (1) You’re designers, but it’s about designing the interaction. (2) How can you bring unique, local materials into this interactive environment. (3) At archives, people are now curating their own memories, with a community collective approach. (4) We have generations of professionals, so just building new locations may not change things.

A: In Denmark we have a long tradition of tcollecting of local historical materials. E.g., we have lots of photos of cattle and farms, so we crowd-sourced geolocating them and put on Google Maps. We have a lot of materials that could be used.

A: We have a new project. When you get your grandparents’ old documents, you digitize them and load them on a national server. You’re in control of how open they should be. That’s in test now.

A: We have lot of projects that focus on seniors.

A: At the WasteLab, one of the most active participants was a 70 year old woman. She made herself into the welcoming host. One day she came in with a smart phone she had won. People at the WasteLab sat with her and helped her learn how to use it; she’d found a community to ask. Creating a variety of offers — from more traditional to the newer — involves everyone.

A: We see the library as a space for that kind of relationships.

Q: Are you getting any support from the Royal Library?

A: It has no relationship to public libraries.

Q: Design is crucial. It can signal to people that there’s more here than you expect. Modern libraries send a signal that it’s not only a place for research or study. Putting up those popup labs in your lobby is one of the most useful devices; people are in the experience without having to look for it. It’s the best of what Disney is trying to accomplish. The popup libraries are the gateway drug.

Q: How might this fit into an academic library space?

A: We collaborate with a couple of universities, but they’re two different worlds. University libraries generally see users as people to whom they provide services, rather than as people who can contribute to the library. It’s a question of what the academic libraries want students to do in the library. To read? To learn from other students? You might experiment with a common space to bring together these different communities.

A: You have a lifelong relationship with your local library, but only for a few years with your university library.

Q: Ultimately all libraries are shared resources, whatever those resources are. That’s a great argument for sharing access to all the tools we’ve heard about. Not every library needs its own 3D printer, but they could use access to one.

A: In Norway, a particular university library is divided into five areas, but with big shared spaces with tables, chairs, and menus. Then they put in empty shelves. The room was totally over-crowded and totally re-arranged.

Q: At Tisch Library at Tufts they’re renovating and creating group study space for people working alone but in a public space. Also, they’ve installed a media lab. At the Northeastern U Library, it felt like I was at an airport. There were fixed spaces and terminals, but there must have been 500 students in there. It was like a beehive. At the Madison Public Library they have The Bubbler, media lab and performance space. These are blurring the lines.

[Loved these talks. These folks are taking deep principles and embodying them in their spaces.]

Be the first to comment »

Dan Cohen on the DPLA’s cloud proposal to the FCC

I’ve posted a podcast interview with Dan Cohen, the executive director of the Digital Public Library of America about their proposal to the FCC.

The FCC is looking for ways to modernize the E-Rate program that has brought the Internet to libraries and schools. The DPLA is proposing DPLA Local, which will enable libraries to create online digital collections using the DPLA’s platform.

I’m excited about this for two reasons beyond the service it would provide.

First, it could be a first step toward providing cloud-based library services, instead of the proprietary, closed, expensive systems libraries typically use to manage their data. (Evergreen, I’m not talking about you, you open source scamp!)

Second, as libraries build their collections using DPLA Local, their metadata is likely to assume normalized forms, which means that we should get cross-collection discovery and semantic riches.

Here’s the proposal itself. And here’s where you can comment to the FCC about it.

Be the first to comment »

March 5, 2014

[berkman] Karim Lakhani on disclosure policies and innovation

Karim Lakhani of Harvard Business School (and a Berkman associate, and a member of the Harvard Institute for Quantititative Social Science) is giving a talk called “How disclosure policies impact search in open innovation, atopic he has researched with Kevin Boudreau of the London Business School.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Karim has been thinking about how crowds can contribute to innovation for 17 years, since he was at GE. There are two ways this happens:

1. Competitions and contests at which lots of people work on the same problem. Karim has asked who wins and why, motives, how they behave, etc.

2. Communities/Collaboration. E.g., open source software. Here the questions are: Motives? Costs and benefits? Self-selection and joining scripts? Partner selection?

More fundamentally, he wants to know why both of these approaches work so well.

He works with NASA, using topcoder.com: 600K users world wide [pdf]. He also works with Harvard Medical School [more] to see how collaboration works there where (as with Open Source) people choose their collaborators rather than having them chosen top-down.

Karim shows a video about a contest to solve an issue with the International Space Station, having to do with the bending of bars (longerons) in the solar collectors when they are in the shadows. NASA wanted a sophisticated algorithm. (See www.topcoder.com/iss) . It was a two week contest, $30K price. Two thousand signed up for it; 459 submitted solutions. The winners came from around the globe. Many of the solutions replicated or slightly exceeded what NASA had developed with its contractors, but this was done in just two weeks simply for the price of the contest prize.

Karim says he’ll begin by giving us the nutshell version of the paper he will discuss with us today. Innovation systems create incentives to exert innovative effort and encourage the disclosure of knowledge. The timing and the form of the disclosures differentiates systems. E.g., Open Science tends to publish when near done, while Open Source tends to be more iterative. The paper argues that intermediate disclosures (as in open source) dampen incentives and participation, yet lead to higher perrformance. There’s more exploration and experimentation when there’s disclosure only at the end.

Karim’s TL;DR: Disclosure isn’t always helpful for innovation, depending on the conditions.

There is a false debate between closed and open innovation. Rather, what differentiates regimes is when the disclosure occurs, and who has the right to use those disclosures. Intermediate disclosure [i.e., disclosure along the way] can involve a range of outputs. E.g., the Human Genome Project enshrined intermediate disclosure as part of an academic science project; you had to disclose discoveries within 24 hours.

Q: What constitutes disclosure? Would talking with another mathematician at a conference count as disclosure?

A: Yes. It would be intermediate disclosure. But there are many nuances.

Karim says that Allen, Meyer and Nuvolari have shown that historically, intermediate disclosure has been an important source of technological progress. E.g., the Wright brothers were able to invent the airplane because of a vibrant community. [I'm using the term "invent" loosely here.]

How do you encourage continued innovation while enabling early re-use of it? “Greater disclosure requirements will degrade incentives for upstream innovators to undertake risky investment.” (Green & Scotchmer; Bessen & Maskin.) We see compensating mechanisms under regimes of greater disclosure: E.g., priority and citations in academia; signing and authorship in Open Source. You may also attract people who have a sharing ethos; e.g., Linus Torvalds.

Research confirms that the more access your provide, the more reuse and sharing there will be. (Cf. Eric von Hippel.) Platforms encourage reuse of core components. (cf. Boudreau 2010; Rysman and Simcoe 2008) [I am not getting all of Karim's citations. Not even close.]

Another approach looks at innovation as a problem-solving process. And that entails search. You need to search to find the best solutions in an uncertain space. Sometimes innovators use “novel combinations of existing knowledge” to find the best solutions. So let’s look at the paths by which innovators come up with ideas. There’s a line of research that assumes that the paths are the essential element to understand the innovation process.

Mathematical formulations of this show you want lots of people searching independently. The broader the better for innovation outcomes. But there is a tendency of the researchers to converge on the initially successful paths. These are affected by decisions about when to disclose.

So, Karim and Kevin Boudreau implemented a field experiment. They used TopCoder, offering $6K, to set up a Med School project involving computational biology. The project let them get fine-grained info about what was going on over the two weeks of the contest.

700 people signed up. They matched them on skills and randomized them into three different disclosure treatments. 1. Standard contest format, with a prize at the end of each week. (Submissions were automatically scored, and the first week prizes went to the highest at that time.) 2. Submitted code was instantly posted to a wiki where anyone could use it. 3. In the first week you work without disclosure, but in the second week submissions were posted to the wiki.

For those whose work is disclosed: You can find and see the most successful. You can get money if your code is reused. In the non-disclosure regime you cannot observe solutions and all communications are bared. In both cases, you can see market signals and who the top coders are.

Of the 733 signups from 69 different countries, 122 coders submitted 654 submissions, with 89 different approaches. 44% were professionals; 56% were students. The skewed very young. 98% men. They spent about 10 hours a week, which is typical of Open Source. (There’s evidence that women choose not to participate in contests like this.) The results beat the NIH’s approach to the problem which was developed at great cost over years. “This tells me that across our economy there are lots of low-performing” processes in many institutions. “This works.”

What motivated the participants? Extrinsic motives matter (cash, job market signals) and intrinsic motives do too (fun, etc.). But so do prosocial motives (community belonging, identity). Other research Karim has done shows that there’s no relation between skills and motives. “Remember that in contests most people are losing, so there have to be things other than money driving them.”

Results from the experiment: More disclosure meant lower participation. Also, more disclosure correlated with the hours worked going down. The incentives and efforts are lower when there’s intermediate disclosure. “This is contrary to my expectations,”Karim says.

Q: In the intermediate disclosure regime is there an incentive to hold your stuff back until the end when no one else can benefit from it?

A: One guy admitted to this, and said he felt bad about it. He won top prize in the second week, but was shamed in the forums.

In the intermediate disclosure regime, you get better performance (i.e., better submission score). In the mixed experiment, performance shot up in the second week once the work of others was available.

They analyzed the ten canonical approaches and had three Ph.D.s tag the submissions with those approaches. The solutions were combinations of those ten techniques.

With no intermediate disclosures, the search patterns are chaotic. With intermedia disclosures, there is more convergence and learning. Intermediate disclosure resulted in 30% fewer different approaches. The no-disclsoure folks were searching in the lower-performance end of the pool. There was more exploration and experimentation in their searches when there was no intermediate disclosure, and more convergence and collaboration when there is.

Increased reuse comes at the cost of incentives. The overall stock of knowledge created is low, although the quality is higher. More convergent behavior comes with intermediate disclosures, which relies on the stock of knowledge available. The fear is that with intermediate disclosure , people will get stuck on local optima — path dependnce is a real risk in intermediate disclosure.

There are comparative advantages of the two systems. Where there is a broad stock of knowledge, intermediate disclosure works best. Plus the diversity of participants may overcome local optima lock-in. Final disclosure [i.e., disclosure only at the end] is useful where there’s broad-based experimentation. “Firms have figured out how to play both sides.” E.g., Apple is closed but also a heavy participant in Open Source.

Q&A

Q: Where did the best solutions come from?

A: From intermediate disclosure. The winner came from there, and then the next five were derivative.

Q: How about with the mixed?

A: The two weeks tracked the results of the final and intermediate disclosure regimes.

Q: [me] How confident are you that this applies outside of this lab?

A: I think it does, but even this platform is selecting on a very elite set of people who are used to competing. One criticism is that we’re using a platform that attracts competitors who are not used to sharing. But rank-order based platforms are endemic throughout society. SATs, law school tests: rank order is endemic in our society. In that sense we can argue that there’s a generalizability here. Even in Wikipedia and Open Source there is status-based ranking.

Q: Can we generalize this to systems where the outputs of innovation aren’t units of code, but, e.g., educational systems or municipal govts?

Q: We study coders because we can evaluate their work. But I think there are generalizations about how to organize a system for innovation, even if the outcome isn’t code. What inputs go into your search processes? How broad do you do?

Q: Does it matter that you have groups that are more or less skilled?

A: We used the Topcoder skill ratings as a control.

Q: The guy who held back results from the Intermediate regime would have won in real life without remorse.

A: Von Hippel’s research says that there are informal norms-based rules that prevent copying. E.g., chefs frown on copying recipes.

Q: How would you reform copyright/patent?

A: I don’t have a good answer. My law professor friends say the law has gone too far to protect incentives. There’s room to pull that back in order to encourage reuse. You can ask why the Genome Project’s Bermuda Rules (pro disclosure) weren’t widely adopted among academics. Academics’ incentives are not set up to encourage automatic posting and sharing.

Q: The Human Genome Project resulted in a splintering that set up a for-profit org that does not disclose. How do you prevent that?

A: You need the right contracts.

This was a very stimulating talk. I am a big fan of Karim and his work.


Afterwards Karim and I chatted briefly about whether the fact that 98% of Topcoder competitors are men raises issues about generalizing the results. Karim pointed to the general pervasiveness of rank-ordered systems like the one at TopCoder. That does suggest that the results are generalizable across many systems in our culture. Of course, there’s a risk that optimizing such systems might result in less innovation (using the same measures) than trying to open those systems up to people averse to them. That is, optimizing for TopCoder-style systems for innovation might create a local optima lock-in. For example, if the site were about preparing fish instead of code, and Japanese chefs somehow didn’t feel comfortable there because of its norms and values, how much could you conclude about optimizing conditions for fish innovation? Whereas, if you changed the conditions, you’d likely get sushi-based innovation that the system otherwise inadvertently optimized against.


[Note: 1. Karim's point in our after-discussion was purely about the generalizability of the results, not about their desirability. 2. I'm trying to make a narrow point about the value of diversity of ideas for innovation processes, and not otherwise comparing women and Japanese chefs.]

1 Comment »

February 26, 2014

Facebook provides more this-like-that instead of this-oh-that! (Or relevancy, interestingingness, and serendipity)

Facebook has announced that it’s going to start adding to your newsfeed stories that you don’t know about but that are on the same topic as ones you follow. As their post puts it:

Now, when a Page tags another Page, we may show the post to some of the people who like or follow the tagged Page.

So close.

In the late 1990s and early Oughties, the size of material being indexed by search engines busted the main metrics. Precision measured how many results of a query pertained to that query — how “noisy” the results are. Recall measured how many of the pertinent results were missed by the results list. But when you are indexing hundreds of billions of pages, total recall results in a noisy list because there are so many results that you can’t find the one that’s most relevant. Thus relevancy became much more important than before.

But even relevancy doesn’t cut the mustard when you are browsing the hay more than looking for the needle. Thus, over the past ten years or so we’ve seen interestingness become important in some environments. Sorting Flickr search results by interestingness turns up some of the most striking photos.

drops on a needle example of interestingness at Flickr

Search for “needle,” sorted by “Interesting” at Flickr (cc-by-nc-sa dmelchordiaz)

Reddit‘s community upvoting mechanism results in a front page that reflects not precision, recall, or relevancy, but interestingness. Reddit’s front page also illustrates that when we ask for results sorted by interestingness, we apparently tolerate far more noise than with any of the other three metrics.

These four criteria obviously each have circumstances in which they have value. If you know what you’re looking for, precision counts. If you need to do a complete review of the literature, or just need to cover your backside — an “Oh crap I didn’t come across that” moment is not permissible — then recall is your friend. If you are finding your way through a new topic, then relevancy will give you a feel for the terrain. But if you want to find something that will stimulate and amaze you, click on the interestingness button.

Facebook has opted for relevancy. This makes sense for them from an economic standpoint: You will be a happy Facebooker if you are shown stuff you didn’t know about that conforms to your existing interests and values. In their blog post explaining the change, Facebook takes as their paradigmatic example showing you a post of a photo captioned “James Harden and Dwight Harden throw down some sick dunks during practice” because you “follow or like Dwight Howard.” Highly relevant. And if Facebook started showing its users posts as noisy as what you get on the Reddit homepage or from a Flickr stream sorted by interestingness, its users would likely revolt.

So, I understand how this new move makes for happier users and thus makes Facebook richer and safer.

But…

It’s a missed opportunity for helping to break us out of our “filter bubble” — Eli Pariser’s term for always being shown items that too closely reflect our existing interests and worldview, and that therefore confirm that worldview rather than expanding it. (See Eli’s excellent TED Talk.) It would have been far more helpful if Facebook had chosen to expand our worldview through interestingingess rather than reinforce it through relevancy.

Interestingingness is the key to serendipity, a term that, like precision and recall, doesn’t scale very well. Those who call for greater serendipity are trusting too much in the randomness now that the domain of possibilities is so huge. For example, one could create a site (which means that it’s already been created) that uses truly random ways to create a set of links to Web pages. Randomized Page Roulette. But how long do you think you would spend visiting those pages if they’re truly random? The list would be serendipitous but highly unlikely to be either relevant or interesting.

So, instead of serendipity, think about how Facebook could provide us with interesting links instead of links it knows we’ll like. It could use its awesome Social Graph to guess at enticing content that is outside our normal interests. These links would would have the sort of appeal that Reddit does, especially if it were marked as a stab and what you’ll find interesting rather than as stuff FB is confident you’ll like.

These links would be a powerful addition to Facebook’s value, for nothing is more stimulating to us than the discovery of something unexpectedly interesting or, even better, the discovering a new unexpected interest.

Most important from my point of view as a non-shareholder in Facebook, it would use what Facebook knows about us to expand our vision rather than adding another brick to the walled garden of our existing interests.

3 Comments »

February 22, 2014

Releasing an Independent Record: The 1994 Version

For $3 at a library book sale I picked up a copy of Releasing an Independent Record, revised 4th edition, by Gary Hustwit, published in 1994 by Rockpress Publishing Co. The short review is: Times have changed.

Gary’s advice is that if you want to get your music out, don’t go to one of the existing labels. Start your own. In 1993, that was pretty radical even though it required you to emulate the major labels’ processes, albeit starting from scratch and with no budget. So,the bulk of Gary’s manual is a directory of the services you’ll need to hire. He assumes you’ve already got a tape of your music. So, now you need to find a tape duplication house. You also need to get the paperwork done to set up your label’s bank account, and don’t forget the rubber stamp: “Depending on what formats you release, you’ll need a ton of different sized envelopes, and stamping the return address is easier than having them printed or writing it by hand.”

There are also handy, multi-page lists of the press to contact and the local radio stations (remember them?) to flog your songs to. And booking agents and promoters. And record labels so you can “See if your label name is already taken.” Oh, and you might want to check “if they’re interested in licensing your record.”

A quick google reveals that Gary is now a director of documentaries. I saw and liked Helvetica, and Objectified is on my Netflix list.

 


On the last page, there’s an ad for Rockpress’ other four books. My favorite is Hell on Wheels, by Greg Jacobs:

A compilation of tour stories from 40 bands, including ALL, aMINIATURE, Babes in Toyland, Big Drill Car, Buck Pets, Buffalo Tom, Butthole Surfers, Cadillac Tramps, Chune, Circle Jerks, Coffin Break, The Cult, Descendents, Doughboys, The Dwarves, Ethyl Meatplow, fIREHOSE, The Germs, God Machine, Kill Sybil, King Missile, L7, Luscious Jackson, Mary’s Danish, Melvins, Minutemen, Naked Raygun, Overwhelming Colorfast, Popdefect, Rockets from the Crypt, Screaming Sirens, Skin Yard, Superchunk, Supersuckers, Surgery, UK Subs, and X.

I recognize a couple —it’s not my demographic, people — but that list’s got a bit of Key and Peele about it, don’t you think?

Be the first to comment »

« Previous Page | Next Page »


Switch to our mobile site