Joho the Blogjohn wilbanks Archives - Joho the Blog

March 9, 2010

[berkman] John Wilbanks on making science generative

John Wilbanks of Creative Commons (and head of Science Commons) is giving a Berkman lunchtime talk about the threats to science’s generativity. He takes Jonathan Zittrain‘s definition of generativity: “a system’s capacity to produced unanticipated change through unfiltered contributions from broad and varied audiences.”

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

[NOTE: Ethan Zuckerman has posted his far superior bloggage]

ScienceCommons tries to spark the sort of creativity and innovation in science that we find in the broader cultural Net. Scientists often resist the factors that have increased generativity in other realms: Science isn’t very accessible, it’s hard to master, and it’s not very transferable because the sciences exist as guild-disciplines. He says MIT had to build a $400M building to put scientists into the same room so they’d collaborate. There’s a tension, he says, between getting credit for your work and sharing your work. People think that it ought to be easy to build a science commons, but it’s not.

To build a common and increase generativity, John looks at three key elements: data, tools, and text. First, he looks at these from the standpoint of law. Text is copyrighted, but we can change the law and we can use Creative Commons. Tools include contracts and patents. Contracts govern the moving of ideas around, and they are between institutions, not between scientists. Data is mainly governed by secrecy.

The resistance turns out not to be from the law but from incentives, infrastructure, and institutions. E.g. the National Institutes of Health Public Access requires scientists to make their work available on line within 12 months if the scientist has taken any NIH money. Before it was required, only 4% of scientists posted their work. Now it’s up over 70%, and it’s rising. Without this, scientists are incented to withhold info until the moment of maximum impact.

To open up data, you need incentives and infrastructure if you’re going to make it useful to others. People need incentives to label their data, put it into useful formats, to take care of the privacy issues, to carefully differentiate attribution and citation (copy vs. inspiration). So far, data doesn’t have the right set of incentives.

To open up tools, we’re talking about physical stuff, e.g., recombinant DNA. Scientists don’t get funded to make copies. “The resistance is almost fractal,” he says, at each level of opening up these materials.

We need a “domain name system for data” if we’re going to get Net effects. But there’s no accepted data infrastructure on the Web for doing this, unlike Google’s role for text pages.

Science is heading back to the garage, in the Eric Von Hippel sense. [He’s sitting next to me at the table!] You can buy a gene sequencer on eBay for under $1,000. You can go to People around the world are doing this. In SF, a group is doing DIY sequencing, creating toxin detectors, etc. The price of parts and materials are dropping the way memory prices and printer prices did. We need an open system, including a registry, in part because that’s the most responsive way to respond to bad genes made by bad people.

“PC or TiVo for science?” John asks. PC’s are ugly, but they give us more control over our tools and will let us innovate faster.

Q: [salil] You focus on experimental sciences. Are these obstacles present in mathematical and computer sciences? Data and tools are not a big part of math. Not making one’s work available right now in my field counts as a disadvantage. Specialization is an issue (what you call a guild)…
A: Math and physics are at the extreme of the gradient of openness, while chemistry probably sits at the other end. The lower the cost of publishing, the more disclosure there is. So, in math there isn’t as much institutional, systemic resistance because you don’t need a lot of support from an institution to be a great mathematician.
A: Guilds serve a purpose. But when you think about the competency of a system overall, it comes from the abstraction of expertise into tools. In the research sciences, microspecialization has come at the expense of abstraction. But it’s easier and easier to put knowledge into the tools because we can put lots into computers; that won’t revolutionize math, but it will have more of an effect on sciences with physical components. Science Commons stays away from math because it’s working.

Q: [Eric Von Hippel] State of patents?
A: Most of the time in science, patents are trading cards; they’re about leverage and negotiations than about keeping people from using them. If we think about data as prior art, if we funnel it correctly, it becomes harder to get stupid patents. Biotech patents should be dealt with through an robust public domain strategy. “We tend to get wound up about IP, but then you go out in the field and people are just doing stuff.” Copyright is more stressful because patents time out after 20 yrs.

Q: [ethanz] Clearly, the legal response is a tiny part of a larger equation. If you were coming into it now, not trying to put forward this novel legal framework, where would you start?
A: Funders. Starting with the law lets us engage everyone in the conversation, because as the legal group we don’t create text, tools, or data. But we’re focusing on the funder-institution relation. We want funders to write clauses that reserve the right to put stuff into the commons. “If the funders mandate, the universities tend to accept.” Also, it gets easier to do high-quality research outside the big universities. Which means the small schools can do deals with the funders to make their faculty more attractive to the funder. The funder can also specify that the scientists will annotate their data. The funder has the biggest interest in making sure that science is generative.

Q: Then why aren’t funders requiring the data be open?
A: Making data legally open is easy. Making it useful to others is difficult. Curating it with enough metadata, publishing it on the Web, making it machine readable, making it persistent — none of those infrastructures exist for that, with some exceptions (e.g., the genome). So, the Web has to become capable of handling data.
Q: [ethanz] One reason that orgs like CC have been successful is that they put into law something that is a norm on the Web. Math and physics are so open is that they’re open; it’s the norm. The institutional culture within these disciplines has a lot to do with it. How do you shape norms?
A: Carolina Rossini and I have been working on a paper about the university as a curator of norms. CC lets you waive all your rights. We’ve thought about writing a series of machine readable norms like CC contracts but with no law in the middle. E.g., citation is a norm. E.g., non-endorsement is a norm that says that if you use my data, you can’t imply that I agree with you. But the norm that I should mark my data clearly, should have a persistent URL, are things laws can’t govern but should be norms. We use Eric’s ideas here. E.g., branding something with an open trademark.
A: [carolina] We need a bottom up approach based on norms and a top down approach based on law and policy. If you don’t work with both, they will clash.
A: Our lawyer Tim says that norms scale far better than the law. You can’t enforce the law all the time.

Q: [me] “Making the Web capable of handling data”? How? Semantic Web? What scale?
A: It’s a religious question. My sect says that ontologies are human. We should be using standard formats, e.g., OWL, RDF. Some ontologies will be used by communities, and if they area expressed in standard ways, they can be stitched together. From my view: name things in clear and distinct ways. 2. Put them into OWL or other languages in the correct way. 3. Let smart people who need connected data do so, and let them publish. It’ll be a mix of top down standards setting and bottom up hacking. I’m a big SemWeb fan, but I get very scared of people saying that they have THE ontology. It’ll be messy. It won’t be beautiful. The main thing is to make it easy for people to wire data sets together. Standard URIs and standard formats are the only way to do this. We’ve seen this in the life sciences. Communities that need to write big data together treat it the way Linux packages get rolled together into a release. You’ll see data distributions emerge that represent different religions. If it works, people will use it. They’ll be flame wars, license wars, and forking, and chaos, and 99% of the projects will die. You should be able to boot your databases into a single operating system that understands it.

Q: Researchers are incented to make their work available and open. Frequently, institutions get in the way of that. Are you looking at CC-style MTA’s [material transfer agreements]?
A: We published some last year. The first adopter was the Cure Huntingtons Disease and then the Personal Genome Project. We’re going to foundations. We want to get the institutions out of the way, but only the funders can change the experience. NIH requires you to provide a breeding pair of genetically altered mice, kept in a storage facility in Maine [I think]. NIH is moving away from MTAs, going with a you-agree-by-opening agreement.

Q: Privacy?
A: Big issue. Sometimes used as an excuse for not sharing data, but privacy makes the issues we’ve been talking about look simple. It’s a long-term problem. Genomes are not considered as personally identifying, although your license plate is. “There will be a reckoning.” JW’s advice: If you’re dealing with humans, be careful.

Q: Scientists are already overwhelmed by requests. More open, more tagged, means more requests.
A: Yes, we have to design with the negative impacts in mind. We need social filtering, etc. I worry about the scientist in eastern Tennessee or Botswana who’s a genius and can’t get access. If enough of the data is available, maybe you can get a community that answers many of the questions. People generally get into science because they like to talk with people. They’re more likely than most to share. But you have to make it part of the culture that it’s easy. One of the ideas behind the open source trademark concept is that you have to build up a certain amount of karma before I’ll read your email. People are the answer. Most of the time.

Q: Incentives to motivate institutions, but how incentives for individuals to move them in this direction?
A: PLOS was created because Mike Eisner was so pissed at closed journals that he created a business to compete with them. In anthropology, the Society is trying to go more closed, but groups of scientists are trying to go more open access. There’s a battle for the discipline’s soul. Individuals in these institutions are driving it. The key is to get the first big adopters going. Everyone wants to be in the top ten, especially when the first three are Harvard, Yale and MIT. American Chemistry Society is not going to go open any time soon because they make lots of money selling abstracts.

Q: [eric von hippel] I hope you realize how wonderful you all are.