Joho the Blog » [berkman] Timo Hannay on Web science
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

[berkman] Timo Hannay on Web science

Timo Hannay is director of Web publishing at Nature magazine. His job is to help try to “make the most of the Web as a scientific communications medium.” He’s giving a Tuesday lunch talk here at the Berkman Center.

He says that most scientists think about the Web in terms of open access. But he’s not going to talk about that today, only because he wants to talk about some longer-term trends. He does think open access is incredibly important. He thinks it will happen primarliy through mandatory archiving into accessible repositories. 70% of scientists didn’t respond to a request to put their articles into an accessible repository, indicating that they don’t know or don’t care about open access. [I am, as always, paraphrasing, and typing quickly.]

But the Web is more important than a cheap way to ship PDFs around. It can redefine scientific publishing and how science is done.

Scientific publishing is dominated by journals and databases. They tend not to talk with one another. But the chemical structures discussed in Nature Chemical Biology are entered into PubChem, an NIH database. Likewise, another site renders molecules into 3D and makes them available. But, as journals have moved online, they are becoming databases themselves. The articles are themselves structured entries, including article metadata, scientific metadata (e.g., which chemical entities, proteins, genes, etc. are described in the paper), structured data sets (e.g., System Biology Markup Language) accompanying articles, and more structure within the articles themselves (e.g., identifying genes as they are discussed) , including interactive figures that have the data underneath it. (The interactive figures are not yet online, he says.)

Likewise, he says, databases are starting to do peer review, further merging the publishing and database models. E.g., AFCS Nature The Signaling Gateway. (It uses Digital Object Identifiers — a persistent unique ID so that the database entries are citable.) The content is structured and always updated. Journals and databases are becoming more like one another.

Peer review is ready to undergo revolution. Peer review already means many things to many journals, from a staff editor reading through a paper to full review by experts with multiple rounds of revision. We’ll see an even greater diversity of models. The increase in the rate of experimentation has been stimlulated by ideas such as openness, the wisdom of crowds, etc. E.g., Plus One (launching next month) supplements peer review with post-publication commentary. Even traditional pubs such as Cell now enable comments on papers post-publication.

Nature ran an open peer review trial June-Sept. this year. (There was as useful “web debate” about peer review as well.) One third of papers submitted are accepted for review. During the trial, authors could agree to have the paper posted for public review, with the editors still making the decisions. The results aren’t fully in because some of the papers are still in process. But Timo says that 73 papers went through, which is about 5-10% of papers submitted, across a broad range of subject areas. There were only 99 comments. About half didn’t get any comments at all. As far as Timo knows, none of the public comments influenced editors’ decisions.

Timo talks about a system that analyzes papers at arxiv.org, notes the citations, and sends the paper out to the citations for comment and review.

Publish then filter or filter than publish? The Web likes the former. That’s quite controversial, Timo says.

Timo talks about blogging. It’s like publishing papers but “massively quicker,” he says. Nature has only found a few hundred by scientists about science for scientists. Timo thinks this is because scientists don’t get credit for publishing on their blogs. And “a lot of scientists are horrified by the thought of publishing things that haven’t been peer reviewed.” Timo says blogs are peer reviewed, but after they’re published. “I think scientific blogging will take off, just as it has in law and economics, because blogging is a remarkably efficient way of exchanging ideas.” He says the incentives—or the generation—needs to change.

Timo points to Postgenomic.com, which aggregates scientific blogs “and does useful and interesting things with it” (as the site says).

He talks about e-science. “Science, especially biology, has been going through a massive transition from a cottage industry…every stage of the process happening within one lab” to an industrial model where big groups specialize. E.g., the genome. This is becoming more open. “The Web enables big groups, multinational groups, to come together” and to put the results up on the Web for everyone to use. But the incentives haven’t caught up, so you tend to see it in big groups that are funded. “You don’t get explicit credit for gathering exquisite data or for coming up with the brilliant algorithm.” It’s always been like that—we know Einstein but not Michelson—but it would be good to change it. That would help collaboration on the small scale. Exceptions: Open WetWare and UsefulChem put info into a wiki. Science isn’t used to this, says Timo, because “it’s like doing science in the nude”: It exposes scientists to embarrassment because what they’re posting may not be finished, perfect or right.

Another characteristic of escience: Use of open identifiers to identify scientific objects. (Timo references Clay Shirky’s Ontology is Overrated.) Chemical numbers are owned by the ACS. An alternative is InChi, which is open and public; you can compute the InChi from the chemical structure.

Timo talks about the role of monitors in the environment. Citizens can contribute to this, engaging people in science.

Timo is a SecondLife enthusiast. “It’s as exciting as the early days of the Web in terms of where it’s going to go.” He talks about the Space Flight Museum that has replicas of space vehicles in Second Life. You can more or less fly them. The Schizophrenia House was designed to help people understand how the world looks to schizophrenics. “Second Nature” is the new home for Nature on SecondLife. It’s shaped like a water molecule. It’s still being terraformed, but there’s already a bubblegum machine that makes structures of molecules by going to PubChem and retrieving the model.

The Web is bringing back to science its original sense of purpose.

Q: (me) Why is Nature progressive about this while others are not?
A: It’s our mission. We have good support from the top. I don’t know why others aren’t.

Q: Why aren’t scientists aren’t jumping on board?
A: Some are deeply unhappy about it. The majority of scientists, my perception is, they don’t think about it much. They get on with their research.

Q: What are the negatives about what Nature is doing? E.g., the communications channels that scientists use might become diffuse, leading to attention diffusion so scientists don’t know what the most prestigious or urgent things to look at are.
A: Information overload is clearly a problem. We don’t want to just give people more choice. We want to help people find what they need.

Q: Why do you exclude these developments from open access?
A: We need to be talking about issues beyond the narrower open access questions about embargoes, etc.

Q: The barriers include psychological, cultural, infrastructure, and funding issues. Which are the most amenable to change and would make the biggest difference in enabling collaboration?
A: The social barriers—the norms and expectations—are very difficult to change. The funders have the ability to change people’s behavior with one stroke, e.g., mandating self-archiving. Journals also have influence. We require authors to put nucleotide sequences into the GenBank and require the accession number. [Yikes. I understood the prepositions but that’s about it. Sorry for the guesses.]

Q: Nature’s authors are donating their research by putting it into the database. This grows Nature’s wealth. This could be an incentive to publishing in Nature. In the next 5-10 years, how will that develop?
A: I don’t know. I agree with that vision. Nature doesn’t have a grand plan for ten years because there are too many imponderables, particularly the social pressures. It’s impossible to foresee how people are going to use these systems.

Q: Open Source software publishing has a similar model. Have you looked at this? E.g., Linus Torvalds seems to have the same role as Nature journal in terms of accepting additions to the core.
A: I see science as an enormous Open Source system with each contribution a patch to the system making it better and growing it. The hacker ethic has the same roots as science.

Q: For filtering after publication, what are the safeguards against junk science?
A: You have to set expectations. Traditional peer review is extremely labor intensive. It’s not scalable. One alternative model is to put it out there and use the wisdom of crowds—not the average behavior of lots of people, but the Wikipedia idea of a self-selected group doing it.

Q: 1. Why can’t we have a mark in the database indicating that it’s been peer reviewed, and publish every paper anyway? It’s not an either-or. 2. And do you see scientists divided into regular scientists and scientists who specialize in aggregating information?
A: 1. Absolutely. And there will be more and more models. 2. I visited yesterday the organization that runs PubChem, etc., and they show how you can make discoveries by going into the database and without doing bench science.

Q: The gene ontologists do this — interdisciplinary.
A: Yes, a lot of people working across lot of domains. But they’re creating an ontology as a framework within which those sorts of discoveries can be made, not making the discoveries.

Q: In terms of the barriers to change, can you comment on the funding system? How might it evolve to support collaboration?
A: I don’t know. There will be various solutions. We’re at too early a stage. Our first job is to do useful things. We can do that because we have a successful journal behind us. That’s a fortunate position.

Q: You’ve talked mainly on the flow of information within the scientific community. How about info flowing into and out of that community?
A: Nature has a two-fold mission: To enable communications between scientists and to enable comms between scientists and society at large. I’m mainly focused on communications among scientists, but things like SecondLife can engage non-scientists in scientific subjects.

Q: (me) Doesn’t the growth of processes other than peer review and open access constitute a threat to Nature?A: In my view, developments of peer review are more of a threat to Nature than open access. Peer review has continuing value, but there is a threat. That’s why Nature has to be out there experimenting and leading. There will still be a role of publishers to help people find what they need. Whether the incumbents are the best ones to do that we’ll have to see. [Tags: ]

Leave a Reply

Comments (RSS).  RSS icon