Joho the Blog » open access

October 25, 2011

[berkman] [2b2k] Michael Nielsen on the networking of science

Michael Nielsen is giving a Berkman talk on the networking of science. (It’s his first talk after his book Reinventing Discovery was published.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He begins by telling the story of Tim Gowers, a Fields Medal winner and blogger. (Four of the 42 living Fields winners have started blogs; two of them are still blogging.) In January 2009, Gowers started posting difficult problems on his blog, and work on the problem in the open. Plus he invited the public to post ideas in the comments. He called this the Polymath Project. 170,000 words in the comments later, ideas had been proposed and rapidly improved or discarded. A few weeks later, the problem had been solved at an even higher level of generalization.

Michael asks: Why isn’t this more common? He gives an example of the failure of an interesting idea. It was proposed by a grad student in 2005. Qwiki was supposed to be a super-textbook about Quantum Mechanics. The site was well built and well marketed. “But science is littered with examples of wikis like this…They are not attracting regular contributors.” Likewise many scientific social networks are ghost towns. “The fundamental problem is one of opportunity costs. If you’re a young scientist, the way you build your career is through the publication of scientific papers…One mediocre crappy paper is going to do more your career than a series of brilliant contributions to a wiki.”

Why then is the Polymath Project succeeding? It just used an unconventional means to a conventional means: they published two papers out of it. Sites like Qwiki that are an end in themselves are not being exploited. We need a “change in norms in scientific culture” so that when people are making decisions about grants and jobs, people who contribute to unconventional formats are rewarded.

How do you achieve a change in the culture. It’s hard. Take the Human Genome project. In the 1990s, there wasn’t not a lot of advantage to individual scientists to share their data. In 1996, the Wellcome Trust held a meeting in Bermuda and agreed on principles that said that if you took more than a thousand base pairs, you need to release it to a public database and be put into the public domain. The funding agencies baked those principles into policy. In April 2000, Clinton and Blair urged all countries to adopt similar principles.

For this to work, you need enthusiastic acceptance, not just a stick beating scientists into submission. You need scientists to internalize it. Why? Because you need all sorts of correlative data to make lab data useful. E.g., Sloane Digital Sky Survey: a huge part of the project was establishing the calibration lines for the data to have meaning to anyone else.

Many scientists are pessimistic about this change occuring. But there’s some hopeful precedents. In 1610 Galileo pointed his telescope at Saturn. He was expecting to see a small disk. But he saw a disk with small knobs on either side — the rings, although he couldn’t resolve the image further. He sent letters to four colleagues, including Kepler that scrambled his discovery into an anagram. This way, if someone else made the discovery, Galileo could unscramble the letters and prove that he had made the discovery first. Leonardo, Newton, Hooks, Hyugens all did this. Scientific journals helped end this practice. The editors of the first journals had trouble convincing scientists to reveal their info because there was no link between publication and career. The editor of the first scientific journal (Philosophical Transactions of the Royal Society) goaded scientists into publishing by writing to them suggesting other scientists were about to disclose what the recipients of the letter were working on. As Paul David [Davis? Couldn't find it via Google] says, the change to the modern system was due to “patron pressure.”

Michael points out that Galileo immediately announced the discovery of four moons of Jupiter in order to get patronage bucks from the Medicis for the right to name them. [Or, as we would do today, The Comcast Moon, the Staples Moon, and the Gosh Honey Your Hair Smells Great Moon.]

Some new ideas: The Journal of Visualized Experiments videotapes lab work, thus revealing tacit knowledge. Geiger Science (from Springer) publishes data sets as first-class objects. Open Research Computation makes code into a first-class object. And blog posts are beginning to show up on Google Scholar (possible because they’re paying attention to tags?). So, if your post is being cited by lots of articles, your post will show up at Scholar.

[in response to a question] A researcher claimed to have solved the P not-P problem. One of the serious mathematicians (Cook) said it was a serious solution. Mathematicians and others tore it apart on the Web to see if it was right. About a week later, the consensus was that there was a serious obstruction, although they salvaged a small lemma. The process leveraged expertise in many different areas — statistical physics, logic, etc.

Q: [me] Science has been a type of publishing. How does scientific knowledge change when it becomes a type of networking?
A: You can see this beginning to happen in various fields. E.g., People at Google talk about their sw as an ecology. [Afterwards, Michael explained that Google developers use a complex ecology of libraries and services with huge numbers of dependencies.] What will it mean when someone says that the Higgs Boson has been found at the LHC? There are millions of lines of code, huge data sets. It will be an example of using networked knowledge to draw a conclusion where no single person has more than a tiny understanding of the chain of inferences that led to this result. How do you do peer review of that paper? Peer review can’t mean that it’s been checked because no one person can check it. No one has all the capability. How do you validate this knowledge? The methods used to validate are completely ad hoc. E.g., International Panel on Climate Change has more data than any one person can evaluate. And they don’t have a method. It’s ad hoc. They do a good job, but it’s ad hoc.

Q: Classification of Finite Groups were the same. A series of papers.
A: Followed by a 1200 word appendix addressing errors.

Q: It varies by science, of course. For practical work, people need access to the data. For theoretical work, the person who makes the single step that solves it should get 98% of the credit. E.g., Newton v. Leibniz on calculus. E.g., Perleman‘s approach to the PoincarĂ© conjecture.
A: Yes. Perelman published three papers on a pre-press server. Afterward, someone published a paper that filled in the gaps, but Perelman’s was the crucial contribution. This is the normal bickering in science. I would like to see many approaches and gradual consensus. You’ll never have perfect agreement. With transparency, you can go back and see how people came to those ideas.

Q: What is validation? There is a fundamental need for change in the statistical algorithms that many data sets are built on. You have to look at those limitations as well as at the data sets.
A: There’s lots of interesting things happening. But I think this is a transient problem. Best practices are still emerging. There are a lot of statisticians on the case. A move toward more reproducible research and more open sharing of code would help. E.g., many random generators are broken, as is well known. Having the random generator code in an open repository makes life much easier.

Q: The P v not-P left a sense that it was a sprint in response to a crisis, but how can it be done in a more scalable way?
A: People go for the most interesting claims.

Q: You mentioned the Bermuda Principles, and NIH requires open access pub one year after paper pub. But you don’t see that elsewhere. What are the sociological reasons?
Peter Suber: There’s a more urgent need for medical research. The campaign for open access at NSF is not as large, and the counter-lobby (publishers of scientific journals) is bigger. But Pres. Obama has said he’s willing to do it by executive order if there’s sufficient public support. No sign of action yet.

Q: [peter suber] I want to see researchers enthusiastic about making their research public. How do we construct a link between OA and career?
A: It’s really interesting what’s going on. A lot of discussion about supporting gold OA (publishing in OA journals, as opposed to putting it into an OA repository). Fundamentally, it comes down to a question of values. Can you create a culture in science that views publishing in gold OA journals as better than publishing in prestigious toll journals. The best way perhaps is to make it a public issue. Make it embarrassing for scientists to lock their work away. The Aaron Swartz case has sparked a public discussion of the role publishers, especially when they’re making 30% profits.
Q: Peter: Whenever you raise the idea of tweaking tenure criteria, you unleash a tsunami of academic conservativism, even if you make clear that this would still support the same rigorous standards. Can we change the reward system without waiting for it to evolve?
A: There was a proposal a few years ago that it be done purely algorithmic: produce a number based on the citation index. If it had been done, simple tweaks to the algorithm would have been an example: “You get a 10% premium for being in a gold OA journal, etc.”
Q: [peter] One idea was that your work wouldn’t be noticed by the tenure committee if it wasn’t in an OA repository.
A: Spiers [??] lets you measure the impact of your pre-press articles, which has had made it easier for people to assess the effect of OA publishing. You see people looking up the Spiers number of a scientist they just met. You see scientists bragging about the number of times their slides have been downloaded via Mendeley.

Q: How can we accelerate by an order of magnitude in the short term?
A: Any tool that becomes widely used to measure impact affects how science is done. E.g., the H Index. But I’d like to see a proliferation of measures because when you only have one, it reduces cognitive diversity.

Q: Before the Web, Erdos was the moving collaborator. He’d go from place to place and force collaboration. Let’s duplicate that on the Net!
A: He worked 18 hours a day, 365 days/year, high on amphetamines. Not sure that’s the model :) He did lots of small projects. When you have a large project, you bring in the expertise you need. Open collaboration has the unpredictable spread of expertise that participates, and that’s often crucial. E.g., Einstein never thought that understanding gravity required understanding non-standard geometries. He learned that from someone else [missed who]. That’s the sort of thing you get in open collaborations.

Q: You have to have a strong ego to put your out-there paper out there to let everyone pick it apart.
A: Yes. I once asked a friend of mine how he consistently writes edgy blog posts. He replied that it’s because there are some posts he genuinely regrets writing. That takes a particular personality type. But the same is true for publishing papers.
Q: But at least you can blame the editors or peer reviewers.
A: But that’s very modern. In the 1960s. Of Einstein’s 300 papers, only one was peer reviewed … and that one was rejected. Newton was terribly anguished by the criticism of his papers. Networked science may exacerbate it, but it’s always been risky to put your ideas out there.

[Loved this talk.]

2 Comments »

October 6, 2011

Open Education Resources bill in Brazil

Carolina Rossini passes along the following:

Today, the Sao Paulo State Legislature Representative, Mr. Simao Pedro, assisted by his team, specially Lucia F. Pinto, and the OER-Brazil Project, has introduced an OER bill to regulate the educational resources developed directly and indirectly (contracts for products or services or public purchases) by that state, and determine that an open license should be applied (CC-BY-NC-SA). It also deals with repositories for such OERs.

Soon the text of the bill will be available from the ALESP website http://www.al.sp.gov.br/portal/site/Internet/.

You can follow the www.rea.net.br (in Portuguese) for more information and analysis, including the recent analysis on the Sao Paulo city OER Decree.

Well done, Sao Paulo!

2 Comments »

October 1, 2011

[2b2k] Open access to a life of work

From Kattallus, via metafilter:

Humanities and the Liberal Arts is the personal website of former Middlebury classics professor William Harris who passed away in 2009. In his retirement he crafted a wonderful site full of essays, music, sculpture, poetry and his thoughts on anything from education to technology. But the heart of the website for me is, unsurprisingly, his essays on ancient Latin and Greek literature some of whom are book-length works. Here are a few examples: Purple color in Homer, complete fragments of Heraclitus, how to read Homer and Vergil, a discussion of a recently unearthed poem by Sappho, Plato and mathematics, Propertius’ war poems, and finally, especially close to my heart, his commentaries on the poetry of Catullus, for example on Ipsithilla, Odi et amo, Attis poem as dramatic dance performance and a couple of very dirty poems (even by Catullus’ standard). That’s just a taste of the riches found on Harris’ site, which has been around nearly as long as the world wide web has existed.

There are months of serious browsing in the world of Prof. Miller’s thought. It is a particularly wonderful illustration of the boon of having worldwide access to unlimited worlds of thought.

1 Comment »

August 24, 2011

Google Books contract with the British Library

Thanks to the persistence of Javier Ruiz of the British Open Rights Group, you can now read [pdf] the contract between the British Library and Google Books. Google has shrouded its book digitization contracts in non-disclosures wrapped in lead sheathing that is then buried in collapsed portions of the Wieliczka salt mines. It took a Freedom of Information Act request by Javier to get access, and Google restricts further re-distribution.

Javier points out that the contract is non-exclusive, although the cost of re-digitizing is a barrier. Also, while the contract allows non-commercial research into the scanned corpus, Google gets to decide which research to allow. “There is also a welcome clause explicitly allowing for metadata to be included in the Europeana database,” Javier reports.

2 Comments »

July 13, 2011

New flavor of open

Randy Scheckman, the new editor of the Proceedings of the National Academy of Sciences — online and free — explains how the a new journal will work: scientists will edit for scientists, there will be rapid turnaround, and the journal’s acceptance rate for submissions will go way up. He positions it as more scientist-friendly than Public Library of Science.

The fact that this interview was (admirably) published in Science magazine has some significance as well.

(By the way, the authors of a report on obstacles to open access have left a hefty and useful comment on my post.)


In my continued pursuit of never getting anything entirely right, here’s a comment from Michael Jensen: “Not quite right — the new editor of what I think is a still-unnamed OA biomedical journal was announced, but Randy Schekman currently edits the PNAS, as I read it.” I have edited the above to get it righter. Thanks, Michael!

Be the first to comment »

July 11, 2011

What’s stopping open access?

A report on a survey of 350 chemists and 350 economists in UK universities leads to the following conclusion about open access publishing:

…our work with researchers on the ground indicates to us that whatever the enthusiasm and optimism within the OA community, it has not spilled into academia to a large extent and has had only a small effect on the publishing habits and perceptions of ordinary researchers, whatever their seniority and whether in Chemistry or Economics.


The report finds that faculty members want to publish in high “impact factor” journals unless they have some specific reason why they should go the Open Access route, e.g., they need to get something out quickly. The subscriptions their libraries buy mask from them the extent to which their work becomes inaccessible to those who are not a university.

The report ends with some recommendations for trying to move academics towards OA publishing.

7 Comments »

June 14, 2011

Linked Open Data take-aways

I just wrote up an informal trip report in the form of “take aways” from the LOD-LAM conference I attended a cople of weeks ago. Here is a lightly edited version.

 


Because it was an unconference, it was too participatory to enable us to take systematic notes. I did, however, interview a number of attendees, and have posted the videos on the Library Innovation Lab blog site. I actually have a few more yet to post. In addition, during the course of one of the sessions (on “Explaining LOD-LAM”), a few of us began constructing a FAQ.

Here’s some of what I took away from the conference.

- There is considerable momentum around linked open data, starting with the sciences where there is particular research value in compiling huge data sets. Many libraries are joining in.

- LOD for libraries will enable a very fluid aggregation of information from multiple types of sources around any particular object. E.g., a page about a Hogarth illustration (or about Hogarth, or about 18th century London, etc.) could quite easily aggregate information from any data set that knows something about that illustration or about topics linked to that illustration. This information could be used to build a page or to do research.

- Making data and metadata available as LOD enables maximal re-use by others.

- Doing so requires expertise, but should be less massively difficult than supporting many other standards.

- For the foreseeable future, this will be something libraries do in addition to supporting more traditional data standards; it will be an additional expense and effort.

- Although there is continuing debate about exactly which license to use when publishing library data sets, it seems that usually putting any form of license on the data other than a public domain waiver of licenses is likely to be (a) futile and (b) so difficult to deal with that it will inhibit re-use of the data, depriving it of value. (See the 4-star license proposal that came out of this conference.)

- The key point of resistance against LOD among libraries, archives and museums is the justified fear that once the data is released into the world, the curating institutions can no longer ensure that the metadata about an object is correct; the users of LOD might pick up a false attribution, inaccurate description, etc. This is a genuine risk, since LOD permits irresponsible use of data. The risk can be mitigated but not removed.

1 Comment »

June 8, 2011

MacKenzie Smith on open licenses for metadata

MacKenzie Smith of MIT and Creative Commons talks about the new 4-star rating system for open licenses for metadata from cultural institutions:

The draft is up on the LOD-LAM site.

Here are some comments on the system from open access guru Peter Suber.

5 Comments »

June 6, 2011

Peter Suber on the 4-star openness rating

One of the outcomes of the the LOD-LAM conference was a draft of an idea for a 4-star classification of openness of metadata from cultural institutions. The classification is nicely counter-intuitive, which is to say that it’s useful.

I asked Peter Suber, the Open Access guru, what he thought of it. He replied in an email:

First, I support the open knowledge definition and I support a star system to make it easy to refer to different degrees of openness.

* I’m not sure where this particular proposal comes from. But I recommend working with the Open Knowledge Foundation, which developed the open knowledge definition. The more key players who accept the resulting star system, the more widely it will be used.

* This draft overlooks some complexity in the 3-star entry and the 2-star entry. Currently it suggests that attribution through linking is always more open than attribution by other means (say, by naming without linking). But this is untrue. Sometimes one is more difficult than the other. In a given case, the easier one is more open by lowering the barrier to distribution.

If you or your software had both names and links for every datasource you wanted to attribute, then attribution by linking and attribution by naming would be about equal in difficulty and openness. But if you had names without links, then obtaining the links would be an extra burden that would delay or impede distribution.

The disparity in openness grows as the number of datasources increases. On this point, see the Protocol for Implementing Open Access Data (by John Wilbanks for Science Commons, December 2007).

Relevant excerpt: “[T]here is a problem of cascading attribution if attribution is required as part of a license approach. In a world of database integration and federation, attribution can easily cascade into a burden for scientists….Would a scientist need to attribute 40,000 data depositors in the event of a query across 40,000 data sets?” In the original context, Wilbanks uses this (cogently) as an argument for the public domain, or for shedding an attribution requirement. But in the present context, it complicates the ranking system. If you *did* have to attribute a result to 40,000 data sources, and if you had names but not links for many of those sources, then attribution by naming would be *much* easier than attribution by linking.

Solution? I wouldn’t use stars to distinguish methods of attribution. Make CC-BY (or the equivalent) the first entry after the public domain, and let it cover any and all methods of attribution. But then include an annotation explaining that some methods attribution increase the difficulty of distribution, and that increasing the difficulty will decrease openness. Unfortunately, however, we can’t generalize about which methods of attribution raise and lower this barrier, because it depends on what metadata the attributing scholar may already possess or have ready to hand.

* The overall implication is that anything less open than CC-BY-SA deserves zero stars. On the one hand, I don’t mind that, since I’d like to discourage anything less open than CC-BY-SA. On the other, while CC-BY-NC and CC-BY-ND are less open than CC-BY-SA, they’re more open than all-rights-reserved. If we wanted to recognize that in the star system, we’d need at least one more star to recognize more species.

I responded with a question: “WRT to your naming vs. linking comments: I assumed the idea was that it’s attribution-by-link vs. attribution-by-some-arbitrary-requirement. So, if I require you to attribute by sticking in a particular phrase or mark, I’m making it harder for you to just scoop up and republish my data: Your aggregating sw has to understand my rule, and you have to follow potentially 40,000 different rules if you’re aggregating from 40,000 different databases.

Peter responded:

You’re right that “if I require you to attribute by sticking in a particular phrase or mark, I’m making it harder for you to just scoop up and republish my data.” However, if I already have the phrases or marks, but not the URLs, then requiring me to attribute by linking would be the same sort of barrier. My point is that the easier path depends on which kinds of metadata we already have, or which kinds are easier for us to get. It’s not the case that one path is always easier than another.

But it might be the case that one path (attribution by linking) is *usually* easier than another. That raises a nice question: should that shifting, statistical difference be recognized with an extra star? I wouldn’t mind, provided we acknowledged the exceptions in an annotation.

1 Comment »

June 3, 2011

Open Access and libraries

I’ve posted the next in my series of library podcasts at the Library Innovation Lab blog. This one is with Peter Suber, the hub of the Open Access movement.

Be the first to comment »

« Previous Page | Next Page »