[2b2k] Scientific transparency vs. trust

Last January, Jean-Claude Bradley, an associate professor of chemistry at Drexel, posted about an assignment he gave his students: He asked them to find five different sources for the properties of a chemicals of their choosing. The results were sobering.

For example, in one case a paper that had spent five months undergoing peer review before being accepted by Biotechnology and Bioprocess Engineering got the water solubility of the chemical extract of green tea (EGCG) wrong. The source of the information had it right — caffeine is 21.7 grams per liter and EGCG is 5g/l — but likely through a transcription error, the number for caffeine got appended to the number for EGCG, resulting in EGCG’s solubility being reported in the paper not as 5 but as 521.7. That number is off by two orders of magnitude, and is so high that you’d think one of the peer reviewers or editors would have caught it. The chain of data in this case goes back through several more sources to a published experiment that, unfortunately, does not contain enough information to enable us (well, chemists like Jean-Claude) to fully judge its accuracy.

Jean-Claude’s point is not that all scientific data is wrong. Rather, it is that “trust should have no part in science.” Instead we should be able to check the sources of data, preferably all the way back to the lab notebooks and the raw instrument readings. That’s the impetus behind Jean-Claude’s open notebook science initiative.

Note that in this case, the correction to the published error is likely to come via a blog, but our ecology does not have an obvious or routine way in which good bloggy information can drive out bad published data. But, no nostalgia here, please! As Jean-Claude’s post shows, for all its peer reviewers and expert editors, the old ecology gave errors a stubborn rootedness.

If you accept that humans are more fallible than we’d like, then you build systems that accommodate change. Paper is not very accommodating in this regard. Worse, its fixity has contributed to our false confidence that we can get things right and know when we’ve done so.

  1. I just saw a major article at a professional organization website with a chart abstracted from the article. The chart of criteria had a mistake obvious to anyone in the field who reads as slowly as I do (slow reading is not something I brag about). When I read the full article, the same mistake was in a table in the body of the article. The data in the article supported the fact that the table was incorrect.

    The article had, literally, about 35 authors and when published online must have been reviewed by at least two reviewers and likely an editor. I wrote the editor who replied that it seemed like a mistake. The abstracted table was corrected in a day or two, but it took much longer for the article itself to be corrected. In the meantime, who knows how many people read and/or downloaded the incorrect article and the abstracted table. The error is not likely to appear in the print version.

  3. Actually we don’t know if the 5 g/l was right because they didn’t reference that number. From the analysis of the other peer reviewed articles in that post I think it is clear that we really don’t know what the solubility of EGCG in water is. We only know that there are major contradictions in the literature.

