Joho the Blog » [2b2k] Is big data degrading the integrity of science?
EverydayChaos
Everyday Chaos
Too Big to Know
Too Big to Know
Cluetrain 10th Anniversary edition
Cluetrain 10th Anniversary
Everything Is Miscellaneous
Everything Is Miscellaneous
Small Pieces cover
Small Pieces Loosely Joined
Cluetrain cover
Cluetrain Manifesto
My face
Speaker info
Who am I? (Blog Disclosure Form) Copy this link as RSS address Atom Feed

[2b2k] Is big data degrading the integrity of science?

Amanda Alvarez has a provocative post at GigaOm:

There’s an epidemic going on in science: experiments that no one can reproduce, studies that have to be retracted, and the emergence of a lurking data reliability iceberg. The hunger for ever more novel and high-impact results that could lead to that coveted paper in a top-tier journal like Nature or Science is not dissimilar to the clickbait headlines and obsession with pageviews we see in modern journalism.

The article’s title points especially to “dodgy data,” and the item in this list that’s by far the most interesting to me is the “data reliability iceberg,” and its tie to the rise of Big Data. Amanda writes:

…unlike in science…, in big data accuracy is not as much of an issue. As my colleague Derrick Harris points out, for big data scientists the abilty to churn through huge amounts of data very quickly is actually more important than complete accuracy. One reason for this is that they’re not dealing with, say, life-saving drug treatments, but with things like targeted advertising, where you don’t have to be 100 percent accurate. Big data scientists would rather be pointed in the right general direction faster — and course-correct as they go – than have to wait to be pointed in the exact right direction. This kind of error-tolerance has insidiously crept into science, too.

But, the rest of the article contains no evidence that the last sentence’s claim is true because of the rise of Big Data. In fact, even if we accept that science is facing a crisis of reliability, the article doesn’t pin this on an “iceberg” of bad data. Rather, it seems to be a melange of bad data, faulty software, unreliable equipment, poor methodology, undue haste, and o’erweening ambition.

The last part of the article draws some of the heat out of the initial paragraphs. For example: “Some see the phenomenon not as an epidemic but as a rash, a sign that the research ecosystem is getting healthier and more transparent.” It makes the headline and the first part seem a bit overstated — not unusual for a blog post (not that I would ever do such a thing!) but at best ironic given this post’s topic.

I remain interested in Amanda’s hypothesis. Is science getting sloppier with data?

Previous: « || Next: »

Leave a Reply

Comments (RSS).  RSS icon