Joho the Blog » science

May 14, 2012

Goodies from Wolfram

Some wonderfully interesting stuff from Stephen Wolfram today.

Here’s his Reddit IAMA.

A post about what’s become of a New Kind of Science in the past ten years. And a part two, about reactions to NKS.

And here’s a post from a couple of months ago that I missed that is, well, amazing. All I’ll say is that it’s about “personal analytics.”

Be the first to comment »

May 13, 2012

[2b2k] The Net as paradigm

Edward Burman recently sent me a very interesting email in response to my article about the 50th anniversary of Thomas Kuhn’s The Structure of Scientific Revolutions. So I bought his 2003 book Shift!: The Unfolding Internet – Hype, Hope and History (hint: If you buy it from Amazon, check the non-Amazon sellers listed there) which arrived while I was away this week. The book is not very long — 50,000 words or so — but it’s dense with ideas. For example, Edward argues in passing that the Net exploits already-existing trends toward globalization, rather than leading the way to it; he even has a couple of pages on Heidegger’s thinking about the nature of communication. It’s a rich book.

Shift! applies The Structure of Scientific Revolutions to the Internet revolution, wondering what the Internet paradigm will be. The chapters that go through the history of failed attempts to understand the Net — the “pre-paradigms” — are fascinating. Much of Edward’s analysis of business’ inability to grasp the Net mirrors cluetrain‘s themes. (In fact, I had the authorial d-bag reaction of wishing he had referenced Cluetrain…until I realized that Edward probably had the same reaction to my later books which mirror ideas in Shift!) The book is strong in its presentation of Kuhn’s ideas, and has a deep sense of our cultural and philosophical history.

All that would be enough to bring me to recommend the book. But Edward admirably jumps in with a prediction about what the Internet paradigm will be:

This…brings us to the new paradigm, which will condition our private and business lives as the twenty-first century evolves. It is a simple paradigm, and may be expressed in synthetic form in three simple words: ubiquitous invisible connectivity. That is to say, when the technologies, software and devices which enable global connectivity in real time become so ubiquitous that we are completely unaware of their presence…We are simply connected.” [p. 170]

It’s unfair to leave it there since the book then elaborates on this idea in very useful ways. For example, he talks about the concept of “e-business” as being a pre-paradigm, and the actual paradigm being “The network itself becomes the company,” which includes an erosion of hierarchy by networks. But because I’ve just written about Kuhn, I found myself particularly interested in the book’s overall argument that Kuhn gives us a way to understand the Internet. Is there an Internet paradigm shift?

The are two ways to take this.

First, is there a paradigm by which we will come to understand the Internet? Edward argues yes, we are rapidly settling into the paradigmatic understanding of the Net. In fact, he guesses that “the present revolution [will] be completed and the new paradigm of being [will] be in force” in “roughly five to eight years” [p. 175]. He sagely points to three main areas where he thinks there will be sufficient development to enable the new paradigm to take root: the rise of the mobile Internet, the development of productivity tools that “facilitate improvements in the supply chain” and marketing, and “the increased deployment of what have been termed social applications, involving education and the political sphere of national and local government.” [pp. 175-176] Not bad for 2003!

But I’d point to two ways, important to his argument, in which things have not turned out as Edward thought. First, the 5-8 years after the book came out were marked by a continuing series of disruptive Internet developments, including general purpose social networks, Wikipedia, e-books, crowdsourcing, YouTube, open access, open courseware, Khan Academy, etc. etc. I hope it’s obvious that I’m not criticizing Edward for not being prescient enough. The book is pretty much as smart as you can get about these things. My point is that the disruptions just keep coming. The Net is not yet settling down. So we have to ask: Is the Net going to enable continuous disruption and self-transformation? If so will it be captured by a paradigm? (Or, as M. Knight Shyamalan might put it, is disruption the paradigm?)

Second, after listing the three areas of development over the next 5-8 years, the book makes a claim central to the basic formulation of the new paradigm Edward sees emerging: “And, vitally, for thorough implementation [of the paradigm] the three strands must be invisible to the user: ubiquitous and invisible connectivity.” [p. 176] If the invisibility of the paradigm is required for its acceptance, then we are no closer to that event, for the Internet remains perhaps the single most evident aspect of our culture. No other cultural object is mentioned as many times in a single day’s newspaper. The Internet, and the three components the book point to, are more evident to us than ever. (The exception might be innovations in logistics and supply chain management; I’d say Internet marketing remains highly conspicuous.) We’ve never had a technology that so enabled innovation and creativity, but there may well come a time when we stop focusing so much cultural attention on the Internet. We are not close yet.

Even then, we may not end up with a single paradigm of the Internet. It’s really not clear to me that the attendees at ROFLcon have the same Net paradigm as less Internet-besotted youths. Maybe over time we will all settle into a single Internet paradigm, but maybe we won’t. And we might not because the forces that bring about Kuhnian paradigms are not at play when it comes to the Internet. Kuhnian paradigms triumph because disciplines come to us through institutions that accept some practices and ideas as good science; through textbooks that codify those ideas and practices; and through communities of professionals who train and certify the new scientists. The Net lacks all of that. Our understanding of the Net may thus be as diverse as our cultures and sub-cultures, rather than being as uniform and enforced as, say, genetics’ understanding of DNA is.

Second, is the Internet affecting what we might call the general paradigm of our age? Personally, I think the answer is yes, but I wouldn’t use Kuhn to explain this. I think what’s happening — and Edward agrees — is that we are reinterpreting our world through the lens of the Internet. We did this when clocks were invented and the world started to look like a mechanical clockwork. We did this when steam engines made society and then human motivation look like the action of pressures, governors, and ventings. We did this when telegraphs and then telephones made communication look like the encoding of messages passed through a medium. We understand our world through our technologies. I find (for example) Lewis Mumford more helpful here than Kuhn.

Now, it is certainly the case that reinterpreting our world in light of the Net requires us to interpret the Net in the first place. But I’m not convinced we need a Kuhnian paradigm for this. We just need a set of properties we think are central, and I think Edward and I agree that these properties include the abundant and loose connections, the lack of centralized control, the global reach, the ability of everyone (just about) to contribute, the messiness, the scale. That’s why you don’t have to agree about what constitutes a Kuhnian paradigm to find Shift! fascinating, for it helps illuminate the key question: How are the properties of the Internet becoming the properties we see in — or notice as missing from — the world outside the Internet?

Good book.

3 Comments »

April 29, 2012

[2b2k] Pyramid-shaped publishing model results in cheating on science?

Carl Zimmer has a fascinating article in the NYTimes, which is worth 1/10th of your NYT allotment. (Thank you for ironically illustrating the problem with trying to maintain knowledge as a scarce resource, NYT!)

Carl reports on what may be a growing phenomenon (or perhaps, as the article suggests, the bugs of the old system may just now be more apparent) of scientists fudging results in order to get published in the top journals. From my perspective the article provides yet another illustration how the old paper-based strictures on scientific knowledge caused by the scarcity of publishing outlets results not only in a reduction in the flow of knowledge, but a degradation of the quality of knowledge.

Unfortunately, the availability of online journals (many of which are peer-reviewed) may not reduce the problem much even though they open up the ol’ knowledge nozzle to 11 on the firehosedial. As we saw when the blogosphere first emerged, there is something like a natural tendency for networked ecosystems to create hubs with a lot of traffic, along with a very long tail. So, even with higher capacity hubs, there may still be some pressure to fudge results in order to get noticed by these hubs, especially since tenure decisions continue to place such high value on a narrow understanding of “impact.”

But: 1. With a larger aperture, there may be less pressure. 2. When readers are also commentators and raters, bad science may be uncovered faster and more often. Or so we can hope.

(There is the very beginnings of a Reddit discussion of Carl’s article here.)

3 Comments »

April 23, 2012

[2b2] Structure of Scientific Revolutions, 50 years later

The Chronicle of Higher Ed asked me to write a perspective on Thomas Kuhn’s The Structure of Scientific Revolutions since this is the 50th year since it was published. It’s now posted.

1 Comment »

March 2, 2012

[2b2k] TIL: Edward Jenner’s smallpox paper was rejected by the Royal Society

Edward Jenner is credited as the discoverer — or perhaps inventor would be the more apt word — of vaccination as a technique to prevent smallpox. That’s pretty much all that I knew, except for the story about milkmaids who got cowpox not getting smallpox. But I just read a really interesting article about the history of small pox at the National Institute of Health, by Stefan Riedel.

“TIL” is Reddit-speak for “Today I learned.” And today I also learned that “As early as 430 BC, survivors of smallpox were called upon to nurse the afflicted” in order to protect them. Today I also learned that “Inoculation…was likely practiced in Africa, India, and China long before the 18th century, when it was introduced to Europe.” And today I also learned that “It was the continued advocacy of the English aristocrat Lady Mary Wortley Montague that was responsible for the introduction of variolation [inoculation] in England.”

1 Comment »

February 29, 2012

[2b2k] The next Darwin is a we

Sebastian Benthall has a fervent post about the need for open networks in science, inspired by an awesome talk by the awesome Victoria Stodden.

Along the way, he offers a correction (or extension, perhaps) of a point that I make in 2b2k: the next Darwin is likely to develop her work within an open network that add values to her work. In some real sense the knowledge lives in that network. Sebastian responds:

He’s right, except maybe for one thing, which is that this digital dialectic (or pluralectic) implies that “the next Darwin” isn’t just one dude, Darwin, with his own ‘-ism’ and pernicious Social adherents. Rather, it means that the next great theory of the origin of species is going to be built by a massive collaborative effort in which lots of people will take an active part. The historical record will show their contributions not just with the clumsy granularity of conference publications and citations, but with minute granularity of thousands of traced conversations. The theory itself will probably be too complicated for any one person to understand, but that’s OK, because it will be well architected and there will be plenty of domain experts to go to if anyone has problems with any particular part of it. And it will be growing all the time and maybe competing with a few other theories.

I love the point.

(Nit: I want to clarify, however, that I wasn’t saying that this next Darwin’s web would consist only of “pernicious Social adherents.” Throughout 2b2k I try to make the point that networked knowledge has value mainly because it includes difference and disagreement. When it does not, it fulfills the nightmare of the echo chamber.)

1 Comment »

October 25, 2011

[berkman] [2b2k] Michael Nielsen on the networking of science

Michael Nielsen is giving a Berkman talk on the networking of science. (It’s his first talk after his book Reinventing Discovery was published.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He begins by telling the story of Tim Gowers, a Fields Medal winner and blogger. (Four of the 42 living Fields winners have started blogs; two of them are still blogging.) In January 2009, Gowers started posting difficult problems on his blog, and work on the problem in the open. Plus he invited the public to post ideas in the comments. He called this the Polymath Project. 170,000 words in the comments later, ideas had been proposed and rapidly improved or discarded. A few weeks later, the problem had been solved at an even higher level of generalization.

Michael asks: Why isn’t this more common? He gives an example of the failure of an interesting idea. It was proposed by a grad student in 2005. Qwiki was supposed to be a super-textbook about Quantum Mechanics. The site was well built and well marketed. “But science is littered with examples of wikis like this…They are not attracting regular contributors.” Likewise many scientific social networks are ghost towns. “The fundamental problem is one of opportunity costs. If you’re a young scientist, the way you build your career is through the publication of scientific papers…One mediocre crappy paper is going to do more your career than a series of brilliant contributions to a wiki.”

Why then is the Polymath Project succeeding? It just used an unconventional means to a conventional means: they published two papers out of it. Sites like Qwiki that are an end in themselves are not being exploited. We need a “change in norms in scientific culture” so that when people are making decisions about grants and jobs, people who contribute to unconventional formats are rewarded.

How do you achieve a change in the culture. It’s hard. Take the Human Genome project. In the 1990s, there wasn’t not a lot of advantage to individual scientists to share their data. In 1996, the Wellcome Trust held a meeting in Bermuda and agreed on principles that said that if you took more than a thousand base pairs, you need to release it to a public database and be put into the public domain. The funding agencies baked those principles into policy. In April 2000, Clinton and Blair urged all countries to adopt similar principles.

For this to work, you need enthusiastic acceptance, not just a stick beating scientists into submission. You need scientists to internalize it. Why? Because you need all sorts of correlative data to make lab data useful. E.g., Sloane Digital Sky Survey: a huge part of the project was establishing the calibration lines for the data to have meaning to anyone else.

Many scientists are pessimistic about this change occuring. But there’s some hopeful precedents. In 1610 Galileo pointed his telescope at Saturn. He was expecting to see a small disk. But he saw a disk with small knobs on either side — the rings, although he couldn’t resolve the image further. He sent letters to four colleagues, including Kepler that scrambled his discovery into an anagram. This way, if someone else made the discovery, Galileo could unscramble the letters and prove that he had made the discovery first. Leonardo, Newton, Hooks, Hyugens all did this. Scientific journals helped end this practice. The editors of the first journals had trouble convincing scientists to reveal their info because there was no link between publication and career. The editor of the first scientific journal (Philosophical Transactions of the Royal Society) goaded scientists into publishing by writing to them suggesting other scientists were about to disclose what the recipients of the letter were working on. As Paul David [Davis? Couldn't find it via Google] says, the change to the modern system was due to “patron pressure.”

Michael points out that Galileo immediately announced the discovery of four moons of Jupiter in order to get patronage bucks from the Medicis for the right to name them. [Or, as we would do today, The Comcast Moon, the Staples Moon, and the Gosh Honey Your Hair Smells Great Moon.]

Some new ideas: The Journal of Visualized Experiments videotapes lab work, thus revealing tacit knowledge. Geiger Science (from Springer) publishes data sets as first-class objects. Open Research Computation makes code into a first-class object. And blog posts are beginning to show up on Google Scholar (possible because they’re paying attention to tags?). So, if your post is being cited by lots of articles, your post will show up at Scholar.

[in response to a question] A researcher claimed to have solved the P not-P problem. One of the serious mathematicians (Cook) said it was a serious solution. Mathematicians and others tore it apart on the Web to see if it was right. About a week later, the consensus was that there was a serious obstruction, although they salvaged a small lemma. The process leveraged expertise in many different areas — statistical physics, logic, etc.

Q: [me] Science has been a type of publishing. How does scientific knowledge change when it becomes a type of networking?
A: You can see this beginning to happen in various fields. E.g., People at Google talk about their sw as an ecology. [Afterwards, Michael explained that Google developers use a complex ecology of libraries and services with huge numbers of dependencies.] What will it mean when someone says that the Higgs Boson has been found at the LHC? There are millions of lines of code, huge data sets. It will be an example of using networked knowledge to draw a conclusion where no single person has more than a tiny understanding of the chain of inferences that led to this result. How do you do peer review of that paper? Peer review can’t mean that it’s been checked because no one person can check it. No one has all the capability. How do you validate this knowledge? The methods used to validate are completely ad hoc. E.g., International Panel on Climate Change has more data than any one person can evaluate. And they don’t have a method. It’s ad hoc. They do a good job, but it’s ad hoc.

Q: Classification of Finite Groups were the same. A series of papers.
A: Followed by a 1200 word appendix addressing errors.

Q: It varies by science, of course. For practical work, people need access to the data. For theoretical work, the person who makes the single step that solves it should get 98% of the credit. E.g., Newton v. Leibniz on calculus. E.g., Perleman‘s approach to the PoincarĂ© conjecture.
A: Yes. Perelman published three papers on a pre-press server. Afterward, someone published a paper that filled in the gaps, but Perelman’s was the crucial contribution. This is the normal bickering in science. I would like to see many approaches and gradual consensus. You’ll never have perfect agreement. With transparency, you can go back and see how people came to those ideas.

Q: What is validation? There is a fundamental need for change in the statistical algorithms that many data sets are built on. You have to look at those limitations as well as at the data sets.
A: There’s lots of interesting things happening. But I think this is a transient problem. Best practices are still emerging. There are a lot of statisticians on the case. A move toward more reproducible research and more open sharing of code would help. E.g., many random generators are broken, as is well known. Having the random generator code in an open repository makes life much easier.

Q: The P v not-P left a sense that it was a sprint in response to a crisis, but how can it be done in a more scalable way?
A: People go for the most interesting claims.

Q: You mentioned the Bermuda Principles, and NIH requires open access pub one year after paper pub. But you don’t see that elsewhere. What are the sociological reasons?
Peter Suber: There’s a more urgent need for medical research. The campaign for open access at NSF is not as large, and the counter-lobby (publishers of scientific journals) is bigger. But Pres. Obama has said he’s willing to do it by executive order if there’s sufficient public support. No sign of action yet.

Q: [peter suber] I want to see researchers enthusiastic about making their research public. How do we construct a link between OA and career?
A: It’s really interesting what’s going on. A lot of discussion about supporting gold OA (publishing in OA journals, as opposed to putting it into an OA repository). Fundamentally, it comes down to a question of values. Can you create a culture in science that views publishing in gold OA journals as better than publishing in prestigious toll journals. The best way perhaps is to make it a public issue. Make it embarrassing for scientists to lock their work away. The Aaron Swartz case has sparked a public discussion of the role publishers, especially when they’re making 30% profits.
Q: Peter: Whenever you raise the idea of tweaking tenure criteria, you unleash a tsunami of academic conservativism, even if you make clear that this would still support the same rigorous standards. Can we change the reward system without waiting for it to evolve?
A: There was a proposal a few years ago that it be done purely algorithmic: produce a number based on the citation index. If it had been done, simple tweaks to the algorithm would have been an example: “You get a 10% premium for being in a gold OA journal, etc.”
Q: [peter] One idea was that your work wouldn’t be noticed by the tenure committee if it wasn’t in an OA repository.
A: Spiers [??] lets you measure the impact of your pre-press articles, which has had made it easier for people to assess the effect of OA publishing. You see people looking up the Spiers number of a scientist they just met. You see scientists bragging about the number of times their slides have been downloaded via Mendeley.

Q: How can we accelerate by an order of magnitude in the short term?
A: Any tool that becomes widely used to measure impact affects how science is done. E.g., the H Index. But I’d like to see a proliferation of measures because when you only have one, it reduces cognitive diversity.

Q: Before the Web, Erdos was the moving collaborator. He’d go from place to place and force collaboration. Let’s duplicate that on the Net!
A: He worked 18 hours a day, 365 days/year, high on amphetamines. Not sure that’s the model :) He did lots of small projects. When you have a large project, you bring in the expertise you need. Open collaboration has the unpredictable spread of expertise that participates, and that’s often crucial. E.g., Einstein never thought that understanding gravity required understanding non-standard geometries. He learned that from someone else [missed who]. That’s the sort of thing you get in open collaborations.

Q: You have to have a strong ego to put your out-there paper out there to let everyone pick it apart.
A: Yes. I once asked a friend of mine how he consistently writes edgy blog posts. He replied that it’s because there are some posts he genuinely regrets writing. That takes a particular personality type. But the same is true for publishing papers.
Q: But at least you can blame the editors or peer reviewers.
A: But that’s very modern. In the 1960s. Of Einstein’s 300 papers, only one was peer reviewed … and that one was rejected. Newton was terribly anguished by the criticism of his papers. Networked science may exacerbate it, but it’s always been risky to put your ideas out there.

[Loved this talk.]

2 Comments »

October 17, 2011

[2b2k] Why this article?

An possible explanation of the observation of neutrinos traveling faster than light has been posted at Arxiv.org by Ronald van Elburg. I of course don’t have any of the conceptual apparatus to be able to judge that explanation, but I’m curious about why, among all the explanations, this is one I’ve now heard about it.

In a properly working knowledge ecology, the most plausible explanations would garner the most attention, because to come to light an article would have to pass through competent filters. In the new ecology, it may well be that what gets the most attention are articles that appeal to our lizard brains in various ways: they make overly-bold claims, they over-simplify, they confirm prior beliefs, they are more comprehensible to lay people than are ideas that require more training to understand, they have an interesting backstory (“Ashton Kutcher tweets a new neutrino explanation!”)…

By now we are all familiar with the critique of the old idea of a “properly working knowledge ecology”: Its filters were too narrow and were prone to preferring that which was intellectually and culturally familiar. There is a strong case to be made that a more robust ecology is wilder in its differences and disagreements. Nevertheless, it seems to me to be clearly true (i.e., I’m not going to present any evidence to support the following) that to our lizard brains the Internet is a flat rock warmed by a bright sun.

But that is hardly the end of the story. The Internet isn’t one ecology. It’s a messy cascade of intersecting environents. Indeed, the ecology metaphor doesn’t suffice, because each of us pins together our own Net environments by choosing which links to click on, which to bookmark, and which to pass along to our friends. So, I came across the possible neutrino explanation at Metafilter, which I was reading embedded within Netvibes, a feed aggregator that I use as my morning newspaper. A comment at Metafilter pointed to the top comment at Reddit’s AskScience forum on the article, which I turned to because on this sort of question I often find Reddit comment threads helpful. (I also had a meta-interest in how articles circulate.) If you despise Reddit, you would have skipped the Metafilter comment’s referral to that site, but you might well hae pursued a different trail of links.

If we take the circulation of Ronald van Elburg’s article as an example, what do we learn? Well, not much because it’s only one example. Nevertheless, I think it at least helps make clear just how complex our “media environment” has become, and some of the effects it has on knowledge and authority.

First, we don’t yet know how ideas achieve status as centers of mainstream contention. Is von Elburg’s article attaining the sort of reliable, referenceable position that provides a common ground for science? It was published at Arxiv, which lets any scientist with an academic affiliation post articles at any stage of readiness. On the other hand, among the thousands of articles posted every day, the Physics Arxiv blog at Technology Review blogged about this one. (Even who’s blogging about what where is complex!) If over time von Elburg’s article is cited in mainstream journals, then, yes, it will count as having vaulted the wall that separates the wannabes from the contenders. But, to what extent are articles not published in the prestigious journals capable of being established as touchpoints within a discipline? More important, to what extent does the ecology still center around controversies about which every competent expert is supposed to be informed? How many tentpoles are there in the Big Tent? Is there a Big Tent any more?

Second, as far as I know, we don’t yet have a reliable understanding of the mechanics of the spread of ideas, much less an understanding of how those mechanics relate to the worth of ideas. So, we know that high-traffic sites boost awareness of the ideas they publish, and we know that the mainstream media remain quite influential in either the creation or the amplification of ideas. We know that some community-driven sites (Reddit, 4chan) are extraordinarily effective at creating and driving memes. We also know that a word from Oprah used to move truckloads of books. But if you look past the ability of big sites to set bonfires, we don’t yet understand how the smoke insinuates its way through the forest. And there’s a good chance we will never understand it very fully because the Net’s ecology is chaotic.

Third, I would like to say that it’s all too complex and imbued with value beliefs to be able to decide if the new knowledge ecology is a good thing. I’d like to be perceived as fair and balanced. But the truth is that every time I try to balance the scales, I realize I’ve put my thumb on the side of traditional knowledge to give it heft it doesn’t deserve. Yes, the new chaotic ecology contains more untruths and lies than ever, and they can form a self-referential web that leaves no room for truth or light. At the same time, I’m sitting at breakfast deciding to explore some discussions of relativity by wiping the butter off my finger and clicking a mouse button. The discussions include some raging morons, but also some incredibly smart and insightful strangers, some with credentials and some who prefer not to say. That’s what happens when a population actually engages with its culture. To me, that engagement itself is more valuable than the aggregate sum of stupidity it allows.


(Yes, I know I’m having some metaphor problems. Take that as an indication of the unsettled nature of our thought. Or of bad writing.)

1 Comment »

October 11, 2011

[2b2k] Retraction system creaking under the load

According to a post at Nature by Richard Van Noorden, the rate of retracted scientific articles is growing far faster than the rate of published or posted articles. No one is sure why, but it is exposing inconsistencies in policies for dealing with retracted articles.

Suggested reforms include better systems for linking papers to their retraction notices or revisions, more responsibility on the part of journal editors and, most of all, greater transparency and clarity about mistakes in research.

It’s encouraging that it’s taken as obvious that the proper response is links and transparency. Gotta love science.

3 Comments »

August 3, 2011

[2b2k] Open bench science

Carl Zimmer at The Loom points to Rosie Redfield’s blogging of her lab work investigating a claim of arsenic-based life forms. It’s a good example of networked science : science that is based on the network model, rather than on a publishing model.

I find open notebook science overall to be fascinating and promising.

Be the first to comment »

« Previous Page | Next Page »