I’m at re comm 13, an odd conference in Kitzbühel, Austria: 2.5 days of talks to 140 real estate executives, but the talks are about anything except real estate. David Eagleman, a neural scientist at Baylor, and a well-known author, is giving a talk. (Last night we had one of those compressed conversations that I can’t wait to be able to continue.)
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
How do we know your thinking is in your brain? If you damage your finger, you don’t change, but damage to your brain can change basic facets of your life. “The brain is the densest representation of who you are.” We’re the only species trying to figure out our own progamming language. We’ve discovered the most complicated device in the universe: our own brains. Ten billion neurons. Every single neuron contains the entire human genome and thousands of protens doing complicated computations. Each neuron is is connected to tens of thousands of its neighbors, meaning there are 100s of trillions of connections. These numbers “bankrupt the language.”
Almost all of the operations of the brain are happening at a level invisible to us. Taking a drink of water requires a “lightning storm” of acvitity at the neural level. This leads us to a concept of the unconscious. The conscious part of you is the smallest bit of what’s happening in the brain. It’s like a stowaway on a transatlantic journey that’s taking credit for the entire trip. When you think of something, your brain’s been working on it for hours or days. “It wasn’t really you that thought of it.”
About the unconscious: Psychologists gave photos of women to men and asked them to evaluate how attractive they are. Some of the photos were the same women, but with dilated eyes. The men rated them as being more attractive but none of them noticed the dilation. Dilated eyes are a sign of sexual readiness in women. Men made their choices with no idea of why.
More examples: In the US, if your name is Dennis or Denise, you’re more likely to become a dentist. These dentists have a conscious narrative about why they became dentists that misses the trick their brain has played on them. Likewise, people are statistically more likely to marry someone whose first name begins with the same first letter as theirs. And, i you are holding a warm mug of coffee, you’ll describe the relationship with your mother as warmer than if you’re holding an iced cup. There is an enormous gap between what you’re doing and what your conscious mind is doing.
“We should be thankful for that gap.” There’s so much going on under the hood, that we need to be shielded from the details. The conscious mind gets in trouble when it starts paying attention to what it’s doing. E.g., try signing your name with both hands in opposite directions simultaneously: it’s easy until you think about it. Likewise, if you now think about how you steer when making a lane change, you’re likely to enact it wrong. (You actually turn left and then turn right to an equal measure.)
Know thyself, sure. But neuroscience teaches us that you are many things. The brain is not a computer with a single output. It has many networks that are always competing. The brain is like a parliament that debates an action. When deciding between two sodas, one network might care about the price, another about the experience, another about the social aspect (cool or lame), etc. They battle. David looks at three of those networks:
1. How does the brain make decisions about valuation? E.g., people will walk 10 mins to save 10 € on a 20 € pen but not on a 557 € suit. Also, we have trouble making comparisons of worth among disparate items unless they are in a shared context. E.g., Williams Sonoma had a bread baking machine for $275 that did not sell. Once they added a second one for $370, it started selling. In real estate, if a customer is trying to decide between two homes, one modern and one traditional, if you want them to buy the modern one, show them another modern one. That gives them the context by which they can decide to buy it.
Everything is associated with everything else in the brain. (It’s an associative network.) Coffee used to be $0.50. When Starbucks started, they had to unanchor it from the old model so they made the coffee houses arty and renamed the sizes. Having lost the context for comparison, the price of Starbucks coffee began to seem reasonable.
2. Emotional experience is a big part of decision making. If you’re in a bad-smelling room, you’ll make harsher moral decisions. The trolley dilemma: 5 people have been tied to the tracks. A trolley is approaching rapidly. You can switch the trolley to a track with only one person tied to it. Everyone would switch the trolley. But now instead, you can push a fat man onto the trolley to stop the car. Few would. In the second scenario, touching someone engages the emotional system. The first scenario is just a math problem. The logic and emotional systems are always fighting it out. The Greeks viewed the self as someone steering a chariot drawn by the white horse of reason and the black horse of passion. [From Plato's Phaedrus]
3. A lot of the machinery of the brain deals with other brains. We use the same circuitry to think about people andor corporations. When a company betrays us, our brain responds the way it would if a friend betrayed us. Traditional economics says customer interactions are short-term but the brain takes a much longer-range view. Breaches of trust travel fast. (David plays “United Breaks Guitars.”) Smart companies use social media that make you believe that the company is your friend.
The battle among these three networks drives decisions. “Know thyselves.”
This is unsettling. The self is not at the center. It’s like when Galileo repositioned us in the universe. This seemed like a dethroning of man. The upside is that we’ve discovered the Cosmos is much bigger, more subtle, and more magnificent than we thought. As we sail into the inner cosmos of the brain, the brain is much subtle and magnificent than we ever considered.
“We’ve found the most wondrous thing in the universe, and it’s us.”
Q: Won’t this let us be manipulated?
A: Neural science is just catching up with what advertisers have known for 100 years.
Q: What about free will?
A: My labs and others have done experiments, and there’s no single experiment in neuroscience that proves that we do or do not have free will. But if we have free will, it’s a very small player in the system. We have genetics and experiences, and they make brains very different from one another. I argue for a legal system that recognizes a difference between people who may have committed the same crime. There are many different types of brains.
Popular Science has announced that it’s shutting down comments on its articles. The post by Suzanne LeBarre says this is because ” trolls and spambots” have overwhelmed the useful comments. But what I hear instead is: “We don’t know how to run a comment board, so shut up.”
Suzanne cites research that suggests that negative comments on an article reduce the credibility of the article, even if those negative comments are entirely unfounded. Thus, the trolls don’t just ruin the conversation, they hurt the cause of science.
Ok, let’s accept that. Scientific American cited the same research but came to a different decision. Rather than shut down its comments, it decided to moderate them using some sensible rules designed to encourage useful conversation. Their idea of a “useful conversation” is likely quite similar to Popular Science’s: not only no spam, but the discourse must be within the norms of science. So, it doesn’t matter how loudly Jesus told you that there is no climate change going on, your message is going to be removed if it doesn’t argue for your views within the evidentiary rules of science.
You may not like this restriction at Scientific American. Tough. You have lots of others places you can talk about Jesus’ beliefs about climate change. I posted at length about the Scientific American decision at the time, and especially about why this makes clear problems with the “echo chamber” meme, but I fundamentally agree with it.
If comments aren’t working on your site, then it’s your fault. Fix your site.
[Tip o' the hat to Joshua Beckerman for pointing out the PopSci post.]
Tagged with: 2b2k
Date: September 27th, 2013 dw
Science Friday has posted a brief, phenomenal video about how octopuses and other cephalopods manage to camouflage themselves incredibly quickly. It explains the skin’s mechanism (which is mind-blowing in itself), but leaves open how they manage this even though they’re color blind. (Hat tip to Joe Mahoney.)
Tagged with: octopus
Date: September 11th, 2013 dw
Amanda Alvarez has a provocative post at GigaOm:
There’s an epidemic going on in science: experiments that no one can reproduce, studies that have to be retracted, and the emergence of a lurking data reliability iceberg. The hunger for ever more novel and high-impact results that could lead to that coveted paper in a top-tier journal like Nature or Science is not dissimilar to the clickbait headlines and obsession with pageviews we see in modern journalism.
The article’s title points especially to “dodgy data,” and the item in this list that’s by far the most interesting to me is the “data reliability iceberg,” and its tie to the rise of Big Data. Amanda writes:
…unlike in science…, in big data accuracy is not as much of an issue. As my colleague Derrick Harris points out, for big data scientists the abilty to churn through huge amounts of data very quickly is actually more important than complete accuracy. One reason for this is that they’re not dealing with, say, life-saving drug treatments, but with things like targeted advertising, where you don’t have to be 100 percent accurate. Big data scientists would rather be pointed in the right general direction faster — and course-correct as they go – than have to wait to be pointed in the exact right direction. This kind of error-tolerance has insidiously crept into science, too.
But, the rest of the article contains no evidence that the last sentence’s claim is true because of the rise of Big Data. In fact, even if we accept that science is facing a crisis of reliability, the article doesn’t pin this on an “iceberg” of bad data. Rather, it seems to be a melange of bad data, faulty software, unreliable equipment, poor methodology, undue haste, and o’erweening ambition.
The last part of the article draws some of the heat out of the initial paragraphs. For example: “Some see the phenomenon not as an epidemic but as a rash, a sign that the research ecosystem is getting healthier and more transparent.” It makes the headline and the first part seem a bit overstated — not unusual for a blog post (not that I would ever do such a thing!) but at best ironic given this post’s topic.
I remain interested in Amanda’s hypothesis. Is science getting sloppier with data?
, too big to know
Tagged with: 2b2k
• big data
Date: May 26th, 2013 dw
Bora Zivkovic, the blog editor at Scientific American, has a great post about bad comment threads. This is a topic that has come up every day this week, which may just be a coincidence, or perhaps is a sign that the Zeitgeist is recognizing that when it talks to itself, it sounds like an idiot.
Bora cites a not-yet-published paper that presents evidence that a nasty, polarized comment thread can cause readers who arrive with no opinion about the paper’s topic to come to highly polarized opinions about it. This is in line with off-line research Cass Sunstein cites that suggests echo chambers increase polarization, except this new research indicates that it increases polarization even on first acquaintance. (Bora considers the echo chamber idea to be busted, citing a prior post that is closely aligned with the sort of arguments I’ve been making, although I am more worried about the effects of homophily — our tendency to hang out with people who agree with us — than he is.)
Much of Bora’s post is a thoughtful yet strongly voiced argument that it is the responsibility of the blog owner to facilitate good discussions by moderating comments. He writes:
So, if I write about a wonderful dinner I had last night, and somewhere in there mention that one of the ingredients was a GMO product, but hey, it was tasty, then a comment blasting GMOs is trolling.
Really? Then why did Bora go out of his way to mention that it was a GMO product? He seems to me to be trolling for a response. Now, I think Bora just picked a bad example in this case, but it does show that the concept of “off-topic” contains a boatload of norms and assumptions. And Bora should be fine with this, since his piece begins by encouraging bloggers to claim their conversation space as their own, rather than treating it as a public space governed by the First Amendment. It’s up to the blogger to do what’s necessary to enable the type of conversations that the blogger wants. All of which I agree with.
Nevertheless, Bora’s particular concept of being on-topic highlights a perpetual problem of conversation and knowledge. He makes a very strong case — nicely argued — for why he nukes climate-change denials from his comment thread. Read his post, but the boiled down version is: (a) These comments are without worth because they do not cite real evidence and most of them are astroturf anyway. (b) They create a polarized environment that has the bad effect of raising unjustified doubts in the minds of readers of the post (as per the research he mentions at the beginning of his post). (c) They prevent conversation from advancing thought because they stall the conversation at first principles. Sounds right to me. And I agree with his subsequent denial of the echo chamber effect as well:
The commenting threads are not a place to showcase the whole spectrum of opinions, no matter how outrageous some of them are, but to educate your readers, and to, in turn, get educated by your readers who always know something you don’t.
But this is why the echo chamber idea is so slippery. Conversation consists of the iteration of small differences upon a vast ground of agreement. A discussion of a scientific topic among readers of Scientific American has value insofar as they can assume that, say, evolution is an established theory, that assertions need to be backed by facts of a certain evidentiary sort (e.g., “God told me” doesn’t count), that some assertions are outside of the scope of discussion (“Evolution is good/evil”), etc. These are criteria of a successful conversation, but they are also the marks of an echo chamber. The good Scientific American conversation that Bora curates looks like an echo chamber to the climate change deniers and the creationists. If one looks only at the structure of the conversation, disregarding all the content and norms, the two conversations are indistinguishable.
But now I have to be really clear about what I’m not saying. I am not saying that there’s no difference between creationists and evolutionary biologists, or that they are equally true. I am not saying that both conversations follow the same rules of evidence. I am certainly not saying that their rules of evidence are equally likely to lead to scientific truths. I am not even saying that Bora needs to throw open the doors of his comments. I’m saying something much more modest than that: To each side, the other’s conversation looks like a bunch of people who are reinforcing one another in their wrong beliefs by repeating those beliefs as if they were obviously right. Even the conversation I deeply believe is furthering our understanding — the evolutionary biologists, if you haven’t guessed where I stand on this issue — has the structure of an echo chamber.
This seems to me to have two implications.
First, it should keep us alert to the issue that Bora’s post tries to resolve. He encourages us to exclude views challenging settled science because including ignorant trolls leads casual visitors to think that the issues discussed are still in play. But climate change denial and creationist sites also want to promote good conversations (by their lights), and thus Bora is apparently recommending that those sites also should exclude those who are challenging the settled beliefs that form the enabling ground of conversation — even though in this case it would mean removing comments from all those science-y folks who keep “trolling” them. It seems to me that this leads to a polarized culture in which the echo chamber problem gets worse. Now, I continue to believe that Bora is basically right in his recommendation. I just am not as happy about it as he seems to be. Perhaps Bora is in practice agreeing with Too Big to Know’s recommendation that we recognize that knowledge is fragmented and is not going to bring us all together.
Second, the fact that we cannot structurally distinguish a good conversation from a bad echo chamber I think indicates that we don’t have a good theory of conversation. The echo chamber fear grows in the space that a theory of conversation should inhabit.
I don’t have a theory of conversation in my hip pocket to give you. But I presume that such a theory would include the notion, evident in Bora’s post, that conversations have aims, and that when a conversation is open to the entire world (a radically new phenomenon…thank you WWW!) those aims should be explicitly stated. Likewise for the norms of the conversation. I’m also pretty sure that conversations are never only about they say they’re about because they are always embedded in complex social environments. And because conversations iterate on differences on a vast ground of similarity, conversations rarely are about changing people’s minds about those grounds. Also, I personally would be suspicious of any theory of conversation that began by viewing conversations as composed fundamentally of messages that are encoded by the sender and decoded by the recipient; that is, I’m not at all convinced that we can get a theory of conversation out of an information-based theory of communication.
But I dunno. I’m confused by this entire topic. Nothing that a good conversation wouldn’t cure.
The letters of
Lord Alfred Russel Wallace, co-discoverer of the theory of evolution by natural selection, are now online. As the Alfred Russel Wallace Correspondence Project explains, the collection consists of 4,000 letters gathered from about 100 different institutions, with about half in the British Natural History Museum and British Library.
The Correspondence Project has, admirably, been releasing the scans without waiting for transcription; more faster is better! Predictably annoyingly, the letters, written by a man who died ten years before the Perpetual Copyright date of 1923, seem to be (but are they?) carefully obstructed by copyright: The Natural History Museum, which houses the collection, asserts copyright over “data held in the Wallace Letters Online database (including letter summaries)” [pdf — oddly unreadable in Mac Preview]. Beyond the summaries, exactly what data is this referring to? Not sure. Don’t know.
But that isn’t the full story anyway, for the NHM sends us to the Wallace Fund for more information about the copyright. That page tells us that the unpublished letters are copyrighted until 2039, with this very helpful footnote:
Unless the work was published with the permission of his Literary Estate before 1 August 1989, in which case the work will be in copyright for 70 years after Wallace’s death, unless he died more than 20 years before the work’s publication, in which case copyright would expire 50 years after publication.
Eventually it gets to some good news:
Authors wishing to publish such works would ordinarily need to obtain permission from the copyright holder before doing so. However, on July 31st 2011, in an attempt to facilitate the scholarly study of ARW’s writings, the co-executors of ARW’s Literary Estate agreed to allow third parties to publish ARW’s copyright works non-commercially without first having to ask the Literary Estate for permission, under the terms and conditions of Creative Commons license “Attribution-NonCommercial-ShareAlike 3.0 Unported”
So, are the letters published on the NHM site actually available under a Creative Commons non-commercial license? The Wallace Fund that aggregated them seems to think so. The NHM that published them maybe thinks not.
Because copyright is just so magical.
TWO HOURS LATER: Please see the first comment, from George Beccaloni, Director of the Wallace Correspondence Project. Thanks, George.
He explains that the transcribed text is available under a Creative Commons non-commercial license, but the digitized images are not. Plus some further complications, such as the content of the database being under copyright, although it is not clear from the site what data that is.
Since the aim of CC is to make it easier for people to re-use material, may I suggest (in the friendliest of fashions) that this be prominently clarified on the sites themselves?
Well, I learned a bunch of stuff, but I’ll only mention two.
First, NASA is as totally awesome as you think it is. I went to the Langley centerfor a one day visit, and got a morning tour, and it is a nerd-heaven work space, with no Star Wars white plastic, but lots and lots of dented workbenches covered with sprays of components. And it adds up to our species looking down on our planet. Ultra ultra cool.
Second, I got a tour of the National Transonic Facility by Bill Bisset, who manages the place. They test models in the world’s most sophisticated wind tunnel — they fill it with liquid nitrogen (which they make themselves) that’s blown in by the world’s most powerful horizontally-mounted electrical motor (that consumes an eighth of the output of a local nuclear generator), and they measure up to 5,000 different parameters. So, naturally, I ran an urban myth past Bill, because that’s an excellent use of his time.
I had been told by someone sometime that those little upturned wing tips you sometimes see on planes were discovered more than invented: Someone tried them out, and they turned out to increase the efficiency of the plane, but no one knew why.
Nope, nope, and nope. They’re called winglets. Here’s the story, from a NASA page:
The concept of winglets originated with a British aerodynamicist in the late 1800s, but the idea remained on the drawing board until rekindled in the early 1970s by Dr. Richard Whitcomb when the price of aviation fuel started spiraling upward.
Bill explained that winglets work by altering the vortex that forms when air rushes over a wing. “Winglets…produce a forward thrust inside the circulation field of the vortices and reduce their strength,” as the NASA page says. They increase efficiency by 6-9%. Bill said they also effectively increase the wingspan of the plane, but without extending the wings horizontally, which matters to airlines because they pay airports based upon the horizontal length of the wings.
So,yes, everything I’d heard was wrong. And, yes, it was in Wikipedia all along.
(And yes, I learned a whole lot more. It was for me a wonderful day.)
Tagged with: nasa
Date: January 9th, 2013 dw
An article in published in Science on Thursday, securely locked behind a paywall, paints a mixed picture of science in the age of social media. In “Science, New Media, and the Public,” Dominique Brossard and Dietram A. Scheufele urge action so that science will be judged on its merits as it moves through the Web. That’s a worthy goal, and it’s an excellent article. Still, I read it with a sense that something was askew. I think ultimately it’s something like an old vs. new media disconnect.
The authors begin by noting research that suggests that “online science sources may be helping to narrow knowledge gaps” across educational levels. But all is not rosy. Scientists are going to have “to rethink the interface between the science community and the public.” They point to three reasons.
First, the rise of online media has reduced the amount of time and space given to science coverage by traditional media .
Second, the algorithmic prioritizing of stories takes editorial control out of the hands of humans who might make better decisions. The authors point to research that “shows that there are often clear discrepancies between what people search for online, which specific areas are suggested to them by search engines, and what people ultimately find.” The results provided by search engines “may all be linked in a self-reinforcing informational spiral…” This leads them to ask an important question:
Is the World Wide Web opening up a new world of easily accessible scientific information to lay audiences with just a few clicks? Or are we moving toward an online science communication environment in which knowledge gain and opinion formation are increasingly shaped by how search engines present results, direct traffic, and ultimately narrow our informational choices? Critical discussions about these developments have mostly been restricted to the political arena…
Third, we are debating science differently because the Web is social. As an example they point to the fact that “science stories usually…are embedded in a host of cues about their accuracy, importance, or popularity,” from tweets to Facebook “Likes.” “Such cues may add meaning beyond what the author of the original story intended to convey.” The authors cite a recent conference  where the tone of online comments turned out to affect how people took the content. For example, an uncivil tone “polarized the views….”
They conclude by saying that we’re just beginning to understand how these Web-based “audience-media interactions” work, but that the opportunity and risk are great, so more research is greatly needed:
Without applied research on how to best communicate science online, we risk creating a future where the dynamics of online communication systems have a stronger impact on public views about science than the specific research that we as scientists are trying to communicate.
I agree with so much of this article, including its call for action, yet it felt odd to me that scientists will be surprised to learn that the Web does not convey scientific information in a balanced and impartial way. You only are surprised by this if you think that the Web is a medium. A medium is that through which content passes. A good medium doesn’t corrupt the content; it conveys signal with a minimum of noise.
But unlike any medium since speech, the Web isn’t a passive channel for the transmission of messages. Messages only move through the Web because we, the people on the Web, find them interesting. For example, I’m moving (infinitesimally, granted) this article by Brossard and Scheufele through the Web because I think some of my friends and readers will find it interesting. If someone who reads this post then tweets about it or about the original article, it will have moved a bit further, but only because someone cared about it. In short, we are the medium, and we don’t move stuff that we think is uninteresting and unimportant. We may move something because it’s so wrong, because we have a clever comment to make about it, or even because we misunderstand it, but without our insertion of ourselves in the form of our interests, it is inert.
So, the “dynamics of online communication systems” are indeed going to have “a stronger impact on public views about science” than the scientific research itself does because those dynamics are what let the research have any impact beyond the scientific community. If scientific research is going to reach beyond those who have a professional interest in it, it necessarily will be tagged with “meaning beyond what the author of the original story intended to convey.” Those meanings are what we make of the message we’re conveying. And what we make of knowledge is the energy that propels it through the new system.
We therefore cannot hope to peel the peer-to-peer commentary from research as it circulates broadly on the Net, not that the Brossard and Scheufele article suggests that. Perhaps the best we can do is educate our children better, and encourage more scientists to dive into the social froth as the place where their research is having its broadest effect.
Notes, copied straight from the article:
 M. A. Cacciatore, D. A. Scheufele, E. A. Corley, Public Underst. Sci.; 10.1177/0963662512447606 (2012).
 C. Russell, in Science and the Media, D. Kennedy, G. Overholser, Eds. (American Academy of Arts and Sciences, Cambridge, MA, 2010), pp. 13–43
 P. Ladwig et al., Mater. Today 13, 52 (2010)
 P. Ladwig, A. Anderson, abstract, Annual Conference of the Association for Education in Journalism and Mass Communication, St. Louis, MO, August 2011; www.aejmc. com/home/2011/06/ctec-2011-abstracts
, social media
, too big to know
Tagged with: 2b2k
Date: January 5th, 2013 dw
Last night I gave a talk at the Festival of Science in Genoa (or, as they say in Italy, Genova). I was brought over by Codice Edizioni, the publisher of the just-released Italian version of Too Big to Know (or, as they say in Italy “La Stanza Intelligente” (or as they say in America, “The Smart Room”)). The event was held in the Palazzo Ducale, which ain’t no Elks Club, if you know what I mean. And if you don’t know what I mean, what I mean is that it’s a beautiful, arched, painted-ceiling room that holds 800 people and one intimidated American.
After my brief talk, Serena Danna of Corriere della Serra interviewed me. She’s really good. For example, her first question was: If the facts no longer have the ability to settle arguments the way we hoped they would, then what happens to truth?
Yeah, way to pitch the ol’ softballs, Serena!
I wasn’t satisfied with my answer, which had three parts. (1) There are facts. The world is one way and not all the other ways that it isn’t. You are not free to make up your own facts. [Yes, I'm talking to you, Mitt!] (2) The basing of knowledge primarily on facts is a relatively new phenomenon. (3) I explicitly invoked Heidegger’s concept of truth, with a soupçon of pragmatism’s view of truth as a tool intended to serve a purpose.
Meanwhile, I’ve been watching The Heidegger Circle mailing list contort itself trying to understand Heidegger’s views about the world that existed before humans entered the scene. Was there Being? Were there beings? It seems to me that any answer has to begin by saying, “Of course the world existed before we did.” But not everyone on the list is comfortable with a statement that simple. Some seem to think that acknowledging that most basic fact somehow diminishes Heidegger’s analysis of the relation of Being and disclosure. Yo, Heideggerians! The world shows itself to us as independent of us. We were born into it, and it keeps going after we’ve died. If that’s a problem for your philosophy, then your philosophy is a problem. And for all of the problems with Heidegger’s philosophy, that just isn’t one. (To be fair, no one on the list suggests that the existence of the universe depends upon our awareness of it, although some are puzzled about how to maintain Heidegger’s conception of “world” (which does seem to depend on us) with that which survives our awareness of it. Heidegger, after all, offers phenomenological ontology, so there is a question about what Being looks like when there is no one to show itself to.)
So, I wasn’t very happy with what I said about truth last night. I said that I liked Heidegger’s notion that truth is the world showing itself to us, and it shows itself to us differently depending on our projects. I’ve always liked this idea for a few reasons. First, it’s phenomenologically true: the onion shows itself differently whether you’re intending to cook it, whether you’re trying to grow it as a cash crop, whether you’re trying to make yourself cry, whether you’re trying to find something to throw at a bad actor, etc. Second, because truth is the way the world shows itself, Heidegger’s sense contains the crucial acknowledgement that the world exists independently of us. Third, because this sense of truth look at our projects, it contains the crucial acknowledgement that truth is not independent of our involvement in the world (which Heidegger accurately characterizes not with the neutral term “involvement” but as our caring about what happens to us and to our fellow humans). Fourth, this gives us a way of thinking about truth without the correspondence theory’s schizophrenic metaphysics that tells us that we live inside our heads, and our mental images can either match or fail to match external reality.
But Heidegger’s view of truth doesn’t do the job that we want done when we’re trying to settle disagreements. Heidegger observes (correctly in my and everybody’s opinion) that different fields have different methodologies for revealing the truth of the world. He speaks coldly (it seems to me) of science, and warmly of poetry. I’m much hotter on science. Science provides a methodology for letting the world show itself (= truth) that is reproducible precisely so that we can settle disputes. For settling disputes about what the world is like regardless of our view of it, science has priority, just as the legal system has priority for settling disputes over the law.
This matters a lot not just because of the spectacular good that science does, but because the question of truth only arises because we sense that something is hidden from us. Science does not uncover all truths but it uniquely uncovers truths about which we can agree. It allows the world to speak in a way that compels agreement. In that sense, of all the disciplines and methodologies, science is the closest to giving the earth we all share its own authentic voice. That about which science cannot speak in a compelling fashion across all cultures and starting points is simply not subject to scientific analysis. Here the poets and philosophers can speak and should be heard. (And of course the compulsive force science manifests is far from beyond resistance and doubt.)
But, when we are talking about the fragmenting of belief that the Internet facilitates, and the fact that facts no longer settle arguments across those gaps, then it is especially important that we commit to science as the discipline that allows the earth to speak of itself in its most compelling terms.
Finally, I was happy that last night I did manage to say that science provides a model for trying to stay smart on the Internet because it is highly self-aware about what it knows: it does not simply hold on to true statements, but is aware of the methodology that led us to see those statements as true. This type of meta awareness — not just within the realm of science — is crucial for a medium as open as the Internet.
I’m at the “Symposium on Digital Curation in the Era of Big Data” held by the Board on Research Data and Information of the National Research Council. These liveblog notes cover (in some sense — I missed some folks, and have done my usual spotty job on the rest) the morning session. (I’m keynoting in the middle of it.)
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
Alan Blatecky [pdf] from the National Science Foundation says science is being transformed by Big Data. [I can't see his slides from the panel at front.] He points to the increase in the volume of data, but we haven’t paid enough attention to the longevity of the data. And, he says, some data is centralized (LHC) and some is distributed (genomics). And, our networks are unable to transport large amounts of data [see my post], making where the data is located quite significant. NSF is looking at creating data infrastructures. “Not one big cloud in the sky,” he says. Access, storage, services — how do we make that happen and keep it leading edge? We also need a “suite of policies” suitable for this new environment.
He closes by talking about the Data Web Forum, a new initiative to look at a “top-down governance approach.” He points positively to the IETF’s “rough consensus and running code.” “How do we start doing that in the data world?” How do we get a balanced representation of the community? This is not a regulatory group; everything will be open source, and progress will be through rough consensus. They’ve got some funding from gov’t groups around the world. (Check CNI.org for more info.)
Now Josh Greenberg from the Sloan Foundation. He points to the opportunities presented by aggregated Big Data: the effects on social science, on libraries, etc. But the tools aren’t keeping up with the computational power, so researchers are spending too much time mastering tools, plus it can make reproducibility and provenance trails difficult. Sloan is funding some technical approaches to increasing the trustworthiness of data, including in publishing. But Sloan knows that this is not purely a technical problem. Everyone is talking about data science. Data scientist defined: Someone who knows more about stats than most computer scientists, and can write better code than typical statisticians :) But data science needs to better understand stewardship and curation. What should the workforce look like so that the data-based research holds up over time? The same concerns apply to business decisions based on data analytics. The norms that have served librarians and archivists of physical collections now apply to the world of data. We should be looking at these issues across the boundaries of academics, science, and business. E.g., economics works now rests on data from Web businesses, US Census, etc.
[I couldn't liveblog the next two — Michael and Myron — because I had to leave my computer on the podium. The following are poor summaries.]
Michael Stebbins, Assistant Director for Biotechnology in the Office of Science and Technology Policy in the White House, talked about the Administration’s enthusiasm for Big Data and open access. It’s great to see this degree of enthusiasm coming directly from the White House, especially since Michael is a scientist and has worked for mainstream science publishers.
Myron Gutmann, Ass’t Dir of of the National Science Foundation likewise expressed commitment to open access, and said that there would be an announcement in Spring 2013 that in some ways will respond to the recent UK and EC policies requiring the open publishing of publicly funded research.
After the break, there’s a panel.
Anne Kenney, Dir. of Cornell U. Library, talks about the new emphasis on digital curation and preservation. She traces this back at Cornell to 2006 when an E-Science task force was established. She thinks we now need to focus on e-research, not just e-science. She points to Walters and Skinners “New Roles for New Times: Digital Curation for Preservation.” When it comes to e-research, Anne points to the need for metadata stabilization, harmonizing applications, and collaboration in virtual communities. Within the humanities, she sees more focus on curation, the effect of the teaching environment, and more of a focus on scholarly products (as opposed to the focus on scholarly process, as in the scientific environment).
She points to Youngseek Kim et al. “Education for eScience Professionals“: digital curators need not just subject domain expertise but also project management and data expertise. [There's lots of info on her slides, which I cannot begin to capture.] The report suggests an increasing focus on people-focused skills: project management, bringing communities together.
She very briefly talks about Mary Auckland’s “Re-Skilling for Research” and Williford and Henry, “One Culture: Computationally Intensive Research in the Humanities and Sciences.”
So, what are research libraries doing with this information? The Association of Research Libraries has a jobs announcements database. And Tito Sierra did a study last year analyzing 2011 job postings. He looked at 444 jobs descriptions. 7.4% of the jobs were “newly created or new to the organization.” New mgt level positions were significantly higher, while subject specialist jobs were under-represented.
Anne went through Tito’s data and found 13.5% have “digital” in the title. There were more digital humanities positions than e-science. She posts a lists of the new titles jobs are being given, and they’re digilicious. 55% of those positions call for a library science degree.
Anne concludes: It’s a growth area, with responsibilities more clearly defined in the sciences. There’s growing interest in serving the digital humanists. “Digital curation” is not common in the qualifications nomenclature. MLS or MLIS is not the only path. There’s a lot of interest in post-doctoral positions.
Margarita Gregg of the National Oceanic and Atmospheric Administration, begins by talking about challenges in the era of Big Data. They produce about 15 petabytes of data per year. It’s not just about Big Data, though. They are very concerned with data quality. They can’t preserve all versions of their datasets, and it’s important to keep track of the provenance of that data.
Margarita directs one of NOAA’s data centers that acquires, preserves, assembles, and provides access to marine data. They cannot preserve everything. They need multi-disciplinary people, and they need to figure out how to translate this data into products that people need. In terms of personnel, they need: Data miners, system architects, developers who can translate proprietary formats into open standards, and IP and Digital Rights Management experts so that credit can be given to the people generating the data. Over the next ten years, she sees computer science and information technology becoming the foundations of curation. There is no currently defined job called “digital curator” and that needs to be addressed.
Vicki Ferrini at the Lamont -Doherty Earth Observatory at Columbia University works on data management, metadata, discovery tools, educational materials, best practice guidelines for optimizing acquisition, and more. She points to the increased communication between data consumers and producers.
As data producers, the goal is scientific discovery: data acquisition, reduction, assembly, visualization, integration, and interpretation. And then you have to document the data (= metadata).
Data consumers: They want data discoverability and access. Inceasingly they are concerned with the metadata.
The goal of data providers is to provide acccess, preservation and reuse. They care about data formats, metadata standards, interoperability, the diverse needs of users. [I've abbreviated all these lists because I can't type fast enough.].
At the intersection of these three domains is the data scientist. She refers to this as the “data stewardship continuum” since it spans all three. A data scientist needs to understand the entire life cycle, have domain experience, and have technical knowledge about data systems. “Metadata is key to all of this.” Skills: communication and organization, understanding the cultural aspects of the user communities, people and project management, and a balance between micro- and macro perspectives.
Challenges: Hard to find the right balance between technical skills and content knowledge. Also, data producers are slow to join the digital era. Also, it’s hard to keep up with the tech.
Andy Maltz, Dir. of Science and Technology Council of Academy of Motion Picture Arts and Sciences. AMPA is about arts and sciences, he says, not about The Business.
The Science and Technology Council was formed in 2005. They have lots of data they preserve. They’re trying to build the pipeline for next-generation movie technologists, but they’re falling behind, so they have an internship program and a curriculum initiative. He recommends we read their study The Digital Dilemma. It says that there’s no digital solution that meets film’s requirement to be archived for 100 years at a low cost. It costs $400/yr to archive a film master vs $11,000 to archive a digital master (as of 2006) because of labor costs. [Did I get that right?] He says collaboration is key.
In January they released The Digital Dilemma 2. It found that independent filmmakers, documentarians, and nonprofit audiovisual archives are loosely coupled, widely dispersed communities. This makes collaboration more difficult. The efforts are also poorly funded, and people often lack technical skills. The report recommends the next gen of digital archivists be digital natives. But the real issue is technology obsolescence. “Technology providers must take archival lifetimes into account.” Also system engineers should be taught to consider this.
He highly recommends the Library of Congress’ “The State of Recorded Sound Preservation in the United States,” which rings an alarm bell. He hopes there will be more doctoral work on these issues.
Among his controversial proposals: Require higher math scores for MLS/MLIS students since they tend to score lower than average on that. Also, he says that the new generation of content creators have no curatorial awareness. Executivies and managers need to know that this is a core business function.
Demand side data points: 400 movies/year at 2PB/movie. CNN has 1.5M archived assets, and generates 2,500 new archive objects/wk. YouTube: 72 hours of video uploaded every minute.
Show business is a business.
Need does not necessarily create demand.
The nonprofit AV archive community is poorly organized.
Next gen needs to be digital natvies with strong math and sci skills.
The next gen of executive leaders needs to understand the importance of this.
Digital curation and long-term archiving need a business case.
Q: How about linking the monetary value of the metadata to the metadata? That would encourage the generation of metadata.
Q: Weinberger paints a picture of flexible world of flowing data, and now we’re back in the academic, scientific world where you want good data that lasts. I’m torn.
A: Margarita: We need to look how that data are being used. Maybe in some circumstances the quality of the data doesn’t matter. But there are other instances where you’re looking for the highest quality data.
A: [audience] In my industry, one person’s outtakes are another person’s director cuts.
A: Anne: In the library world, we say if a little metadata would be great, a lot of it would be great. We need to step away from trying to capture the most to capturing the most useful (since can’t capture the most). And how do you produce data in a way that’s opened up to future users, as well as being useful for its primary consumers? It’s a very interesting balance that needs to be played. Maybe short-term need is a higher thing and long-term is lower.
A: Vicki: The scientists I work with use discrete data sets, spreadsheets, etc. As we get along we’ll have new ways to check the quality of datasets so we can use the messy data as well.
Q: Citizen curation? E.g., a lot of antiques are curated by being put into people’s attics…Not sure what that might imply as model. Two parallel models?
A: Margarita: We’re going to need to engage anyone who’s interested. We need to incorporate citizen corporation.
Anne: That’s already underway where people have particular interests. E.g., Cornell’s Lab of Ornithology where birders contribute heavily.
Q: What one term will bring people info about this topic?
A: Vicki: There isn’t one term, which speaks to the linked data concept.
Q: How will you recruit people from all walks of life to have the skills you want?
A: Andy: We need to convince people way earlier in the educational process that STEM is cool.
A: Anne: We’ll have to rely to some degree on post-hire education.
Q: My shop produces and integrates lots of data. We need people with domain and computer science skills. They’re more likely to come out of the domains.
A: Vicki: As long as you’re willing to take the step across the boundary, it doesn’t mater which side you start from.
Q: 7 yrs ago in library school, I was told that you need to learn a little programming so that you understand it. I didn’t feel like I had to add a whole other profession on to the one I was studying.
Next Page »