Christine Borgman, chair of Info Studies at UCLA, and author of the essential Scholarship in the Digital Age, is giving a talk on The Knowledge Infrastructure of Astronomy. Her new book is Big Data, Little Data, No Data: Scholarship in the Networked World, but you’ll have to wait until January. (And please note that precisely because this is a well-organized talk with clearly marked sections, it comes across as choppy in these notes.)
NOTE: Live-blogging. Getting things wrong. Missing points.Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
Her new book draws on 15 yrs of studying various disciplines and 7-8 years focusing on astronomy as a discipline. It’s framed around the change to more data-intensive research across the sciences and humanities plus, the policy push for open access to content and to data. (The team site.)
They’ve been looking at four groups:
The world thinks that astronomy and genomics have figured out how to do data intensive science, she says. But scientists in these groups know that it’s not that straightforward. Christine’s group is trying to learn from these groups and help them learn from one another
Knowledge Infrastructures are “string and baling wire.” Pieces pulled together. The new layered on top of the old.
The first English scientific journal began almost 350 yrs ago. (Philosophical Transactions of the Royal Academy.) We no longer think of the research object as a journal but as a set of articles, objects, and data. People don’t have a simple answer to what is their data. The raw files? The tables of data? When they’re told to share their data, they’re not sure what data is meant.”Even in astronomy we don’t have a single, crisp idea of what are our data.”
It’s very hard to find and organize all the archives of data. Even establishing a chronology is difficult. E.g., “Yes, that project has that date stamp but it’s really a transfer from a prior project twenty years older than that.” It’s hard to map the pieces.
Seamless Astronomy: ADS All Sky Survey, mapping data onto the sky. Also, they’re trying to integrate various link mappings, e.g., Chandra, NED, Simbad, WorldWide Telescope, Arxiv.org, Visier, Aladin. But mapping these collections doesn’t tell you why they’re being linked, what they have in common, or what are their differences. What kind of science is being accomplished by making those relationships? Christine hopes her project will help explain this, although not everyone will agree with the explanations.
Her group wants to draw some maps and models: “A Christmas Tree of Links!” She shows a variety of maps, possible ways of organizing the field. E.g., one from 5 yrs ago clusters services, repositories, archives and publishers. Another scheme: Publications, Objects, Observations; the connection between pubs (citations) and observations is the most loosely coupled. “The trend we’re seeing is that astronomy is making considerable progress in tying together the observations, publications, and data.” “Within astronomy, you’ve built many more pieces of your infrastructure than any other field we’ve looked at.”
She calls out Chris Erdmann [sitting immediately in front of me] as a leader in trying to get data curation and custodianship taken up by libraries. Others are worrying about bit-rot and other issues.
Astronomy is committed to open access, but the resource commitments are uneven.
Strengths of astronomy:
Gaps of astronomy:
Investment in data sstewardship: varies by mission and by type of research. E.g., space-based missions get more investment than the ground-based ones. (An audience member says that that’s because the space research was so expensive that there was more insistence on making the data public and usable. A lively discussion ensues…)
The access to data varies.
Curation of tools and technologies
International coordination. Sould we curate existing data? But you don’t get funding for using existing data. So, invest in getting new data from new instruments??
Christine ends with some provocative questions about openness. What does it mean exactly? What does it get us?
Q: As soon as you move out of the Solar System to celestial astronomy, all the standards change.
A: When it takes ten years to build an instrument, it forces you to make early decisions about standards. But when you’re deploying sensors in lakes, you don’t always note that this is #127 that Eric put the tinfoil on top of because it wasn’t working well. Or people use Google Docs and don’t even label the rows and columns because all the readers know what they mean. That makes going back to it is much harder. “Making it useful for yourself is hard enough.” It’s harder still to make it useful for someone in 5 yrs, and harder still to make it useful for an unknown scientist in another country speaking another language and maybe from another discipline.
Q: You have to put a data management plan into every proposal, but you can’t make it a budget item… [There is a lively discussion of which funders reasonably fund this]
Q: Why does Europe fund ground-based data better than the US does?
A: [audience] Because of Riccardo Giacconi.
A: [Christine] We need to better fund the invisible workforce that makes science work. We’re trying to cast a light on this invisible infrastructure.
A new report on Ithaka S+R‘s annual survey of libraries suggests that library directors are committed to libraries being the starting place for their users’ research, but that the users are not in agreement. This calls into question the expenditures libraries make to achieve that goal. (Hat tip to Carl Straumsheim and Peter Suber.)
The question is good. My own opinion is that libraries should let Google do what it’s good at, while they focus on what they’re good at. And libraries are very good indeed at particular ways of discovery. The goal should be to get the mix right, not to make sure that libraries are the starting point for their communities’ research.
The Ithaka S+R survey found that “The vast majority of the academic library directors…continued to agree strongly with the statement: ‘It is strategically important that my library be seen by its users as the first place they go to discover scholarly content.'” But the survey showed that only about half think that that’s happening. This gap can be taken as room for improvement, or as a sign that the aspiration is wrongheaded.
The survey confirms that many libraries have responded to this by moving to a single-search-box strategy, mimicking Google. You just type in a couple of words about what you’re looking for and it searches across every type of item and every type of system for managing those items: images, archival files, books, maps, museum artifacts, faculty biographies, syllabi, databases, biological specimens… Just like Google. That’s the dream, anyway.
I am not sold on it. Roger cites Lorcan Dempsey, who is always worth listening to:
Lorcan Dempsey has been outspoken in emphasizing that much of “discovery happens elsewhere” relative to the academic library, and that libraries should assume a more “inside-out” posture in which they attempt to reveal more effectively their distinctive institutional assets.
Yes. There’s no reason to think that libraries are going to be as good at indexing diverse materials as Google et al. are. So, libraries should make it easier for the search engines to do their job. Library platforms can help. So can Schema.org as a way of enriching HTML pages about library items so that the search engines can easily recognize the library item metadata.
But assuming that libraries shouldn’t outsource all of their users’ searches, then what would best serve their communities? This is especially complicated since the survey reveals that preference for the library web site vs. the open Web varies based on just about everything: institution, discipline, role, experience, and whether you’re exploring something new or keeping up with your field. This leads Roger to provocatively ask:
While academic communities are understood as institutionally affiliated, what would it entail to think about the discovery needs of users throughout their lifecycle? And what would it mean to think about all the different search boxes and user login screens across publishes [sic] and platforms as somehow connected, rather than as now almost entirely fragmented? …Libraries might find that a less institutionally-driven approach to their discovery role would counterintuitively make their contributions more relevant.
I’m not sure I agree, in part because I’m not entirely sure what Roger is suggesting. If it’s that libraries should offer an experience that integrates all the sources scholars consult throughout the lifecycle of their projects or themselves, then, I’d be happy to see experiments, but I’m skeptical. Libraries generally have not shown themselves to be particularly adept at creating grand, innovative online user experiences. And why should they be? It’s a skill rarely exhibited anywhere on the Web.
If designing great Web experiences is not a traditional strength of research libraries, the networked expertise of their communities is. So is the library’s uncompromised commitment to serving its community’s interests. A discovery system that learns from its community can do something that Google cannot: it can find connections that the community has discerned, and it can return results that are particularly relevant to that community. (It can make those connections available to the search engines also.)
This is one of the principles behind the Stacklife project that came out of the Harvard Library Innovation Lab that until recently I co-directed. It’s one of the principles of the Harvard LibraryCloud platform that makes Stacklife possible. It’s one of the reasons I’ve been touting a technically dumb cross-library measure of usage. These are all straightforward ways to start to record and use information about the items the community has voted for with its library cards.
It is by far just the start. Anonymization and opt-in could provide rich sets of connections and patterns of usage. Imagine we could know what works librarians recommend in response to questions. Imagine if we knew which works were being clustered around which topics in lib guides and syllabi. (Support the Open Syllabus Project!) Imagine if we knew which books were being put on lists by faculty and students. Imagine if knew what books were on participating faculty members’ shelves. Imagine we could learn which works the community thinks are awesome. Imagine if we could do this across institutions so that communities could learn from one another. Imagine we could do this with data structures that support wildly messily linked sources, many of them within the library but many of them outside of it. (Support Linked Data!)
Let the Googles and Bings do what they do better than any sane person could have imagined twenty years ago. Let libraries do what they have been doing better than anyone else for centuries: supporting and learning from networked communities of scholars, librarians, and students who together are a profound source of wisdom and working insight.
, too big to know
Tagged with: 2b2k
Date: October 13th, 2014 dw
I’m at a Shorenstein Center brownbag talk. Robin Sproul is talking about journalism in the changing media landscape. She’s been Washington Bureau Chief of ABC News for 20 years, and now is VP of Public Affairs for that network. (Her last name rhymes with “owl,” by the way.)
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
This is an “incredibly exciting time,” Robin begins. The pace has been fast and is only getting faster. E.g., David Plouffe says that Obama’s digital infrastructure from 2008 didn’t apply in 2012, and the 2012 infrastructure won’t apply in 2016.
A few years ago, news media were worried about how to reach you wherever you are. Now it’s how to reach you in a way that makes you want to pay attention. “How do we get inside your brain, through the firehose, in a way that will break through everything you’re exposed to?” We’re all adapting to getting more and smaller bites. “Digital natives swerve differently than the older generation, from one topic to another.”
In this social media world, “each of us is a news reporter.” Half of people on social networks repost news videos, and one in ten post news videos they’ve recorded themselves.
David Carr: “If the Vietnam War brought war into our living rooms,” now “it’s at our fingertips.” But we see the world through narrow straws. We’re not going back from that, but we need to get better at curating them and making sure they’re accurate and contextualized.
On the positive side: “I was so moved by a Ferguson coverage: how a community of color, in this case, could tell their own story” and connect with people around the country, in real-time. “The people of that community were ahead of the cables.” Sure, some of the info was wrong, but we could watch people bearing witness to history. Also, the Ray Rice video has stimulated conversations on domestic violence around the country. How do you tap into these discussions? Sort them? Curate them? “A lot of it comes down to curation.”
People are not coming into ABCnews.com directly. “They’re coming in through side doors.” “And the big stories we do compete with the animal stories, the recipes,” etc. “We see a place like Buzzfeed” that now has 200 employees. They’ve hired someone from The Guardian, they’ve been reporting from the ground in Liberia. Yahoo’s hired Katie Couric. Vice. Michael Isikoff. Reddit’s AMAs. Fusion has just hired Tim Pool from Vice Media. “All of these things are competing in a rapidly shifting universe.”
ABC is creating partnerships, e.g., with Facebook for identifying what’s trending which is then discussed on their Sunday morning show. [See Ethan Zuckerman’s recent post on why Twitter is a better news source than Facebook. Also, John McDermott’s Why Facebook is for ice buckets, Twitter is for Ferguson. Both suggest that ABC maybe should rethink its source for what’s trending.] ABC uses various software platforms to evaluate video coming in of breaking news. “We need help, so we’re partnering.” ABC now has a social desk. “During a big story, we activate a team…and they are in a deep deep dive of social media,” vetting it for accuracy and providing context. “Six in ten of Americans watch videos on line and half of those watch news videos. This is a big growth area.” But, she adds ruefully, it’s “not a big revenue growth area.”
So, ABC is tapping into social media, but is wary of those who have their own aims. E.g., Whitehouse.gov does reports that look like news reports but are not. The photos the White House hands out never show a yawning, exhausted, or weeping president. “I joke with the press secretary that we’re one step away from North Korea.” We’re heading toward each candidate having their own network, in effect, a closed circle.
Q: You’ve describe the fragmentation in the supply of news. But how about the demand? “Are you getting a sense of your audience?” What circulates? What sticks? What sets the agenda? etc.
A: We do a lot of audience research. Our mainstream TV shows attract an aging audience. No matter what we do, they’re not bringing in a new audience. Pretty much the older the audience, the more they like hard news. We’ve changed the pace of the Sunday shows. We think people want a broader lens from us. “We’re not as focused on horse race politics, or what John McCain thinks of every single issue. We’re open to new voices.”
Q: The future of health reporting? I’m disappointed with what I see. E.g., there’s little regard to the optics of how we’re treating Ebola, particular with regard to the physicians getting treated back in the US.
A: Dr. Richard Besser, who ran the CDC, is at ABC and has reported from Africa. But it’s hit or miss. We did cover the white doctors getting the serum, but it’s hard to find in the firehose.
Q: How do you balance quality news with short attention spans?
A: For the Sunday shows we’ve tried to maintain a balance.
Q: Does ABC try to maintain its own pace, or go with the new pace? If the latter, how do you maintain quality?
A: We used to make a ton of money producing the news and could afford to go anywhere. Now we have the same number of hours of news on TV, but the audiences are shrinking and we’re trying to grow. It’s not as deep. It’s broader. We will want to find you…but you have to be willing to be found.
Q: How do you think about the segmentation of your news audience? And what are the differences in what you provide them?
A: We know which of our shows skew older (Sunday shows), or more female (Good Morning America), etc. We don’t want to leave any segment behind. We want our White House reporter to go into depth, but he also has to tweet all day, does a Yahoo show, does radio, accompanies Nancy Pelosi on a fast-walk, etc.
Q: Some of your audiences matter from a business point of view. But historically ABC has tried to supply news to policy makers etc. The 11 year old kids may give you large numbers, but…
A: When we sit in our morning editorial mornings we never say that we will do a story because the 18-24 year olds are interested. The need to know, what we think is important, drives decisions. We used to be programming for “people like us” who want the news. Then we started getting thousands of “nutjob” emails. [I’m doing a bad job paraphrasing here. Sorry] Sam Donaldson was shocked. “This digital age has made us much more aware of all those different audiences.” We’re in more contact with our audience now. E.g., when the media were accused of pulling their punches in the run-up to the Iraq War, we’d get pushback saying we’re anti-American. Before, we didn’t get these waves.
Q: A fast-walk with Nancy Pelosi, really?
A: [laughs] It got a lot of hits.
Q: Can you elaborate on your audience polling? And do people not watch negative stories?
A: A Harvard prof told me last night that s/he doesn’t like watching the news any more because it’s just so depressing. But that’s a fact of life. Anyway, it used to be that the posted comments were very negative, and sometimes from really crazy people. We learned to put that into perspective. Now Twitter provides instant feedback. We’re slammed whatever we do. So we try to come up with a mix. For World News Tonight, people with different backgrounds talk about the stories, how they play off the story before it, etc. Recenty we’ve been criticized for doing too much “news you can use”, how to live your life, etc. We want to give people news that isn’t always just terrible. There’s a lot of negative stuff that we’re exposed to now. [Again, sorry for the choppiness. My fault.]
Q: TV has always had the challenge of the limited time for news. With digital, how are you linking the on-screen reporting with the in-depth online stories, given the cutbacks? How do you avoid answering every tweet? [Not sure I got that right.]
A: We have a mix of products.
Q: What is the number one barrier to investigative journalism? How have new media changed that balance?
A: There are investigative reporting non-profits springing up all the time. There’s an appetite from the user for it. All of the major news orgs still have units doing it. But what is the business model? How much money do you apportion to each vertical in your news division? It’s driven by the appetite for it, how much money you have, what you’re taking it away from. Investigative is a growth industry.
Q: I was a spokesperson for Healthcare.gov and was interested in your comments about this Administration being more closed to the media.
A: They are more closed than prior admins. There’s always a message. When the President went out the other day to talk, no other admin members were allowed to talk with the media. I think it’s a response to how many inquiries are coming and how out of control info is, and how hard it is to respond to inaccuracies that pop up. The Obama administration has clamped down a little more because of that.
Q: You can think of Vice in Liberia as an example of boutique reporting: they do that one story. But ABC News has to cover everything. Do you see a viable future for us?
A: As we go further down this path and it becomes more overwhelming, there are some brands that stand for something. Curation is what we do well. Cyclically, people will go back to these brands.
Q: In the last couple of years, there’s a trend away from narrative to Gestalt. They were called news stories because they had a plot. Recent news events like Ferguson or Gaza were more like just random things. Very little story.
A: Twitter is a tool, a platform. It’s not really driving stories. Maybe it’s the nature of the stories. It’ll be interesting to see how social media are used by the candidates in the 2016 campaign.
Q: Why splitting the nightly news anchor from …
A: Traditionally the evening news anchor has been the chief anchor for the network. George Stephanopoulos anchors GMA, which makes most of the money. So no one wanted to move him to the evening news. And the evening news has become a little less relevant to our network. There’s been a diminishment in the stature of the evening news anchor. And it plays to GS’s strengths.
, too big to know
Tagged with: 2b2k
Date: September 9th, 2014 dw
I have an op-ed/column up at CNN about the Facebook experiment. [The next day: The op-ed led to 4 mins on the Jake Tapper show. Oh what the heck. Here’s the video.]
All I’ll say here is how struck I am again (as always) about the need to leave out most of everything when writing goes from web-shaped to rectangular.
Just as a quick example, I’m not convinced that the Facebook experiment was as egregious as the headlines would have us believe. But I made a conscious decision not to address that point in my column because I wanted to make a more general point. The rectangle for an op-ed is only so big.
Before I wrote the column, I’d observed, and lightly participated in, some amazing discussion threads among people who bring many different sorts of expertise to the party. Disagreements that were not just civil but highly constructive. Evidence based on research and experience experience. Civic concern. Emotional connections. Just amazing.
I learned so much from those discussions. What I produced in my op-ed is so impoverished compared to the richness in that tangle of linked differences. That’s where the real knowledge lives.
Jill Lepore has an excellent take-down in The New Yorker of Clay Christensen’s The Innovator’s Dilemma. Yet I am unconvinced.
I thought I was convinced when I read it. It’s a brilliantly done piece, examining Christensen’s evidence, questioning his methods, and drawing appropriate lessons, including wondering why we accepted the Innovator’s Dilemma for decades without critically examining it. (Christensen became so famous for it that his last name isn’t even flagged as a spelling error on my Mac.)
I got de-convinced by a discussion on a mailing list I’m on that points to some weaknesses in Lepore’s own argument, including her use of “cherry-picked” examples — a criticism she levels at Christensen — and her assumption that the continuity of companies, as opposed to their return on assets, is the right measure. As a person on the mailing list points out, John Hagel, John Seely Brown and Lang Davison take return on assets as a key metric in their book The Big Shift. And then someone else maintained that ROA is a poor measure of networked phenomena. That morphed into a discussion about the pragmatic value of truth: Does disruption provide a helpful framing for the New York Times as it considers its future?
The problem is that brains are truthy. They are designed to pay attention to things that seem to matter to us, bending our world around our concerns and interests. And brains are associative, so they make sense of the world — maybe even at the level of perception — by finding the relationships that seem to matter to us. In Heidegger’s terms, we are not indifferent knowing machines, but are creatures that care about what happens to us and to others. The brain is an unreliable narrator.
We now have access to an unfathomable sea of information that can contradict anything we settle on. That sea has been assembled by caring creatures and their minions, but it is so vast and global that it contains information beyond the caring and linking of any one of us. Every understanding can be subverted with a wink and a hand wave because all understanding simplifies a world that is resolutely and even necessarily complex. The universe outruns us.
Now we have machines that can look at masses of data and escape from our temptation to turn everything into a narrative. But those machines are limited by our decision about which data is worth gathering and connecting. There is hope in this direction, but it’s not clear whether we are capable of accepting the findings of machines that correlate without stories.
TL;DR: Our brains are truthy and the world is too big to make sense of. Not that that will stop us from trying.
[June 20:] Clay Christensen has cried foul in an interview.
Categories: big data
, too big to know
Tagged with: business
Date: June 18th, 2014 dw
I’m at the Israeli Wikimedia conference. The chair of the Wikimedia Foundation, Jan-Bart De Vreede, is being interviewed by Shizaf Rafaeli.
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
Jan introduces himself. Besides being the chair, in the Netherlands he works on open educational resources at Kinnesnent. He says that the Wikimedia Foundation is quite small compared to other organizations like it. Five members are elected by the community (anyone with enough edits can vote), there are four appointed members, and Jimmy Wales.
Q: The Foundation is based on volunteers, and it has a budget. What are the components of the future for Wikipedia?
A: We have to make sure we get the technology to the place where we’re prepared for the future. And how we can enable the volunteers to do whatever they want to achieve our mission of being the sum of all knowledge, which is a high bar? Enabling volunteers is the highest impact thing that we can do.
Q: Students just did a presentation here based on the idea that Wikipedia already has too much information.
A: It’s not up to us to decide how the info is consumed. We should make sure that the data is available to be presented any way people want to. We are moving toward WikiData: structured data and the relationship among that data. How can we make it easier for people to add data to WikiData without necessarily requiring people to edit pages? How can we enable people to tag data? Can we use that to learn what people find relevant?
Q: What’s most important?
A: WikiData. Then Wikipedia Zero, making WP access available in developing parts of the globe. We’re asking telecoms to provide free access to Wikipedia on mobile phones.
Q: You’re talking with the Israeli Minister of Education tomorrow. About what?
A: We have a project of Wikipedia for children, written by children. Children can have an educational experience — e.g., interview a Holocaust survivor — and share it so all benefit from it.
Q: Any interesting projects?
A: Wiki Monuments [link ?]. Wiki Air. So many ideas. So much more to do. The visual editor will help people make edits. But we also have to make sure that new editors are welcomed and are treated kindly. Someone once told Jan that she “just helps new editors,” and he replied that that scale smuch better than creating your own edits.
A: I’m surprised you didn’t mention reliability…
Q: Books feel trustworthy. The Net automatically brings a measure of distrust, and rightly so. Wikipedia over the years has come to feel trustworthy, but that requires lots of people looking at it and fixing it when its wrong.
Q: 15,000 Europeans have applied to have their history erased on Google. The Israeli Supreme Court has made a judgment along the same lines. What’s Wikipedia’s stance on this?
A: As we understand it, the right to be forgotten applies to search engines, not to source articles about you. Encyclopedia articles are about what’s public.
Q: How much does the neutral point of view count?
A: It’s the most important thing, along with being written by volunteers. Some Silicon Valley types have refused to contributed money because, they say, we have a business model that we choose not to use: advertising. We decided it’d be more important to get many small contributions than corrode NPOV by taking money.
A: How about paid editing so that we get more content?
Q: It’s a tricky thing. There are public and governmental institutions that pay employees to provide Open Access content to Wikipedia and Wiki Commons. On the other hand, there are organizations that take money to remove negative information about their clients. We have to make sure that there’s a way to protect the work of genuine volunteers from this. But even when we make a policy about, the local Wikipedia units can override it.
Q: What did you think of our recent survey?
A: The Arab population was much more interested in editing Wikipedia than the Israeli population. How do you enable that? It didn’t surprise me that women are more interested in editing. We have to work against our systemic bias.
Q: Other diversity dimensions we should pay more attention to?
A: Our concept of encyclopedia itself is very Western. Our idea of citations is very Western and academic. Many cultures have oral citations. Wikipedia doesn’t know how to work with that. How can we accommodate knowledge that’s been passed down through generations?
Q: Wikipedia doesn’t allow original research. Shouldn’t there be an open access magazine for new scientific research?
A: There are a lot of OA efforts. If more are needed, they should start with volunteers.
Q: Academics and Wikipedia have a touchy relationship. Wikipedia has won that battle. Isn’t it time to gear up for the next battle, i.e., creating open access journals?
A: There are others doing this. You can always upload and publish articles, if you want [at Wiki Commons?].
On Friday, I had the tremendous honor of being awarded a Doctor of Letters degree from Simmons College, and giving the Commencement address at the Simmons graduate students’ ceremony.
Simmons is an inspiring place, and not only for its deep commitment to educating women. Being honored this way — especially along with Ruth Ellen Fitch, Madeleine M. Joullié, and Billie Jean King — made me very, very happy.
Thank you so much to Simmons College, President Drinan, and the Board of Trustees for this honor, which means the world to me. I’m just awed by it. Also, Professor Candy Schwartz and Dean Eileen Abels, a special thank you. And this honor is extra-special meaningful because my father-in-law, Marvin Geller, is here today, and his sister, Jeannie Geller Mason, was, as she called herself, a Simmons girl, class of 1940. Afterwards, Marvin will be happy to sing you the old “We are the girls of Simmons C” college song if you ask.
So, first, to the parents: I have been in your seat, and I know how proud – and maybe relieved – you are. So, congratulations to you. And to the students, it’s such an honor to be here with you to celebrate your being graduated from Simmons College, a school that takes seriously the privilege of helping its students not only to become educated experts, but to lead the next cohort in their disciplines and professions.
Now, as I say this, I know that some of you may be shaking your inner heads, because a commencement speaker is telling you about how bright your futures are, but maybe you have a little uncertainty about what will happen in your professions and with your career. That’s not only natural, it’s reasonable. But, some of you – I don’t know how many — may be feeling beyond that an uncertainty about your own abilities. You’re being officially certified with an advanced degree in your field, but you may not quite feel the sense of mastery you expected.
In other words, you feel the way I do now. And the way I did in 1979 when I got my doctorate in philosophy. I knew well enough the work of the guy I wrote my dissertation on, but I looked out at the field and knew just how little I knew about so much of it. And I looked at other graduates, and especially at the scholars and experts who had been teaching us, and I thought to myself, “They know so much more than I do.” I could fake it pretty well, but actually not all that well.
So, I want to reassure those of you who feel the way that I did and do, I want to reassure you that that feeling of not really knowing what you should, that feeling may stay with you forever. In fact, I hope it does — for your sake, for your profession, and for all of us.
But before explaining, I need to let you in on the secret: You do know enough. It’s like Gloria Steinem’s response, when she was forty, to people saying that she didn’t look forty. Steinem replied, “This is what forty looks like.” And this is what being a certified expert in your field feels like. Simmons knows what deserving your degree means, and its standards are quite high. So, congratulations. You truly earned this and deserve it.
But here’s why it’s good to get comfortable with always having a little lack of confidence. First, if you admit no self-doubt, you lose your impulse to learn. Second, you become a smug, know-it-all and no one likes you. Third, what’s even worse, is that you become a soldier in the army of ignorance. Your body language tells everyone else that their questions are a sign of weakness, which shuts down what should have been a space for learning.
The one skill I’ve truly mastered is asking stupid questions. And I don’t mean questions that I pretend are stupid but then, like Socrates, they show the folly of all those around me. No, they’re just dumb questions. Things I really should know by now. And quite often it turns out that I’m not the only one in the room with those questions. I’ve learned far more by being in over my head than by knowing what I’m talking about. And, as I’ll get to, we happen to be in the greatest time for being in over our heads in all of human history.
Let me give you just one quick example. In 1986 I became a marketing writer at a local tech startup called Interleaf that made text-and-graphic word processors. In 1986 that was a big deal, and what Interleaf was doing was waaaay over my head. So, I hung out with the engineers, and I asked the dumbest questions. What’s a font family? How can the spellchecker look up words as fast as you type them? When you fill a shape with say, purple, how does the purple know where to stop? Really basic. But because it was clear that I was a marketing guy who was genuinely interested in what the engineers were doing, they gave me a lot of time and an amazing education. Those were eight happy years being in over my head.
I’m still way over my head in the world of libraries, which are incredibly deep institutions. Compared to “normal” information technology, the data libraries deal with is amazingly profound and human. And librarians have been very generous in helping me learn just a small portion of what they know. Again, this is in part because they know my dumb questions are spurred by a genuine desire to understand what they’re doing, down to the details.
In fact, going down to the details is one very good way to make sure that you are continually over your head. We will never run out of details. The world’s just like that: there’s no natural end to how closely you can look at thing. And one thing I’ve learned is that everything is interesting if looked at at the appropriate level of detail.
Now, it used to be that you’d have to seek out places to plunge in over your head. But now, in the age of the Internets, all we have to do is stand still and the flood waters rise over our heads. We usually call this “information overload,” and we’re told to fear it. But I think that’s based on an old idea we need to get rid of.
Here’s what I mean. So, you know Flickr, the photo sharing site? If you go there and search for photos tagged “vista,” you’ll get two million photos, more vistas than you could look at if you made it your full time job.
If you go to Google and search for apple pie recipes, you’ll get over 1.3 million of them. Want to try them all out to find the best one. Not gonna happen.
If you go to Google Images and search for “cute cats,” you’ll get over seven million photos of the most adorable kittens ever, as well as some ads and porn, of course, because Internet.
So that’s two million vista photos. 1.3 million apple pie recipes. 7.6 million cute cat photos. We’re constantly warned about information overload, yet we never hear one word single word about the dangers of Vista Overload, Apple Pie Overload, or Cute Kitten overload. How have the media missed these overloads! It’s a scandal!
I think there’s actually a pretty clear reason why we pay no attention to these overloads. We only feel overloaded by that which we feel responsible for mastering. There’s no expectation that we’ll master vista photos, apple pie recipes, or photos of cute cats, so we feel no overload. But with information it’s different because we used to have so much less of it that back then mastery seemed possible. For example, in the old days if you watched the daily half hour broadcast news or spent twenty minutes with a newspaper, you had done your civic duty: you had kept up with The News. Now we can see before our eyes what an illusion that sense of mastery was. There’s too much happening on our diverse and too-interesting planet to master it, and we can see it all happening within our browsers.
The concept of Information Overload comes from that prior age, before we accepted what the Internet makes so clear: There is too, too much to know. As we accept that, the idea of mastery will lose its grip, We’ll stop feeling overloaded even though we’re confronted with exactly the same amount of information.
Now, I want to be careful because we’re here to congratulate you on having mastered your discipline. And grad school is a place where mastery still applies: in order to have a discipline — one that can talk with itself — institutions have to agree on a master-able set of ideas, knowledge, and skills that are required for your field. And that makes complete sense.
But, especially as the Internet becomes our dominant medium of ideas, knowledge, culture, and entertainment, we are all learning just how much there is that we don’t know and will never know.
And it’s not just the quantity of information that makes true mastery impossible in the Age of the Internet. It’s also what it’s doing to the domains we want to master — the topics and disciplines. In the Encyclopedia Britannica — remember that? — an article on a topic extends from the first word to the last, maybe with a few suggested “See also’s” at the end. The article’s job is to cover the topic in that one stretch of text. Wikipedia has different idea. At Wikipedia, the articles are often relatively short, but they typically have dozens or even hundreds of links. So rather than trying to get everything about, say, Shakespeare into a couple of thousand words, Wikipedia lets you click on links to other articles about what it mention — to Stratford-on-Avon, or iambic pentameter, or about the history of women in the theater. Shakespeare at Wikipedia, in other words, is a web of linked articles. Shakespeare on the Web is a web. And it seems to me that that webby structure actually is a more accurate reflection of the shape of knowledge: it’s an endless series of connected ideas and facts, limited by interest, not an article that starts here and ends there. In fact, I’d say that Shakespeare himself was a web, and so am I, and so are you.
But if topics and disciplines are webs, then they don’t have natural and clear edges. Where does the Shakespeare web end? Who decides if the article about, say, women in the theater is part of the Shakespeare web or not? These webs don’t have clearcut edges. But that means that we also can’t be nearly as clear about what it means to master Shakespeare. There’s always more. The very shape of the Web means we’re always in over our heads.
And just one more thing about these messy webs. They’re full of disagreement, contradiction, argument, differences in perspective. Just a few minutes on the Web reveals a fundamental truth: We don’t agree about anything. And we never will. My proof of that broad statement is all of human history. How do you master a field, even if you could define its edges, when the field doesn’t agree with itself?
So, the concept of mastery is tough in this Internet Age. But that’s just a more accurate reflection of the way it always was even if we couldn’t see it because we just didn’t have enough room to include every voice and every idea and every contradiction, and we didn’t have a way to link them so that you can go from one to another with the smallest possible motion of your hand: the shallow click of a mouse button.
The Internet has therefore revealed the truth of what the less confident among us already suspected: We’re all in over our heads. Forever. This isn’t a temporary swim in the deep end of the pool. Being in over our heads is the human condition.
The other side of this is that the world is far bigger, more complex, and more unfathomably interesting than our little brains can manage. If we can accept that, then we can happily be in over our heads forever…always a little worried that we really are supposed to know more than we do, but also, I hope, always willing to say that out loud. It’s the condition for learning from one another…
…And if the Internet has shown us how overwhelmed we are, it’s also teaching us how much we can learn from one another. In public. Acknowledging that we’re just humans, in a sea of endless possibility, within which we can flourish only in our shared and collaborative ignorance.
So, I know you’re prepared because I know the quality of the Simmons faculty, the vision of its leadership, and the dedication of its staff. I know the excellence of the education you’ve participated in. You’re ready to lead in your field. May that field always be about this high over your head — the depth at which learning occurs, curiosity is never satisfied, and we rely on one another’s knowledge, insight, and love.
, too big to know
Tagged with: 2b2k
Date: May 11th, 2014 dw
The New Republic continues to favor articles debunking claims that the Internet is bringing about profound changes. This time it’s an article on the digital humanities, titled “The Pseudo-Revolution,” by Adam Kirsch, a senior editor there. [This seems to be the article. Tip of the hat to Jose Afonso Furtado.]
I am not an expert in the digital humanities, but it’s clear to the people in the field who I know that the meaning of the term is not yet settled. Indeed, the nature and extent of the discipline is itself a main object of study of those in the discipline. This means the field tends to attract those who think that the rise of the digital is significant enough to warrant differentiating the digital humanities from the pre-digital humanities. The revolutionary tone that bothers Adam so much is a natural if not inevitable consequence of the sociology of how disciplines are established. That of course doesn’t mean he’s wrong to critique it.
But Adam is exercised not just by revolutionary tone but by what he perceives as an attempt to establish claims through the vehemence of one’s assertions. That is indeed something to watch out for. But I think it also betrays a tin-eared reading by Adam. Those assertions are being made in a context the authors I think properly assume readers understand: the digital humanities is not a done deal. The case has to be made for it as a discipline. At this stage, that means making provocative claims, proposing radical reinterpretations, and challenging traditional values. While I agree that this can lead to thoughtless triumphalist assumptions by the digital humanists, it also needs to be understood within its context. Adam calls it “ideological,” and I can see why. But making bold and even over-bold claims is how discourses at this stage proceed. You challenge the incumbents, and then you challenge your cohort to see how far you can go. That’s how the territory is explored. This discourse absolutely needs the incumbents to push back. In fact, the discourse is shaped by the assumption that the environment is adversarial and the beatings will arrive in short order. In this case, though, I think Adam has cherry-picked the most extreme and least plausible provocations in order to argue against the entire field, rather than against its overreaching. We can agree about some of the examples and some of the linguistic extensions, but that doesn’t dismiss the entire effort the way Adam seems to think it does.
It’s good to have Adam’s challenge. Because his is a long and thoughtful article, I’ll discuss the thematic problems with it that I think are the most important.
First, I believe he’s too eager to make his case, which is the same criticism he makes of the digital humanists. For example, when talking about the use of algorithmic tools, he talks at length about Franco Moretti‘s work, focusing on the essay “Style, Inc.: Reflections on 7,000 Titles.” Moretti used a computer to look for patterns in the titles of 7,000 novels published between 1740 and 1850, and discovered that they tended to get much shorter over time. “…Moretti shows that what changed was the function of the title itself.” As the market for novels got more crowded, the typical title went from being a summary of the contents to a “catchy, attention-grabbing advertisement for the book.” In addition, says Adam, Moretti discovered that sensationalistic novels tend to begin with “The” while “pioneering feminist novels” tended to begin with “A.” Moretti tenders an explanation, writing “What the article ‘says’ is that we are encountering all these figures for the first time.”
Adam concludes that while Moretti’s research is “as good a case for the usefulness of digital tools in the humanities as one can find” in any of the books under review, “its findings are not very exciting.” And, he says, you have to know which questions to ask the data, which requires being well-grounded in the humanities.
That you need to be well-grounded in the humanities to make meaningful use of digital tools is an important point. But here he seems to me to be arguing against a straw man. I have not encountered any digital humanists who suggest that we engage with our history and culture only algorithmically. I don’t profess expertise in the state of the digital humanities, so perhaps I’m wrong. But the digital humanists I know personally (including my friend Jeffrey Schnapp, a co-author of a book, Digital_Humanities, that Adam reviews) are in fact quite learned lovers of culture and history. If there is indeed an important branch of digital humanities that says we should entirely replace the study of the humanities with algorithms, then Adam’s criticism is trenchant…but I’d still want to hear from less extreme proponents of the field. In fact, in my limited experience, digital humanists are not trying to make the humanities safe for robots. They’re trying to increase our human engagement with and understanding of the humanities.
As to the point that algorithmic research can only “illustrate a truism rather than discovering a truth,” — a criticism he levels even more fiercely at the Ngram research described in the book Uncharted — it seems to me that Adam is missing an important point. If computers can now establish quantitatively the truth of what we have assumed to be true, that is no small thing. For example, the Ngram work has established not only that Jewish sources were dropped from German books during the Nazi era, but also the timing and extent of the erasure. This not only helps make the humanities more evidence-based —remember that Adam criticizes the digital humanists for their argument-by-assertion —but also opens the possibility of algorithmically discovering correlations that overturn assumptions or surprise us. One might argue that we therefore need to explore these new techniques more thoroughly, rather than dismissing them as adding nothing. (Indeed, the NY Times review of Uncharted discusses surprising discoveries made via Ngram research.)
Perhaps the biggest problem I have with Adam’s critique I’ve also had with some digital humanists. Adam thinks of the digital humanities as being about the digitizing of sources. He then dismisses that digitizing as useful but hardly revolutionary: “The translation of books into digital files, accessible on the Internet around the world, can be seen as just another practical tool…which facilitates but does not change the actual humanistic work of thinking and writing.”
First, that underplays the potential significance of making the works of culture and scholarship globally available.
Second, if you’re going to minimize the digitizing of books as merely the translation of ink into pixels, you miss what I think is the most important and transformative aspect of the digital humanities: the networking of knowledge and scholarship. Adam in fact acknowledges the networking of scholarship in a twisty couple of paragraphs. He quotes the following from the book Digital_Humanities:
The myth of the humanities as the terrain of the solitary genius…— a philosophical text, a definitive historical study, a paradigm-shifting work of literary criticism — is, of course, a myth. Genius does exist, but knowledge has always been produced and accessed in ways that are fundamentally distributed…
Adam responds by name-checking some paradigm-shifting works, and snidely adds “you can go to the library and check them out…” He then says that there’s no contradiction between paradigm-shifting works existing and the fact that “Scholarship is always a conversation…” I believe he is here completely agreeing with the passage he thinks he’s criticizing: genius is real; paradigm-shifting works exist; these works are not created by geniuses in isolation.
Then he adds what for me is a telling conclusion: “It’s not immediately clear why things should change just because the book is read on a screen rather than on a page.” Yes, that transposition doesn’t suggest changes any more worthy of research than the introduction of mass market paperbacks in the 1940s [source]. But if scholarship is a conversation, might moving those scholarly conversations themselves onto a global network raise some revolutionary possibilities, since that global network allows every connected person to read the scholarship and its objects, lets everyone comment, provides no natural mechanism for promoting any works or comments over any others, inherently assumes a hyperlinked rather than sequential structure of what’s written, makes it easier to share than to sequester works, is equally useful for non-literary media, makes it easier to transclude than to include so that works no longer have to rely on briefly summarizing the other works they talk about, makes differences and disagreements much more visible and easily navigable, enables multiple and simultaneous ordering of assembled works, makes it easier to include everything than to curate collections, preserves and perpetuates errors, is becoming ubiquitously available to those who can afford connection, turns the Digital Divide into a gradient while simultaneously increasing the damage done by being on the wrong side of that gradient, is reducing the ability of a discipline to patrol its edges, and a whole lot more.
It seems to me reasonable to think that it is worth exploring whether these new affordances, limitations, relationships and metaphors might transform the humanities in some fundamental ways. Digital humanities too often is taken simply as, and sometimes takes itself as, the application of computing tools to the humanities. But it should be (and for many, is) broad enough to encompass the implications of the networking of works, ideas and people.
I understand that Adam and others are trying to preserve the humanities from being abandoned and belittled by those who ought to be defending the traditional in the face of the latest. That is a vitally important role, for as a field struggling to establish itself digital humanities is prone to over-stating its case. (I have been known to do so myself.) But in my understanding, that assumes that digital humanists want to replace all traditional methods of study with computer algorithms. Does anyone?
Adam’s article is a brisk challenge, but in my opinion he argues too hard against his foe. The article becomes ideological, just as he claims the explanations, justifications and explorations offered by the digital humanists are.
More significantly, focusing only on the digitizing of works and ignoring the networking of their ideas and the people discussing those ideas, glosses over the locus of the most important changes occurring within the humanities. Insofar as the digital humanities focus on digitization instead of networking, I intend this as a criticism of that nascent discipline even more than as a criticism of Adam’s article.
Two percent of Harvard’s library collection circulates every year. A high percentage of the works that are checked out are the same as the books that were checked out last year. This fact can cause reflexive tsk-tsking among librarians. But — with some heavy qualifications to come — this is at it should be. The existence of a Long Tail is not a sign of failure or waste. To see this, consider what it would be like if there were no Long Tail.
Harvard’s 73 libraries have 16 million items [source]. There are 21,000 students and 2,400 faculty [source]. If we guess that half of the library items are available for check-out, which seems conservative, that would mean that 160,000 different items are checked out every year. If there were no Long Tail, then no book would be checked out more than any other. In that case, it would take the Harvard community an even fifty years before anyone would have read the same book as anyone else. And a university community in which across two generations no one has read the same book as anyone else is not a university community.
I know my assumptions are off. For example, I’m not counting books that are read in the library and not checked out. But my point remains: we want our libraries to have nice long tails. Library long tails are where culture is preserved and discovery occurs.
And, having said that, it is perfectly reasonable to work to lower the difference between the Fat Head and the Long Tail, and it is always desirable to help people to find the treasures in the Long Tail. Which means this post is arguing against a straw man: no one actually wants to get rid of the Long Tail. But I prefer to put it that this post argues against a reflex of thought I find within myself and have encountered in others. The Long Tail is a requirement for the development of culture and ideas, and at the same time, we should always help users to bring riches out of the Long Tail
Simply in terms of nostalgia, this 1985 video called “Knowledge Engineering: Artificial Intelligence Research at the Stanford Heuristic Programming Project” from the Stanford archives is charming right down to its Tron-like digital soundtrack.
But it’s also really interesting if you care about the way we’ve thought about knowledge. The Stanford Heuristic Programming Project under Edward Feigenbaum did groundbreaking work in how computers represent knowledge, emphasizing the content and not just the rules. (Here is a 1980 article about the Project and its projects.)
And then at the 8:50 mark, it expresses optimism that an expert system would be able to represent not only every atom of proteins but how they fold.
Little could it have been predicted that protein folding even 30 years later would be better recognized by the human brain than by computers, and that humans playing a game — Fold.It — would produce useful results.
It’s certainly the case that we have expert systems all over the place now, from Google Maps to the Nest thermostat. But we also see another type of expert system that was essentially unpredictable in 1985. One might think that the domain of computer programming would be susceptible to being represented in an expert system because it is governed by a finite set of perfectly knowable rules, unlike the fields the Stanford project was investigating. And there are of course expert systems for programming. But where do the experts actually go when they have a problem? To StackOverflow where other human beings can make suggestions and iterate on their solutions. One could argue that at this point StackOverflow is the most successful “expert system” for computer programming in that it is the computer-based place most likely to give you an answer to a question. But it does not look much like what the Stanford project had in mind, for how could even Edward Feigenbaum have predicted what human beings can and would do if connected at scale?
(Here’s an excellent interview with Feigenbaum.)
, too big to know
Tagged with: 2b2k
Date: April 12th, 2014 dw
« Previous Page | Next Page »