Ars Technica has a post about Wikidata, a proposed new project from the folks that brought you Wikipedia. From the project’s introductory page:
Many Wikipedia articles contain facts and connections to other articles that are not easily understood by a computer, like the population of a country or the place of birth of an actor. In Wikidata you will be able to enter that information in a way that makes it processable by the computer. This means that the machine can provide it in different languages, use it to create overviews of such data, like lists or charts, or answer questions that can hardly be answered automatically today.
Because I had some questions not addressed in the Wikidata pages that I saw, I went onto the Wikidata IRC chat (http://webchat.freenode.net/?channels=#wikimedia-wikidata) where Denny_WMDE answered some questions for me.
[11:29] hi. I’m very interested in wikidata and am trying to write a brief blog post, and have a n00b question.
[11:29] go ahead!
[11:30] When there’s disagreement about a fact, will there be a discussion page where the differences can be worked through in public?
[11:30] two-fold answer
[11:30] 1. there will be a discussion page, yes
[11:31] 2. every fact can always have references accompanying it. so it is not about “does berlin really have 3.5 mio people” but about “does source X say that berlin has 3.5 mio people”
[11:31] wikidata is not about truth
[11:31] but about referenceable facts
When I asked which fact would make it into an article’s info box when the facts are contested, Denny_WMDE replied that they’re working on this, and will post a proposal for discussion.
So, on the one hand, Wikidata is further commoditizing facts: making them easier and thus less expensive to find and “consume.” Historically, this is a good thing. Literacy did this. Tables of logarithms did it. Almanacs did it. Wikipedia has commoditized a level of knowledge one up from facts. Now Wikidata is doing it for facts in a way that not only will make them easy to look up, but will enable them to serve as data in computational quests, such as finding every city with a population of at least 100,000 that has an average temperature below 60F.
On the other hand, because Wikidata is doing this commoditizing in a networked space, its facts are themselves links — “referenceable facts” are both facts that can be referenced, and simultaneously facts that come with links to their own references. This is what Too Big to Know calls “networked facts.” Those references serve at least three purposes: 1. They let us judge the reliability of the fact. 2. They give us a pointer out into the endless web of facts and references. 3. They remind us that facts are not where the human responsibility for truth ends.
, too big to know
Tagged with: 2b2k
• big data
Date: March 31st, 2012 dw
Scott F. Johnson has posted a dystopic provocation about the present of digital scholarship and possibly about its future.
Here’s the crux of his argument:
… as the deluge of information increases at a very fast pace — including both the digitization of scholarly materials unavailable in digital form previously and the new production of journals and books in digital form — and as the tools that scholars use to sift, sort, and search this material are increasingly unable to keep up — either by being limited in terms of the sheer amount of data they can deal with, or in terms of becoming so complex in terms of usability that the average scholar can’t use it — then the less likely it will be that a scholar can adequately cover the research material and write a convincing scholarly narrative today.
Thus, I would argue that in the future, when the computational tools (whatever they may be) eventually develop to a point of dealing profitably with the new deluge of digital scholarship, the backward-looking view of scholarship in our current transitional period may be generally disparaging. It may be so disparaging, in fact, that the scholarship of our generation will be seen as not trustworthy, or inherently compromised in some way by comparison with what came before (pre-digital) and what will come after (sophisticatedly digital).
Scott tentatively concludes:
For the moment one solution is to read less, but better. This may seem a luddite approach to the problem, but what other choice is there?
First, I should point out that the rest of Scott’s post makes it clear that he’s no Luddite. He understands the advantages of digital scholarship. But I look at this a little differently.
I agree with most of Scott’s description of the current state of digital scholarship and with the inevitability of an ever increasing deluge of scholarly digital material. But, I think the issue is not that the filters won’t be able to keep up with the deluge. Rather, I think we’re just going to have to give up on the idea of “keeping up” — much as newspapers and half hour news broadcasts have to give up the pretense that they are covering all the day’s events. The idea of coverage was always an internalization of the limitation of the old media, as if a newspaper, a broadcast, or even the lifetime of a scholar could embrace everything important there is to know about a field. Now the Net has made clear to us what we knew all along: most of what knowledge wanted to do was a mere dream.
So, for me the question is what scholarship and expertise look like when they cannot attain a sense of mastery by artificial limiting the material with which they have to deal. It was much easier when you only had to read at the pace of the publishers. Now you’d have to read at the pace of the writers…and there are so many more writers! So, lacking a canon, how can there be experts? How can you be a scholar?
I’m bad at predicting the future, and I don’t know if Scott is right that we will eventually develop such powerful search and filtering tools that the current generation of scholars will look betwixt-and-between fools (or as an “asterisk,” as Scott says). There’s an argument that even if the pace of growth slows, the pace of complexification will increase. In any case, I’d guess that deep scholars will continue to exist because that’s more a personality trait than a function of the available materials. For example, I’m currently reading Armies of Heaven, by Jay Rubenstein. The depth of his knowledge about the First Crusade is astounding. Astounding. As more of the works he consulted come on line, other scholars of similar temperament will find it easier to pursue their deep scholarship. They will read less and better not as a tactic but because that’s how the world beckons to them. But the Net will also support scholars who want to read faster and do more connecting. Finally (and to me most interestingly) the Net is already helping us to address the scaling problem by facilitating the move of knowledge from books to networks. Books don’t scale. Networks do. Although, yes, that fundamentally changes the nature of knowledge and scholarship.
[Note: My initial post embedded one draft inside another and was a total mess. Ack. I've cleaned it up - Oct. 26, 2011, 4:03pm edt.]
, too big to know
Tagged with: 2b2k
• open access
Date: October 26th, 2011 dw
I’ve come to love Reddit. What started as a better Digg (and is yet another happy outcome of the remarkable Y Combinator) has turned into a way of sharing and interrogating news. Reddit as it stands is not the future of news. It is, however, a hope for news.
As at other sites, at Reddit readers post items they find interesting. Some come from the media, but many are home-made ideas, photos, drawings, videos, etc. You can vote them up or down, resulting in a list ordered by collective interests. Each is followed by threaded conversations, and those comments are also voted up or down.
It’s not clear why Reddit works so well, but it does. The comments in particular are often fiercely insightful or funny, turning into collective, laugh-out-loud riffs. Perhaps it helps that the ethos — the norm — is that comments are short. Half-tweets. You can go on for paragraphs if you want, but you’re unlikely to be up-voted if you do. The brevity of the individual comments can give them a pithiness that paragraphs would blunt, and the rapid threading of responses can quickly puncture inflated ideas or add unexpected perspectives.
But more relevant to the future of news are the rhetorical structures that Reddit has given names to. They’re no more new than Frequently Asked Questions are, but so what? FAQs have become a major new rhetorical form, of unquestioned value, because they got a name. Likewise TIL, IAMA, and AMA are hardly startling in their novelty, but they are pretty amazing in practice.
TIL = Today I Learned. People post an answer to a question you didn’t know you had, or a fact that counters your intuition. They range from the trivial (“TIL that Gilbert Gottfried has a REAL voice.”) to the opposite of the trivial (“TIL there is a US owned Hydrogen bomb that has been missing off the coast of Georga for over 50 years. “)
IAMA = I Am A. AMA = Ask Me Anything. People offer to answer questions about whatever it is that they are. Sometimes they are famous people, but more often they are people in circumstances we’re curious about: a waiter at an upscale restaurant, a woman with something like Elephant Man’s disease, a miner, or this morning’s: “IAmA guy who just saw the final Harry Potter movie without reading/watching any Harry Potter material beforehand. Being morbidly confused, I made up an entire previous plot for the movie to make sense in my had. I will answer your HP Series question based on the made up previous plot in my head AMA.” The invitation to Ask Me Anything typically unfetters the frankest of questions. It helps that Reddit discourages trolling and amidst the geeky cynicism permits honest statements of admiration and compassion.
The topics of IAMA’s are themselves instructive. Many are jokes: “IAmA person who has finished a whole tube of chapstick without losing it. AMA” But many enable us to ask questions that would falter in the face of conventional propriety: “IAmA woman married to a man with Asperger’s Syndrome AMA”. Some open up for inquiry a perspective that we take for granted or that was too outside our normal range of consideration: “IAMA: I was a German child during WWII that was in the Hitler Youth and had my city bombed by the U.S.”
Reddit also lets readers request an IAMA. For example, someone is asking if one of Michelle Bachman’s foster kids would care to engage. Might be interesting, don’t you think?
So, my hypothesis is that IAMA and AMA are an important type of citizen journalism. Call it “community journalism.”
Now, if you’ve clicked through to any of these IAMA’s, you may be disappointed at the level of “journalism” you’ve seen. For example, look at yesterday’s “IAMA police officer who was working during the London Riots. AMA.” Many of the comments are frivolous or off-topic. Most are responses to other comments, and many threads spin out into back-and-forth riffing that can be pretty damn funny. But it’s not exactly “60 Minutes.” So what? This is one way citizen journalism looks. At its best, it asks questions we all want asked, unearths questions we didn’t know we wanted asked, asks them more forthrightly than most American journalists dare, and gets better — more honest — answers than we hear from the mainstream media.
You can also see in the London police officer’s IAMA one of the main ways Reddit constitutes itself as a community: it binds itself together by common cultural references. The more obscure, the tighter the bond. For example, during the IAMA with the police officer in the London riots, someone asks if they’ve caught the guy who knocked over the trash can. This is an unlinked reference to a posting from a few days before of a spoof video of a middle class guy looking around an empty street and then casually knocking over a garbage can. The comments devolve into some silliness about arresting a sea gull for looting. The police officer threads right in:
[police officer] I do assure you we take it very seriously, however. Here, please have a Victim of Crime pack and a crime reference number. We will look into this issue as a matter of priority, and will send you a telegram in six-to-eight-weeks.
Telegram? Are you that cop who got transported back to the 1970s?
My friends call me Murphy.
Lawl, I’m watching RoboCop right now.
This community is both Reddit’s strength as a site, and its greatest weakness as a form of citizen journalism. Reddit illustrates why there are few quotes that simultaneously delight and scare me more than “If the news is important, it will find me.” This was uttered, according to Jane Buckingham (and reported in a 2008 Brian Stelter NY Times article) by a college student in a focus group. In my view, the quote would be more accurate if it read, “If the news is interesting to my social group, it will find me.” What’s interesting to a community is not enough to make us well informed because our community’s interests tend to be parochial and self-reinforcing. This is not so much a limitation of community as a way that communities constitute themselves.
And here’s where I think Reddit offers some hope.
First, it’s important to remember that Reddit is not intending to cover the news, even though its tag line is “The front page of the Internet.” It feels no responsibility to post and upvote a story simply because it is important. Rather, Reddit is a supplement to the news. If something is sufficiently covered by the mainstream — today the stock market went up dramatically, today the Supreme Court decided something — it exactly will not be covered as news at Reddit. Reddit is for what didn’t make it into the mainstream news. So, Reddit does not answer the question: How will we get news when the main stream dries up?
But it does make manifest a phenomenon that should take some of the gloom off our outlook. Take Reddit as a type of internet tabloid. Mainstream tabloids are sensationalistic: They indulge and enflame what are properly thought of as lower urges. But Reddit feeds and stimulates a curiosity about the world. It turns out that a miner —or a person who works at Subway — has a lot to tell us. It turns out that a steely British cop has a sense of humor. It turns out that American planes dropping bombs on a German city did not fly with halos over them. True, there’s a flood of trivial curios and tidbits at Reddit. Nevertheless, from mainstream tabloids you learn that humans are a weak and corrupt species that revels in the misfortunes of others. From Reddit you learn that we are creatures with a wild curiosity, indiscriminate in its fascinations. And you learn that we are a social species that takes little seriously and enjoys the multiplicity of refractions.
But is the curiosity exhibited at Reddit enough? I find this question rocks back and forth. The Reddit community constitutes itself through a set of references that belong to a particular group and that exclude those who just don’t get nods to Robocop. Yet it is a community that reaches for what is beyond its borders. Not far enough, sure. But it’s never far enough. Reddit’s interests are generally headed in the right direction: outward. Those interests often embrace more than what the mainstream has found room for. Still, the interests of any group are always going to reflect that group’s standpoint and self-filters. Reddit’s curiosity is unsystematic, opportunistic, and indiscriminate. You will not find all the news you need there. That’s why I say Reddit offers not a solution to the impeding News Hole, but a hope. The hope is that while communities are based on shared interests and thus are at least somewhat insular, some communities can generate an outward-bound curiosity that delights in the unabashed exploration of what we have taken for granted and in the discovery of that which is outside its same-old boundaries.
But then there is the inevitability triviality of Reddit. Reddit topics, no matter how serious, engender long arcs of wisecracks and silliness. But this too tells us something, this time about the nature of curiosity. One of the mistakes we’ve made in journalism and education is to insist that curiosity is a serious business. Perhaps not. Perhaps curiosity needs a sense of humor.
I’m at an education conference put on by CET in Tel Aviv. This is the second day of the conference. The opening session is on business models for supporting the webification of the educational system.
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
Eli Hurvitz (former deputy director of the Rothschild Foundation, the funder of CET) is the moderator. The speakers are Michael Jon Jensen (Dir of Strategic Web Communications, National Academies Press), Eric Frank (co-founder of Flat World Knowledge) and Sheizaf Rafaelli (Dir. of the Sagy Center for Internet Research at Haifa Univ.)
Michael Jensen says he began with computers in 1980, thinking that books would be online within 5 yrs. He spent three yearsat Project Muse (1995-8), but left because they were spending half their money on keeping people away from their content. He went to the National Academies Press (part of the National Academy of Science). The National Academies does about 200 reports a year, the result of studies by about 20 experts focused on some question. While there are many wonderful things about crowd-sourcing, he says, “I’m in favor of expertise. Facts and opinions on the Web are cheap…but expertise, expert perspective and sound analysis are costly.” E.g., that humans are responsible for climate change is not in doubt, should not be presented as if it were in doubt, and should not be crowd-sourced, he says.
The National Academy has 4,800 books online, all available to be read on line for free. (This includes an algorithmic skimmer that extacts the most important two-sentence chunk from every page.) [Now that should be crowd-sourced!] Since 2005, 65% are free for download in PDF. They get 1.4M visitors/month, each reading 7 page on average. But only 0.2% buy anything.
The National Academy Press’ goal is access and sustainability. In 2001, they did an experiment: When people were buying a book, they were offered a download of a PDF for 80% of the price, then 60%, then 40%, then for free. 42% took the free PDF. But it would have been too expensive to make all PDF’s free. The 65% that are now free PDFs are the “long tail” of books. “We are going to be in transition for the next 20 yrs.” Book sales have gone from 450,00/yr in 2002 to 175,000 in 2010. But, as they have given away more, they are disseminating about 850,000 units per year. “That means we’re fulfilling our publishing mission.” 260,000 people have opted in for getting notified of new books.
Michael goes through the available business options. NAP’s offerings are too broad for subscriptions. They will continue selling products. Authors fund some of the dissemination. And booksellers provide some revenue. There are different models for long-form content vs. articles vs. news vs. databases. Further, NAP has to provide multiple and new forms of content.
General lessons: Understand your mission. Make sure your strategy supports your mission. But digital strategies are a series of tactics. Design fot the future. and “The highest resolution is never enough…Never dumb down.” “The print-based mindset will work for the next few years, but is a long-term dead end.” “‘Free’ of some kind is required.” Understand your readers, and develop relationships with them. Go where the audiences are. “Continue experimenting.” There is no single best model. “We are living in content hyperabundance, and must compete with everything else in the world.”
Eric Frank of Flat World Knowledge (“the largest commercial publisher of” open source textbooks) says that old business models are holding us back from achieving what’s possible with the Net. He points to a “value gap” in the marketplace. Many college textbooks are $200. The pain is not evenly distributed. Half of college students are in 2 yr colleges, where the cost of textbooks can be close to their tuition costs. The Net is disrupting the text book market already, e.g.,through the online sale of used books, or text book rental models, or “piracy.” So, publishers are selling fewer units per year, and are raising pricves to protect their revenues. There’s a “vicious downward spiral,” making everyone more and more unhappy.
Flat World Knowledge has two business models. First, it puts textbooks through an editorial process, and publishes them under open licenses. They vet their authors, and peer review the books. They publish their books under a Creative Commons license (attribution, non-commercial, share-alike); they retain the copyright, but allow users to reuse, revise, remix, and redistribute them. They provide a customization platform that looks quite slick: re-order the table of content, add content, edit the content. It then generates multiple formats, including html, pdf, ePub, .mobi, digital Braille, .mp3. Students can choose the format that works best for them. The Web-based and versions for students with disabilities are free. They sell softwcover books ($35 fofr b&w, $70 for color) and the other formats. They also sell study guides, online quizzes, and flashcards. 44% read for free online. 66% purchase something: 33% print, 3% audiobooks, 17% print it yourself, 3% ebooks.
Second business model: They license all of their intellectual property to an institution that buys a site license at $20/student, who then get access to the material in every format. Paper publishers’ unit sales tend to zero out over just a few semesters as students turn to other ways of getting the book. Free World Knowledge’s unit sales tend to be steady. They pay authors 20% royalty (as opposed to a standard 13%), which results in higher cumulative revenues for the authors.
They currently have 112 authors (they launched in 2007 and published their first book in Spring 2009). 36 titles published; 42 in pipeline. Their costs are about a third of the industry and declining. Their time to market is about half of the traditionals (18 months vs. 40 months). 1,600 faculty have formally adopted their books, in 44 countries. Sales are growing at 320%. Their conversion rate of free to paid is currently at 61% and growing. They’ve raised $30M in venture capital. Bertelsmann has put in $15M. Random House today invested.
He ends by citing Kevin Kelly: The Net is a giant copy machine. When copies are super-abundant, and worthless. So, you need to seel stuff that can’t be copied. Kevin lists 8 things that can’t be copied: immediacy, personalization, interpretation (study aids), authenticity (what the prof wants you to read), accessibility, embodiment (print copy), patronage (people want to pay creators), findability. Future for FWK: p2p tutoring, user-generated marketplace, self-assessment embedded within the books, data sales. “Knowledge is the black gold of the 21st century.”
[Sheizaf Rafaelli's talk was excellent — primarily about what happens when books lose bindings — but he spoke very quickly, and the talk itself did not lend itself to livebloggery, in part because I was hearing it in translation, which required more listening and less typing. Sorry. His slides are here. ]
I got to attend the Digital Public Library of America‘s first workshop yesterday. It was an amazing experience that left me with the best kind of headache: Too much to think about! Too many possibilities for goodness!
Mainly because the Chatham House Rule was in effect, I tweeted instead of live-blogged; it’s hard to do a transcript-style live-blog when you’re not allowed to attribute words to people. (The tweet stream was quite lively.) Fortunately, John Palfrey, the head of the steering committee, did some high-value live-blogging, which you can find here: 1 2 3 4.
The DPLA is more of an intention than a plan. The DPLA is important because the intention is for something fundamentally liberating, the people involved have been thinking about and working on related projects for years, and the institutions carry a great deal of weight. So, if something is going to happen that requires widespread institutional support, this is the group with the best chance. The year of workshops that began yesterday aims at helping to figure out how the intention could become something real.
So, what is the intention? Something like: To bring the benefits of public libraries to every American. And there is, of course, no consensus even about a statement that broad. For example, the session opened with a discussion of public versus research libraries (with the “versus” thrown into immediate question). And, Terry Fisher at the very end of the day suggested that the DPLA ought to stand for a principle: Knowledge should be free and universally accessible. Throughout the course of the day, many other visions and pragmatic possibilities were raised by the sixty attendees. [Note: I've just violated the Chatham Rule by naming Terry, but I'm trusting he won't mind. Also, I very likely got his principle wrong. It's what I do.]
I came out of it invigorated and depressed at the same time. Invigorated: An amazing set of people, very significant national institutions ready to pitch in, an alignment on the value of access to the works of knowledge and culture. Depressed: The !@#$%-ing copyright laws are so draconian and, well, stupid, that it is hard to see how to take advantage of the new ways of connecting to ideas and to one another. As one well-known Internet archivist said, we know how to make works of the 19th and 21st centuries accessible, but the 20th century is pretty much lost: Anything created after 1923 will be in copyright about as long as there’s a Sun to read by, and the gigantic mass of works that are out of print, but the authors are dead or otherwise unreachable, is locked away as firmly as an employee restroom at a Disney theme park.
So, here are some of the issues we discussed yesterday that I found came home with me. Fortunately, most are not intractable, but all are difficult to resolve and, some, to implement:
Should the DPLA aggregate content or be a directory? Much of the discussion yesterday focused on the DPLA as an aggregation of e-works. Maybe. But maybe it should be more of a directory. That’s the approach taken by the European online library, Europeana. But being a directory is not as glamorous or useful. And it doesn’t use the combined heft of the participating institutions to drive more favorable licensing terms or legislative changes since it itself is not doing any licensing.
Who is the user? How generic? Does the DPLA have to provide excellent tools for scholars and researchers, too? (See the next question.)
Site or ecology? At one extreme, the DPLA could be nothing but a site where you find e-content. At the other extreme, it wouldn’t even have a site but would be an API-based development platform so that others can build sites that are tuned to specific uses and users. I think the room agrees that it has to do both, although people care differently about the functions. It will have to provide a convenient way for users to find ebooks, but I hope that it will have an incredibly robust and detailed API so that someone who wants to build a community-based browse-and-talk environment for scholars of the Late 19th Century French Crueller can. And if I personally had to decide between the DPLA being a site or metadata + protocols + APIs, I’d go with the righthand disjunct in a flash.
Should the DPLA aim at legislative changes? My sense of the room is that while everyone would like to see copyright heavily amended, DPLA needs to have a strategy for launching while working within existing law.
Should the DPLA only provide access to materials users can access for free? That meets much of what we expect from public libraries (although many local libraries do charge a little for DVDs), but it fails Terry Fisher’s principle. (I don’t mean to imply that everyone there agreed with Terry, btw.)
What should the DPLA do to launch quickly and well? The sense of the room was that it’s important that DPLA not get stuck in committee for years, but should launch something quickly. Unfortunately, the easiest stuff to launch with are public domain works, many of which are already widely available. There were some suggestions for other sources of public domain works, such as government documents. But, then the DPLA would look like a specialty library, instead of the first place people turn to when they want an e-book or other such content.
How to pay for it? There was little talk of business models yesterday, but it was a short day for a big topic. There were occasional suggestions, such as just outright buying e-books (rather than licensing them), in part to meet the library’s traditional role of preserving works as well as providing access to them.
How important is expert curation? There seemed to be a genuine divide — pretty much undiscussed, possibly because it’s a divisive topic — about the value of curation. A few people suggested quite firmly that expert curation is a core value provided by libraries: you go to the library because you know you can trust what is in it. I personally don’t see that scaling, think there are other ways of meeting the same need, and worry that the promise is itself illusory. This could turn out to be a killer issue. Who determines what gets into the DPLA (if the concept of there being an inside to the DPLA even turns out to make sense)?
Is the environment stable enough to build a DPLA? Much of the conversation during the workshop assumed that book and journal publishers are going to continue as the mediating centers of the knowledge industry. But, as with music publishers, much of the value of publishers has left the building and now lives on the Net. So, the DPLA may be structuring itself around a model that is just waiting to be disrupted. Which brings me to the final question I left wondering about:
How disruptive should the DPLA be? No one’s suggesting that the DPLA be a rootin’ tootin’ bay of pirates, ripping works out of the hands of copyright holders and setting them free, all while singing ribald sea shanties. But how disruptive can it be? On the one hand, the DPLA could be a portal to e-works that are safely out of copyright or licensed. That would be useful. But, if the DPLA were to take Terry’s principle as its mission — knowledge ought to be free and universally accessible — the DPLA would worry less about whether it’s doing online what libraries do offline, and would instead start from scratch asking: Given the astounding set of people and institutions assembled around this opportunity, what can we do together to make knowledge as free and universally accessible as possible? Maybe a library is not the best transformative model.
Of course, given the greed-based, anti-knowledge, culture-killing copyright laws, the fact may be that the DPLA simply cannot be very disruptive. Which brings me right back to my depression. And yet, exhilaration.
The DPLA wiki is here.
The deadline for my book is looming, but I spoke today with Michael Edson, Director of Web and New Media Strategy at the Smithsonian, and I’d love to include his idea for a Smithsonian Commons.
The Smithsonian Commons would make publicly available digital content and information drawn from the magnificent Smithsonian collections, allowing visitors to interact with it, repost it, add to it, and mash it up. It begins with being able to find everything about, say Theodore Roosevelt, that is currently dispersed across multiple connections and museums: photos, books, the original Teddy bear, recordings of the TR campaign song, a commemorative medal, a car named after him, contemporary paintings of his exploits, the chaps he wore on his ranch…But Michael is actually most enthusiastic about the “network effects” that can accrue to knowledge when you let lots of people add what they know, either on the Commons site itself or out across the whole linked Internet.
Smithsonian Commons goes way beyond putting online as much of our national museum as possible â€” which should be enough to justify its creation. It goes beyond bringing to bear everything curators, experts, and passionate visitors know to increase our understanding of what is there. By allowing us to discover connections, link in and out, and add ideas and knowledge, what used to be a “mere” collection will be an embedded part of countless webs of knowledge that in turn add value to one another. That is to say, we will be able to take up the objects of our heritage in ways that will make them more distinctly and uniquely ours than ever before.
Let’s hope Smithsonian Commons goes from idea to a national â€” global â€” center of ideas, creativity, knowledge, and learning.
Paul Gillin blogs about CIThread (while disclosing that he is advising them):
The curator starts by presenting the engine with a basic set of keywords. CIThread scours the Web for relevant content, much like a search engine does. Then the curator combs through the results to make decisions about what to publish, what to promote and what to throw away.
As those decisions are made, the engine analyzes the content to identify patterns. It then applies that learning to delivering a better quality of source content. Connections to popular content management systems make it possible to automatically publish content to a website and even syndicate it to Twitter and Facebook without leaving the CIThread dashboard.
There’s intelligence on the front end, too. CIThread can also tie in to Web analytics engines to fold audience behavior into its decision-making. For example, it can analyze content that generates a lot of views or clicks and deliver more source material just like it to the curator. All of these factors can be weighted and varied via a dashboard.
I like the idea of providing automated assistance to human curators…
Eszter Hargittai and her team have done research that shows that digital youngsters are not as savvy as we would like them to be, over-relying on Google’s rank ordering of results, etc.
It’s important to have actual data to look at — thanks, Eszter! — even though it confirms what we should all probably know by now: When it comes to information, we’re a lazy, sloppy species that vastly over-estimates its own wisdom.
, quick links
, too big to know
Tagged with: 2b2k
• digital youth
Date: July 28th, 2010 dw
JP Rangaswami has an excellent post about the democratizing of curation.
He begins by quoting Eric Schmidt (found at 19:48 in this video):
“â€¦. the statistic that we have been using is between the dawn of civilisation and 2003, five exabytes of information were created. In the last two days, five exabytes of information have been created, and that rate is accelerating. And virtually all of that is what we call user-generated what-have-you. So this is a very, very big new phenomenon.”
He concludes â€” and I certainly agree â€” that we need digital curation. He says that digital curation consists of “Authenticity, Veracity, Access, Relevance, Consume-ability, and Produce-ability.” “Consume-ability” means, roughly, that you can play it on any device you want, and “produce-ability” means something like how easy it is to hack it (in the good O’Reilly sense).
JP seems to be thinking primarily of knowledge objects, since authenticity and veracity are high on his list of needs, and for that I think it’s a good list. But suppose we were to think about this not in terms of curation â€” which implies (against JP’s meaning, I think) a binary acceptance-rejection that builds a persistent collection â€” and instead view it as digital recommendations? In that case, for non-knowledge-objects, other terms will come to the fore, including amusement value, re-playability, and wiseacre-itude. In fact, people recommend things for every reason we humans may like something, not to mention the way we’s socially defined in part by what we recommend. (You are what you recommend.)
Anyway, JP is always a thought-provoking writer…
Beth Noveck is deputy chief technology officer for open government and leads President Obama’s Open Government Initiative. She is giving a talk at Harvard. She begins by pointing to the citizenry’s lack of faith in government. Without participation, citizens become increasingly alienated, she says. For example: the rise of Tea Parties. A new study says that a civic spirit reduces crime. Another article, in Social Science and Medicine, correlates civic structures and health. She wants to create more opportunities for citizens to engage and for government to engage in civic structures — a “DoSomething.gov,” as she lightly calls it. [NOTE: Liveblogging. Getting things wrong. Missing things. Substituting inelegant partial phrases for Beth's well-formed complete sentences. This is not a reliable report.]
Beth points to the peer to patent project she initiated before she joined the government. It enlists volunteer scientists and engineers to research patent applications, to help a system that is seriously backlogged, and that uses examiners who are not necessarily expert in the areas they’re examining. This crowd-sources patent applications. The Patent Office is studying how to adopt peer to patent. Beth wants to see more of this, to connect scientists and others to the people who make policy decisions. How do we adapt peer to patent more broadly, she asks. How do we do this in a culture that prizes consistency of procedures?
This is not about increasing direct democracy or deliberative democracy, she says. The admin hasn’t used more polls, etc., because the admin is trying to focus on action, not talk. The aim is to figuring out ways to increase collaborative work. Next week there’s a White House on conf on gov’t innovation, focusing on open grant making and prize-based innovation.
The President’s first executive action was to issue a memorandum on transparency and open gov’t. This was very important, Beth says, because it let the open gov folks in the administration say, “The President says…” President Obama is very committed to this agenda, she says; after all, he is a community organizer in his roots. Simple things like setting up a blog with comments were big steps. It’s about changing the culture. Now, there’s a culture of “leaning forward,” i.e., making commitments to being innovative about how they work. In Dec., every agency was told to come up with its own open govt plan. A directive set a road map: How and when you’re going to inventory all the data in your agency and put it online in raw, machine-readable form? How are you going to engage people in meaningful policy work? How are you going to engage in collaboration within govt and with citizens? On Tuesday, the White House collected self-evaluations, which are then evaluated by Beth’s office and by citizen groups.
How to get there. First, through people. Every agency has someone responsible for open govt. The DoT has 200+ on their open govt committee. Second, through platforms (which, as she says, is Tim O’Reilly’s mantra). E.g., data.gov is a platform.
Transparency is going well, she thinks: White House visitor logs, streaming the health care summit, publishing White House employee salaries. More important is data.gov. 64M hits in under a year. Pew says 40% of respondents have been there. 89M hits on the IT dashboard that puts a user-friendlier interface to govt spending. Agencies are required to put up “high value” data that helps them achieve their core mission. E.g., Dept. of Labor has released 15 yrs of data about workplace exposure to toxic chemicals, advancing its goal of saving workers’ lives. Medicare data helps us understand health care. USDA nutrition data + a campaign to create video games to change the eating habits of the young. Agencies are supposed to ask the public which data they want to see first, in part as a way of spurring participation.
To spur participation, the GSA now has been procuring govt-friendly terms of service for social media platforms; they’re available at apps.gov. It’s now trying to acquire innovation prize platforms, etc.
Participation and collaboration are different things, she says. Participation is a known term that has to do with citizens talking with govt. But the exciting new frontier, she says, is about putting problems out to the public for collaborative solving. E.g., Veterans Benefits Admin asked its 19,000 employees how to shorten wait times; within the first week of a brainstorming competition, 7,000 employees signed up and generated 3,000 ideas, the top ten of which are being implemented. E.g., the Army wikified the Army operations manual.
It’s also about connecting the public and private. E.g., the National Archives is making the Federal Registry available for free (instead of for $17K/yr), and the Princeton Internet center has made an annotatable. Carl Malamud also. The private sector has announced National Lab Day, to get scientists out into the schools. Two million people signed up.
She says they know they have a lot to do. E.g., agencies are sitting on exebytes of info, some of which is on paper. Expert networking: We have got to learn how to improve upon the model of federal advisory commissions, the same group of 20 people. It’s not as effective as a peer to patent model, volunteers pooled from millions of people. And we don’t have much experience using collaboration tools in govt. There is a recognition spreading throughout the govt that we are not the only experts, that there are networks of experts across the country and outside of govt. But ultimately, she says, this is about restoring trust in govt.
Q: Any strategies for developing tools for collaborative development of policy?
A: Brainstorming techniques have been taken up quickly. Thirty agencies are involved in thinking about this. It’s not about the tools, but thinking about the practices. On the other hand, we used this tool with the public to develop open govt plans, but it wasn’t promoted enough; it’s not the tools but the processes. Beth’s office acts as an internal consultancy, but people are learning from one another. This started with the President making a statement, modeling it in the White House, making the tools available…It’s a process of creating a culture and then the vehicles for sharing.
Q: Who winnowed the Veterans agency’s 3,000 suggestions?
A: The VA ideas were generated in local offices and got passed up. In more open processes, they require registration. They’ve used public thumbs up and down, with a flag for “off topic” that would shrink the posting just to one link; the White House lawyers decided that that was acceptable so long as the public was doing the rating. So the UFO and “birther” comments got rated down. They used a wiki tool (MixedInk) so the public could write policy drafts; that wiki let users vote on changes. When there are projects with millions of responses, it will be very hard; it makes more sense to proliferate opportunities for smaller levels of participation.
A: We’re crowd-sourcing expertise. In peer to patent, we’re not asking people if they like the patent or think it should be patented; we’re asking if they have info that is relevant. We are looking for factual info, recognizing that even that info is value-laden. We’re not asking about what people feel, at least initially. It’s not about fostering contentious debate, but about informed conversation.
A: What do you learn from countries that are ahead of the curve on e-democ, e.g., Estonia? Estonia learned 8 yrs ago that you have to ask people to register in online conversations…
A: Great point. We’re now getting up from our desks for the first time. We’re meeting with the Dutch, Norway, Estonia, etc. And a lot of what we do is based on Al Gore’s reinventing govt work. There’s a movement spreading particularly on transparency and data.gov.
Q: Is transparency always a good approach? Are there fields where you want to keep the public out so you can talk without being criticized?
A: Yes. We have to be careful of personal privacy and national security. Data sets are reviewed for both before they go up on data.gov. I’d rather err on the side of transparency and openness to get usover the hump of sharing what they should be sharing. There’s value in closed-door brainstorm so you can float dumb ideas. We’re trying to foster a culture of experimentation and fearlessness.
[I think it's incredible that we have people like Beth in the White House working on open government. Amazing.]
« Previous Page | Next Page »