I’m at an education conference put on by CET in Tel Aviv. This is the second day of the conference. The opening session is on business models for supporting the webification of the educational system.
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
Eli Hurvitz (former deputy director of the Rothschild Foundation, the funder of CET) is the moderator. The speakers are Michael Jon Jensen (Dir of Strategic Web Communications, National Academies Press), Eric Frank (co-founder of Flat World Knowledge) and Sheizaf Rafaelli (Dir. of the Sagy Center for Internet Research at Haifa Univ.)
Michael Jensen says he began with computers in 1980, thinking that books would be online within 5 yrs. He spent three yearsat Project Muse (1995-8), but left because they were spending half their money on keeping people away from their content. He went to the National Academies Press (part of the National Academy of Science). The National Academies does about 200 reports a year, the result of studies by about 20 experts focused on some question. While there are many wonderful things about crowd-sourcing, he says, “I’m in favor of expertise. Facts and opinions on the Web are cheap…but expertise, expert perspective and sound analysis are costly.” E.g., that humans are responsible for climate change is not in doubt, should not be presented as if it were in doubt, and should not be crowd-sourced, he says.
The National Academy has 4,800 books online, all available to be read on line for free. (This includes an algorithmic skimmer that extacts the most important two-sentence chunk from every page.) [Now that should be crowd-sourced!] Since 2005, 65% are free for download in PDF. They get 1.4M visitors/month, each reading 7 page on average. But only 0.2% buy anything.
The National Academy Press’ goal is access and sustainability. In 2001, they did an experiment: When people were buying a book, they were offered a download of a PDF for 80% of the price, then 60%, then 40%, then for free. 42% took the free PDF. But it would have been too expensive to make all PDF’s free. The 65% that are now free PDFs are the “long tail” of books. “We are going to be in transition for the next 20 yrs.” Book sales have gone from 450,00/yr in 2002 to 175,000 in 2010. But, as they have given away more, they are disseminating about 850,000 units per year. “That means we’re fulfilling our publishing mission.” 260,000 people have opted in for getting notified of new books.
Michael goes through the available business options. NAP’s offerings are too broad for subscriptions. They will continue selling products. Authors fund some of the dissemination. And booksellers provide some revenue. There are different models for long-form content vs. articles vs. news vs. databases. Further, NAP has to provide multiple and new forms of content.
General lessons: Understand your mission. Make sure your strategy supports your mission. But digital strategies are a series of tactics. Design fot the future. and “The highest resolution is never enough…Never dumb down.” “The print-based mindset will work for the next few years, but is a long-term dead end.” “‘Free’ of some kind is required.” Understand your readers, and develop relationships with them. Go where the audiences are. “Continue experimenting.” There is no single best model. “We are living in content hyperabundance, and must compete with everything else in the world.”
Eric Frank of Flat World Knowledge (“the largest commercial publisher of” open source textbooks) says that old business models are holding us back from achieving what’s possible with the Net. He points to a “value gap” in the marketplace. Many college textbooks are $200. The pain is not evenly distributed. Half of college students are in 2 yr colleges, where the cost of textbooks can be close to their tuition costs. The Net is disrupting the text book market already, e.g.,through the online sale of used books, or text book rental models, or “piracy.” So, publishers are selling fewer units per year, and are raising pricves to protect their revenues. There’s a “vicious downward spiral,” making everyone more and more unhappy.
Flat World Knowledge has two business models. First, it puts textbooks through an editorial process, and publishes them under open licenses. They vet their authors, and peer review the books. They publish their books under a Creative Commons license (attribution, non-commercial, share-alike); they retain the copyright, but allow users to reuse, revise, remix, and redistribute them. They provide a customization platform that looks quite slick: re-order the table of content, add content, edit the content. It then generates multiple formats, including html, pdf, ePub, .mobi, digital Braille, .mp3. Students can choose the format that works best for them. The Web-based and versions for students with disabilities are free. They sell softwcover books ($35 fofr b&w, $70 for color) and the other formats. They also sell study guides, online quizzes, and flashcards. 44% read for free online. 66% purchase something: 33% print, 3% audiobooks, 17% print it yourself, 3% ebooks.
Second business model: They license all of their intellectual property to an institution that buys a site license at $20/student, who then get access to the material in every format. Paper publishers’ unit sales tend to zero out over just a few semesters as students turn to other ways of getting the book. Free World Knowledge’s unit sales tend to be steady. They pay authors 20% royalty (as opposed to a standard 13%), which results in higher cumulative revenues for the authors.
They currently have 112 authors (they launched in 2007 and published their first book in Spring 2009). 36 titles published; 42 in pipeline. Their costs are about a third of the industry and declining. Their time to market is about half of the traditionals (18 months vs. 40 months). 1,600 faculty have formally adopted their books, in 44 countries. Sales are growing at 320%. Their conversion rate of free to paid is currently at 61% and growing. They’ve raised $30M in venture capital. Bertelsmann has put in $15M. Random House today invested.
He ends by citing Kevin Kelly: The Net is a giant copy machine. When copies are super-abundant, and worthless. So, you need to seel stuff that can’t be copied. Kevin lists 8 things that can’t be copied: immediacy, personalization, interpretation (study aids), authenticity (what the prof wants you to read), accessibility, embodiment (print copy), patronage (people want to pay creators), findability. Future for FWK: p2p tutoring, user-generated marketplace, self-assessment embedded within the books, data sales. “Knowledge is the black gold of the 21st century.”
[Sheizaf Rafaelli's talk was excellent — primarily about what happens when books lose bindings — but he spoke very quickly, and the talk itself did not lend itself to livebloggery, in part because I was hearing it in translation, which required more listening and less typing. Sorry. His slides are here. ]
I got to attend the Digital Public Library of America‘s first workshop yesterday. It was an amazing experience that left me with the best kind of headache: Too much to think about! Too many possibilities for goodness!
Mainly because the Chatham House Rule was in effect, I tweeted instead of live-blogged; it’s hard to do a transcript-style live-blog when you’re not allowed to attribute words to people. (The tweet stream was quite lively.) Fortunately, John Palfrey, the head of the steering committee, did some high-value live-blogging, which you can find here: 1234.
The DPLA is more of an intention than a plan. The DPLA is important because the intention is for something fundamentally liberating, the people involved have been thinking about and working on related projects for years, and the institutions carry a great deal of weight. So, if something is going to happen that requires widespread institutional support, this is the group with the best chance. The year of workshops that began yesterday aims at helping to figure out how the intention could become something real.
So, what is the intention? Something like: To bring the benefits of public libraries to every American. And there is, of course, no consensus even about a statement that broad. For example, the session opened with a discussion of public versus research libraries (with the “versus” thrown into immediate question). And, Terry Fisher at the very end of the day suggested that the DPLA ought to stand for a principle: Knowledge should be free and universally accessible. Throughout the course of the day, many other visions and pragmatic possibilities were raised by the sixty attendees. [Note: I've just violated the Chatham Rule by naming Terry, but I'm trusting he won't mind. Also, I very likely got his principle wrong. It's what I do.]
I came out of it invigorated and depressed at the same time. Invigorated: An amazing set of people, very significant national institutions ready to pitch in, an alignment on the value of access to the works of knowledge and culture. Depressed: The !@#$%-ing copyright laws are so draconian and, well, stupid, that it is hard to see how to take advantage of the new ways of connecting to ideas and to one another. As one well-known Internet archivist said, we know how to make works of the 19th and 21st centuries accessible, but the 20th century is pretty much lost: Anything created after 1923 will be in copyright about as long as there’s a Sun to read by, and the gigantic mass of works that are out of print, but the authors are dead or otherwise unreachable, is locked away as firmly as an employee restroom at a Disney theme park.
So, here are some of the issues we discussed yesterday that I found came home with me. Fortunately, most are not intractable, but all are difficult to resolve and, some, to implement:
Should the DPLA aggregate content or be a directory? Much of the discussion yesterday focused on the DPLA as an aggregation of e-works. Maybe. But maybe it should be more of a directory. That’s the approach taken by the European online library, Europeana. But being a directory is not as glamorous or useful. And it doesn’t use the combined heft of the participating institutions to drive more favorable licensing terms or legislative changes since it itself is not doing any licensing.
Who is the user? How generic? Does the DPLA have to provide excellent tools for scholars and researchers, too? (See the next question.)
Site or ecology? At one extreme, the DPLA could be nothing but a site where you find e-content. At the other extreme, it wouldn’t even have a site but would be an API-based development platform so that others can build sites that are tuned to specific uses and users. I think the room agrees that it has to do both, although people care differently about the functions. It will have to provide a convenient way for users to find ebooks, but I hope that it will have an incredibly robust and detailed API so that someone who wants to build a community-based browse-and-talk environment for scholars of the Late 19th Century French Crueller can. And if I personally had to decide between the DPLA being a site or metadata + protocols + APIs, I’d go with the righthand disjunct in a flash.
Should the DPLA aim at legislative changes? My sense of the room is that while everyone would like to see copyright heavily amended, DPLA needs to have a strategy for launching while working within existing law.
Should the DPLA only provide access to materials users can access for free? That meets much of what we expect from public libraries (although many local libraries do charge a little for DVDs), but it fails Terry Fisher’s principle. (I don’t mean to imply that everyone there agreed with Terry, btw.)
What should the DPLA do to launch quickly and well? The sense of the room was that it’s important that DPLA not get stuck in committee for years, but should launch something quickly. Unfortunately, the easiest stuff to launch with are public domain works, many of which are already widely available. There were some suggestions for other sources of public domain works, such as government documents. But, then the DPLA would look like a specialty library, instead of the first place people turn to when they want an e-book or other such content.
How to pay for it? There was little talk of business models yesterday, but it was a short day for a big topic. There were occasional suggestions, such as just outright buying e-books (rather than licensing them), in part to meet the library’s traditional role of preserving works as well as providing access to them.
How important is expert curation? There seemed to be a genuine divide — pretty much undiscussed, possibly because it’s a divisive topic — about the value of curation. A few people suggested quite firmly that expert curation is a core value provided by libraries: you go to the library because you know you can trust what is in it. I personally don’t see that scaling, think there are other ways of meeting the same need, and worry that the promise is itself illusory. This could turn out to be a killer issue. Who determines what gets into the DPLA (if the concept of there being an inside to the DPLA even turns out to make sense)?
Is the environment stable enough to build a DPLA? Much of the conversation during the workshop assumed that book and journal publishers are going to continue as the mediating centers of the knowledge industry. But, as with music publishers, much of the value of publishers has left the building and now lives on the Net. So, the DPLA may be structuring itself around a model that is just waiting to be disrupted. Which brings me to the final question I left wondering about:
How disruptive should the DPLA be? No one’s suggesting that the DPLA be a rootin’ tootin’ bay of pirates, ripping works out of the hands of copyright holders and setting them free, all while singing ribald sea shanties. But how disruptive can it be? On the one hand, the DPLA could be a portal to e-works that are safely out of copyright or licensed. That would be useful. But, if the DPLA were to take Terry’s principle as its mission — knowledge ought to be free and universally accessible — the DPLA would worry less about whether it’s doing online what libraries do offline, and would instead start from scratch asking: Given the astounding set of people and institutions assembled around this opportunity, what can we do together to make knowledge as free and universally accessible as possible? Maybe a library is not the best transformative model.
Of course, given the greed-based, anti-knowledge, culture-killing copyright laws, the fact may be that the DPLA simply cannot be very disruptive. Which brings me right back to my depression. And yet, exhilaration.
The deadline for my book is looming, but I spoke today with Michael Edson, Director of Web and New Media Strategy at the Smithsonian, and I’d love to include his idea for a Smithsonian Commons.
The Smithsonian Commons would make publicly available digital content and information drawn from the magnificent Smithsonian collections, allowing visitors to interact with it, repost it, add to it, and mash it up. It begins with being able to find everything about, say Theodore Roosevelt, that is currently dispersed across multiple connections and museums: photos, books, the original Teddy bear, recordings of the TR campaign song, a commemorative medal, a car named after him, contemporary paintings of his exploits, the chaps he wore on his ranch…But Michael is actually most enthusiastic about the “network effects” that can accrue to knowledge when you let lots of people add what they know, either on the Commons site itself or out across the whole linked Internet.
Smithsonian Commons goes way beyond putting online as much of our national museum as possible â€” which should be enough to justify its creation. It goes beyond bringing to bear everything curators, experts, and passionate visitors know to increase our understanding of what is there. By allowing us to discover connections, link in and out, and add ideas and knowledge, what used to be a “mere” collection will be an embedded part of countless webs of knowledge that in turn add value to one another. That is to say, we will be able to take up the objects of our heritage in ways that will make them more distinctly and uniquely ours than ever before.
Let’s hope Smithsonian Commons goes from idea to a national â€” global â€” center of ideas, creativity, knowledge, and learning.
The curator starts by presenting the engine with a basic set of keywords. CIThread scours the Web for relevant content, much like a search engine does. Then the curator combs through the results to make decisions about what to publish, what to promote and what to throw away.
As those decisions are made, the engine analyzes the content to identify patterns. It then applies that learning to delivering a better quality of source content. Connections to popular content management systems make it possible to automatically publish content to a website and even syndicate it to Twitter and Facebook without leaving the CIThread dashboard.
There’s intelligence on the front end, too. CIThread can also tie in to Web analytics engines to fold audience behavior into its decision-making. For example, it can analyze content that generates a lot of views or clicks and deliver more source material just like it to the curator. All of these factors can be weighted and varied via a dashboard.
I like the idea of providing automated assistance to human curators…
Eszter Hargittai and her team have done research that shows that digital youngsters are not as savvy as we would like them to be, over-relying on Google’s rank ordering of results, etc.
It’s important to have actual data to look at — thanks, Eszter! — even though it confirms what we should all probably know by now: When it comes to information, we’re a lazy, sloppy species that vastly over-estimates its own wisdom.
“â€¦. the statistic that we have been using is between the dawn of civilisation and 2003, five exabytes of information were created. In the last two days, five exabytes of information have been created, and that rate is accelerating. And virtually all of that is what we call user-generated what-have-you. So this is a very, very big new phenomenon.”
He concludes â€” and I certainly agree â€” that we need digital curation. He says that digital curation consists of “Authenticity, Veracity, Access, Relevance, Consume-ability, and Produce-ability.” “Consume-ability” means, roughly, that you can play it on any device you want, and “produce-ability” means something like how easy it is to hack it (in the good O’Reilly sense).
JP seems to be thinking primarily of knowledge objects, since authenticity and veracity are high on his list of needs, and for that I think it’s a good list. But suppose we were to think about this not in terms of curation â€” which implies (against JP’s meaning, I think) a binary acceptance-rejection that builds a persistent collection â€” and instead view it as digital recommendations? In that case, for non-knowledge-objects, other terms will come to the fore, including amusement value, re-playability, and wiseacre-itude. In fact, people recommend things for every reason we humans may like something, not to mention the way we’s socially defined in part by what we recommend. (You are what you recommend.)
Beth Noveck is deputy chief technology officer for open government and leads President Obama’s Open Government Initiative. She is giving a talk at Harvard. She begins by pointing to the citizenry’s lack of faith in government. Without participation, citizens become increasingly alienated, she says. For example: the rise of Tea Parties. A new study says that a civic spirit reduces crime. Another article, in Social Science and Medicine, correlates civic structures and health. She wants to create more opportunities for citizens to engage and for government to engage in civic structures — a “DoSomething.gov,” as she lightly calls it. [NOTE: Liveblogging. Getting things wrong. Missing things. Substituting inelegant partial phrases for Beth's well-formed complete sentences. This is not a reliable report.]
Beth points to the peer to patent project she initiated before she joined the government. It enlists volunteer scientists and engineers to research patent applications, to help a system that is seriously backlogged, and that uses examiners who are not necessarily expert in the areas they’re examining. This crowd-sources patent applications. The Patent Office is studying how to adopt peer to patent. Beth wants to see more of this, to connect scientists and others to the people who make policy decisions. How do we adapt peer to patent more broadly, she asks. How do we do this in a culture that prizes consistency of procedures?
This is not about increasing direct democracy or deliberative democracy, she says. The admin hasn’t used more polls, etc., because the admin is trying to focus on action, not talk. The aim is to figuring out ways to increase collaborative work. Next week there’s a White House on conf on gov’t innovation, focusing on open grant making and prize-based innovation.
The President’s first executive action was to issue a memorandum on transparency and open gov’t. This was very important, Beth says, because it let the open gov folks in the administration say, “The President says…” President Obama is very committed to this agenda, she says; after all, he is a community organizer in his roots. Simple things like setting up a blog with comments were big steps. It’s about changing the culture. Now, there’s a culture of “leaning forward,” i.e., making commitments to being innovative about how they work. In Dec., every agency was told to come up with its own open govt plan. A directive set a road map: How and when you’re going to inventory all the data in your agency and put it online in raw, machine-readable form? How are you going to engage people in meaningful policy work? How are you going to engage in collaboration within govt and with citizens? On Tuesday, the White House collected self-evaluations, which are then evaluated by Beth’s office and by citizen groups.
How to get there. First, through people. Every agency has someone responsible for open govt. The DoT has 200+ on their open govt committee. Second, through platforms (which, as she says, is Tim O’Reilly’s mantra). E.g., data.gov is a platform.
Transparency is going well, she thinks: White House visitor logs, streaming the health care summit, publishing White House employee salaries. More important is data.gov. 64M hits in under a year. Pew says 40% of respondents have been there. 89M hits on the IT dashboard that puts a user-friendlier interface to govt spending. Agencies are required to put up “high value” data that helps them achieve their core mission. E.g., Dept. of Labor has released 15 yrs of data about workplace exposure to toxic chemicals, advancing its goal of saving workers’ lives. Medicare data helps us understand health care. USDA nutrition data + a campaign to create video games to change the eating habits of the young. Agencies are supposed to ask the public which data they want to see first, in part as a way of spurring participation.
To spur participation, the GSA now has been procuring govt-friendly terms of service for social media platforms; they’re available at apps.gov. It’s now trying to acquire innovation prize platforms, etc.
Participation and collaboration are different things, she says. Participation is a known term that has to do with citizens talking with govt. But the exciting new frontier, she says, is about putting problems out to the public for collaborative solving. E.g., Veterans Benefits Admin asked its 19,000 employees how to shorten wait times; within the first week of a brainstorming competition, 7,000 employees signed up and generated 3,000 ideas, the top ten of which are being implemented. E.g., the Army wikified the Army operations manual.
It’s also about connecting the public and private. E.g., the National Archives is making the Federal Registry available for free (instead of for $17K/yr), and the Princeton Internet center has made an annotatable. Carl Malamud also. The private sector has announced National Lab Day, to get scientists out into the schools. Two million people signed up.
She says they know they have a lot to do. E.g., agencies are sitting on exebytes of info, some of which is on paper. Expert networking: We have got to learn how to improve upon the model of federal advisory commissions, the same group of 20 people. It’s not as effective as a peer to patent model, volunteers pooled from millions of people. And we don’t have much experience using collaboration tools in govt. There is a recognition spreading throughout the govt that we are not the only experts, that there are networks of experts across the country and outside of govt. But ultimately, she says, this is about restoring trust in govt.
Q: Any strategies for developing tools for collaborative development of policy?
A: Brainstorming techniques have been taken up quickly. Thirty agencies are involved in thinking about this. It’s not about the tools, but thinking about the practices. On the other hand, we used this tool with the public to develop open govt plans, but it wasn’t promoted enough; it’s not the tools but the processes. Beth’s office acts as an internal consultancy, but people are learning from one another. This started with the President making a statement, modeling it in the White House, making the tools available…It’s a process of creating a culture and then the vehicles for sharing.
Q: Who winnowed the Veterans agency’s 3,000 suggestions?
A: The VA ideas were generated in local offices and got passed up. In more open processes, they require registration. They’ve used public thumbs up and down, with a flag for “off topic” that would shrink the posting just to one link; the White House lawyers decided that that was acceptable so long as the public was doing the rating. So the UFO and “birther” comments got rated down. They used a wiki tool (MixedInk) so the public could write policy drafts; that wiki let users vote on changes. When there are projects with millions of responses, it will be very hard; it makes more sense to proliferate opportunities for smaller levels of participation.
A: We’re crowd-sourcing expertise. In peer to patent, we’re not asking people if they like the patent or think it should be patented; we’re asking if they have info that is relevant. We are looking for factual info, recognizing that even that info is value-laden. We’re not asking about what people feel, at least initially. It’s not about fostering contentious debate, but about informed conversation.
A: What do you learn from countries that are ahead of the curve on e-democ, e.g., Estonia? Estonia learned 8 yrs ago that you have to ask people to register in online conversations…
A: Great point. We’re now getting up from our desks for the first time. We’re meeting with the Dutch, Norway, Estonia, etc. And a lot of what we do is based on Al Gore’s reinventing govt work. There’s a movement spreading particularly on transparency and data.gov.
Q: Is transparency always a good approach? Are there fields where you want to keep the public out so you can talk without being criticized?
A: Yes. We have to be careful of personal privacy and national security. Data sets are reviewed for both before they go up on data.gov. I’d rather err on the side of transparency and openness to get usover the hump of sharing what they should be sharing. There’s value in closed-door brainstorm so you can float dumb ideas. We’re trying to foster a culture of experimentation and fearlessness.
[I think it's incredible that we have people like Beth in the White House working on open government. Amazing.]
Here’s a post from last July â€” ok, so I’m a little behind in my reading â€” that describes the Tuttle Club’s first consulting engagement. An open, self-selected group of people converge for an open session with the potential client. They talk, sketch, and do some improv, out of which emerges a set of topics and people for more focused discussion.
This is semi-emergent expertise. I add the “semi” because the initial starting conditions are quite focused, so the potential areas of collaboration and outcomes are thus fairly constrained. But compared to traditional Calf Sock Expertise (i.e., highly paid and trained men in blue suits who believe that focus is the only efficient way to proceed), this is wildly emergent.
As part of my Be A Bigger A-Hole resolution, let me note that the Harvard Business Review blog has just run a post of mine that looks at the history of the DIKW pyramid and why it doesn’t make that much sense.
…The Republicans are better at questioning the President than you are.
I learned more about both sides of the issues than I have by listening to official press conferences. Getting neutrality out of the way seems to help when the issues are by nature contentious. Having the media mediate puts into the middle a force that (a) fears that taking up the opposition’s side too strongly will look like partisanship, and (b) is looking for “news,” i.e., headlines. It turns out that getting to hear the back-and-forth of the groups that have skin in the game can be better than inviting in a skinless third party.
Of course, it helps that not only is our President articulate and informed, he tries to engage substantively and accords his opponents appropriate dignity. And it’s to the Republican’s credit that they invited him in, gave the session enough time, and treated him civilly.