|
|
At SCS13, Clay Shirky says that “Why do comments suck so bad?” is one of the questions that is perpetually asked in public discussions. So, what’s the answer?
|
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
|
Clay points to YouTube as the “basement” of conversation, even in comments on innocuous videos, but there are sites discussing contentious issues that are quite civil and useful. And Google owns YouTube, and they have lots of money and an Internet sensibility, but still YouTube comments suck.
Explanation #1: The world is filled with trolls. But in fact, some sites with good commenting sections moderate comments, thinking about the commenters as a community, not as individuals asserting “First Amendment” rights.
Explanation #2: “Good. Big. Cheap. Pick two.” YouTube’s scale is “an attractive nuiscance.” If you have a publishing frame, then you want to let as many people in. If you have a community view, you are ok with limiting page views. E.g., Gawker uses an algorithm that features comments based on the richness of the thread. (The lower-ranked comments are still there.)
Explanation #3: “What do you want the users to do?” Publishing sites actually want people to forward the article to a million friends and then read another article. They often relegate the comments to the bottom of the page. E.g., the NYT says “Share your thoughts,” which is incredibly generic. No guidance is given. The result are responses that read like letters to the editor, without interaction or conversation. The NYT gives you actionable info for shows, but not for candidates: no links to their sites, no way to donate, etc. “The NYT is much better at helping consumers plug into markets than citizens to plug into politics.”
Explanation #4. “Institutions dodnot have the full range of either social technical solutions available to them culturally.” They can’t think of their commenters as a community instead of as a way of generating low-cost page views.
Q&A
Clay would like newspapers to have a dashboard of options they can use when constructing commenting sections, each customized to the article.
Q: [Anil Dash] Why ascribe this to ignorance instead of malice. Many of this institutions are served by making their readers look stupid.
Clay: That’s one of my a priori assumptions. I don’t think the individuals making choices are purposefully trying to keep the comments shallow and to prevent collective action. Rather, “letters to the editor” is a comfortable place for them.
Categories: liveblog, media Tagged with: liveblog • social media Date: January 17th, 2013 dw
Gary Price from Infodocket is moderating a panel on what’s new in search. It’s a panel of vendors
|
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
|
The first speaker is from Blekko.com, which he says “is thought of as the third search engine” in the US market. It features info from authoritative sources. “You don’t want your health information to come from some blog.” When you search for “kate spade” you get authenticated Kate Spade fashion stuff. Slashtags let you facet within a topic, based on expert curation. Users can create their own slashtags. At /webgrep you can ask questions about the corpus that if upvoted the techies at Blekko will answer.
Weblib.com describes itself on its site as “Natural Language Processing Tools and Customizable Knowledge Bases for Semantic Search and Discovery Applications.” Thomas talks about OntoFind and semantic search, which is a search that produces “meaningful results even when the retrieved pages” contain none of the search terms [latent semantic search!]. He points to Google’s Freebase, which has info about 500M entities and their relationships. In a week you’ll be able to try OntoFind at ezu.com, I believe. Searching for big brother and privacy first asks you to disambiguate and then pulls together results.
ScienceScape.com is designed to help scientists follow science. It diagrams publications on a topic, and applies article-level metrics. It’s focused on the undergrad and graduate research markets. It integrates genomic knowledge plus much more. It lets you see the history of science top down, and browse e.g. by date. You can share what you’ve found.
[I couldn't hear the Q&A well enough to blog it.]
Categories: liveblog Tagged with: internetlibrarian • liveblog • search Date: October 22nd, 2012 dw
David Wood of 3RoundStones.com is talking about Callimachus, an open source project that is also available through his company. [NOTE: Liveblogging. All bets on accuracy are off.]
We’re moving from PCs to mobile, he says. This is rapidly changing the Internet. 51% of Internet traffic is non-human, he says (as of Feb 2012). 35hrs of video are uploaded to YouTube every minute. Traditionally we dealt with this type of demand via data warehousing: put it all in one place for easy access. But that’s not true: we never really got it all in one place accessible through one interface. Jeffrey Pollock says we should be talking not about data integration but interoperability because the latter implies a looser coupling.
He gives some use cases:
-
BBC wanted to have a Web presence for all of its 1500 broadcasts per day. They couldn’t do it manually. So, they decided to grab data from the linked open data data cloud and assemble the pages automatically. They hired fulltime editors to curate Wikipedia. RDF enabled them to assuemble the pages.
-
O’Reilly Media switched to RDF reluctantly but for purely pragmatic reasons.
-
BestBuy, too. They used RDFa to embed metadata into their pages to improve their SEO.
-
Elsevier uses Linked Data to manage their assets, from acquisition to delivery.
This is not science fiction, he says. It’s happening now.
Then two negative examples:
-
David says that Microsoft adopted RDF in the late 90s. But Netscape came out a portal tech based on RDF that scared Microsoft out of the standards effort. But they needed the tech, so they’ve reinvented it three times in proprietary ways.
-
Borders was too late in changing its tech.
Then he does a product pitch for Callimachus Enterperise: a content management system for enterprises.
It’s the end of the workstream day of the DPLA Midwest meeting. Each of the three workstream meetings is reporting back to the general group.
|
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
|
Emily Gore from the Content stream: What kind of guidance should we develop for interested content providers? The group wants to have a strategic collection development plan draft by the end of December. “What is our role with regard to advocacy” for content currently under copyright? Also, the group talked about the hub pilot project. Various participants in that pilot were in the room.
SJ Klein from the Technical workstream: There was a lively discussion this afternoon, primarily about the design of the front end. How to make the frontend experience help people become contributors? They also talked about the Chatanooga hackathon Nov. 8-9. Tools for making working with metadata easier. Packaging tools that match potential contributors with a hub. Metadata purgatory for metadata that has been contributed but doesn’t meet the standards.
Maureen Sullivan and John Palfrey report on the Governance group: The next steps are to take the barebone by-laws and flesh them out. There were many discussions about whether DPLA the 501(c)(3) should be a membership organization, but the general consensus is no. (Paul Courant made the point that many institutions shy away from becoming members because that makes them liable.) Rather, it would be good to have participation from groups and people with specific areas of expertise. There was a lot of energy about expanding on the statement of principles, including adding an explicit commitment to accessibility. There was strong support for continuing to see the DPLA as a public/private enterprise. John Bracken made the point that DPLA should view itself as a network, not as a heavyweight organization.
Maureen points out that the workstreams have converged. She says that “contributor” seems to be a better word than “member.” We need to be flexible about how people will come together to do the work that’s required. And we should be thinking of ourselves as advocates, a force for change to improve the lives of people in this country and around the world.
Categories: dpla, liveblog Tagged with: dpla • liveblog Date: October 11th, 2012 dw
Today is the first day of the third national plenary of the Digital Public Library of America. We’re in the Chicago Public Library where Brian Bannon has welcomed us. Brian is Chicago’s new Library Commissioner, and I am a huge fan.
|
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
|
John Palfrey, the chair of the DPLA, tells us that what makes him happiest about the DPLA meeting is the wide range of people who continue to work on the project. He tells us that this is the first time the DPLA has live-streamed the workstream day; tomorrow is the big public confab. (Hashtag: #dplamidwest.) JP tells us that the DPLA is working across workstreams; the meetings today are not focused on workstreams but topics.
One session will be on content. JP reminds us that Emily Gore is working fulltime on acquiring content. That session is going to talk about strategic planning, and about the digital hubs pilot project that is under development. (The hubs project apparently will give access to the Hathi Trust and Internet Archive, which means there will be books in the DPLA!) JP tells us that there are two federal funders and one not-yet-announced private funder.
The second simultaneous group is the technical workstream. Martin Kalfaltovic, SJ Klein and Jeffrey Licht.
The third is on the future of the DPLA with JP and Maureen Sullivan.
JP announces that the the DPLA non-profit org is on the way. He also congratulates PAul Courant of the Hathi Trust for the judicial decision yesterday. JP asks how we can keep the DPLA’s inclusiveness and openness even as it moves to a more formal structure.
“This is the last of the entire days to roll up your sleeves and figure out what the ‘it’ is before we launch the ‘it’ in April 2013,” JP says.
Categories: dpla, liveblog Tagged with: conference coverage Date: October 11th, 2012 dw
Andrew Keen is speaking. (I liveblogged him this spring when he talked at a Sogeti conference.) His talk’s title: “How today’s online social revolution is dividing, diminishing, and disorienting us.” [Note: Posted without rereading because I'm about to talk. I may go back and do some cleanup.]
|
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
|
Andrew opens with an anecdote. He grew up as a Jew in Britain. His siblings were split between becoming lawyers or doctors. But his mother asked him if he’d like to be the anti-Christ. So, now he’s grown up to become the anti-Christ of Silicon Valley.
“I’m not usually into intimacy,” but look at each other. How much do we know about each other? Not much. One of the great joys is getting to know one another. By 2017 there will 15x more data flowing over the network. Billions of intelligent devices. “The world we are going into is one in which 2o-25 years…you strangers will show up in a big city in London and you’ll know everything about each other.” You’ll know one another’s histories, interests…
“My argument is that we’re all stuck in Digital Vertigo. We’re all participants in a digital noir.” He shows a clip from Vertigo. “In the future these kinds of scenes won’t be possible. There won’t be private detectives…So this movie about the unfolding of understanding between strangers won’t happen.” What happens to policing. “Will we be guilty if we don’t carry our devices.” [SPOILERS] The blonde in this movie doesn’t exist. She’s a brunette shopgirl from Kansas. “The movie is about a deception…A classic Hitchcock narrative of falling in love with something that doesn’t exist. A good Catholic narrative…It’s a warning about falling in love with something that is too good to be true.” That’s what we’re doing with social media nd big data. We’re told big data brings us together. They tell us the Net gives us the opportunity for human beings to come together, to realize themselves as social beings. Big data allows us to become human.
This is about more than the Net. The revolution that Carlotta is talking about is one in which the Net becomes central in the way we live our lives. Fifteen years ago, Doc Searls, David W., and I would be marginal computer nerds, and now our books can be found in any book store. [Doc is in the audience also.]
He shows a clip from The Social Network: “We lived on farms. Now we’re going to live on the Internet.” It’s the platform of 21st century life. This is not a marginal or media issue. It is about the future of society. Many people this network will solve the core problems of life. We now have an ecosystem of apps in the business of eliminating loneliness. E.g., Highlight, “the darling of the recent SxSW show.” They say it’s “a fun way to learn more about people nearby.” Then he shows a clip from The Truman Show. His point: We’re all in our own Truman Shows. The destruction of privacy. No difference between public and private. We’re being authentic. We’re knowingly involving ourselves in this.
A quote: “Vertigo is the ultimate critics’ film because it is a dreamlike film about people who are not sure who they but who are busy econstructing themselves and each other to a f=kind of cinema ideal of the ideal soul mate.” Substitute social media for film. We’re losing what it means to be using. We’re destroying the complexity of our inner lives. We’re only able to live externally. [This is what happens when your conceptual two poles are public and private. It changes when we introduce the term "social."]
Narcissism isn’t new. Digital narcissism has reached a climax. As we’re given personal broadcasting platforms, we’re increasingly deluded into thinking we’re interesting and important. Mostly it reveals our banality, our superficiality. [This is what you get when your conceptual poles are taken from broadcast media.]
It’s not just digital narcissism. “Visibility is a trap,” said Foucault. Hypervisibility is a hypertrap. Our data is central to Facebook and others becoming viable businesses. The issue is the business model. Data is oil, and it’s owned by the rich. Zuckerberg, Reed Hoffman, et al., are data barons. Read Susan Cain’s “Quiet”: introverts drive innovation. E.g., Steve Wozniak. Sharing is not good for innovation. Discourage your employees from talking with one another all the time. It makes them less thoughtful. It creates groupthink. If you want them to think for themselves, “take away their devices and put them in dark rooms.”
It’s also a trap when it comes to govt. Many govts are using the new tech to spy on their citizens. Cf. Bentham’s panopticon, which was corrupted into 1984 and industrial totalitarianism. We need to go back to the Industrial Age and JS Mill — Mill’s On Liberty is the best antidote to Bentham’s utilitarianism. [? I see more continuity than antidote.]
To build a civilized golden age: 1. There is a role for govt. The market needs regulation. 2. “I’m happy with the EU is working on this…and came out against FB facial recognition software. … We have a right to forget.” “It’s the most unhuman of things to remember everything.” “We shouldn’t idolize the never-forgetting nature of Big Data.” “To forget and forgive is the core essence of being human.” 3. We need better business models. We don’t want data to be the new oil. I want businesses that charge. “The free economy has been a catastrophe.”
He shows the end of The Truman Show. [SPOILER] As Truman enters reality, it’s a metaphor for our hope. We can only protect our humanness by retreating into dark, quiet places.
He finishes with a Vermeer that shows us a woman about which we know nothing. In our Age of Facebook, we need to build a world in which the woman in blue can read that letter, not reveal herself, not reveal her mystery…”
Q: You’re surprising optimistic today. In the movie Vertigo, there’s an inevitability. How about the inevitability of this social movement? Are you tilting at windmills.
Idealists tilt at windmills. People are coming to around to understanding that the world we’re collectively creating is not quite right. It’s making people uneasy. More and more books, articles, etc., that FB is deeply exploitative. We’re all like Jimmy Stewart in Vertigo. The majority of people in the world don’t want to give away their data. As more of the traditional world comes onto the Net, there will be more resistant to collapsing the private and the public. Our current path is not inevitable. Tech is religion. Tech is not autonomous, not a first mover. We created Big Data and need to reestablish our domination over it. I’m cautiously optimistic. But it could go wrong, especially in authoritarian regimes. In Silicon Valley people say privacy is dead, get over it. But privacy is essential. Once we live this public ideal, then who are we.
I’m at Sogeti‘s annual executive conference, which brings together about 80 CEOs. I’m here with Doc Searls, Andrew Keen, and others. I’ve spoken at other Sogeti events, and I am impressed with their commitment to providing contrary points of view — including views at odds with their own corporate interests. (My one complaint: They expect all attendees to have an iPad or iPhone so that they can participate in on the realtime survey. Bad symbolism.) (Disclosure: They’re paying me to speak. They are not paying me to say something nice about them.)
|
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
|
Menno van Doorn begins by talking about the quantified self movement, claiming that they sometimes refer to themselves as “datasexuals” :) All part of Big Data, he says. To give us an idea of bigness, he relates the Legend of Sessa: “Give me grain, doubling the amount for each square on a chessboard.” Exponential growth meant that by the time you hit the second half of the chessboard, you’re in impossible numbers. Experts say that’s where we were in 2006 when it comes to data. But “there’s no such thing as too much data.” “Big Data is powering the next industrial revolution. Data is the new oil.”
Big Data is about (1) lots of data, (2) at high velocity, (3) using in a variety of ways. (“volume, velocity, variety.”) Michael Chui says that there’s billions in revenues to gain, including from efficiencies. But, Chui says, there are no best practices. The value comes from “human exhaust.” I.e., your digital footprint, what you leave behind in your movement through the Net. Menno thinks of this as “your recorded future.”
Three examples:
1. Menno points to Target, a company that can predict life-changing events among its customers. E.g., based on purchases of 25 products, they can predict which customers are pregnant and roughly when they are due. But, this led to Target sending promotional materials for pregnancy to young girls whose parents learned this way that their daughters were pregnant.
2. In SF, they send out police cars to neighborhoods based on 14-day predictions of where crime will occur, based on data about prior crime patterns.
3. Schufa, a German credit agency, announced they’d use social media to assess your credit worthiness. Immediately a German Minister said, “Schufa cannot become the Big Brother of the beusiness world.”
Two forces are in contention and will determine how much Big Data changes us. Today, the conference will look at the dawn of the age of big data, and then how disruptive it will be for society (the session Keen and I are in). Day 2: Bridging the gap to the new paradigm, Big Data’s fascinating future, and Decision Time: Taming Big Brother.
Carlota Perez, Prof. of Tech and Socio-Economic Development, from Venezuela speaks now.. She is a “neo-Schumpeterian.” She says her role in the conference is “locate the current crisis.” What is the real effect on innovation, and why are we only midways along in feeling the impact?
There have been 5 tech revolutions in the past 240 yeares: 1. 1771 Industrial rev. 1829. Age of steam, coal and railways. 3. 1875 Steel and heavy engineering (the first globalization). 4. Age of he automobile, oril, petrochem and mass production 5. 1971 Age of info tech and telecom. We’re only halfway through that last one. The next revolution queued up: age of biotech, bioelectronics, nanotech, and new materials. [I'm surprised she doesn't count telegrapgh + radio + telephone, etc., as a comms rev. And I'd separate the Net as its own rev. But that's me.]
Lifecycle of a tech rev: gestation, induction, deployment, exhaustion. The “big bang” tends to happen when the prior rev is reaching exhaustion. The structure of revs: new cheap inputs, new products, new processes. A new infrastructure arise. And a constellation of new dynamic industries that grow the world economy.
Why call these “revolutions”, she asks? Because they transform the whole economy. They bring new organizational principles and new best practice models. I.e. , a new “techno-economic paradigm.” E.g., we’ve gone from mass production to flexible production. Closed pyramids to open networks. Stable routines to continuous improvement. “Information technology finds change natural.” From human resources to human capital (from raw materials to value). Suppliers and clients to value network partners. Fixed plans to flexible strategies. Three-tier markets (big,medium,small) to hyper-segmented markets. Internationalization to globalization. Information as costly burden to info as asset. Together, these constitute a radical change in managerial common sense.
The diffusion process is broken in two: Bubble, followed by a crash, and then the Golden Age. During the bubble, financial capital forces diffusion. There is income and demand polarization. Then the crash. Then there is an institutional recomposition, leading to a golden age in which everyone benefits. Production capital takes over from financial capital (driven by the govt), and there is better distribution of income and demand.
She looks at the 5 revs, and finds the same historic pattern that she just sketched.
wo major differences between installation and deployment: 1. Bubbles vs. patient (= long-term) capital. 2. Concentrated innovation to modernize industries vs. innovation in all industries that use the new technologies. “Understanding this sequence is essential for strategic thinking.”
The structure of innovation in deployment: pa new coherent fabric of the economy emerges, leading to a golden age. Also, oligopolies emerge which means there’s less unhelpful competition. (?)
Example of prior rev: home electrical applicances: In the installation period, we had a bunch of electric utilities going into homes in the 1910s and 1930s. During the revision, we get a few more. But then in the 1950-70s. we get a surge of new applicances, including tape recorder, microwave, even the electric toothbrush. It’s enabled by universal electricity and driven by suburbinization. It’s the same pattern if you look at textile fibers, from rayon and acetate during instlation, to a huge number during deployment. E.g., structural and packaging plastics: installation brought bakelite, polystyrene and polyethylene, and then a flood of innovation during deployment. “The various systems of the ICT revolution will follow a similar sequence.” [Unless it follows the Tim Wu pattern of consolidation — e.g., everyone being required to use an iPad at a conference] During installation period, ICT was in constant supply push mode. Now must respond to demand pull. “The paradigm and its potential are now understood by all. Demand (in vol and nature) becomes the driving force.
This shifts the role of the CIO. To modernize a mature company, during installation you brought in an expert in modernization, articulating the hw and sw being pushed by the suppliers. During the deployment phase, a modern company that is innovating for strategic expansion, the CIO is an expert in strategy, specifying needs and working with suppliers. “The CIO is no longer staff. S/he must be directly involved in strategy.”
There are 3 main forces for innovation in the next 2-3 decades, as is true for all the revs. 1. Deepening and widening of the ICT tech rev, responding to user needs. 2. The users of ICT across all industries and activities. 3. The gestation of the next rev (probably bioteech, nanotech, and new materials).
Big Data is likely have a big role in each of those directions.
Q: Why are we only 50% of the way through?
A: Because the change after the recession is like opening a dam. Once you get to the point where you can have a comfortable innovation prospective, imagine the market possibilities.
Q: What can go wrong?
A: Governments. Unfettered free markets are indispensable for the installation process. Lightly guided markets are needed in the golden age. Free markets work when you need to force everyone to change. But now no longer: The state has to come in . But govts are drunk with free markets. Now finance is incompetent. “They don’t dare invest in real things.” Ideology is so strong and the understanding of history is so shallow that we’re not doing the right thing.”
Christopher Ahlberg speaks now. He’s the founder of Recorded Future. His topic: “Turning the Web into Predictive Signals.”
We see events like Arab Spring and wonder if we could have predicted them. Three things are going on: 1. Moving from smaller to larger datasets. 2. From structured to unstructured data (from numbers to text). 3. From corporate data to Internet/Web.
There’s a “seismic shift in intelligence” “emporal indexing of the Web enables Web intelligence.” The Web is not organized for finding date; it’s about finding documents.” Can we create structure for the Web we can use for analysis? A lot of work has been done on this. Why is this possible now? Fast math, large, fast storage, web harvesting, and linguistic analysis progress.
His company looks for signals in human language. E.g., temporal signals. That can turn up competitive info. But human language is tough to deal with. But also when something happens — e.g., Haitian earthquake — there are patterns in when people show up: helpers, doctors, military, do-gooder actors, etc. There tends to be a flood of notifications immediately afterwards. The Recorded Data platform does the linguistic analysis.
He gives an example: What’s going to happen to Merck over the next 90 days. Some is predictable: There will be a quarterly financial conference all. A key drug is up for approval. Can we look into the public conversations about these events, and might this guide our stock purchases? And beyond Merck, we could look at everything from cyber attacks to sales opportunities.
Some examples. 1. Monitoring unrest. Last week there were protests against Foxconn in China. Analysis of Chinese media shows that most of those protests were inland, while corporate expansion is coming in coastal areas. Or look at protests against pharmaceuticals for animal testing.
Example 2: Analyzing cyber threats. Hackers often try out an approach on a small scale and then go larger. This can give us warning.
Example 3: Competitive intelligence. When is there a free space — announcement-free — when you can get some attention. Example 4: Lead generation. E.g., look for changes in management. (New marketing person might need a new PR agency.) Exasmple 5: Trading patterns. E.g., if there’s bad news but insiders are buying.
Conclusion: As we move from small to large datasets, structured to unstructured, and from inside to outside the company, we go from surprise to foresight.
Q: What is the question you cannot answer?
A: The situations that have low frequency. It’s important that there be an opportunity for follow-up questions.
Q: What if you don’t know what the right question is?
A: When it’s unknown unknowns, you can’t ask the right question. But the great thing about visualizaton is that it helps people ask questions.
Q: How to distinguish fact from opinion on Twitter, etc.?
A: Or NYT vs. Financial Post. There isn’t a simple answer. We’re working toward being able to judge sources based on known outcomes.
Q: Do your predictions get more accurate the more data you have?
A: Generally yes, but it’s not always that simple.
A series of 5-min lightning talks.
|
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
|
Christie Moffatt of the National Library of Medicine talks about a project collecting blogs talking about health. It began in 2011. The aim is to understand Web archiving processes and how this could be expanded. Three examples: Wheelchair Kamikaze. Butter Compartment. Doctor David’s Blog. They were able to capture them pretty well, but with links to outside, outside of scope content, and content protected by passwords, there’s a question about what it means to “capture” a blog. The project has shown the importance of test crawls, and attending to the scope, crawling frequency and duration. The big question is which blogs they capture. Doctors who cook? Surgeons who quilt? Other issues: Permissions. Monitoring when the blogs end, change focus, or move to a new url. E.g., a doctor retired and his blog changed focus to about fishing.
Terry Plum from Simmons GSLIS talks about a digital curriculum lab. It was set up to pull in students and faculty around a few different areas. They maintain a collection of open source applications for archives, museums, and digital libraries. There are a variety of teaching aids. The DCL is built into a Cultural Heritage Informatics track at Simmons.
Daniel Krech of Library of Congress works at the Repository Development Center. The RDC works with people managing collections. The RDC works on human-machine interfaces. One project involves “sets” (collections). “We’ve come up with some new and interesting ways to think about data.” They use knot, set, and hyper theory, but they also sometimes use a physical instantiation of a set — it looks like knotted yarn — to help understand some very abstract ideas.
Kelsey [Keley?]Shepherd of Amherst represents the Five College Digital Task Force. (She begins by denying that the Scooby Gang was based on the five colleges.) They don’t share a digital library but want to collaborate on digital preservation. They are creating shared guidelines for preservation-ready digital objects. They are exploring models for funding and organizational structure. And they are collaborating on implementing a trusted digital perservation repository. But each develops its own digital preservation policy.
Jefferson Baily talks about Personal Digital Archiving at the Library of Congress. He talks about the source diary for The Widwife’s Tale. That diary sat on a shelf for 200 years before being discovered as an invaluable window on the past. Often these archives are the responsibility of the record creators. The LoC therefore wants to support community archives, enthusiasts, and citizen archivists. They are out and about, promoting this. See digitalpreservation.gov
Carol Minton Morris with DuraSpace and the NDSA (National Digital Stewardship Alliance) talks about funding archiving through “hip pocket resources.” They’re looking into Kickstarter.com. Technology and publishing projects at Kickstarter have only raised $9M out of the $100M raised there; most of it goes to the arts. She points to some other microfinance sites, including IndieGoGo and DonorsChoose.org. She encourages the audience to look into microfinancing.
Kristopher Nelson from LoC Office of Strategic Initiatives talks about the National Digitial Stewardship Residency, which aims at building a community of professionals who will advance digital archiving. It wants to bridge classroom education and professional experience, and some real world experience. It will start in June 2013 with 10 residents participating in the 9 month program.
Moryma Aydelott, program specialist at LoC talks about Tackling Tangible Metadata. The LoC’s digital data is on lots of media: 300T on everything from DVDs to DAT tapes and Zip disks. Her group provides a generic workflow for dealing with this stuff — any division, any medium. They have a wheeling cart for getting at this data. They make the data available “as is.” It can be hard to figure out what type of file it is, and what application is needed to read it. Right now, it’s about getting it on the server. They’ve done about 6.5T of material, 700-800 titles, so far. But the big step forward is in training and in documenting processes.
Categories: libraries, liveblog Tagged with: archives • libraries • liveblog Date: July 24th, 2012 dw
Michael Carroll, from American University Washington College of Law, is talking about “Copyright and Digital Preservation: The Role of Open Licenses.” (Michael is on the board of Creative Commons.)
|
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
|
Michael begins with a comparison to environmentalism: Stewardship of valuable resources, and long-term planning. There are cognitive challenges, and issues in providing institutional incentives. (He recommends sucking in as much data as possible, and worrying about adding the metadata later, perhaps through crowdsourcing.)
Michael notes that copyright used to be an opt-in and opt-out system; you had to register, and deposit a copy. Then you had to publish with a ©; anything published before 1989 that doesn’t have the © is in the public domain. You had to renew after 28 years, and the majority of copyrights (60%) were not renewed. We therefore had a growing public domain.
The court in Golan upheld Congress’ right to restore copyright for works published outside the US. This puts the public domain at risk, he says. He also points to the Hathi case in which they’ve been sued for decisions they made about orphan works. There is a dangerous argument being made there that if archiving occurs within the library space, fair use goes away. The legal environment is thus unstable.
Now that copyright is automatic and lasts for 70 years after the author’s death, managing the rights in order to preserve the content is fraught with difficulty.
He reminds us that making a copy to preserve the work is unlikely to have market harm to the copyright owner, and thus ought to be legal under fair use, Michael says. “You ought to have a bias toward believing you have a Fair Use right to preserve things.”
He asks: “Can the preservation community organize itself to be the voice of tomorrow’s users on issues of copyright policy and copyright estate planning?” For orphan works, copyright term shortening, exceptions to DRM rules, good practices open licensing in the long term…
And he asks: How can you get the FBs and Googles et al. to support long-term preservation? Michael suggests marking things that already in the public domain as being in the public domain. Otherwise, the public domain is invisible. And think about “springing” licenses, e.g. an open license that only goes into effect after a set time or under a particular circumstance.
Anil Dash (one of my heroes, and is also hilarious) is talking at a Library of Congress event on Digital Preservation, part of the National Digital Information Infrastructure and Preservation Program. Anil’s talk is called “Make a Copy.” (Anil is now at ThinkUp.)
Live Blogging
Getting things wrong. Making fluid talks sound choppy. Missing important points. Not running a spellpchecker. This is not a reliable report. You have been warned, people!
Anil says he’s a geek interested in the social impacts of tech on culture, govt, and more. He started Expert Labs a few years ago to enable tech to talk with policy makers. Expert Labs built ThinkUp. He wants to talk about the issues that this group or archivists confronts every day that the tech community doesn’t know about. He warns us that this means he’s starting with depressing stuff. So…
…Picture the wholesale destruction of your wedding photos, or other deeply personal mementos. They are being destroyed by an exclusive, private, ivy league club: Facebook. FB treats memories as disposable. “Maybe if I were a 25 year old billionaire, I’d think of these as disposable, too.” “The terms of service of digital social networks trumps the Constitution in terms of what people can share and consume.” Our ordinary conversations are treated as disposable, at Facebook, Twitter, Microsoft, etc. They explicitly say that they can delete all of your content at any time for any reason. “100s of millions of Americans have accepted that. That should be troubling to those of us who care about preservation.”
You can opt out, but not without compromising your career and having severe social cost. And you can’t rely upon the rest of the Web, because “there’s a war ranging against the open Web.” “The majority of time spent on the Web in the US is spent in an application,” not on pages. Yet we’re still archiving Web pages but not those applications. “They are gaslighting the Web,” Anil says, referring to the old movie. E.g., you can leave FB comments on Anil’s blog, but when you click from FB to his blog, FB gives you a warning that the site you’re going to is untrustworthy. “I don’t do that to them,” he says, even though they’ve consistently “moved the goal posts” on privacy, and he has registered his site with FB.
After blogging this, Anil got a message from a tech at FB saying that it was a bug that’s being fixed. But suppose he hadn’t blogged it, or FB had missed it? “The best case scenario is that we’re left fixing their bugs.” He adds, “That’s pretty awful, because they’re not fixing our bugs. And we’re helping them to extend their prisons over the Web.” And is the only way to get our words preserved is to agree to Twitter’s ToS so that we’ll get archived by the Library of Congress, which has been archiving tweets. Anil says that he’s conscientiously tried to archive his own works for his new baby, but it shouldn’t rely on that much effort by an individual.
And, he says, that’s just the Web, not the apps. You can’t crawl his phone and preserve his photos. And when FB buys Instagram which has a billion photos, and only 5% of the content FB has bought has been preserved…? And yet the Instagram acquisition is considered a success by the Valley. If you’re a Pharaoh, your words are preserved. Anil is worried about the rest of the conversations.
“If I were to ask you what is the most watched form of video, what would you say?” Anil guesses that it’s animated gifs. And we don’t archive them. “We’re talking about the wrong things.” We’re arguing that we should be using Ogg Vorbis, but the proprietary forms are the ones that are most used. The standards ecology is getting more complicated. “We need to reflect back to the tech community that they have an obligation to think about preservation.” They’ve got money and resources. Shouldn’t they be contributing?
We’re losing metadata, he says. You can’t find Instagram photos because they have no Web presence and are short on metadata. Flickr, on the contrary, has lots of metadata. The Instagram owners are now multi-millionaires and are undermotivated to fix this problem. Maybe we’ll get something in 5 years, but then we will have lost a full decade of people’s photos. There’s no way to assign Instagrams open licenses at this point.
Indeed, “they are bending the law to make archiving illegal.” You can’t hack your own phone. You can’t copy your own photos from one device to another.
“Content tied to devices dies when those devices become obsolete.” The obsolesence cycle is becoming faster every year.
So, what should we do?
The technologists building these devices don’t know about the work of archivists. They don’t know that what this group is doing is meaningful. Many are young and don’t yet have experiences they want to preserve. They may not have confronted their own mortality yet.
But, the Web at its base level is about making copies. So, if we get things on the Web as opposed to in apps, we win. Apps should be powered by, or connected to, a Web experience. How can we take advantage of the fact that every time you go to a Web page, you’re copying it? How can we take advantage of the CDN’s, which are already doing a lot of the work needed for preservation?
“There is also a growing class of apps that want to do the right thing.” E.g., TimeHop, that sends you an email reminding you of what you tweeted, etc., a year ago. This puts a user experience around the work of preservation. They’re marketing the value of the preservation community, but they don’t know it yet. Or Brewster, an iPhone address book that hooks up to all the address books you have on social services, reminding you to connect with people you haven’t touched in a while. This is a preservation app, although Brewster doesn’t know.
Then, how do we mine our personal archives? (He notes that his company’s tool, ThinkUp, is in this space.) His Nike fuel band captures data about his physical activity. The Quantified Self movement is looking at all sorts of data. “They too are preservationists, and they don’t know it.”
Then there are institutions. People revere the Library of Congress. Senior people at Twitter speak in a hushed voice when they say, “The tweets go to the LoC.” Take advantage of the institution’s authority. Don’t be shy. Meet them halfway. And say, “By the way, look at my cool email address.”
“PR trumps ToS.” ThinkUp archived the FB activity of the White House. At the time, FB’s ToS forbid archiving it for more than 24 hours. But the WH policy requires it. I said, “Please, FB, please cut off the White House’.” It turns out that FB was already planning on revising the policy. “What a great conversation we would have gotten to have.” You are our advocates, says Anil. You have an obligation to speak on our behalves.
The public is already violating “Intellectual Property” rules. “We don’t look at YouTube as the Million Mixers March, but that’s what it is.” It’s civil disobedience: People violating the law in public under their own names. These are people who recognize the value of preserving cultural works that otherwise would disappear. Sony won’t sell you a copy of Michael Jackson’s Thriller, but there are copies on YouTube. The heart and soul of those posting those videos is preservation. “All they want to do is what you do: make a copy of what matters to them.”
« Previous Page | Next Page »
|