Thanks to the persistence of Javier Ruiz of the British Open Rights Group, you can now read [pdf] the contract between the British Library and Google Books. Google has shrouded its book digitization contracts in non-disclosures wrapped in lead sheathing that is then buried in collapsed portions of the Wieliczka salt mines. It took a Freedom of Information Act request by Javier to get access, and Google restricts further re-distribution.
Javier points out that the contract is non-exclusive, although the cost of re-digitizing is a barrier. Also, while the contract allows non-commercial research into the scanned corpus, Google gets to decide which research to allow. “There is also a welcome clause explicitly allowing for metadata to be included in the Europeana database,” Javier reports.
But it’s disturbing that the cartoon purposefully makes the Fair Use “explanation” unintelligible. Presumably that’s because Fair Use is so complex and so difficult to defend that Google doesn’t even want to raise it as a possibility. Nevertheless, it seems like a missed opportunity to do some education. Worse, it’s a sign that we’ve pretty much given up on Fair Use.
Having written in opposition to the Google Books Settlement (123), I was pleased with Judge Chin’s decision overall. The GBS (which, a couple of generations ago would have unambiguously referred to George Bernard Shaw) was worked out by Google, the publishers, and the Authors Guild without schools, libraries, or readers at the table. The problems with it were legion, although over time it had gotten somewhat less obnoxious.
Yet, I find myself slightly disappointed. We so desperately need what Google was building, even though it shouldn’t have been Google (or any single private company) that is building it. In particular, the GBS offered a way forward on the “orphaned works” problem: works that are still in copyright but the owners of the copyright can’t be found and often are probably long dead. So, you come across some obscure 1932 piece of music that hasn’t been recorded since 1933. You can’t find the person who wrote it because, let’s face it, his bone sack has been mouldering since Milton Berle got his own TV show, and the publishers of the score went out of business before FDR started the Lend-Lease program. You want to include 10 seconds of it in your YouTube ode to the silk worm. You can’t because some dead guy and his defunct company can’t be exhumed to nod permission. Multiply this times millions, and you’ve got an orphaned works problem that has locked up millions of books and songs in a way that only a teensy dose of common sense could undo. The GBS applied that common sense — royalties would be escrowed for some period in case the rights owner staggered forth from the grave to claim them.. Of course the GBS then divvied up the unclaimed profits in non-common-sensical ways. But at least it broke the log jam.
Now it seems it’ll be up to Congress to address the orphaned works problem. But given Congress’ maniacal death-grip on copyright, it seems unlikely that common sense will have any effect and our culture will continue to be locked up for seventy years beyond the grave in order to protect the 0.0001 percent of publishers’ catalogs that continue to sell after fourteen years. (All numbers entirely made up for your reading pleasure.)
Jon Orwant is an Engineering Manager at Google, with Google Books under him. He used to be CTO at O’Reilly, and was educated at MIT Media Lab. He’s giving a talk to Harvard’s librarians about his perspective on how libraries might change, a topic he says puts him out on a limb. Title of his talk: “Deriving the library from first principles.” If we were to start from scratch, would they look like today’s? He says no.
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
Part I: Trends.
He says it’s not controversial that patrons are accessing more info online. Foot traffic to libraries is going down. Library budgets are being squeezed. “Public libraries are definitely feeling the pinch” exactly when people have less discretionary money and thus are spending more time at libraries.
At MIT, Nicholas Negroponte contended in the early 1990s that telephones would switch from wired to wireless, and televisions would go from wired to wireless. “It seems obvious in retrospect.” At that time, Jon was doing his work using a Connection Machine, which consisted of 64K little computers. The wet-bar size device he shows provided a whopping 5gb of storage. The Media Lab lost its advantage of being able to provide high end computers since computing power has become widespread. So, Media Lab had to reinvent itself, to provide value as a physical location.
Is there an analogy to the Negroponte switch of telephone and TV, Jon asks? We used to use the library to search for books and talk about them at home. In the future, we’ll use our computer to search for books, and talk about them at our libraries.
What is the mission of libraries, he asks. Se;ect and preserve info, or disseminate it. Might libraries redefine themselves? But this depends on the type of library.
1. University libraries. U of Michigan moved its academic press into the library system, even though the press is the money-making arm.
2. Research libraries. Harvard’s Countway Medical Library incorporates a lab into it, the Center for Bioinformatics. This puts domain expertise and search experts together. And they put in the Warren Anatomical Museum (AKA Harvard’s Freak Museum). Maybe libraries should replicate this, adopting information-driven departments. The ideal learning environment might be a great professor’s office. That 1:1 instruction isn’t generally tenable, but why is it that the higher the level of education, the fewer books are in the learning environment? I.e., kindergarten classes are filled with books, but grad student classrooms have few.
3. Public libraries. They tend to be big open rooms, which is why you have to be quiet in them. What if the architecture were a series of smaller, specialized rooms? Henry Jenkins said about newspapers, Jon says, that it’s strange that hundreds of reporters cover the Superbowl, all writing basically the same story; newspapers should differentiate by geography. Might this notion of specialization apply to libraries, reflecting community interests at a more granular level. Too often, public libraries focus on lowest common denominator, but suppose unusual book collections could rotate like exhibits in museums, with local research experts giving advice and talks. [Turn public libraries into public non-degree based universities?]
Part 2: Software architecture
Google Books want to scan all books. Has done 12M out of the 120 works (which have 174 manifestations — different versions and editions, etc.). About 4B pages, 40+ libraries, 400 languages (“Three in Klingon”). Google Books is in the first stage: Scanning. Second: Scaling. Third: What do we do with all this? 20% are public domain.
He talks a bit about the scanning tech, which tries to correct for the inner curve of spines, keeps marginalia while removing dirt, doing OCR, etc. At O’Reilly, the job was to synthesize the elements; at Google, the job is to analyze them. They’re trying to recognize frontispieces, index pages, etc. He gives as a sample of the problem of recognizing italics: “Copyright is way too long to strike the balance between benefits to the author and the public. The entire raison d’etre of copyright is to strike a balance between benefits to the author and the public. Thus, the optimal copyright term is c(x) = 14(n + 1).” In each of these, italics indicates a different semantic point. Google is trying to algorithmically catch the author’s intent.
Physical proximity is good for low-latency apps, local caching, high-bandwidth communication, and immersive environments. So, maybe we’ll see books as applications (e.g., good for physics text that lets you play with problems, maybe not so useful for Plato), real-time video connections to others reading the same book, snazzy visualizations, presentation of lots of data in parallel (reviews, related books, commentary, and annotations).”
“We’ll be paying a lot more attention to annotations” as a culture. He shows a scan of a Chinese book that includes a fold-out piece that contains an annotation; that page is not a single rectangle. “What could we do with persistent annotations?” What could we do with annotations that have not gone through the peer review process? What if undergrads were able to annotate books in ways that their comments persisted for decades? Not everyone would choose to do this, he notes.
We can do new types of research now. If you want to know whether the past tense of “sneak” is, 50 yrs ago people would have said “snuck” but in 50 years it’ll be “sneaked.” You can see that there is a trend toward regularization of verbs (i.e., not irregular verbs) over the time, which you can see by examining the corpus of books Google makes available to researchers. Or, you can look at triplets of words and ask what are the distinctive trigrams. E.g., It was: oxide of lead, vexation of spirit, a striking proof. Now: lesbian and gay, the power elite, the poor countries. Steve Pinker is going to use the corpus to test the “Great man” theory. E.g., when Newton and Leibniz both invented the calculus, was the calculus in the air? Do a calculus word cloud in multiple languages and test against the word configurations of the time. The usage of phrases “World War I” and “The Great War” cross around 1938, but there were some people calling it “WWI” in 1932, which is a good way to discover a new book (wouldn’t you want to read the person who foresaw WWII?). This sort of research is one of the benefits of the Google Books settlement, he says. (He also says that he was both a plaintiff and defendant in the case because as an author, his book was scanned without authorization.)
The images of all the world’s books are about 100 petabytes. If you put terminals in libraries so anyone can access out of print books. You can let patrons print on demand. “Does that have an impact on collections” and budgets? Once that makes economic sense, then every library will “have” every single book.
How can we design a library for serendipity? The fact that books look different is appealing, Jon says. Maybe a library should buy lots and lots of different e-readers, in different form factors. The library could display info-rich electronic spines (graphics of spines) [Jon doesn't know that this is an idea the Harvard Law Library, with whom I'm working, is working on]. We could each have our own virtual rooms and bookshelves, with books that come through various analytics, including books that people I trust are reading. We could also generalize this by having the bookshelves change if more than one person in the room; maybe the topics get broader to find shared interests. We could have bookshelves for a community in general. Analytics of multifactor classification (subject, tone, bias, scholarliness, etc.) can increase “deep” serendipity.
Q: One of the concerns in the research and univ libraries is the ability to return to the evidence you’ve cited. Having many manifestations (= editions, etc.) lets scholars return. We need permanent ways of getting back to evidence at a particular time. E.g., Census Dept. makes corrections, which means people who ran analyses of the data get different answers afterward.
A: The glib answer: You just need better citation mechanisms. The more sophisticated answer: Anglo-Saxon scholars will hold up a palimpsest. I don’t have an answer, except for a pointer to George Mason conf where they’re trying to come up with a protocol for expressing uncertainty [I think I missed this point -- dw]. What are all the ways to point into a work? You want to think of the work as a container, with all the annotations that come up with it. The ideal container has the text itself, info extracted from it, the programs needed to do the extraction, and the annotations. This raises the issue of the persistence of digital media in general. “We need to get into the mindset of bundling it all together”: PDFs and TIFFs + the programs for reading them. [But don't the programs depend upon operating systems? - dw]
Q: Centralized vs. distributed repository models?
A: It gets into questions of rights. I’d love to see it as distributed to as many places and in as many formats as possible. It shouldn’t just be Google digitizing books. You can get 100 petabytes in a single room, and of course much smaller in the future. There are advantages to keeping things local. But for the in-copyright works, it’ll come down to how comfortable the holders feel that it’s “too annoying” for people to copy what they shouldn’t.
Charlie Leadbeater has a terrific post on the threats posed by the fact that The Cloud (as in “cloud computing”) too often actually is a recentralizing of the Net by profit-seeking companies.
The easiest example cited by Charlie is Google Books, which provides a tremendous service but at the social cost of giving a single company control over America’s digital library. The problem here isn’t capitalism but monopolization; an open market in which other organizations could (the pragmatic “could,” not the legal or science fiction “could”) also offer access to scanned libraries would create a cloud of books not solely controlled by any single company. (The Google Books settlement threatens to rule out competition because without an equivalent agreement with publishers and authors, any other organization that scans and provides access to books runs the strong risk of being sued for copyright infringement, especially when it comes to books whose copyright holders are hard to find. The revision of the Settlement is less egregiously monopolistic.)
Here is a letter Lewis Hyde sent to Judge Denny Chin who is considering the proposed Google Books settlement. I’ve also appended a supporting letter written by Eric Saltzman. The issue is that the newly-proposed trustee overseeing the handling of “orphaned works” (i.e., works that are still in copyright but whose copyright holders cannot be found) still does not have the power to adequately represent the interests of the rights holders, especially when it comes to allowing companies that are not Google to license the works. Granting Google a monopoly on these works seems like too much of a reward for Google’s scanning of them (which I’ve costs about $30/book), and does not seem to serve the interests of the rights holder or — more important, from my point of view — the overall social good of increasing access to these works. (Note: I am not a lawyer.)
So, here are the letters, minus some addresses, etc.:
27 January 2010Â
Dear Judge Chin:Â Â
I write to amend the letter of objection that I wrote last August in regard to The Authors Guild, Inc., et al. v. Google Inc. (Case No. 1:05-cv-08136-DC).Â My August letter is on file with your office as Document 480.Â Â
I shall here limit my remarks to provisions of the amended settlement that are changed from the original settlement, specifically to the role of the newly proposed trustee for orphan works.Â Â
I object to the fact that, despite the amended settlement’s creation of an Unclaimed Works Fiduciary (UWF), the monopoly powers that Google and the Books Rights Registry will acquire, should the Court approve the orphan works elements of the settlement, still stand.Â The settling parties have limited the role of the UWF such that he may discharge some duties of the registry in some circumstances, but little else.Â He cannot act fully on behalf of the rightsholders of unclaimed books; he cannot, for example, license their work to third parties.Â Â
To put this another way, it is still the case that an approved settlement will in essence grant the settling parties unique compulsory licenses for the exploitation of orphan works.Â But why make such licenses unique?Â If the Court and the settling parties believe that they can authorize compulsory licenses of any sort, why not go the extra step and grant such licenses broadly so that competing providers can enter this market?Â Â
To address the problem of monopoly in the market for digital books the UWF should be empowered to act as a true trustee.Â As such, he should make every effort to locate lost owners, communicate to them their rights under the approved settlement, and pay them their due.Â Absent their instructions to the contrary, he should deliver the works of lost owners to the public through the efficiencies of a fully competitive market.Â Â
As Chief Justice Rehnquist has written in regard to the larger purposes of our copyright laws:Â “We have often recognized the monopoly privileges that Congress has authorized … are limited in nature and must ultimately serve the public good…” (Fogerty v. Fantasy, Inc., 510 U.S. 517 (1994)).Â In regard to both content owners and the public, then, the fiduciary needs to operate in an open economy of knowledge and, for that, he will need the freedom to license work to other actors.Â Â
(Note:Â I have asked my attorney, Eric Saltzman, to separately address the question of the UWF’s authority to license orphaned works to others; please see the attached addendum to this letter.)Â Â
Richard L. Thomas Professor of English
Eric F. Salzman
Re: The Authors Guild, Inc., et al. v. Google Inc. (Case No. 1:05-cv-08136-DC).Â
Dear Judge Chin:Â
My client, Lewis Hyde, tells the Court in his letter of January 27th that the new proposed settlement cannot be fair to the owners of the copyrights in the orphan works and to the public unless it allows the Unclaimed Works Fiduciary to make licenses to other providers to allow competition with the monopoly plan that Google and the Plaintiffs now propose to the Court.Â Â
I would like to offer the Court additional support for Professor Hyde’s objection and suggestion.Â Â
If the named plaintiffs or others who “opt in” to the settlement wish to sign on to it with their own copyrights (and if it survives any antitrust process), then that shall be their prerogative.Â However, the combination in this class action lawsuit of inadequate representation and significant actual conflicts among the so-called class should make the Court skeptical of granting a monopolistic license of the absent members’ copyrights.Â Â
If the Court does decide to approve a settlement of the case, it should not approve one where Plaintiff’s counsel have consented to deliver the licenses for the orphan works to just one licensee.Â
It would be a complete fiction to say that Plaintiffs’ attorneys have adequately represented the orphan works authors and their successors in interest in this case.Â The original settlement proposal clearly demonstrated counsel’s willingness and ability to compromise or, at least, to ignore the orphan works owners’ interests in favor of the named plaintiffs who engaged them and whose assent they needed to cut the deal. Â
The problem of plaintiff counsel shaping a settlement attractive to the clients before them at the expense of absent class members is a well-discussed problem in class action jurisprudence.Â This Court may take notice of an incentive in that direction, the more than fifty million dollars of fees that Google has agreed to pay to Plaintiffs’ counsel if the settlement goes through.Â Â
Allow me to point out two methods whereby the proposed settlements seriously shortchanged the orphan works owners to enrich other class members at their expense. Â
The proposed settlement provides that “Google will make a Cash Payment of at least $60 per Principal Work, $15 per Entire Insert and $5 per Partial Insert for which at least one Rightsholder has registered a valid claim by the opt-out deadline” (Emphasis supplied). According to the settlement, total payments will amount to $45 million. Â
By definition, no orphan work Rightsholders could meet this registration condition.Â Thus was the settlement engineered so that the rightsholders of orphan works and their successors-in-interest would not and could not get any share of the up-front payments total. Â
Evidently, in dividing up the scores of millions of dollars that defendant Google was ultimately willing to pay up-front (i.e., unrelated to yet unproven forthcoming revenues) to settle the lawsuit, counsel felt no obligation to share any of it with the orphan works owners, even if the rightsholder should later appear and wish to register and claim that payment.Â This very large slice of the pie would go only to the known rightsholders, their de facto clients.Â
This economic discrimination against the orphan works rightsholders went beyond just up-front payments. It also took unclaimed (after five years) revenues from exploitation of the orphan works and assigned them to the known rightsholders of other books, thus promising still further enrichment of the client sub-class with actual control over the settlement.Â Â
That particular feature drew such unpleasant attention to the bias in representation in favor of the known rightsholders (and disfavoring the orphan works rightsholders) that it was written out of the settlement proposal now before the Court.Â Nevertheless, the Plaintiffs’ counsel who now urge the court to approve this revised settlement agreement are the same counsel who, in the first settlement go-around, assured the Court then (as they do now) that they had adequately represented the entire class, including the orphan works rightsholders.Â
Commonality and adequacy of representation are two touchstones for class certification.Â “The adequacy inquiry under Rule 23 (a) (4) serves to uncover conflicts of interest between named parties and the class they seek to represent.” Amchem Prods. v. Windsor, 521 U.S. 591 at 625 (1997). Â
In Amchem, the Supreme Court upheld the Third Circuit Court’s decertification of the class because it found that “â€¦the settling parties achieved a global compromise with no structural assurance of fair and adequate representation for the diverse groups and individuals affected. The Third Circuit found no assurance here that the named parties operated under a proper understanding of their representational responsibilities. That assessment is on the mark.” Id at 595.Â
As demonstrated above, much less than promising the “structural assurance of fair and adequate representation for the diverse groups and individuals affected”, the settlements that were and are proposed to this Court suggest that advantaging the named class members at the expense of the unrepresented orphan works rightsholders was a goal successfully achieved during the settlement negotiation.Â
Accordingly, if the Court will entertain a settlement, it should itself take on the burden of making sure that the orphan works rightsholders interests are well protected.Â At this point, the best way to do so is to free the orphan works from the monopoly straitjacket that the proposed settlement forces on them.Â Â
Let the parties live with the deal they made for the parties who were, in fact, adequately and aggressively represented. For the inadequately represented sub-class, the orphan works rightsholders, the Court should empower the UWF (or similar fiduciary) to license their works into the open market. With this authority going forward, the UWF will, as well, be able to adjust licensing of digital rights in these works to the market conditions in an area that is still very new and sure to develop in ways that are, today, impossible to predict.Â Â
Professor Hyde’s objection addresses the two enormous flaws in the proposed settlement:Â 1. the actual conflicts within the class together with the failure of adequate representation of the orphan works rightsholders, and 2.Â the anti-competitive effect of the full copyright term license it would grant to Google only.Â The first undermines both the process by which the settlement was achieved and, correspondingly, the public confidence in the courts.Â The second hurts both the orphan works rightsholders and the strong public interest in access to the knowledge and creativity these books offer.Â Â
Short of a initiating a new attempt at settlement — with new counsel for the orphan works rightsholders — the changes Professor Hyde proposes would achieve a result that would be fair for all the parties and for the public.Â Â
Here’s a summary of the summary Google provides [pdf], although IANAL and I encourage you to read the summary, which is written in non-legal language and is only 2 pages long:
1. The agreement now has been narrowed to books registered for copyright in the US, or published in the UK, Australia or Canada.
2. There have been changes to the terms of how “orphaned works” (books under copyright whose rightsholders can’t be found) are handled. The revenue generated by selling orphaned works no longer will get divvied up among the authors, publishers and Google, none of whom actually have any right to that money. Instead it will go to fund active searching for the rightsholders. (At the press call covered by Danny Sullivan [see below], the Authors Guild rep said that with money, about 90% of missing rightsholders can be found.) After holding those revenues in escrow (maybe I’m using the wrong legal term) for ten years (up from five in the first settlement), the Book Rights Registry established by the settlement can ask the court to disburse the funds to “nonprofits benefiting rightsholders and the reading public”; I believe in the original, the Registry decided who got the money. So, in ten years there may be a windfall for public libraries, literacy programs, and maybe even competing digital libraries. (The Registry may also (determined by what?) give the money to states under abandoned property laws. (No, I don’t understand that either.))
The new settlement creates a new entity: A “Court-approved fiduciary” who represents the rightsholders who can’t be found. (James Grimmelmann [below] speculates interestingly on what that might mean.)
3. The settlement now explicitly states that any book retailer can sell online access to the out-of-print books Google has scanned, including orphaned works. The revenue split will be the same (63% to the rightsholder, “the majority of” 37% to the retailer).
4. The settlement clarifies that the Registry can decide to let public libraries have more than a pitiful single terminal for public access to the scanned books. The new agreement also explicitly acknowledges that rightsholders can maintain their Creative Commons licenses for books in the collection, so you could buy digital access and be given the right to re-use much or all of the book. Rightsholders also get more control over how much Google can display of their books without requiring a license.
5. The initial version said Google would establish “market prices” for out of print book, which seemed vague because what counts as the market for out-of-print books? The new agreement clarifies the algorithm, aiming to price them as if in a competitive market. And, quite importantly, the new agreement removes the egregious “most favored nation” clause that prevented more competitive deals to be made with other potential book digitizers.
From my non-legal point of view, this addresses many of the issues. But not all of them.
I’m particularly happy about the elements that increase competition and access. It’s big that Amazon and others will be able to sell access to the out-of-print books Google has scanned, and sell access on the same terms as Google. As I understand it, there won’t be price competition, because prices will be set by the Registry. Further, I’m not sure if retailers will be allowed to cut their margins and compete on price: If the Registry prices an out-of-print book at $10, which means that $6.30 goes to the escrow account, will Amazon be allowed to sell it to customers for, say $8, reducing its profit margin? If so, then how long before some public-spirited entity decides to sell these books to the public at their cost, eschewing entirely the $3.70 (or the majority of that split, which is what they’re entitled to)? I don’t know.
I also like the inclusion of Creative Commons licensing. That’s a big deal since it will let authors both sell their books and loosen up the rights of reuse.
As far as getting rid of the most favored nation clause: Once the Dept. of Justice spoke up, it’s hard to imagine it could have survived more than a single meeting at Google HQ.
Reactions from the critics has not been all that positive.
The Open Book Alliance (basically an everyone-but-Google consortium) is not even a little amused, because the new agreement doesn’t do enough to keep Google from establishing a de facto monopoly over digital books. The Electronic Frontier Foundation is not satisfied because no reader privacy protections were added. Says the ACLU: “No Settlement should be approved that allows reading records to be disclosed without a properly-issued warrant from law enforcement and court orders from third parties. ”
Danny Sullivan live-blogged the press call where Google and the other parties to the settlement discussed the changes. It includes a response to Open Book Alliance’s charges.
Harry Lewis has a terrific post about a $300 do-it-yourself book scanner he saw at the D is for Digitize conference on the Google Book settlement. The plans are available at DIYBookScanner.org, from Daniel Reetz, the inventor.
There are lots of personal uses for home-digitized books, so â€” I am definitely not a lawyer â€” I assume it’s legal to scan in your own books. But doesn’t that just seem silly if your friend or classmate has gone to the trouble of scanning in a book that you already own? Shouldn’t there be a site where we can note which books we’ve scanned in? Then, if we can prove that we’ve bought a book, why shouldn’t we be able to scarf up a copy another legitimate book owner has scanned in, instead of wasting all the time and pixels scanning in our own copy?
Isn’t Amazon among the places that: (a) knows for sure that we’ve bought a book, (b) has the facility to let users upload material such as scans, and (c) could let users get an as-is scan from a DIY-er if there is one available for the books they just bought?
Viktor Mayer-Schönberger is giving a talk at the Berkman Center (well, actually at Pound Hall) on his book Delete: The Virtue of Forgetting in the Digital Age. Viktor teaches at Singapore University, and was at the Kennedy School for ten years.
NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.
He begins with a story of person studying to become a teacher who was kicked out of school because the school noticed a photo of her drinking on Facebook. She tried deleting it, but the Internet remembered it. He gives another example: A person who noted in an article that he had taken LSD in the 1960s. When trying to cross into the US, an immigration officer refused him admittance because he hadn’t offered up that information, and the officer uncovered it by googling him. What’s put on the Web is never forgotten. In another example, the information was not put up by the individual but by someone else: a bar/club in Europe records all the people, all the drinks, etc., and hasn’t ever deleted any information. Likewise, Google knows more about us than we can remember.
For millennia, forgetting was easy, and remembering was hard, says Viktor. So, we’ve come up with ways to pass on our memories. The oral tradition. Painting. Writing. “But these tools have not altered the fundamental fact that for us humans, forgetting is easy, and remembering is time-consuming and expensive.” The book and the photo also haven’t altered this fact. What is long past fades in our mind. We depreciate what is no longer relevant. But because forgetting is biological, we never had to develop explicit strategies to forget. Now we’ve moved from biologically forgetting to permanent remembering. [Hmm. I haven't. We still don't remember much. But we have more records, and thus are able to retrieve more. That seems different to me.]
This has happened because storage is cheap in the digital world. Google has server farms with a capacity of 100,000 terabytes perhaps. And we’ve gotten much better at retrieving information. And we have global access. Remembering has become the default.
There are, of course, benefits to this, Viktor says. But undoing forgetting has deep consequences, far beyond the information efficiencies. He points to power and time.
Power: If others have info about us and can keep that info accessible for a very long time, the informational power increases, and can affect how we transact and interact. It’s Bentham’s Panopticon: behavioral compliance through the permanent threat of constant surveillance.
Time: Imagine Jane is about to catch up with her old friend John, but when reviewing their history of email, discovers msgs from a time when he was nasty to her. She had forgotten that time. Now it comes back. Her current relationship with John now is ruined. [Or, she discovers msgs that remind her she once loved him. Isn't Viktor's example actually an argument for more remembering, so she can see how she got over the bad time?] “In analog times, the dangers were limited” because our biology would have brought us to forget.
Viktor talks about AJ, a non-fictional woman who has difficulty forgetting. It is a weird and unhappy condition.[This is why the conflation of human remembering and the presence of a fairly complete digital record matters. The presence of digital info and the tools for retrieving it does not turn us into AJ.]
Without forgetting, we have trouble changing. We have trouble forgiving. We may turn into an unforgiving society. “This is the real danger of shifting the default from forgetting to remembering.” Worse, suppose we stop relying on our own memories and rely instead on the digital memories. “Does that give those who control digital memory the power over history?”
What to do? Perhaps give privacy rights to individuals. But there are weaknesses: It’s not politically feasible in the US. The European have those rights, but people have not used them.
Or perhaps we could create an information ecology, a regulatory construction of what can be remembered. E.g., it might require the deletion of info after a particular time. This does not require individuals to go to court for enforcement, and it protects against an unforeseen future as when the benign Dutch social services registry was repurposed by the Nazis to identify Jews. “It may be better to store less than more.” But, after 9/11, we’re seeing requirements for increasing data retention, Viktor notes.
So, maybe we need to augment these approaches. “Digital abstinence,” for example. Don’t put everything on Facebook. But abstinence isn’t all that reasonable, he says. By the end of 2007, two out of three young Americans had put their info online.
The opposite approach is “full contextualization.” E.g., Jane can’t find the context of her bad treatment by John. Full contextualization would restore that. But will that ever be technically feasible? And if it were, would it really address the challenge of digital remembering? Do we have time to relive our past again and again?
Another approach: Hope for a cognitive adjustment. That is, over time we’ll learn to devalue older info and learn to live with an omnipresent past. “That would solve our problem. But is it likely?” How long would it take us to change how we assess information? “Cognitive psychologists are very critical of our ability to change our decision making in the short run.” [But a change in norms can happen much faster than that, and we govern what we're allowed to notice and remember through norms. Statements like "That's water under the bridge" and "Youthful indiscretions" are expressions of norms that enforce social forgetting without requiring actual brain evolution.]
Or, we could change our technology, rather than changing ourselves. E.g., a global DRM system to protect privacy. Viktor is not recommending this: “Wouldn’t this be a perfect surveillance system?” And we’d have to make sure that privacy is built deep into the infrastructure.
None of these six solutions are sufficient, although all offer something.
“I advocate a revival of forgetting…to establish a mechanism that makes forgetting easy, and makes remembering just a bit more strenuous.” Just enough to shift the incentives back to what we humans are used to. Viktor suggests an expiry date for information. Whenever we save info, we should be prompted to put in a date when we want it deleted. We should be able to change those dates.
The core of this proposal isn’t the automatic deletion, he says. Rather, the prompting for the date will remind us humans that most information is not of permanent value.
E.g., search engines could offer us an easy way to say how long we should remember searches. Or people could carry a device on their keyring to set expiration dates, perhaps tagging the expiration dates for the images of the people in digital photos.
Any expiry date system must have only two characteristics. First, it must aim at changing the default from remembering back to forgetting. Second, it must remind us of information’s temporal nature.
Expiry dates are also no silver bullet, and don’t solve digital privacy problems, Viktor says. But they could be useful when used with some of the other proposed solutions.
“Forgetting is often forgotten…Let us remember to forget.”
Q: You don’t mention the propensity of all media to fade over time. Digital memory is not perfect. Also, data is growing so quickly that it gets too expensive to digitally remember everything. The amount of data is growing faster than Moore’s Law.
A: You don’t need much space to remember a billion queries a day. A couple of hundred dollars worth of data storage. And Google’s way of saving data is relatively future-proof.
Q: [me] If we take memory to mean only the human capacity, and digital “memory” to be more like what we usually call storage, then what has actually happened to human memory in the digital age?
A: I chose the term “digital memory” carefully. If I can’t access my VCR tapes easily, they’re pretty much useless to me. Digital stuff is so easily accessible. How has digital remembering changed human remembering? I don’t know. But my argument isn’t that it’s changed human remembering, but that it has changed the external stimuli affecting our memory.
Q: One of the way a culture forgets is that it lets books go out of print, get moved out of libraries, etc. Now we have Google Books, which will make all books ever printed available (pretty much). Do you see negative effects of this project?
A: I haven’t given it enough thought because authors would like to set their books’ expiry dates very far in the future. Some preliminary research we’re doing on court decisions are showing an interesting effect on memory.
Q: The author of the book isn’t the only one concerned with the info in it. There may be people written about who would want to a say…
A: Yes, and the author’s rights aren’t always fully owned by them.
Q: Digital memory has value as cultural memory. The things we’d put expiration dates on have value even if against the interests of the people at the time, because it has social and historic meaning…
A: That’s just conjecture…
Q: No it’s not. We’re leaving traces now all the time. How we put that info to use is a different question.
A: Suppose you’re an author. Shouldn’t you be able to put bad early stories into the trash bin? Why should society have the right to take it from you and preserve it and make it public?
Q: Great point, but we still do struggle with this. Nonetheless, I would recommend we give thought to how these things might sensibly be balanced. E.g., the Iran election twitter stream. Enormous amt of fascinating info has been lost.
A: The solution is built in. For certain contexts, we may be required to mandate a very long expiry date. We do that all the time. I’m arguing for keeping that as the exception to the rule.
Q: I’m a cultural historian, trained as a Medievalist. There’s data scarcity in that field. Who decides about inclusion, preservation, etc.? Institutions have performed the filtering role. Google keeps some types of info and not others. Others are interested in your social security number, etc. So, who are the gatekeepers? There’s power to the Internet Archive’s approach of capturing everything. The stuff that the institutions of memory don’t preserve may turn out to be the most interesting for historians. (I basically buy your core argument, although I’m a believer in the cognitive adjustment.) <
A: Brewster Kale and I (of Internet Archive) are in general agreement. The Archive sets expiry dates. [Not sure I got that right. Sorry.] My core argument is to give back the choice to the individuals.
Q: I too believe in the cognitive adjustment because I see myself and others already doing that. Sure, you find old emails reminding of something you wanted to forget, but when you accidentally delete some years’ worth, you feel an intense sense of loss.
A: When I lost all my email at the end of 1998, I was completely horrified. But then I discovered it doesn’t really matter. I started out believing the cog adjustment argument, but after I read cog science books, I changed my mind. I want to plug The Seven Sins of Memory, which shows how hard it is to readjust.
Q: Suppose two of us in a shared record have different expiry preferences…
A: I talk about that a lot in the book.
Q: There’s a big diff between what I want to preserve and what others do. The European privacy laws require data deletion. Google and others are now negotiating with the European Commission about this …
A: We need to differentiate between privacy rights and norms.
[missed a couple of questions. sorry.]
Viktor says that he recognizes that expiry dates are a crude instrument. Too binary. “I’d prefer rusting or something like that.” :)