Amanda Filipacchi has a great post at the New York Times about the problem with classifying American female novelists as American female novelists. That’s been going on at Wikipedia, with the result that the category American novelist was becoming filled predominantly with male novelists.
Part of this is undoubtedly due to the dumb sexism that thinks that “normal” novelists are men, and thus women novelists need to be called out. And even if the category male novelist starts being used, it still assumes that gender is a primary way of dividing up novelists, once you’ve segregated them by nation. Amanda makes both points.
From my point of view, the problem is inherent in hierarchical taxonomies. They require making decisions not only about the useful ways of slicing up the world, but also about which slices come first. These cuts reflect cultural and political values and have cultural and political consequences. They also get in the way of people who are searching with a different way of organizing the topic in mind. In a case like this, it’d be far better to attach tags to Wikipedia articles so that people can search using whatever parameters they need. That way we get better searchability, and Wikipedia hasn’t put itself in the impossible position of coming up with a taxonomy that is neutral to all points of view.
Wikipedia’s categories have been broken for a long time. We know this in the Library Innovation Lab because a couple of years ago we tried to find every article in Wikipedia that is about a book. In theory, you can just click on the “Book” category. In practice, the membership is not comprehensive. The categories are inconsistent and incomplete. It’s just a mess.
It may be that a massive crowd cannot develop a coherent taxonomy because of the differences in how people think about things. Maybe the crowd isn’t massive enough. Or maybe the process just needs far more guidance and regulation. But even if the crowd can bring order to the taxonomy, I don’t believe it can bring neutrality, because taxonomies are inherently political.
There are problems with letting people tag Wikipedia articles. Spam, for example. And without constraints, people can lard up an object with tags that are meaningful only to them, offensive, or wrong. But there are also social mechanisms for dealing with that. And we’ve been trained by the Web to lower our expectations about the precision and recall afforded by tags, whereas our expectations are high for taxonomies.
Ars Technica has a post about Wikidata, a proposed new project from the folks that brought you Wikipedia. From the project’s introductory page:
Many Wikipedia articles contain facts and connections to other articles that are not easily understood by a computer, like the population of a country or the place of birth of an actor. In Wikidata you will be able to enter that information in a way that makes it processable by the computer. This means that the machine can provide it in different languages, use it to create overviews of such data, like lists or charts, or answer questions that can hardly be answered automatically today.
Because I had some questions not addressed in the Wikidata pages that I saw, I went onto the Wikidata IRC chat (http://webchat.freenode.net/?channels=#wikimedia-wikidata) where Denny_WMDE answered some questions for me.
[11:29] hi. I’m very interested in wikidata and am trying to write a brief blog post, and have a n00b question.
[11:29] go ahead!
[11:30] When there’s disagreement about a fact, will there be a discussion page where the differences can be worked through in public?
[11:30] two-fold answer
[11:30] 1. there will be a discussion page, yes
[11:31] 2. every fact can always have references accompanying it. so it is not about “does berlin really have 3.5 mio people” but about “does source X say that berlin has 3.5 mio people”
[11:31] wikidata is not about truth
[11:31] but about referenceable facts
When I asked which fact would make it into an article’s info box when the facts are contested, Denny_WMDE replied that they’re working on this, and will post a proposal for discussion.
So, on the one hand, Wikidata is further commoditizing facts: making them easier and thus less expensive to find and “consume.” Historically, this is a good thing. Literacy did this. Tables of logarithms did it. Almanacs did it. Wikipedia has commoditized a level of knowledge one up from facts. Now Wikidata is doing it for facts in a way that not only will make them easy to look up, but will enable them to serve as data in computational quests, such as finding every city with a population of at least 100,000 that has an average temperature below 60F.
On the other hand, because Wikidata is doing this commoditizing in a networked space, its facts are themselves links — “referenceable facts” are both facts that can be referenced, and simultaneously facts that come with links to their own references. This is what Too Big to Know calls “networked facts.” Those references serve at least three purposes: 1. They let us judge the reliability of the fact. 2. They give us a pointer out into the endless web of facts and references. 3. They remind us that facts are not where the human responsibility for truth ends.
, too big to know
Tagged with: 2b2k
• big data
Date: March 31st, 2012 dw
In the straight-up match between paper and Web, the Encyclopedia Britannica lost. This was as close to a sure thing as we get outside of the realm of macro physics and Meryl Streep movies.
The EB couldn’t cover enough: 65,000 topics compared to the almost 4M in the English version of Wikipedia.
Topics had to be consistently shrunk or discarded to make room for new information. E.g., the 1911 entry on Oliver Goldsmith was written by no less than Thomas Macaulay, but with each edition, it got shorter and shorter. EB was thus in the business of throwing out knowledge as much as it was in the business of adding knowledge.
Topics were confined to rectangles of text. This is of course often a helpful way of dividing up the world, but it is also essentially false. The “see also’s” and the attempts at synthetic indexes and outlines (Propædi) helped, but they were still highly limited, and cumbersome to use.
All the links were broken.
It was expensive to purchase.
If you or your family did not purchase it, using it required a trip to another part of town where it was available only during library hours.
It was very slow to update — 15 editions since 1768 — even with its “continuous revision” policy.
Purchasers were stuck with an artifact that continuously became wronger.
Purchasers were stuck with an artifact that continuously became less current.
It chose topics based on agendas and discussions that were not made public.
You could not see the process by which articles were written and revised, much less the reasoning behind those edits.
It was opaque about changes and errors.
There were no ways for readers to contribute or participate. For example, underlining in it or even correcting errors via marginalia would get you thrown out of the library. It thus crippled the growth of knowledge through social and networked means.
It was copyrighted, making it difficult for its content to be used maximally.
Every one of the above is directly or indirectly a consequence of the fact that the EB was a paper product.
Paper doesn’t scale.
Paper-based knowledge can’t scale.
The Net scales.
The Net scales knowledge.
I should probably say something nice about the Britannica:
Extremely smart, very dedicated people worked on it.
It provided a socially safe base for certain sorts of knowledge claims.
Owning it signaled that one cared about knowledge, and it’s good for our culture for us to be signaling that sort of thing.
The inestimably smart and wise Matthew Battles has an excellent post on the topic (which I hesitate to recommend only because he refers to “Too Big to Know” overly generously).
, too big to know
Tagged with: 2b2k
Date: March 20th, 2012 dw
Jimmy Wales has proposed that Wikipedia might black out its English-language pages for a short period to register opposition to the SOPA law that would allow the US government to shut down access to sites that provide access to material that infringes copyright. These shutdowns would occur without the need for any judicial procedure, without notice, and without appeal.
I think Jimmy’s idea is great and that all sites that could be affected by SOPA — which is to say any site — ought to join in. Just name the date and time, and many of us would turn out our sites’ lights.
[Minutes later: Through a failure in my command of in-page searching, I missed Cory Doctorow's proposing exactly this on BoingBoing. Go Jimmy! Go Cory!]
(Here’s Rebecca MacKinnon’s op-ed on SOPA and its Senate version, which together would constitute a Great Firewall of America, as she says. [A couple of hours later: Rebecca and Ivan Sigal just posted a terrific op-ed on the topic at CNN.com)
Tagged with: jimbo wales
• jimmy wales
Date: December 14th, 2011 dw
My podcast interview of Yochai Benkler about his excellent new book, The Penguin and the Leviathan has been posted. Yochai makes brilliantly (of course) a case that shouldn’t need making, but that in fact does very much need to be made: that we are collaborative, social, cooperative creatures. Your unselfish genes will thoroughly enjoy this book.
And, Joseph Reagle has promulgated the following email about his excellent, insightful book that explores the subtleties of the social structures that enable Wikipedia to accomplish its goal of being a great encyclopedia:
I’m pleased to announce that the Web/CC edition of *Good Faith Collaboration* is now available. In addition to all of the book’s complete content, hypertextual goodness, and fixed errata, there is a new preface discussing some of the particulars of this edition.
In what is certainly the coolest proof that philosophy is the Queen of the Sciences, and also the Duke and all of the in-laws, xefer shows the relation of any Wikipedia topic to its “Philosophy” article. (Hat tip to Hal Roberts.)
Tagged with: kevin bacon
Date: June 8th, 2011 dw
I just shared a cab with James Bridle, a UK publisher and digital activist (my designation, not his) who is the brilliance behind the printing out of the changes to the Wikipedia article on the Iraq War. It turns out that those changes — just the changed portions — fill up twelve volumes.
What does the project show? “The argument,” James says. Of course it also shows the power of the cognitive surplus: we just casually created twelve volumes of changes in our spare time. If only all users of Wikipedia all understood how it’s put together! (Rather than banning students from using Wikipedia, it’d be far better if teachers required students to click on the “Discussion” tab.)
, too big to know
Tagged with: 2b2k
• cognitive surplus
Date: March 27th, 2011 dw
Notabilia has visualized the hundred longest discussion threads at Wikipedia that resulted in the deletion of an article and the hundred that did not. The visualized threads take on shapes depending on whether the discussion was controversial, swinging, or unanimous. For those whose brains can process visualized information (as mine cannot), you will undoubtedly learn much. For the rest of us: Oooooh, pretty!
They’ve posted some other analyses as well. For example, “The analysis [pdf] of a large sample of AfD discussions (200K discussions that took place between November 2002 and July 2010) suggests that the largest part of these discussions ends after only a few recommendations are expressed.” And: “Delete decisions tend to be fairly unanimous. In contrast, we found many Keep decisions resulting from a discussion that leaned towards deletion…”
The latest Radio Berkman podcast is up. This time, it’s with Joseph Reagle, author Good Faith Collaboration, about the culture of Wikipedia. And as a special bonus, if you act now (or later), there’s a bonus interview with Zack Exley, Chief Community Officer for the Wikimedia Foundation.
Categories: too big to know
Tagged with: wikipedia
Date: November 23rd, 2010 dw
Want to see one way to use the Web to teach? Berkman‘s Jonathan Zittrain and Stanford Law’s Elizabeth Stark are teaching a course called Difficult Problems in Cyberlaw. It looks like they have students creating wiki pages for the various topics being discussed. The one on “The Future of Wikipedia” is a terrific resource for exploring the issues Wikipedia is facing.
Among the many things I like about this approach: It implicitly makes the process of learning â€” which we have traditionally taken as an inward process â€” a social, outbound process. By learning this way. we are not only enriching ourselves, but enriching our world.
My only criticism: I wish the pages had prominent pointers to a main page that explains that the pages are part of a course.
Next Page »