Joho the Blog » wikipedia

January 10, 2011

Visualizing Wikipedia deletions

Notabilia has visualized the hundred longest discussion threads at Wikipedia that resulted in the deletion of an article and the hundred that did not. The visualized threads take on shapes depending on whether the discussion was controversial, swinging, or unanimous. For those whose brains can process visualized information (as mine cannot), you will undoubtedly learn much. For the rest of us: Oooooh, pretty!

They’ve posted some other analyses as well. For example, “The analysis [pdf] of a large sample of AfD discussions (200K discussions that took place between November 2002 and July 2010) suggests that the largest part of these discussions ends after only a few recommendations are expressed.” And: “Delete decisions tend to be fairly unanimous. In contrast, we found many Keep decisions resulting from a discussion that leaned towards deletion…”

Follow me

1 Comment »

November 23, 2010

Radio Berkman: Wikipedia

The latest Radio Berkman podcast is up. This time, it’s with Joseph Reagle, author Good Faith Collaboration, about the culture of Wikipedia. And as a special bonus, if you act now (or later), there’s a bonus interview with Zack Exley, Chief Community Officer for the Wikimedia Foundation.

Follow me

Categories: too big to know Tagged with: wikipedia Date: November 23rd, 2010 dw

5 Comments »

July 19, 2009

Britannica: #1 at Google

Today, for the very first time in my experience, The Encyclopedia Britannica was the #1 result at Google for a query.

It’s good to see the EB making progress with its online offering, but I’m actually puzzled in this case. The query was “horizontal hold” (without quotes), and the EB page that’s #1 is pretty much worthless. It’s a stub that gives a snippet of the article on the topic, but the snippet oddly begins with definition #4. The page then points us into actual articles in the EB, but they’re articles you have to pay for (although the EB offers a “no risk” free trial).

So, how did Google’s special sauce float this especially unhelpful page to the surface? And why isn’t there a Wikipedia page on “horizontal hold”? And does this mean that if there’s no Wikipedia page for a topic, Google gets the vapors and just doesn’t know what to recommend? Nooooo………

[Tags: google wikipedia encyclopedia_britannica britannica search horizontal_hold ]

Follow me

Categories: misc Tagged with: britannica • encyclopedia_britannica • everythingIsMiscellaneous • google • horizontal_hold • search • wikipedia Date: July 19th, 2009 dw

10 Comments »

June 19, 2009

Wikipedia goes video

David Talbot reports that Wikipedia is getting ready to make embedded video and important part of its content.

[Tags: wikipedia video ]

Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • video • wikipedia Date: June 19th, 2009 dw

Be the first to comment »

June 9, 2009

Meaning-mining Wikipedia

DBpedia extracts information from Wikipedia, building a database that you can query. This isn’t easy because much of the information in Wikipedia is unstructured. On the other hand, there’s an awful lot that’s structured enough so that an algorithm can reliably deduce the semantic content from the language and the layout. For example, the boxed info on bio pages is pretty standardized, so your algorithm can usually assume that the text that follows “Born: ” is a date and not a place name. As the DBpedia site says:

The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples). It features labels and short abstracts for these things in 30 different languages; 609,000 links to images and 3,150,000 links to external web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories.

Over time, the site will get better and better at extracting info from Wikipedia. And as it does so, it’s building a generalized corpus of query-able knowledge.

As of now, the means of querying the knowledge requires some familiarity with building database queries. But, the world has accumulated lots of facility with putting front-ends onto databases. DBpedia is working on something differentL accumulating an encyclopedic database, open to all and expressed in the open language of the Semantic Web.

(Via Mirek Sopek.) [Tags: wikipedia semantic_web everything_is_miscellaneous ]

Follow me

Categories: Uncategorized Tagged with: everythingIsMiscellaneous • everything_is_miscellaneous • knowledge • metadata • semantic_web • web 2.0 • wikipedia Date: June 9th, 2009 dw

5 Comments »

June 8, 2009

Next, he dehydrates water

Rob Matthews has printed out and bound Wikipedia’s featured articles, creating a 5,000 page volume.

In case you were wondering, featured articles are articles that get a gold star from Wikipedia – about one in every 1,140 at the moment, for the English language version.

(If Rob hadn’t copyrighted the excellent photos, they’d be popping up in every third slide deck from now on.)

[Tags: wikipedia ]

Follow me

Categories: Uncategorized Tagged with: digital culture • everythingIsMiscellaneous • knowledge • media • wikipedia Date: June 8th, 2009 dw

1 Comment »

May 4, 2009

How important is WolframAlpha?

The Independent calls WolframAlpha “An invention that could change the Internet forever.” It concludes: “Wolfram Alpha has the potential to become one of the biggest names on the planet.”

Nova Spivak, a smart Semantic Web guy, says it could be as important as Google.

Ton Zijlstra, on the other hand, who knows a thing or two about knowledge and knowledge management, feels like it’s been overhyped. After seeing the video of Wolfram talking at Harvard, Ton writes:

No crawling? Centralized database, adding data from partners? Manual updating? Adding is tricky? Manually adding metadata (curating)? For all its coolness on the front of WolframAlpha, on the back end this sounds like it’s the mechanical turk of the semantic web.

(“The mechanical turk of the semantic web.” Great phrase. And while I’m in parentheses, ReadWriteWeb has useful screenshots of WolframAlpha, and here’s my unedited 55-minute interview with Wolfram.)

I am somewhere in between, definitely over in the Enthusiastic half of the field. I think WolframAlpha [WA] will become a standard part of the Internet’s tool set, but is not transformative.

WA works because it’s curated. Real human beings decide what topics to include (geography but not 6 Degrees of Courtney Love), which data to ingest, what metadata is worth capturing, how that metadata is interrelated (= an ontology), which correlations to present to the user when she queries it (daily tonnage of fish captured by the French compared to daily production of garbage in NYC), and how that information should be presented. Wolfram insists that an expert be present in each data stream to ensure the quality of the data. Given all that human intervention, WA then performs its algorithmic computations … which are themselves curated. WA is as curated as an almanac.

Curation is a source of its strength. It increases the reliability of the information, it enables the computations, and it lets the results pages present interesting and relevant information far beyond the simple factual answer to the question. The richness of those pages will be big factor in the site’s success.

Curation is also WA’s limitation. If it stays purely curated, without areas in which the Big Anyone can contribute, it won’t be able to grow at Internet speeds. Someone with a good idea — provide info on meds and interactions, or add recipes so ingredients can be mashed up with nutritional and ecological info — will have to suggest it to WolframAlpha, Inc. and hope they take it up. (You could to this sorta kinda through the API, but not get the scaling effects of actually adding data to the system.) And WA will suffer from the perspectival problems inevitable in all curated systems: WA reflects Stephen Wolfram’s interests and perspective. It covers what he thinks is interesting. It covers it from his point of view. It will have to make decisions on topics for which there are no good answers: Is Pluto a planet? Does Scientology go on the list of religions? Does the page on rabbits include nutritional information about rabbit meat? (That, by the way, was Wolfram’s example in my interview of him. If you look at the site from Europe, a “rabbit” query does include the nutritional info, but not if you log in from a US IP address.) But WA doesn’t have to scale up to Internet Supersize to be supersized useful.

So, given those strengths and limitations, how important is WA?

Once people figure out what types of questions it’s good at, I think it will become a standard part of our tools, and for some areas of inquiry, it may be indispensable. I don’t know those areas well enough to give an example that will hold up, but I can imagine WA becoming the first place geneticists go when they have a question about a gene sequence or chemists who want to know about a molecule. I think it is likely to be so useful within particular fields that it becomes the standard place to look first…Like IMDB.com for movies, except for broad, multiple fields, with the ability to cross-compute.

But more broadly, is WA the next Google? Does it transform the Internet?

I don’t think so. Its computational abilities mean it does something not currently done (or not done well enough for a crowd of users), and the aesthetics of its responses make it quite accessible. But how many computational questions do you have a day? If you want to know how many tons of fish France catches, WA will work as an almanac. But that’s not transformational. If you want to know how many tons divided by the average weight of a French person, WA is for you. But the computational uses that are distinctive of WA and for which WA will frequently be an astounding tool are not frequent enough for WA to be transformational on the order of a Google or Wikipedia.

There are at least two other ways it could be transformational, however.

First, its biggest effect may be on metadata. If WA takes off, as I suspect it will, people and organizations will want to get their data into it. But to contribute their data, they will have to put it into WA’s metadata schema. Those schema then become a standard way we organize data. WA could be the killer app of the Semantic Web … the app that gives people both a motive for putting their data into ontologies and a standardized set of ontologies that makes it easy to do so.

Second, a robust computational engine with access to a very wide array of data is a new idea on the Internet. (Ok, nothing is new. But WA is going to bring this idea to mainstream awareness.) That transforms our expectations, just as Wikipedia is important not just because it’s a great encyclopedia but because it proved the power of collaborative crowds. But, WA’s lesson — there’s more that can be computed than we ever imagined — isn’t as counter-intuitive as Wikipedia’s, so it is not as apple-cart-upsetting, so it’s not as transformational. Our cultural reaction to Wikipedia is to be amazed by what we’ve done. With WA, we are likely to be amazed by what Wolfram has done.

That is the final reason why I think WA is not likely to be as big a deal as Google or Wikipedia, and I say this while being enthusiastic — wowed, even — about WA. WA’s big benefit is that it answers questions authoritatively. WA nails facts down. (Please take the discussion about facts in a postmodern age into the comments section. Thank you.) It thus ends conversation. Google and Wikipedia aim at continuing and even provoking conversation. They are rich with links and pointers. Even as Wikipedia provides a narrative that it hopes is reliable, it takes every opportunity to get you to go to a new page. WA does have links — including links to Wikipedia — but most are hidden one click below the surface. So, the distinction I’m drawing is far from absolute. Nevertheless, it seems right to me: WA is designed to get you out of a state of doubt by showing you a simple, accurate, reliable, true answer to your question. That’s an important service, but answers can be dead-ends on the Web: you get your answer and get off. WA as question-answerer bookends WA’s curated creation process: A relatively (not totally) closed process that has a great deal of value, but keeps it from the participatory model that generally has had the biggest effects on the Net.

Providing solid, reliable answers to difficult questions is hugely valuable. WolframAlpha’s approach is ambitious and brilliant. WolframAlpha is a genius. But that’s not enough to fundamentally alter the Net.

Nevertheless, I am wowed.[Tags: wolfram wolframalpha wikipedia google search metadata semantic_web ]

Follow me

Categories: Uncategorized Tagged with: digital culture • education • everythingIsMiscellaneous • expertise • google • infohistory • knowledge • libraries • metadata • science • search • web 2.0 • wikipedia • wolfram • wolframalpha Date: May 4th, 2009 dw

19 Comments »

May 2, 2009

Andrew Lih and me on C-SPAN

C-SPAN’s monumentally, overwhelmingly, epically popular series, BookTV, this week is broadcasting the Berkman session with Andrew Lih on his The Wikipedia Revolution that I moderated. Andrew speaks for a few minutes, then I interview him, then the audience asks questions. I thought Andrew’s presentation was terrific, and the audience asked good questions. It’s on Sunday, May 3, at 12:30 AM, Sunday, May 24, at 4:30 PM and — for those for whom twice isn’t enough — at Monday, May 25, at 4:30 AM.

[Tags: andrew_lih wikipedia books everything_is_miscellaneous ]

Follow me

Categories: Uncategorized Tagged with: andrew_lih • books • digital culture • education • everything_is_miscellaneous • knowledge • libraries • media • wikipedia Date: May 2nd, 2009 dw

Be the first to comment »

April 17, 2009

[ugc3] Understanding evolving online behavior

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Over-emphasizing small points. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are hereby warned.

John Horrigan of Pew Internet and American Life, gives a “non-Koolaid” presentation. He says that about 12% of Internet users have a blog. The percentage of people doing some form of content sharing is not increasing much at all. The demographics says that 18-24 do the most sharing, and then it goes down in pretty much a straight line. The change over time is not distributed evenly across age groups. Younger adults are turning away from the 6 core UGC behaviors, the 24-35s are increasing. The rest: not much change.

But people are increasingly going to social networkingIf UGC is migrating to rules-based environments, is it a good bargain? On the one hand, good governance can build sustainable mechanisms. OTOH, bad governance is a risk, so you want an open Internet.

Q: A decrease in activity among younger folk? Because they were so heavily involved initially?
John: They’re going to social networking sites instead of maintaining their own sites. But UGC is still an important activity to them.

Q: The changing behaviors as people age and how that will effect UGC?
John: Impossible to answer because we don’t know how the tech will change.

Mainak Mazumdar of The Nieslen Company begins by looking at blogging topics. It’s quite diverse he says. Next: size. Wikipedia has many more topics than Britannica. Also, social networking is very big: Member communities are #4 on most visited lists, after search, portals, and software manufacturers. #5 is email. Social media is big everywhere. (Biggest: 80% of Brazil. 67% in US.) The US is showing comparatively slower growth in “active reach of member communities.” Time spent in CGM has been increasing. So is the time spent on social networking. 35-49 years are the fastest growing audience for social networking sites. Teen consumption of SNS is going down, because they’re going more and more mobile. Mobile will be huge. TV will be big. People are watching more TV. Big media companies are doing well. “Becoming a mother is a dramatic inflectin point and drives women to the Web in search of advice and a desire to connect with others in her shoes” (from the slide).

Is the Net a game-changer for research companies? He compares it to scanner data in the 90s and online surveys in 1990s. In 2000s, perhaps [perhaps??] social networking will once again change the game. Reasons to think the Net is a game-changer overall [i.e., exceptionalism] : Pervasive, sticky, generational.

Q: Is TV watching growing on all screens or just on the living room screen?
Mainak: Time spent watching TV content on a TV.

Q: Maybe SNS have surpassed email because email was used to listserves to serve the social function.
Mainak: We’re talking about how long you spend in Outlook + Web mail. We install monitors that report on how long you spend in each application.

Russ Neuman: Be careful of projecting out from the current tech. It can be disrupted easily.

Q: Older people are entering SNSs. I call them “parents.” To what extent will that change what started out as a youth movement? Is the move to mobile a move out of the SNS as they become mom and dad’s spots? [Oprah is on twitter.]
A: Yes. Some younger teens are going straight to mobile and circumventing the Internet.

Eszter Hargittai talks about the role of skill in Internet use. Yes, young people use digital media and spend a lot of time online, but it’s true that they engage in lots of online activities or that they’re particularly savvy about the Net and Web tools. So, the idea of “digital natives” is often misguided.

She’s particularly interested in the skills people have and need. Her methodology: Paper and pencil surveys to avoid biasing towards those comfortable with using Web tools. 1,060 first year students at U of Illinois. Most of the data comes from 2007, although she has some pre-pub data from 2009. The question is: What explains variation in skill? Gender, education and income predict skill. “The Web offers lots of opportunities but those who can take advantage of them are those who are already privileged.”

This has an effect on how we intervene to equalize matters. You can’t change socio-economic status. And it turns out that motivation doesn’t seem to make much of an effect. You can only be motivated to do something that you already know is a possibility. She shows new data, not ready for blogging, that show that very small percentages of users have actually created content, voted on reviews, edited Wikipedia pages, etc. The number of teenagers who have heard of Twitter is quite low. [Sorry for the lack of numbers. I’m not sure I’m supposed to be reporting even these trends.]

Mainstream media remain strong. Eszter points to the media story about Facebook users having lower grades. Eszter looked at the study and finds it to be of poor quality. Yet it got huge mainstream play. Eszter tweeted about it. She blogged about it. The tweet led to a co-authored paper. Even so, the mainstream probably won’t care, and most of the tweets are still simply retweeting the bad data. The Net is a huge opportunity, but it’s not evenly distributed.

Q: A study found that people online are lonely. It was picked up by the media. The researcher revised to say that it’s the other way around. It wasn’t picked up. The media pick up on the dystopic.

Q: Your data reflects my experience with my students. They don’t blog, they don’t tweet. There’s a class component to this.
Eszter: We measure socio-economic status. Why does it correlate? We’re exploring this. We now ask about parental support of technology use, rules at home about tech use, etc. So far we’re finding (tentatively!) that lower-educated parents tend to have more rules for their kids.

Q: What happens when there’s universal wireline connection?
Eszter: As the tech changes, the skill sets change. The privileged stay ahead, according to my 8 years of studies.

Q: What skills should we be teaching?
A: Complicated. Crucial issue: The evaluation of the credibility of sources. There’s an extreme amount of trust in search engines. That’s one place we need to do more work. And librarians are highly relevant here.

Q: How do people use the Net to learn informally, e.g., WebMD?
Eszter: There are lots of ways and types to do this. But, first you need to know what’s on the Web. You need good search skills, good credibility-evaluation skills.

Cliff Lampe talks about how Mich State U students use Facebook. He presents a study just completed yesterday, so the data isn’t yet perfect. 97% of his sample are FB users (although Cliff expresses some discomfort with this number). Mean average of 441 friends; median = 381. Ninety percent of these they consider to be “actual” friends. 73% only accept friend requests from people they know in real life. Most spend just a little time (under 30mins) at FB per day. About half lets their friends (but not everyone in their network) to see everything in their profile. Almost everyone puts a photo of themselves up. Vast majority have a photo album. About a third think their parents are looking at their page. Overall they think they’re posting for their college and high school friends.

He talks about Everything2.com, a user-generated encyclopedia/compendium that is 11 years old. Why have people exited? Research shows they left because other sites came along that do the same thing better. Also, changes in life circumstances. Also, conflict with administration of the site. There’s a corporitization of some of the UGC sites. He also has looked into why new users don’t stick: They don’t glom onto the norms of the site.

Q: Are reasons for exiting a negative network effect? More than 150 and the network deteriorates?
Cliff: We see that in Usenet. But not so much at Facebook where you’re just dealing with your friends.

Q: Any sites that have tried to drive away new users?
Cliff: Metafilter has a bit of that. Slashdot has a “earn your bullshit” tagline.

Q: Are your students alone or with others when they are online? Are they aware of the technology?
Cliff: The rise of the netbook has had an effect. Most of my students experience social media as a group activity. But a lot of them are not that savvy. They generally don’t know how Wikipedia operates. [Tags: sns social_networking facebook exceptionalism tv wikipedia environment ]

Follow me

Categories: blogs Tagged with: blogs • business • conference coverage • digital culture • education • entertainment • environment • everythingIsMiscellaneous • exceptionalism • expertise • facebook • knowledge • libraries • marketing • media • sns • tv • wikipedia Date: April 17th, 2009 dw

2 Comments »

April 15, 2009

The CIA’s Intellipedia

We’ve posted the latest Radio Berkman podcast, this time an interview with Don Burke and Sean Dennehy, two of the folks behind Intellipedia, a wikipedia for U.S. intelligence services.

[Tags: wikipedia intellipedia cia intelligence everything_is_miscellaneous ]

Follow me

Categories: Uncategorized Tagged with: cia • digital culture • egov • everythingIsMiscellaneous • intelligence • intellipedia • social networks • wikipedia Date: April 15th, 2009 dw

1 Comment »

« Previous Page | Next Page »