Joho the Blog
An Entry from the Archives

« Berkman Center on the move...literally! || Back to Blog | Me, me, me ... this time on Web 2.0 »

June 24, 2007

Inside every link is a tag struggling to get out

Samuel Wantman, who works on Wikipedia's category strategy, has suggested that every hyperlinked word in every Wikipedia article be treated as a tag.

What a cool idea! It'd frequently give you so many articles that it wouldn't be worth it, but especially if we were able to do intersections of the hyperlinked words, there are times when it'd be worth its weight in bits.

Apparently, however, this would require so much processing power that the lights on the Eastern seaboard would dim every time someone used it. So, perhaps it's a project that a third party could undertake? Or refine? [Tags: wikipedia samuel_wantman tagging folksonomy everything_is_miscellaneous ]

Posted by D. Weinberger at June 24, 2007 03:39 PM


Comments

Have you heard of auto-hyperlinking strategies, e.g. for blog posts?

A= Average occurrence of any N word phrase
T= Occurrence of this N word phrase
Kn= Interestingness threshold for phrases of length N.

When T/A<Kn then auto-hyperlink phrase (with most appropriate hyperlink if an appropriate link can be found).

Thus in your above post the following phrases would be candidates for auto-hyperlinking:

Wantman
Samuel Wantman
Wikipedia
Eastern seaboard
weight in bits
worth its weight in bits

In the event of conflict, the most 'interesting' auto-hyperlink takes precedence, leaving:

Samuel Wantman
Wikipedia
Eastern seaboard
worth its weight in bits

Not much to do with tags, but might give you ideas in that respect.

Posted by: Crosbie Fitch | June 24, 2007 04:20 PM


It's my understanding that many of the larger scale tagging implementations aren't done with straight 3rd normal form database tables and join queries, but are instead all pumped into a full text index and handled that way.

Seeing as how Wikipedia already has a search feature and presumably a Full Text Index of all their content, maybe it's not that big a stretch to it implemented.

Posted by: Michael Buckbee | June 24, 2007 07:07 PM


Have you seen dbpedia.org? It's all based on RDF and SPARQL, which means it'll be dismissed as totally irrelevant because it's unfashionable among certain trendy 2.0 types, but it's here and it works great.

You can ask really complex queries of Wikipedia like "Show me all the soccer players in the world who wear the number '11' shirt, play for a club with over 40,000 seats and were born in a country with more than 10 million inhabitants" - http://wikipedia.aksw.org/index.php?qid=1

Posted by: Tom Morris | June 24, 2007 07:55 PM


If you add rel="tag" to such links, they count as valid tags in the rel-tag standard, and could be indexed by search engines that support this (technorati, icerocket etc). Wikipedia could do this with a small software change.

Posted by: Kevin Marks | June 25, 2007 04:54 AM


Post a comment

Guidelines for Commenting

Basically, you can say what you want. (Click here for the fine print.)

If you haven't left a comment here before, your comment may be put into a queue for me to approve. Sorry for the delay. Blame the damn spammers.