Inside every link is a tag struggling to get out

Samuel Wantman, who works on Wikipedia‘s category strategy, has suggested that every hyperlinked word in every Wikipedia article be treated as a tag.

What a cool idea! It’d frequently give you so many articles that it wouldn’t be worth it, but especially if we were able to do intersections of the hyperlinked words, there are times when it’d be worth its weight in bits.

Apparently, however, this would require so much processing power that the lights on the Eastern seaboard would dim every time someone used it. So, perhaps it’s a project that a third party could undertake? Or refine? [Tags: ]

  1. Have you heard of auto-hyperlinking strategies, e.g. for blog posts?

    A= Average occurrence of any N word phrase
    T= Occurrence of this N word phrase
    Kn= Interestingness threshold for phrases of length N.

    When T/A<Kn then auto-hyperlink phrase (with most appropriate hyperlink if an appropriate link can be found).

    Thus in your above post the following phrases would be candidates for auto-hyperlinking:

    Samuel Wantman
    Eastern seaboard
    weight in bits
    worth its weight in bits

    In the event of conflict, the most ‘interesting’ auto-hyperlink takes precedence, leaving:

    Samuel Wantman
    Eastern seaboard
    worth its weight in bits

    Not much to do with tags, but might give you ideas in that respect.

  2. It’s my understanding that many of the larger scale tagging implementations aren’t done with straight 3rd normal form database tables and join queries, but are instead all pumped into a full text index and handled that way.

    Seeing as how Wikipedia already has a search feature and presumably a Full Text Index of all their content, maybe it’s not that big a stretch to it implemented.

  3. Have you seen It’s all based on RDF and SPARQL, which means it’ll be dismissed as totally irrelevant because it’s unfashionable among certain trendy 2.0 types, but it’s here and it works great.

    You can ask really complex queries of Wikipedia like “Show me all the soccer players in the world who wear the number ’11’ shirt, play for a club with over 40,000 seats and were born in a country with more than 10 million inhabitants” –

  4. If you add rel=”tag” to such links, they count as valid tags in the rel-tag standard, and could be indexed by search engines that support this (technorati, icerocket etc). Wikipedia could do this with a small software change.

