Joho the Blog » tags

December 14, 2013

Are tags over-rated?

Jeff Atwood [twitter:codinghorror] , a founder of Stackoverflow and Discourse.org — two of my favorite sites — is on a tear about tags. Here are his two tweets that started the discussion:

I am deeply ambivalent about tags as a panacea based on my experience with them at Stack Overflow/Exchange. Example: pic.twitter.com/AA3Y1NNCV9

Here’s a detweetified version of the four-part tweet I posted in reply:

Jeff’s right that tags are not a panacea, but who said they were? They’re a tool (frequently most useful when combined with an old-fashioned taxonomy), and if a tool’s not doing the job, then drop it. Or, better, fix it. Because tags are an abstract idea that exists only in particular implementations.

After all, one could with some plausibility claim that online discussions are the most overrated concept in the social media world. But still they have value. That indicates an opportunity to build a better discussion service. … which is exactly what Jeff did by building Discourse.org.

Finally, I do think it’s important — even while trying to put tags into a less over-heated perspective [do perspectives overheat??] — to remember that when first introduced in the early 2000s, tags represented an important break with an old and long tradition that used the authority to classify as a form of power. Even if tagging isn’t always useful and isn’t as widely applicable as some of us thought it would be, tagging has done the important work of telling us that we as individuals and as a loose collective now have a share of that power in our hands. That’s no small thing.

2 Comments »

June 13, 2013

[eim][misc] Tagging rises

Both Facebook and Apple have announced the use of tags. Yay!

Tags have continued to percolate through the ecosystem after their most auspicious introduction in Delicious.com. (Note the phrase “most auspicious”; tags have always been with us.) It’s great to see them increase both because they are a great way to get use out of the craziness while preserving it in its original form for others, and because there is great value in scaling tags, as Flickr has shown.

So, yay for tags. And yay for the crazy.

Be the first to comment »

June 24, 2011

Tagging the National Archives

The National Archives is going all tag-arrific on us:

The Online Public Access prototype (OPA) just got an exciting new feature — tagging! As you search the catalog, we now invite you to tag any archival description, as well as person and organization name records, with the keywords or labels that are meaningful to you. Our hope is that crowdsourcing tags will enhance the content of our online catalog and help you find the information you seek more quickly.

Nice! (Hat tip to Infodocket for the tip)

Be the first to comment »

September 16, 2010

Tibetan taggers

This is a couple of years old, but it’s interesting. (Thanks to Norm Jacknis for the tip.)

Tibetans living in Switzerland and non-Tibetan Swiss were asked to provide tags for an exhibit of traditional Tibetan work. Then those tags were analyzed, wondering what cultural differences might show up. Some were fairly obvious:

Taggers disagreed in their perceptions of the esoteric deity Chakrasamvara. Tibetans tagged it frequently with “buddha”, accurately identifying its wisdom aspect; however, Swiss Germans found it böse or “angry-looking” and associated it with death. This exemplifies how tags can help uncover cultural misunderstandings: rather than anger, Chakrasamvara actually embodies the union of bliss and emptiness.

It also revealed (or suggests) some differences in how people approach tagging itself:

When Tibetans were asked which images were easiest to tag and why, their responses were contradictory. One person said artworks she knew were easy to tag because she already has something to say about them. Another found unfamiliar works easier to tag because they seemed “freer” The rating indicates that symbolic and familiar works do elicit less diverse responses from Tibetan taggers. And although some people may find them easier to tag because their meanings are culturally pre-defined, the way in which viewers react to them is likely to be less personal and even “less free.”

2 Comments »

April 27, 2010

[berkman] Luis von Ahn on free lunches, captcha, and tags

Luis von Ahn of Carnegie Mellon University is giving a Berkman lunchtime talk. [NOTE: I'm liveblogging. I'm making mistakes, leaving stuff out, paraphrasing, getting things wrong. This is an unreliable record.]

Luis invented captchas, the random characters you have to type in to convince a web page that you are a human and not a hostile software program. (He shows randomly generated sequences that happened to spell out “wait” and “restart.”) Captchas are useful, he says, when you’re trying to prevent people from gaming a system by writing a program to enter data robotically. They’re also useful to prevent spammers from signing up for free email accounts. To get around this, spammers have started up sweat shops where humans type captchas all day long; it costs the spammers about $0.33/account. And some porn companies ask users to type in a captcha to see photos; the captchas are drawn from email account applications. Damn clever!

He shows some variants. A Russian asks you to solve a mathematical limit. In India one asks you to solve a circuit. Luis says these aren’t all that effective because compputers can solve both problems, but they’re still better than the “what is 1 + 1?” captchas he’s found on US sites.

He says that about 200M captchas are typed every day. He was proud of that until he realized it takes about 10 seconds to type them, so his invention is wasting 500,000 hours per day. So, he wondered if there was a way to use captchas to solve some humungous problem ten seconds at a time. result: ReCAPTCHA. For books written before 1900, the type is weak and about 30% of the text cannot be recognized by OCR. So, now many captchas ask you to type in a word unrecognized when OCR’ing a book. (The system knows which words are unrecognized by running multiple OCR programs; ReCAPTCHA uses those words.) To make sure that it’s not a software program typing in random words, ReCAPTCHA shows the user two words, one of which is known to be right. The user has to type in both, but doesn’t know which is which. If the user types in the known word correctly, the system knows it’s not dealing with a robot, and that the user probably got the unknown word right.

ReCAPTCHA is a free service. Sites that use it have to feed back the entries for the unknown word. About 125,000 sites use it. They’re doing about 70M words per day, the equivalent of 2-4M books per year. If the growth continues, they’ll run out of books in 7 years, but Luis doesn’t think the growth will continue, so it might take twenty years. (There are 100M books.)

(In response to a backchannel question, Luis tells the penis captcha story.)

The ReCAPTCHA system filters out nationalities, known insult terms, and the like, to avoid unfortunate juxtapositions. It’s soon going to be released in 40 languages. Google acquired ReCAPTCHA.

Q: When will OCR be good enough to break captchas?
A: I don’t know. We’ll probably run out of books first.

Q: Business model?,br>
A: Google Books gets help digitizing.

ReCAPTCHA “reuses wasted human processing power.” The average American spends 1.9 seconds per day typing captchas. We also spend 1.1 hours a day playing electronic games. We humans spent 9B hours spending in 2003. It took less than a day of that to build the Panama Canal. So, Luis switches topics a bit to talk about how to solve human problems by playing games.

First is tagging images with words. Image search works by looking at file names and html text, because computers can’t yet recognize objects in images very well.

Does typing two words take twice as long as typing random letters? No, it takes about the same time, he says. Luis says about 10% of the world’s population have typed in a captcha. The ESP game asks two people unknown to each other to label an image until they agree. The game taboos words that other players have already agreed on. The system passes images through until they get no new labels. They’ve gotten over 50M agreements. 5,000 players playing simultaneous could label all Google images in a month. Google has itsown version; Google has an exclusive license to the patent.

Q: Demographics?
A: For my version, average age is 29 (with huge variance), evenly split between women and men.

Q: Compared to Flickr tags?
A: Only a small fraction of Flickr images have useful tags. The tags from flickr tend to be significantly more exact, but also significantly noisier (e.g., a person tagging an image in a way that means something idiosyncratic).

Q: Bots?
A: Yes, we don’t want you to wait for a partner, so sometimes we’ll give you a bot that replays the moves a human had made with the same image.

Q: Google Images benefits from its version of your game. Who benefits from your version of the game?
A: No one.

For some images, guesses change over time. E.g., a Britney Spears photo five years ago got labels like britney and hot. About two years ago, the labels changed to crazy, rehab, and shaved head. Now they’re back to britney and hot. By watching a player for 15 mins, you can guess whether the player is male or female with 95-98% accuracy.

Why do people like the ESP game? Sometimes they feel an intimacy with their partners. They have to step outside of themselves to make the match. They can have a sense of achievement.

He ends by saying that the about the same number of people — 100,000 — have worked on humanity’s big projects, e.g., pyramids, Panama Canal, putting a person on the moon. That’s in part (he says) because it is so hard to coordinate large numbers of people. Now we can get 100M people to work on something. What can we do?

2 Comments »

October 2, 2009

Libraries sans Dewey

Barbara Fister has a terrific article in LibraryJournal about libraries who have moved away from the Dewey Decimal Classification (DDC) system, many in favor of some version of the BISAC system that arranges books alphabetically by topic. This is a more bookstore-like approach. The article presents the multiple sides of this discussion, with lots of examples.

The disagreement among librarians is, to my mind, itself evidence that there is no one right way to organize physical objects. Classification is pragmatic. You classify in a way that works, but what works depends upon what you’re trying to do. Libraries serve multiple purposes, so librarians have to make hard decisions. If the DDC isn’t the safe and obvious choice, then libraries have to confront the question of their mission. The classification question quickly becomes existential in the JP Sartre sense.

At the end, she quotes from Everything Is Miscellaneous where I say that the Dewey system “can’t be fixed.” I still think that’s right in its context: No single classification system can work for everyone or for every purpose, although they can be better or worse at what they’re trying to do. In that sense, the DDC can be improved, and the OCLC has continuously improved it. But because it’s premised on assigning a single main category to each book, it is repeating the limitations of the physical world that require physical books each to go on a single shelf. Any single classification is going to be inapt for some purposes, and is going to embody biases constitutive of its culture. It’s the job of a library and of a book store to decide which single way of classifying works best for its patrons, with the obvious recognition that no single way works best for all. Books are miscellaneous. Libraries, bookstores, and the shelves over your desk are not.

Anyway, Barbara’s article is a fascinating look at how libraries are trying to do the best for their patrons, working within the constraints of the physical.

5 Comments »

September 13, 2009

From Technorati to WordPress tag namespace

The excessively sharp-eyed of you may have noticed that I have recently switch from listing tags at the end of posts to using WordPress tags at the end of posts. Here’s why. Not that you should care.

When tagging first took off, there weren’t a lot of good places to link your tags to. So, I chose to have them link to Technorati because Technorati was then the leading search engine for blogs. Plus, Technorati had taken the lead in making itself tag-worthy. Plus, Technorati was founded by a friend of mine — David Sifry — who I trusted (and still do trust) to do the Right Thing. Also, I was on the Technorati board of advisers (uncompensated), so I had some basic familiarity with the site and the the people. As a result, when you click on one of my old-style tags, it does a search for tags at Technorati and shows you the results. For example, here’s a tag to try: [Tags: ].

A couple of years ago, Word Press — the blogging software I use — introduced its own tagging capability. Instead of my having to hand-create links to the tags I want to use (actually, I wrote a little javascript to do it for me), I can enter tags and Word Press will turn them into links that aggregate all of my own postings that I’ve tagged that way. At the bottom of this post, you can try out the taxonomy link.

This is a further step into narcissism, for rather than seeing what the rest of the world has tagged “e-gov” (or whatever), you now see only my posts tagged that way. But I suspect that is probably what most users expect and want when they click on a tag at the bottom of a post. If you want to search all posts by everyone that have a certain tag, Technorati and other sites will do it for you.

(By the way, many thanks to Brad Sucks for writing the scripts that extracted my old tags and auto-inserted them as Word Press tags. He says the scripts are too focused to be of general use, so don’t ask. But do buy his music.)

3 Comments »

August 7, 2009

Tags again

Jeez, it would save me a lot of time if Keynote (or Powerpoint, if you insist) let me tag slides and objects in slides (especially images). I spend way too much time looking for that slide of a “smart room” or the one that shows business vs. end-user use of Web 2.0, or that photo of an old broadcast tower. (Later that day: Maybe I should add, having just rewritten the Wikipedia entry on Interleaf, that back in the early 1990s, Interleaf gave us exactly that capability.)

Instead, I have two hacks, both a pain in the butt. First, I keep a humungous file of slides I think I’ll want to use again. Second, I’ve started putting tags into the speaker notes by putting the tags in brackets. But I use the speaker notes to speak from, so larding them up with tags is sub-optimal.

And especially if you save Keynote files in the pre-2009 multi-file formats, then it’d be a snap for third parties to build tools that extract the tags and manage them. (I have a fussy home-made utility that extracts the text from the speaker notes and builds an editable file of them. If you want it, let me know.)

Tags are easy! Tags are useful! Let tags be tags!

[Tags: ]

2 Comments »

January 5, 2009

Tags made smarter, easier

Sarah Perez at Read Write Web has a good post about a service that “understands” the meaning of of your tags (Zigtag) and another that suggests tags based on its analysis of Wikipedia (faviki). These services — I haven’t tried them — promise to making tagging yet more important by making it easier to apply tags and by letting us get more value from them.

[Tags: ]

2 Comments »

September 28, 2008

University home page word cloud

Matt Pasiewicz at Educause has created a word cloud out of 1,000 university home pages. Nothing too surprising, but interesting nonetheless.

[Tags: ]

Be the first to comment »


Switch to our mobile site