Why Tagging Is Expensive
On the surface tagging seems to offer a new paradigm of organising information, one that reduces the cost of entry and so enables a long tail of participation to emerge. I’ve come to realise that the cost isn’t removed, instead it’s displaced and possibly increased. Tagging bulldozes the cost of classification and piles it onto the price of discovery.
The main proponents of tagging eschew formal classifications that require the user to invest time in understanding the scheme before it can be used. For example, Clay Shirky wrote a few days ago:
I [...] am of the unreasonable view that classification schemes are going to be largely displaced by tagging for the same reasons that search has largely displaced directories for finding things, namely that distributed intelligence, for all its faults, tends to beat the work of a professional class when dealing with large, dynamic systems.
Here Clay identifies the professional class, i.e those who have invested the time up front in learning and understanding the classification scheme. Let’s call them librarians. There’s a reason why these people invest this time: it’s to help the thousands of people who visit libraries every day find things quickly.
Tagging makes things harder to find because it vastly increases the places you have to look. Here’s Keith Tipton’s view:
I can appreciate the growth of a collective bookmarking system which allows users to tag these bookmarks any way they see fit, but how do you best take advantage of such a tagging system for FINDING stuff? I either have to see what is found under a tag that I think is useful or appropriate, or I take a look at what some other people have bookmarked after seeing that they’ve tagged something I’ve tagged
In my view the total cost of an information retrieval system is the cost of classification plus the cost of discovery. In the formal classification world you have a very small number of people incurring a high cost in order to reduce the costs incurred by a very large number of people. In contrast the tagging world has the unit costs reversed: it’s cheap to classify, expensive to find. But the numbers of people involved are large in both cases so you end up with a lot of people paying a tiny cost to classify added to a lot of people paying a high price to discover. I think it’s pretty likely that the total cost is going to end up much higher than in the classification scenario.
What’s the cost I’m talking about? It’s people’s time. Time spent searching for things that should be easy to find.
I think at some level Clay realises these costs are shifted rather than removed and proposes that computer algorithms will be used to achieve the real cost savings:
Full text indexing, link analysis, trust networks, and related techniques now accomplish about 80% of what classification used to do for us.
[...]
Formal classification has centuries of practice supporting it, while tagging merely has a handful of early, incomplete examples. As a result, tagging does not have anything like the sophistication of classification systems — for tagging to work broadly and well we still need, inter alia, group tags; private tags; better user-defined thesauri; better tools for discovering latent communities; better tools for making time series; better routing labels like for: and via:; better traversal of the resulting for/via graphs; and ways of turning a collection of tags into site navigation, a sort of permanent card-sorting game that continually optimizes site navigation. For tagging to take over from classification, we need all those things and more, and we don’t have them.
If I didn’t know better I’d think this had the hint of AI and automated classification about it, but in any case, as William Blaze points out, these systems rely on structured data rather than the unstructured information Clay advocates:
Google is a great example, in some ways it is the triumph of the algorithm, yet it’s very existence depends upon high structured data. Without the DNS system Google would be worthless. Without html standards Google would be worthless. Imagine if each web page had its own definition of the anchor tag, Google would be worthless. Or if there was no standard way to declare a language for each page. And lets not even get into the fact that the best results in Google are often pages that are directories or in other ways feature highly structured data.
Without structured data, and the tagging advocates are proposing just that, how will these systems function? What would DNS look like if it were based on tagging principles I wonder? What would microsoft.com resolve to if the resolution was defined by the commons? Clearly tagging isn’t appropriate for every situation and here Clay agrees:
The question is not whether tagging systems can do everything formal classification schemes used to do — they can’t, but they don’t need to. The question is: which is a better fit for the requirements of labeling in a post-search world — tagging, or formal classification? And my answer is tagging.
The question remains why, when the total costs are higher in a tag based system, are its advocates so convinced that it’s the next big thing. Part of the answer lies at the end of Clay’s essay with an appeal to simplicity:
You could make a lot of money or win a lot of bar bets when thinking about the digital realm if you compare technologies between hard or easy, rigorous or sloppy, sophisticated or naïve, expensive or cheap, professional or amateur, and then bet on the things that have the most checkmarks in the right hand column. Tagging has a checkmark in all those boxes.
The argument here is that if something is simple it will be successful and backing successful technologies is always a good thing. I agree that tagging fits all the attributes that Clay lists and qualifies as a simple technology. I certainly don’t agree that simplicity automatically implies that the technology will be successful. It’s illuminating to use this classification scheme to categorise a couple of technologies from Tim Bray’s list of technology winners, technologies that… have had a substantial and long-lasting impact of the practice of information technology
. I’ve chosen SQL and Java since they should be familiar to most people. Making the reasonable assumption that Clay is talking about how these attributes apply to the primary users of the technologies, not the implementors, I think it’s probably uncontroversial to categorise both these popular technologies as hard, rigorous, sophisticated, cheap and professional. Only one out of five of Clay’s factors for success hold for these important technologies. I think I just saved you a ton of money in the bar tonight. If you’re looking for a more realistic way to predict technology success then I strongly recommend you read the rest of Tim Bray’s technology predictor series.
I’ve come to the conclusion that this simple technology is hobbled by being expensive. This worries me. I’m not convinced that tagging will persist in the long term and I have a feeling that five years from now we’ll be looking back at the fad that was tagging and shaking our heads over the vast inpenetrable databases of tagged content. Then we’ll get back to Google and to our librarians and get on with finding the things that matter to us.













September 7th, 2005 at 11:38 am
An interesting piece, and you’re right to say that the total cost of the system is classification + retrieval. But you don’t offer any explanation as to why you think retrieval is so expensive with tagging compared to classification. With tagging, you can use your own intuitions about how the thing you’re looking for might be tagged, and rely on the fact that some other people might think like you. With classification, you have to second-guess the classifier.
September 7th, 2005 at 12:03 pm
The advantage classification holds is that you can lookup the classification you need and be confident that you have found all that is available. With tagging there is no way to exhaustively search all the possible tags that people might have used, in all possible languages and spellings.
September 7th, 2005 at 4:24 pm
You make some interesting points, but I disagree on many of them. Many of the problems you describe will be resolved over time - tagging is still a nascent technology. As it becomes more popular there will be substantial enhancements, although you seem to suggest the addition of augmentation might make it not pure tagging anymore. I disagree. I think many of the things you suggest are missing will come with time.
For example, at Shadows we already provide an enhancement called “Narrow Results” which shows you overlapping tags when you perform a search. As more people tag these intersections will tend to automatically provide broader coverage. But it takes time.
Things like thesaurus functionality, clustering, auto-generated communities, are all things that can be layered into any serious tagging system. As these things develop I believe you will see tagging develop as a tool for personal memory combined with a powerful mechanism for search and discovery - probably utilizing something that looks like classification but is really a clustering mechanism on top of tags. Again, I would point you to our narrow results option as a very early mechanism for just such a thing.
September 8th, 2005 at 7:14 am
But, it’s not about exhaustivity.
It’s about the ease of finding enough to satisfy the searcher’s need for information. If the need for information is very deep, then it’s up to the searcher to continue her research. If the need for information is fairly shallow (or rather, anything other than deep), the searcher can feel fairly confident that others have been interested in the topic area before and done enough prior heavy lifting on the classification and categorization of resources to satisfy her search.
The rest is here: http://cloudalicio.us/2005/09/07/its-not-about-exhaustivity/
September 13th, 2005 at 9:43 pm
One Librarian’s Take on Folksonomies
The blog entry below highlights one librarian’s take on folksonomies … an interesting read. For whatever it is worth, it looks like he may work for a library system vendor (talis).
http://wonderfulworldofmrc.blogspot.com/2005/08/folksonomi
September 14th, 2005 at 7:58 pm
Wow, I got cited — What am I, an expert?
The Silkworm blog actually cited me about something I really wasn’t trying to make a point about. My point was more about the “growth” of del.ic…
September 17th, 2005 at 3:18 am
http://www.abstractdynamics.org/linkage/archives/006392.html
Silkworm Blog: Why Tagging Is Expensive…
September 17th, 2005 at 8:01 pm
Mess of links for 17.09.2005
At Access Matters, I found a nice writeup of results of testing the interaction between JavaScript and screen readers. As a followup I’m curious to find out how tweaking the DOM using JS affects screenreaders and other accessibility technologies. The A…
September 17th, 2005 at 8:01 pm
Mess of links for 17.09.2005
At Access Matters, I found a nice writeup of results of testing the interaction between JavaScript and screen readers. As a followup I’m curious to find out how tweaking the DOM using JS affects screenreaders and other accessibility technologies. The A…
September 17th, 2005 at 8:01 pm
Mess of links for 17.09.2005
At Access Matters, I found a nice writeup of results of testing the interaction between JavaScript and screen readers. As a followup I’m curious to find out how tweaking the DOM using JS affects screenreaders and other accessibility technologies. The A…
September 19th, 2005 at 1:22 pm
Nice piece. I particularly like the dissection of Clay’s ‘bar bet’, & the pointer to Tim Bray’s rather more interesting thoughts on the subject. Applying Clay’s Easy/Sloppy/Naive/Cheap/Amateur metric to Tim’s (fairly uncontroversial) technology winners, Unix and OSS stand out as C+A+, PCs and the Web as E+S+N+ (and the S is generous). It’s hard to think of a successful technology that would tick all five boxes.
(And in old news, Shirky Makes Overblown Claim Shock; Rhetoric ‘Persuasive’, Say Witnesses…)
September 20th, 2005 at 6:44 pm
I think it’s worthwhile to consider tagging not as a replacement for classification, but as a complement.
Clearly tagging results in discovery of information that might otherwise remain hidden. Anyone who’s familiar with del.icio.us knows this.
September 22nd, 2005 at 9:30 pm
Why Tagging Is Expensive
Why Tagging Is Expensive discusses something about tagging you’d think more people would have mentioned: that tagging doesn’t eliminate the cost of classifying knowledge, it just moves it from the archivist to the discoverer. But there’s more. The disc…
September 23rd, 2005 at 9:43 am
That is exactly the point I was going to make Rouleur!
Tagging is about discovery.
September 27th, 2005 at 9:09 pm
RSS Feeds for Tagged Posts
As part of the ongoing experiment with adding tagging to this site, I’ve enabled RSS feeds for individual tags, e.g. web20 or tagging. I still think that tagging has value but it’s not a panacea….
September 27th, 2005 at 11:12 pm
Discovery = Time though.
People search for a reason to find what they are interested in, they dont really have time to discover other things they never knew about while finding what they where looking for in the first place.
There is actually the same discussion going on here..
http://www.rashmisinha.com/archives/05_09/tagging-cognitive.html
October 17th, 2005 at 3:42 am
articles on tagging (help?)
I’m working on a literature review of tagging for a class. I am particularly interested in the collective action and cultural convergence aspects work. I’ve been traipsing through various articles and blog entries on the topic and i’m wondering if folk…
October 17th, 2005 at 5:42 am
tagging - blog entries
Notes on blog entries about tagging: Ian Davis: Why Tagging is Expensive (September 7, 2005) “Tagging bulldozes the cost of classification and piles it onto the price of discovery.” Formal classification takes a lot of time - tagging overrides that,…
October 17th, 2005 at 6:57 am
articles on tagging (help?)
I’m working on a literature review of tagging for a class. I am particularly interested in the collective action and cultural convergence aspects work. I’ve been traipsing through various articles and blog entries on the topic and i’m wondering if folk…