Random Ramblings about stuff I see going on in biotech, internet and the stuff I read.

Friday, July 21, 2006

Semantic Web, User Stupidity, and why I think we can beat the machines after all

Every now and then I see posts like this one where someone says they are going to "Tag" the post with a tag and that this will be useful in some way. I never understand it. I don't think it works, and here is why.

Users make up the Tags

and

Users are stupid.

I include myself in the above defined users. In the example at that page, Fred Wilson is saying that he will tag all posts about stocks with the tag "Stocks". This seems simple, but here is the problem. Somone else might be more granular than Fred and tag things as "biotech stocks" or "industrial stock" or they might use the singular and say "stock". And so - when I go to a site or go to run a search I have absolutly no idea what tag people used and so I fail and then I get negative reinforcement and I don't get back to it ever.

You can see this on Flickr - where you can't just look at one stream and get everything that is pictures of something. You have to look at a couple of tags. If your lucky you will see a post with some way of also tagging that always you to wander on (a rosetta stone if you will).

Why does this matter to me so much? Transgenic Mice and my thesis. AND the notion of this Semantic Web thing that keeps getting batted about. You can see Scoble talk about it to sometimes.

Mesh headings are a bit related to tags, in that you can search for things that fall in a Mesh heading on Medline. They seem to be more hidden these days than they used to be, but they are still there lurking uselessly in the background. They are assigned by editors to put papers in categories so that one could browse down a hierarchy and end up with only papers talking about, say, "Transgenic mice".

Now the problem. Papers are assigned to the hierarchy by people. They read and make judgements about where things should go, and the papers are assigned.

When Transgenics first came out, people used all kinds of words for them (and the free text search engines were slow anyway) AND then, to cap it off, the editors at the National Library of Medicine decided to put them in at least 5 different MESH headings. SO - if you were doing transgenic work close to the time that transgenics were a new thing (that would date me a bit...) then you had a problem of being able to find all of the literature on the subject. Eventually, after a couple of years, they got it together and solved this problem.....only to have it again when knockout mice came on the scene. By this time, I was only doing full text searches anyway, so didn't care as much. BUT - the point is - Tagging, when done by humans, is useless.

Humans have opinions about things (for the most part) and not all of them are enlightened enough to just full agree with me. This means that when you lable something, I may not agree with the label you put on it.

A further problem crept in to this when, a few years ago, I was working with some people at GSK who were putting in a system to manually categorize every bit of paper they had at the site they were at, put it in a computer, and then have this mass repository of stuff that would somehow magically produce drugs faster. They were just starting to run in to the problem that if you gave the same bit of paper to a couple of people, they wouldn't categorize it all the same way. There would be subtle differences that would cause them to read it just a bit differently, and thus they would file it differently.

Another similar problem creeps in when you try to file paper in files by company name (say, all of your legal files). How do you deal with University of Southern California vs. University of California at Los Angeles? + all the other University of.... 's --- You can use the full name of every university and spend a lot of time typing and have very long file names. Or you can always shorten to U.S.C. or USC or U SC (but then what do you do with South Carolina?). Given this problem, people will pick a different system that to them makes all the sense in the world. Others will look at the system and be like "what were they thinking" and then they will make statements like I did at the beginning of "users are stupid". Outsiders would look at my filing system and (I think) would understand it. BUT they would be unlikely to have duplicated it without peeking as the shortcuts I use are based on my background and my perception of the world. I have just had to teach this to the woman I hired, as we have to share the filing system back and forth and she admits it makes sense. I have clear conventions etc... but she also said it was totally different from what she had done at her previous job. They had different conventions. Neither of us is "right" but it just goes to show that from the computers point of view - users are stupid.

I think they are going to build massive engines and interpreters and other confabulators to deal with this, and they may even do it = but I will be real surprised if they pull it off becuase even if you do - if it is dependent on the user doing anything then it won't get done. I could tag these posts, but I don't. Why? because I am lazy. I think most people are lazy and they won't do this tagging stuff becuase they don't reall care that much about it and they have better things to do with their life.

So - In summary - Users are both stupid and lazy.

1 comment:

Anonymous said...

priceless! hahahaa! totally classic...

tags are retarded. users are stupid. but even if they're not, tagging and classification of any knowledge base is a stupid concept to pursue no matter how good you get at it.

i just feel like the whole human kind is obsessed with being able to categorize something. why? so that something new we encounter can be classified into one of those familiar categories, and 'tagged' as something bigger and familiar, so we feel more secure when facing it. it's like general human nature or something. each new thing has to BELONG TO SOME CLASS before we can process it...?
it even applies to ethnic or racial classification. what has it created? only confusion and labeling/mislabeling... while some hispanics can be absolutely white with blonde hair, they are "latinos," but some italians with dark skin hair and eyes are "caucasian". meanwhile, we evolve to new mixes and new terms that cross all the preconceived labels.

but anything truly novel, especially in scientific literature, even if it is a product or mix of something old, has to be immediately categorized as a new label of its own in order to receive the appropriate initial regard until with time even that label will inevitably become outdated.. SO HOW ABOUT NO LABEL AT ALL... but then the comfort blanket of familiar identification is gone, which makes it very hard for people to accept and relate to something new, especially when it comes to knowledge and scientific information. even the process of accepting something totally new gets a label/tag that says "::blink-blink:: NEW!!!::blink-blink:: and is a brave process full of risk, which leads to this "NEW" category anyway... we are dooomed!
but seriously, in scientific research areas, somantic labeling is indeed plane stooopid, because phrasing is such an individual thing... and so subjective... nobody can predict how the masses will think. we dont have collective consciousness ... like the borgue or tailons [sorry for the geeky sci-fi ref's] to formulate things in the same way. but that is also our strength, which drives our creative progress.
we just need to rid ourselves of the stupid need to classify everything into some familiar bins, especially when the bins are pure somantics! lol

so you think some of it may be an influence of clinical term categorization and tagging? because there it seems needed sometimes to assess patients' well-being, levels of pain, different stages of disease progression, etc... and we all know how clinical world is intertwined with basic science research... it's just that clinical terms are taught in a more standardized way to ensure somewhat of collective conscience approach. so tag systems for some clinical databases could probably be of use in clinical trials, etc... and therefore, be less stooopid.
so then, perhaps, as long as consistent tag usage provides benefits and shows some merit somewhere, it can never be abandoned as a whole... (?)
hmmmm..

sorry for my rambling inspired by 'somethingsyouwrite'

good night :)

PhD Dropout.