Accidental linking and hard-wired category schemes

Thursday, June 14, 10:18 AM -- Jimbo Wales wrote recently on Wikipedia-L:

"Accidental linking" is important. Having page names like "Humanities/Philosophy/Existentialism/Sartre" might have some appeal, but they would cut down on the ease of use for new authors, which is very important.

This was written in response to Krzysztof P. Jasiutowicz, who had asked, "Are we making the future 100,000 pages totally flat?" i.e., are we not going to impose some sort of hierarchical structure onto the body of Wikipedia articles?

I think this exchange is very interesting. First, I think Jimbo has a good point about why multiple layers of subpages is a bad idea; you won't know how to link to something without either exploring the structure or doing a search, both of which are time-consuming (not to mention off-putting to new authors--indeed, very important). So it's better if we can reasonably expect Sartre or Jean-Paul Sartre to be an article (with one redirecting to the other); we just wikify whatever form of name we think is appropriate (have a look at naming conventions if you're interested), and go on. This is called "accidental linking."

Perhaps more interesting is to ask why multiple layers of subpages makes accidental linking difficult. It's obvious enough, but also enlightening to contemplate: multiple layers of subpages requires that people learn our categorization scheme in order to know exactly what a page's address is. But, for better or worse, there isn't any one obvious category scheme in existence, much less one correct category scheme. Why not? Good question; a subject for another column.

It's also worth asking what exactly the difference is between a "hard-wired" category scheme, on the one hand, and what we have now, on the other. Wikipedia at present does, after all, have something like a category scheme, in that we have links in our articles from topics to what might be considered subtopics of those topics. Doesn't that constitute its own sort of structure? Yes, or so it seems, but there are two differences:

  1. A hard-wired scheme is inherently more difficult to change (if it's not, it's not hard-wired). This is contrary to the present Wikipedia (and Nupedia) system, because anyone can go in and change around categories, add them, delete them, create entire new "high-level" pages, create "interdisciplinary" pages, etc.
  2. A hard-wired scheme--at least in its traditional format, as can be seen on Yahoo! and dmoz.org--has a definite topic-subtopic hierarchy; the main topic bears a somewhat similar relation to the subtopic that a parent does to a child. Wikipedia pages, on the other hand, except where subpages have been created, have merely links to other pages, and once you get a few links past the HomePage, there is no implied or logically inferrable hierarchy. Psychology and philosophy will both link to philosophy of psychology (which will in turn link back), and from this logical structure alone it will be impossible to determine which page is supposed to be the "parent" page and which page is the "child" page.

These are good things, for a variety of reasons. I would explain in more detail, but I am out of time. Perhaps the next column will address the question: why is it preferable to have a category structure such that, simply by viewing the linking relations between pages, it is impossible to determine which page is "parent" and which is "child"? I fully believe it is preferable. Why?

By the way, this is one more reason to eschew subpages on Wikipedia (cf. my last column)--as I might also explain.

--Larry_Sanger