What is an encyclopedia?

Saturday, September 1, 2001, 12:00 PM -- Some recent events on Wikipedia have raised a question that has been idly bothering me for well over a year now: what is an encyclopedia, anyway? I'm not so much interested in historical definitions; I'm interested in a sort of prescriptive, revisionist definition that Nupedia and Wikipedia can actually use (after some debate, perhaps) in conceiving of their projects.

First, what sort of knowledge is included in an encyclopedia? The following is going to be more or less a ramble, not a careful academic discussion.

I think I have said sometimes that an encyclopedia is a repository of empirical knowledge, but this is not quite right. On nearly all accounts, math and logic are nonempirical knowledge, and yet it is entirely appropriate that we include that sort of knowledge in an encyclopedia. I think it might be better to say that it is synthetic knowledge, as opposed to analytic. (Philosophers, please forgive me for glossing over many subtleties here. I'm just getting one view of the lay of the land in this essay.) Analytic knowledge (as Kant would say, approximately) is the sort of knowledge you can get simply by an analysis of a concept (as he put it, "the predicate is contained in the subject," as in the sentence, "that bachelor is unmarried"). By extension, we can view analytic knowledge as the result of analysis of the meanings of words. Synthetic knowledge, then, is everything else--non-analytic knowledge.

So, putting concerns about the analytic-synthetic distinction aside, we could say that an encyclopedia is a repository of synthetic knowledge. This, however, will not do, because there is plenty of synthetic knowledge that has no place in an encyclopedia; for instance, I took the dog for a walk this morning and had cereal for breakfast. In theory, we could include such knowledge in an encyclopedia (there would be no problem categorizing it, for one thing: put it on a page titled "What Larry Sanger Did on September 1, 2001").

Perhaps we're interested in general facts, not particular ones. But this isn't right, because there are very many particular facts that are of the utmost importance, such as the particular fact that John Wilkes Booth shot Abraham Lincoln at a play performance in 1865.

So then we might say that we are interested in important synthetic knowledge, admitting that some of both general and particular knowledge is utterly unimportant (we needn't report in an encyclopedia that 236+362=598, but it's perfectly general). But this raises the question of what "important knowledge" means--what might seem to be important to me will seem utterly trivial to you. For example, I'm very interested in the life of an Irish fiddler named Mickey Doherty, who died in 1970. I would love to see an article about Mickey's life. But you might think that that's so utterly trivial and unimportant that we shouldn't have such articles in the encyclopedia.

The latter would be a serious, vexed question if we were working with a paper encyclopedia. But since we're not, we can generously welcome into the fold of the adequately important nearly anything that anyone deems important. Still, this description ("nearly anything that anyone deems important") lacks any sort of prescriptive purchase. It's important to me that I finish this essay within a reasonable amount of time, but that's not a fact that belongs in an encyclopedia. And if we say that "adequately important" means "anything anyone deems important enough to put in an encyclopedia," we are committing a sort of circularity. It's not strictly circular, of course: in order to determine whether fact F should be in the encyclopedia, we check around to see if anyone deems F important enough to put it in an encyclopedia. But there is a practical sort of circularity: suppose I don't know whether F is important enough, and no one has any opinions on the subject. How do I decide?

Well, I'd propose the following as one way in part to make the determination. If, when Nupedia and/or Wikipedia are powering along at their most active rate, some years in the future, it is practical for the members of the project to keep track of all facts similar to fact F, then we should keep track of F. For instance, if it is practical, at that point, for us to keep track of all living composers who have had their work performed by some community's orchestra, then we should keep track of each of them. If, on the other hand, this proves to be impossible, then we shouldn't.

But this still leaves it undecided whether any of the "facts similar to F" are important at all. So I'm forced to come up with some sort of rule. But this doesn't seem too hard, if I'm making a rough first guess: we can always say that one fact, F1, is more important than another fact, F2, if F1 has had a greater impact on a greater number of people than F2. Exactly how that is determined is unclear--this formula doesn't help at all with the borderline cases. But it does seem fairly clear that, according to this formula, it's more important that the U.S. exploded an atomic bomb over Hiroshima than that I ate cereal for breakfast this morning.

Now, suppose that, after a great deal more philosophical wrangling, we had arrived at some reasonable and useful account of "adequately important synthetic knowledge." I would then want to point out that, as epistemologists are fond of pointing out, there are two types of knowledge, namely declarative and procedural, or knowledge of the truth of propositions and knowledge of how to do things. With some exceptions, most traditional multivolume encyclopedias have focused on declarative knowledge at the expense of codifying procedural knowledge. But I see no good reason for this, especially for Nupedia and Wikipedia, which do not have the space constraints that older encyclopedias had. (The only thing that constrains size, actually, is the average number of people, working an average amount of time on the thing any given day. This is the main reason why Wikipedia is so much bigger than Nupedia.)

There's an entirely different consideration to bear in mind in characterizing the sort of knowledge that is in encyclopedias. We do not mean knowledge in a strict sense, in the sense that philosophers were after when they produced the old definition, "S knows that p if, and only if, p is true, S believes that p, and S has a justified belief that p." Rather, we mean alleged knowledge, or information taught as knowledge, or, we might say, "educated belief." When postmodernists use the word "know" as in "what this community knows," they are using the term in this sense; it doesn't mean belief, precisely, but rather something more like belief that is generally accepted, by some people, as knowledge or as very probably true. But I'm not going to try to give an account of what sort of knowledge this is (it's a sense of the word that analytic philosophers actually haven't given a lot of attention to, except to say that the sense exists). I'm just going to call it "human knowledge" and move on.

There is one important result, however, of the fact that general encyclopedias codify "human knowledge": it is that it is appropriate that general encyclopedias be written from a neutral point of view. Where one controversial view is presented as fact, or is asserted as being probably true when a substantial number of experts or concerned parties would disagree with that, the reader of the encyclopedia is given a skewed view about what "human knowledge" of the topic consists of. To be given a really accurate view of "how the experts think" about a topic, it is important to represent, fairly, all the views of the experts, whoever they might be.

Getting back to the main task at hand, I would say that encyclopedias codify adequately important synthetic "human knowledge," both declarative and procedural. Let us call this encyclopedic knowledge for short. Encyclopedic knowledge is the sort of knowledge that we ought to find in general encyclopedias.

This might (or might not) articulate what sort of knowledge an encyclopedia contains, but to say what an encyclopedia is, it is not enough to say that it is a text that contains encyclopedic knowledge. I could write one article that codifies some chunk of encyclopedic knowledge, but that would not constitute an encyclopedia. In order to be an encyclopedia, there has to be a lot of articles; and if we're talking about a general encyclopedia, they are articles about everything (well, everything within the constraints of the aforementioned account of encyclopedic knowledge). So we might say: an encyclopedia is a text that contains articles expressing all of encyclopedic knowledge.

There's much more to be said, but this looks like a good first stab. --Larry_Sanger

Encyclopedias have always been related to a whole series of pedagogical functions. In particular they've been used as tools for introducing someone to the group of subjects deemed important by those constructing the encyclopedias. This process is part of the enculturation process, and that means that those who produce encyclopedias have an obligation to present warranted information. The combination of the introductory nature of the pedagogical function of the encyclopedia and the requirement that all information in the encyclopedia have the elusive quality called warrant means that encyclopedias have historically been inherently conservative in terms of what "knowledge" they include.

As far as I can tell the wiki process has exactly the opposite effect. Ward's Wiki developed first as a PaternsRepository, and then became a resource for XP programmers. Both of these fields were new at the time, and they grew up through and around the wiki process, with information on Ward's Wiki receiving warrant through the collaborative wiki process which helped a culture to come to agreement about where to draw the distinction between knowledge and opinion in these fields. Though the technology is new, every academic discipline is defined by a community who follows a roughly similar process.

As I see it, there are contradictory pressures at work on the wikipedia, we want to function both in the traditional pedagogic role of the encyclopedia, but we also want to be a resource for those who want to take it further, and participate with the academic community in understanding the cutting edge knowledge on various subjects. But the very requirement to "say only things all reasonable people can agree to" which has informed the traditional encyclopedia and the pedagogical roles which surround that process can be coercively inimical to the desire create an academic community, which is why people like Peter Wozniak have left the Wiki process, or decided to only commit to creating lightweight, introductory knowledge with no research behind them.

I don't know how we want to deal with this tension, but I think that this is the essential thing we need to do if we want to define what kind of thing the wikipedia will be. That is not to say that I think the distinctions between the kind of knowledge you make in the above article are unimportant, but I think the key question is what kind and how much warrant must a statement have to be included in the Wikipedia, and what kind of structures are we going to implement in order to accommodate both introductory material, and material which may be of interest to more serious students. --Mark Christensen

Mark, it sounds like you are after a more careful explanation of what I was calling "human knowledge." We agree that encyclopedias are not repositories of what just anybody thinks--they're repositories of expert knowledge, more or less, or what passes for expert knowledge. I guess I agree that, to qualify as such, this knowledge has to have some sort of warrant, in the sense that indeed there are certain criteria a bit of information would have to meet to constitute "human knowledge" or "expert knowledge." I'd admit there must be that sort of warrant, but I'm not sure if this is what you mean. Your question seems to be what sort of criteria for warrant we should recognize. In practical terms, the thing to look for is to look at what recognized experts in a field believe. However that might be, you seem to identify "warrant" (in one sense anyway) with a conservative tendency (i.e., to say what the experts believe). Then you say that the wiki format militates against this sort of conservatism and therefore against (that type of) warrant:

As I see it, there are contradictory pressures at work on the wikipedia, we want to function both in the traditional pedagogic role of the encyclopedia, but we also want to be a resource for those who want to take it further, and participate with the academic community in understanding the cutting edge knowledge on various subjects. But the very requirement to "say only things all reasonable people can agree to" which has informed the traditional encyclopedia and the pedagogical roles which surround that process can be coercively inimical to the desire create an academic community, which is why people like Peter Wozniak have left the Wiki process, or decided to only commit to creating lightweight, introductory knowledge with no research behind them.

I think the above contains three (what I think are) confusions, and I hope it will help for me to explain them:

I would say that we aren't essentially engaged in pedagogy here--I won't dispute the historical point (I have no idea whether it's true). Sure, we want to make it as easy as possible for people to learn from our articles, but that does not mean that the best description of the function of this encyclopedia--or many other modern encyclopedias--is a pedagogical one. When I say that it's a repository of human knowledge, I mean it; one can learn from the contents of a repository of knowledge, of course. One can also learn a lot from the contents of a library, but that doesn't make the function of a library primarily pedagogical. One can even strongly encourage that people make it as easy as possible to learn from the contents of the repository, but even that doesn't entail that the purpose or function of the repository is primarily one of teaching students.
I would also say that we are not encouraging "cutting edge" research, though we certainly do encourage reporting on the latest research about nearly anything. This is another subtle but important distinction. It means that if Piotr wants to come in and advance the latest theories about the function of sleep (just to take an example), he is not free to do so. What he's free to do is to report about the latest theories about the function of sleep, hopefully without advancing any one of them as the correct theory (unless, of course, scientists have recently achieved a general agreement on the subject, which many of us find highly unlikely, Piotr notwithstanding).
When you speak of "the very requirement to 'say only things all reasonable people can agree to' which has informed the traditional encyclopedia and the pedagogical roles which surround that process," I'm not sure exactly what you mean. In (3a) and (3b) I will address two different interpretations of it.

3a. We certainly don't have a requirement to try to come up with a single viewpoint on each issue that somehow represents "the reasonable point of view," such that that becomes the official Wikipedia view of the subject. I doubt this is what you mean, but it might be. Whether or not there ever was such a widespread requirement for traditional encyclopedias, we have been tolerably clear that that is not a requirement we have for Wikipedia (or Nupedia). I at least have repeated this point I imagine a half-dozen times. This alleged requirement would represent a misunderstanding of the NeutralPointOfView. I and others have said many times that what we want is fair statements of the different possible views on different controversial subjects--and we leave it up to the reader to decide which is correct. I doubt this is the traditional approach, actually.

3b. On the other hand, in interpreting "the very requirement to 'say only things all reasonable people can agree to' which has informed the traditional encyclopedia and the pedagogical roles which surround that process," you might want me to include fair statements of competing views among the items about which all reasonable people can agree--and in that case, I would say that Wikipedia and Nupedia do indeed have such a requirement (although it's not necessarily connected with any central pedagogical role).

Now that I've explained what I think were the confused assumptions behind your point, let's return to the point itself: "people like Peter Wozniak have left the Wiki process, or decided to only commit to creating lightweight, introductory knowledge with no research behind them." This you regard as a problem, with the items I said were confusions as the cause of the problem. Well, that's interesting. First--without naming names--if indeed there are highly-qualified people who don't feel inclined to write about their specializations for Wikipedia, and the reason for this is that they think Wikipedia is insufficiently accepting of reports about cutting-edge research, they're simply mistaken. Perhaps, indeed, they have misconstrued the nonbias policy (as I explained); or perhaps, as you seem to be implying, they think Wikipedia aims too much at conservative pedagogy. But I think there's a more plausible explanation of any such problem (see below). I also deny that it's much of a problem (also see below).

With all this analysis finished, it should be reasonably easy to understand my reaction to your last paragraph:

I don't know how we want to deal with this tension, but I think that this is the essential thing we need to do if we want to define what kind of thing the wikipedia will be. That is not to say that I think the distinctions between the kind of knowledge you make in the above article are unimportant, but I think the key question is what kind and how much warrant must a statement have to be included in the Wikipedia, and what kind of structures are we going to implement in order to accommodate both introductory material, and material which may be of interest to more serious students.

First, I deny that there is any such tension. I will explain that more soon. Second, I don't think there's any important question of encyclopedia policy that rests on the question "how much warrant a statement must have" in order to be included in Wikipedia. As I said, it seems the only sort of "warrant" a candidate bit of information has to have is the warrant for thinking that it's information regarded as knowledge by some expert on the subject. (More or less.) What the experts think is important information, and we are not better placed to judge on their subjects than they are. Third, I don't know why you think there is a need for any kind of "structures" that will somehow "accommodate" both introductory and advanced material (don't we already accommodate it?); maybe I am simply not understanding, though.

I think your argument can be summed up as follows. Experts arrive at Wikipedia and, unfortunately, they don't want to write about their areas of expertise. This is (you seem to think) because Wikipedia has a tendency toward conservatism and has a pedagogical mission, which waters down the material and drives the experts away. To solve this problem, we should perhaps establish some sort of structures that will accommodate the experts, so they'll feel more welcome.

My reply can be summed up as follows. I am skeptical that there is a problem, and that any such problem is actually caused by what you say it's caused by.

I'm an expert about a few different topics in epistemology and (was, anyway) about aspects of the philosophies of David Hume, Thomas Reid, Descartes, and a number of other philosophers. I have written rather little about these topics for Wikipedia. I now ask myself why. It certainly isn't because I feel somehow put off--that my efforts would not be welcome. I know they would be perfectly welcome. I imagine a lot of it has to do with the fact that there is so much else to do first. How can I write an article about epistemic circularity, for example, when the epistemology area in general is still in very, very rough shape?

I think maybe a lot of the others, who are experts on various stuff but don't write much about that stuff, feel the same way. Why write about the specific polymer you're doing experiments about, or about obscure programming methods you've studied, when the basics of your field still need to be filled in and tightened up?

So there's no serious problem here, I think: it's a good thing that we're filling in the basics first. This gives structure and context to more advanced stuff.

But in a few years, I imagine the basics will be filled in and tightened up in most fields. Then it's going to become a lot more interesting for the experts to participate in their capacity as experts--and I predict that they will participate, too, simply because Wikipedia is fun. --Larry Sanger

Why do you want to exclude analytical knowledge from an encyclopedia? Isn't math analytical knowledge by your definition? --AxelBoldt

It's actually a controversial question whether much of mathematical knowledge is synthetic or analytic (though some of it is certainly analytic if you think anything is--many philosophers deny there's any analytic knowledge). Kant, for example, thought that "7+5=12" was synthetic a priori. Anyway, I overgeneralized--obviously, there's quite a bit of analytical knowledge that should be included--but only as an aid to understanding the synthetic knowledge. It might be better in the end not to try to characterize the sort of knowledge that goes in an encyclopedia as "synthetic," but the main point of doing that would be to exclude mere dictionary definitions. This is an arbitrary distinction, perhaps, but I think it's very useful to use dictionaries to find out the meanings of common words, and to use encyclopedias to discover knowledge above and beyond that, including the meanings of jargon. --LMS

I think that one of the aspects of wikipedia work has to be pleasure taken in writing. Writing in my area of expertise can occasionally be pleasant because I can tell someone things they don't know which I do. However, I know lots about lots of areas that are still worth saying, and I don't worry nearly so much about what I'm leaving out as I do about the early middle ages. Those of us who are teaching faculty all have the same feeling when we are teaching a survey course and need to MOVE ON from the period we are most interested in - we sweat at the thought that the students will never know about X if I don't tell them right now!. I get more immediate gratification writing for wikipedia than I'm getting out of the article due in October which I'm avoiding working on at this very moment. The long-term pleasure from the article may be great, but finetuning the argument is driving me nuts. Hence, I'm thinking about ancient and medieval slavery instead. --MichaelTinkler

Actually, as I understand it, Peter Wozniak feels it is too easy to change detailed information. And when he does, being an expert in his field, someone else changes it, who isn't as expert as he, and the work he put in to make sure the article was technically accurate is lost. So he spends a lot of time re-creating work because there are only a few days of retention on previous versions, so he has to come up with the article from scratch, or he has to keep his own copy of the article to fix the problems introduced by non-experts.

Well. I watched most of the sleep/learning horror unfold, and I was not impressed with Wozniak's diplomacy. No one's expertise is above question in Wikipedia OR in real life, and he did not accept questions graciously, provide fuller explanations, or show a willingness to discuss what he in his expertise considers settled questions. --MichaelTinkler

Would be nice to know who I'm talking to here. I suspect you (like Piotr) are relatively new here, and therefore fail to appreciate what actually goes on. In my daily experience (from the beginning) on Wikipedia, it has been occasionally true that someone who is inexpert in a subject will edit a part of an article written by an expert, and the result will be a degradation of quality. But this is fairly rare; and often, the foul is not serious, or is simply debatable. In point of fact, most people here are reasonably good judges of what they can and cannot credibly write about. They have the politeness and humility not to pretend to be able to write authoritatively on aspects of subjects that are currently beyond their grasp. There are exceptions and everyone occasionally overreaches, but these are very able handled by the overall process--it's very robust.

My understanding of Piotr's case, which to my knowledge hasn't been disputed seriously, is that he insisted on making a page that he wrote reflect only one (of many) views about the purpose of sleep, which is contrary to the NeutralPointOfView. He was rightly called on this, and in self-defense said that he was the expert and the others, who demanded a more balanced treatment of the subject, shouldn't be able to edit his work. I think those of us who objected were very right to object, and that Piotr simply failed to understand what's going on here on Wikipedia. It requires a sort of give-and-take that Piotr, and understandably, many other traditional scholars might not be willing to engage in. If you or them don't like this, you are encouraged to go to http://www.nupedia.com/ and http://chalkboard.nupedia.com/ . But don't complain that Wikipedia isn't more closed--it is very open on purpose. It's what it is because it's open. So don't try to make it more closed!

Notice, the reason Wikipedia is so active and successful is precisely that it is so open. It's perfectly understandable that there are many experts who cannot work in such an environment--and not just experts, but anyone who simply hates the idea that their work can be edited by any passer-by. I think most of us have come to the understanding that Wikipedia does not have authors, per se, but contributors--I and many other people as it were take responsibility for the whole thing. Of course, there are bits of text I care more about (because I know more about them, or because I worked on it). But I am very comfortable with the fact that the article can take a life of its own. This is a good thing. I also think there's ample evidence that the outcome, in the end, will be a lot of really good articles. Over and over again we see the process resulting in balanced, well-informed discussions of this and that. Snooty naysayers have fewer reasons to think Wikipedia cannot produce really excellent content all on its own.

I have been thinking about what Wikipedia will be like in about two or three years. In that time, nearly all the basics of nearly all subjects will be filled in and explored. Dilettantes will find nearly nothing to do--only in increasingly specialized areas will there be room to explore. By then, the encyclopedia will be overrun by open-minded scholars, who look at results rather than degrees, and who love the idea of working together to report on even the most detailed results in their fields.

--Larry Sanger

In science one strives for "operational" definitions: that is, it's meaningless to define words in terms of theories or abstractions--they should be defined in terms of actual physical experiments that one can perform and observe. For example, the meter--the metric unit of distance--was once defined in terms of a single physical artifact, as the kilogram still is. When scientific instruments reached the point where it was possible to count wavelegths of a laser beam emitted from a certain apparatus (and it became necessary to have a definition that precise), it was redefined in those terms. Now it is defined in terms of the speed of light in a vaccuum, because we can now measure that with great accuracy.

An operational definition of "encyclopedia" to me is roughly "Where I go to look up basic information about some subject that isn't my pet subject, but that I assume somebody more interested has already compiled." I go to an encyclopedia when I have a need to know something like "Who was the King of Sweden in 1875?", "What won the Best Picture Oscar 1945?", "What's the difference between a donkey and a mule?". "What is the density of seawater?", "What other films was that actress I just saw in?", "How do I convert pounds to kilograms?", "The news just told me that Justin Wilson died--who was he, anyway?", "Just where is Bosnia, what kind of people are there, and what language do they speak?", "What other books were written by this author I just discovered?"

So the primary value of an encyclopedia article for me is completeness in covering basic facts about things. Not necessarily in-depth analysis (though there's certainly no downside to having that as well), but just who did what when. What is the name, or number, or date I'm looking for? If I run across interesting details while I'm there, that's cool too, but it's got to have the basics.

Coverage of the natural world should be easy. A page for every kind of animal, plant, land formation, weather pattern, planet, chemical element, mineral, important compound, form of energy, etc. History of nations and governments should be easy, as well as important aspects of culture like movies and music--sure, there will be subjective statements in a few of these as well, but as long as the facts are there we have something useful.

Biographies are important, and should be easy. There have been thousands of people important to someone in varying degrees, and many of them are probably quite controversial in some way. I don't expect an encyclopedia to make up my mind, but I do expect it to tell me when and where the person lived, what he wrote or spoke about, what he accomplished, who were his friends and enemies, and what the controversies were about, if any. A picture is always nice.

Abstract concepts are a tricky one: I don't know that I would trust any encyclopedia to give me an unbiased account of what people think of some subject like "capitalism" or "abortion", but I would expect it to tell me that Adam Smith wrote Wealth of Nations, and that Roe v. Wade was decided in 1973.

A lot of my expectations may be constrained by the history of encyclopedias being paper; for example, I would expect an encyclopedia to have an article on The Simpsons that told me who produced the show, who starred, when it aired, etc. But I wouldn't normally expect a separate article about each character. Maybe here I should, and I certainly don't see any reason not to have that info as well. I also don't look to an encyclopedia as a source of English usage, but if there are pages about the English language, why not have a complete dictionary, thesaurus, style guide, etc. as part of this thing "where I look up stuff"? Maybe that shouldn't be where links go by default, but then I'm not used to "links" at all--they form no part of my expectation, so I suppose they should go wherever the author thinks is useful. I don't expect things like movie and book reviews, just synopses. But if a dozen people want to add a review to an article about a movie, why not?

I'm not sure if there's any great insight in any of the above, but at least it's where I'm coming from when I write stuff here, and maybe it's a few good ideas for others to think about when they write here. --Lee Daniel Crocker