The art of Wikipedia weeding

Wednesday, September 26, 2001, 11:38 AM -- The most meaningful and important thing anyone can do to contribute to Wikipedia is to write a long, accurate, well-referenced, meaty article.

But it's also very important to do "Wikipedia weeding"--particularly when we have disproportionate influxes of new people who are eagerly, willy-nilly, contributing scads of new entries. We love these people, but they need teaching. Wikipedia weeding consists essentially of checking over work and making small but important edits and, occasionally, adding comments and questions. Here, for your edification, is my advice on how to perform Wikipedia weeding yourself.

Here are things I look for (cf. w:Wikipedia:most common Wikipedia faux pas):

  • The Recent Changes page. It's most effective to weed based on what comes up on Recent Changes. People are looking to see what other people have done to their articles; this is a prime opportunity to teach by example (and teaching by example is that than which nothing is more wiki). One can also weed by repeatedly following the Random page link and working on whatever comes up. This latter can be fun!
  • Look for new names and ISP numbers. It sounds like anti-newbieism, and maybe it is, but very often the people who add the most dross to the project are the newest people. One of the finest services you can perform is follow these people around and clean up after them. But, in explaining your changes (if necessary), we do not want to make them feel unwelcome; please be as gentle as you can with them.
  • Bad titles. Titles can be improperly capitalized (should be lower case unless the word in the title is always capitalized), and they can be ambiguous. It can help to tell people to study naming conventions. If I don't know where a page should be located, I simply append a small italicized note at the end of the page. If I do know where it should be located, I make a redirection page.
  • The start of articles. A pet peeve of mine is the tendency to repeat the subject of an article on its own line (it's already at the top of the page) and to make the first sentence of the article a partial sentence. So I (compulsively, I admit) convert such entries to begin with full sentences, with the subject of the article in bold. See G. E. Moore [ en:George Edward Moore ] for an example of how this is done.
  • Articles made to look like dictionary definitions. Another pet peeve of mine are entries that have different senses of the title word numbered--as if we were writing a dictionary (which we are not!!) and we needed to number the senses. Goddammit, Jed wishes we had parentheses (Magnus, we need to get your software fixed!). Anyway, I remove the numbers, properly format the separate articles on the page, and put a line between them. I also, mercilessly, delete any brief appendages to the effect, "In English, this word is also used to mean X," where the word-in-the-sense-of-X ain't ever going to be an encyclopedia article. See /Why a list of the senses of a word is not an encyclopedia article.
  • Copyediting. It might be unfair and silly, but Wikipedia is going to be judged based on how well we spell, punctuate, etc. So I clean articles up that way. Blatant copyediting mistakes are, even if trivial, nevertheless indefensible; mistakes in content can often be defended on grounds of ambiguity. So if somebody who cares about how is used spots a bunch of copyediting mistakes, he'll easily be able to conclude the product is shoddy; on the other hand, if the same person sees few such mistakes, and content that is largely correct, with a few overgeneralizations and half-truths, his suspicions of shoddiness will be less certain. (Just some idle hypothesizing there.)
Content before grammar!
  • Fix bad links. Some newbies, caught up in the excitement (and who can blame them), wikify everything in sight, including plurals, capitalized words that shouldn't be capitalized, ambiguous words and surnames only, etc. Hence I often make it my mission to fix or remove bad links.
  • Remove patent nonsense, etc. I sometimes find myself simply deleting entire sentences and even paragraphs. This has to be done carefully, though, of course. Sometimes it's just vandalism, and no excuses need to be made to remove that. Sometimes it's something that seems to have been written by a 14-year-old whose main concern is to express excitement about a hobby, but conveys literally zero information. The possibilities of useless text, indeed, are endless. Another possibility is completely, blatantly biased stuff. If I think I don't have time to correct it, and if the bias is extreme, but the content is useful, I'll move it to a talk page and say "this needs to be de-biased" or something like that.
  • Check for and if necessary remove copyrighted stuff. Basically, if some new person (or a person who hasn't signed in) writes some fantastic prose, I instantly copy a string of four or five words from it and, in quotes, see if Google recognizes it. If so (which happens more often than I'm comfortable with), I check on the source page for a copyright notice. If there is no indication that the text is public domain or released under the GNU FDL, I either remove the text on the spot, giving the URL where I found the text on a talk page, or (if there's some question) I append a note asking where it came from, or whether it's copyrighted, etc. If I receive no reply, I delete the text.
  • Light content editing. If, while doing the above, I come across some statement I can make factually correct, or I can add some essential piece of information or remove some clear error, etc., I'll do that.

I probably do this sort of editing more than any other simply because it seems to me that others aren't doing it enough (yet). It is, again, probably not the most important way one can use one's time on Wikipedia (there are many ways to help, of course)--but it is definitely essential work. If we don't do it, Wikipedia is going to start looking more and more like Everything2, and I swear I'll kill myself if that happens.  :-)

--Larry_Sanger


I am one of those newbies, but I found myself weeding (carefully) almost at once. I'd like to make a suggestion: If the /Talk page hasn't had any traffic for over a month AND the issue people were talking about has clearly been resolved and fixed, can we delete it? Or does eveyone feel that /Talk pages have significant historical value and should be held onto at all costs? -- clasqm

It depends on the circumstances in my opinion. If the /Talk discussion resolved some trivial issue, say a typo, then it can be deleted. But if the /Talk discussion resolved some fundamental questions, then don't delete. In the latter case the 'historical' /Talk may help to avoid that the same discussion arises again. --css

I remove things from /Talk pages when they aren't relevant anymore. If a change or addition is discussed for a while and then done on the main page, then I think the discussion should be removed as well. --Pinkunicorn


I think I'm counted as a newbie (a week or so, now), and one of the things I've been doing, from the beginning, is this sort of tidying. Not on the "blatant nonsense" level, so far, but fixing spelling, grammar, and such. Then again, how many wikipedia newbies are copyeditors? This is something I find both easy and rewarding, so I do it.


Sorry, I didn't mean to imply that "newbies" couldn't do a lot of very useful weeding. Of course they can! --LMS


Back to Larry's columns