Simple ideology of Wikitax

You may read I have no idea which is more representative of current developer team opinion.


A Simple ideology of Wikitax (for Wikipedia3 and Wikipedia4)

Paper Wikipedia is the highest priority, since paper is most universally used and sharable. The Wikipedia DTD should put paper needs foremost—including conserving paper, by making maximum use of compressed text conventions. Those conventions are better understood in electronic media:

Most users encounter email first, then chat (including IRC, MSN, ICQ, and IM), and if they are good at both, they discover weblogs, then wikitext. So, the conventions used in these ought to be respected in that order unless it is truly contradictory to do so—OR unless it builds a Wikipedia database that cannot be validated, maintained, edited and vetted by uniform mechanical means. This requires the spacetime DTD, person DTD and TIPAESA conventions.

Metalingo is a good place to start, if it encourages the use of absolutely unambiguous conjunctions; E Prime verbs "becomes", "remains", "equals" which can be translated for reading purposes into "is", "is then", etc. In effect any use of typed links or language constructs that can be said to imply such links is a rather serious move in this direction.


That's the simple ideology in a nutshell. The implications of the last clause are severe, implying rigorous standards for names, places, time intervals and (most difficult of all) attribution of sources. Since Wikipedia is by far the largest wiki application, and most uses of wiki also involve source checking, or trust in certain contributors, and politics of sysops and bias, it's important to consider the desirable future where the Wikipedia data are widely trusted, and where the Wikipedia conventions of expressing WHO/WHEN/WHERE become universal—I urge you to push hard for strict standards of this in Talk:Wikitax.

The rest of this amounts to rationale and detail:

Email uses many conventions, and almost all users know them. The most obvious are the use of ">" (and ">>", ">>>") marks to denote a dialogue with quotation of prior statements, and the use of "*" to emphasize, as discussed for "*bold*". This necessarily means losing the use of "*" as it presently stands in wikitext but that is justified in the SIMPLE IDEOLOGY because email and chat and some blogs already use * that way. So here's an example where we're discussing it already.

Many obvious conventions like "mailto:" and "http:" and "ftp:" being URLs and presented as links, are also respected on most email and chat clients, and weblogs. These are usually easy to handle.

Where it gets difficult is the presentation of quotations and attributions and such. In an encyclopedia it is particularly important to associate statements with authorities, not just in the text but in the diffs between the texts—done clumsily right now by good old revision control and informal social trust between contributors (who don't revert text from those they consider trustworthy, and do so regularly for those they don't, without much checking either way). It's easy enough to imagine an elaborated diff or history file that lays out the level of 'so and so added this statement at such a time' - the "Older Versions" link does this now, and it could all be presented on a page with alot of parentheses—although we don't do that for newbie's clarity and to continue to exploit conventions (like 'side by side presentation of versions') of the user interface. But in principle, the facts that "Newton said F=MA", "Physicists agree that F=MA (source=Newton)", "(Nontroll added the text that) Physicists agree that F=MA (and assumed source=Newton without justifying it)" are all of the same sort. They are however validated three different ways.

Note, if you want to be a purist then "F=MA" itself, and "(sysop decided that Nontroll was in fact a non-troll)" are also facts of this nature. But that goes a bit beyond the SIMPLE... into the reasons we trust science and sysops, both of which are necessarily part of trusting computer collaboration. So don't go that far, or we have to consider the five different ways we validate things here. Not the point of wikitax.

The difference between Wikitax and say Weblogtax is probably the ease with which we express these source and validation relationships, and how standard (global) we expect expressions of names, times, places to be. We already standardize these, and a name, a time, and a place, should be marked as such in the Wikitax (and ultimately in the XML DTD)—if only to improve tools for the validation I mention above.

A great proportion of the content of the Wikipedia is names, places, dates/times, and titles (e.g. King, physicist, Elvis impersonator). It would be truly ideal if the XML DTD let us deal with all these in a standard way. For instance 1999 refers to the calendar year 1999 in the Gregorian Calendar, but there's no reason not to allow exact dates to be converted to other calendars. Adopting a default representation of UTC Time Intervals (from absolute time A to absolute time B, or absolute time A plus time interval C=1 day, 1 month, etc.) and shorthand for referring to those; e.g. "January 2003" refers to UTC January 1, 2003 (absolute) plus 1 month, isn't that hard to do. More complex issues like unfolding events (history of the en:United Nations for instance, which is still going, human lifetimes that continue), calendar conversion, etc., can be dealt with as long as this is exact. Without accurate time semantics (a spacetime DTD), everything else falls apart in a hurry.

The second most important thing to get right is locations and place-names. As borders shift over time, what is in 'Czechoslovakia' or the 'Austro-Hungarian Empire' in one article is in 'Slovakia' now... this is another major headache. Eventually we'll have nice latitude-longitude maps of the actual borders at key points in time, and towns and villages will be located by some automatic means.

Then, third, we have to rigorize conventions regarding names so that we are not using multiple names for one person, and can compile biographies and timelines of associations, without extreme effort, in some cases straight from evidence. A person DTD and following strict TIPAESA conventions for reporting a source or authority in dispute, is the best approach.

So, with these constraints, that conventions for names, times, places, sources, must be rigorous enough to stand use in the global peer-reviewed Wikipedia, the user interface can happily follow paper, email, IM/IRC, and weblog usage for other things—or by default the existing wikitext conventions, which are pretty reasonable for links etc. It would be good to add simple 'source attributes' to links, so that the idea that A is linked to B can be viewed the same as the idea that B should be referenced in a file about A. So some convention that lets you state the source of something right in the link itself, even if that source is just another article, is good enough. That lets me tie F=MA to an elaborate article on Newton's Laws, without forcing me to make mention of that in every article—and giving the user the option of looking at something say, historically or philosophically, or scientifically, and going to different places based on why they're reading the article.

This also integrates the Older Versions, as one can actually present the whole article in ALL VERSIONS THROUGH TIME as a single complex article using these conventions; i.e. each section added by ATroll (and later deleted) appears as a link to a version with that text added back, with the source listed as "ATroll". One set of conventions therefore covers naming of contributors and those we write about in Wikipedia, presentation of old versions of articles and old versions of theories, etc. Real processing on the whole database is possible.

This starts towards a w:Semantic Web ideal for Wikipedia, but should go no further than the above: names, places, times, and sources with special status so that, in the long run, older versions and original source material, users and celebrities, all have identical status in the system syntactically. Beyond that, we're best off pandering to conventions that people already know very well.

BUT, to stay SIMPLE, an IDEOLOGY must not try to cover things it can't cover, nor can it pander to views that would compromise the overall results to the point where the SIMPLE IDEOLOGY is rejected. As Einstein said, "Things should be as simple as possible. But no simpler." As Alan Kay said, "Simple things should be simple. Complex things should be possible." They're both right.

So, blind importation of weblog conventions is a BAD THING, if it cripples the key functions of attribution and source mapping, and makes us not an encyclopedia. It would be a shame to have to fork the Wikitax just because these key considerations of an encyclopedia weren't taken into account early.

The simple ideology probably cannot be implemented fully in Wikipedia3 but should be a very high priority in the development of Wikipedia4. That opinion is reflected in the Wikipedia4 timeline. Edit it there, not here.