User:Duesentrieb/XML
An XML representation of wiki markup seems to be a frequently requested feature and has been much discussed. I decided to give it a shot - I'm aware that several people have started owring on it, but I have decided to do a few things differently. Most importantly: It's already working.
Here are the most important differences:
- Do not use XML as an intermediat format to create HTML output
- Do not rewite the parse
- Do not build a full parse tree in memory
- Separate "renderer" (back end) logic from "parser" logic (front end), i.e. factor out knowledge about HTML from the parser into a separate "renderer" class.
- Provide a "symbolic" representation (no template substitution ,etc) as well as an "expanded" view.
- Make the back end versatile enough to be able to produce different types of XML as well as TeX, etc
- Provide an interface to the different back ends. This is done via ...?action=convert&converter=TeXRenderer, etc.
This design allows us to move more logic into the renderer(s) step by step - right now it consists mainly of getter- and maker-functions, the output state and text is still maintained in the parser. This could change over time.
I have tried to benchmark my hacked parser in normal HTML mode against the old (current) parser, and could not find much of a difference - it seems to be nice and fast.
Some other things that need some more thought:
ToDo
edit- Have a formal DTD (or better: Schema)
- Integrate with XHTML (is the schema mixing OK?)
- deal with named entities.
- Change markup around (this is easily done)
- Templates directly after headings behave strangely
- ...
Related Stuff
editExample Output
editHer's an example of what the XML renderer produces so far (in "symbolic" mode, i.e. without substitutions):
XML rendering of en:Main Page (the pretty printing is for convenience here, it's not done automatically):
<?xml version="1.0"?> <mw:wikitext xmlns:mw="http://wikimedia.org/schemas/wikitext/1.0" xmlns:xhtml="http://www.w3.org/1999/xhtml" version="1.5.0"> <xhtml:p> <mw:include ref="MainPageIntro"/> <xhtml:div style="padding-bottom: .3em; margin: 0 .5em .5em"> <mw:include ref="Main Page banner"/> </xhtml:div> </xhtml:p> <xhtml:table cellspacing="3"> <xhtml:tr valign="top"> <xhtml:td width="55%" class="MainPageBG" style="border: 1px solid #ffc9c9; color: #000; background-color: #fff3f3"> <xhtml:div style="padding: .4em .9em .9em"> <xhtml:h3>Today's featured article</xhtml:h3> <mw:dynamic-include> <mw:link-target>Wikipedia:Today's featured article/<mw:var-ref ref="CURRENTMONTHNAME"/> <mw:var-ref ref="CURRENTDAY"/>, <mw:var-ref ref="CURRENTYEAR"/></mw:link-target> </mw:dynamic-include> <xhtml:h3>Selected anniversaries</xhtml:h3> <mw:dynamic-include> <mw:link-target>Wikipedia:Selected anniversaries/<mw:var-ref ref="CURRENTMONTHNAME"/>_<mw:var-ref ref="CURRENTDAY"/></mw:link-target> </mw:dynamic-include> </xhtml:div> </xhtml:td> <xhtml:td width="45%" class="MainPageBG" style="border: 1px solid #c6c9ff; color: #000; background-color: #f0f0ff"> <xhtml:div style="clear: right; text-align: left; float: right; padding: .4em .9em .9em"> <xhtml:h3>In the news</xhtml:h3> <mw:include ref="In the news"/> <xhtml:h3>Did you know...</xhtml:h3> <mw:include ref="Did you know"/> </xhtml:div> </xhtml:td> </xhtml:tr> </xhtml:table> <xhtml:div class="MainPageBG" style="padding: .5em 1em 0; margin: 0 3px 3px; border-bottom: 2px solid #ccc"><xhtml:h3 id="lang">Wikipedia in other languages</xhtml:h3> You may read and edit articles in many different languages:<mw:include ref="Wikipedialang"/></xhtml:div> <xhtml:div class="MainPageBG" style="padding: .5em 1em 1em; margin: 3px;"> <xhtml:h3 id="sister">Wikipedia's sister projects</xhtml:h3> <mw:include ref="WikipediaSister"/> <xhtml:div style="clear:left"/> </xhtml:div> <xhtml:div class="MainPageBG" style="border: 1px solid #ffad80; padding: .5em 1em; color: #000; background-color: #fff7cb; margin: 3px 3px 0; text-align: center"> <xhtml:div style="font-size:90%"> <mw:include ref="donate"/> </xhtml:div> </xhtml:div> <mw:magic-word name="__NOTOC__"/> <mw:magic-word name="__NOEDITSECTION__"/> <xhtml:div class="MainPageBG" style="padding: .5em 1em 0; margin: 3px 3px 0; text-align: center;"> <mw:include ref="newpagelinksmain"/> </xhtml:div> </mw:wikitext>