- Abstract: PDF / OpenOffice
- Paper (GFDL): PDF / OpenOffice
- Presentation: PDF / OpenOffice
- Wikimania talk: OGG audio / OGG video
I would like to present a project that aims to apply techniques of data-mining and knowledge-management to the Wikipedia corpus. The idea is to extract semantic relations directly from the link structure, as opposed to trying to analyze natural language. Wikipedia is an excellent basis for such an analysis because every node in the web of links represents exactly one topic. The results may be used to benefit the Wikipedias and other Wikimedia projects. Key points are support of multilingual features and computer aided structuring.
From the analysis I hope to create a network of topics and their relations, which could be seen as a semantically rich dictionary or basic ontology. This would include relations on the lexical level (synonyms, homonyms, flexions, translations) as well as on the semantic level (is-a, element-of, component-of, opposite-of).
The first step is a broad classification of pages (disambiguation, redirect, navigation/list, real topic page, etc). After that, links to other pages are analyzed, using collocation and cluster analysis. Interlanguage links will provide a useful basis for building a translation dictionary. Categorization will be looked at separately: some categories provide information about a specific facet of a topic, such as is-a, geographic location, timespan, etc. Also, categorization has to be handled as a transitive relation.
Additional information can be derived from simple pattern matching on the pages. Boilerplate elements like the townbox are especially helpful for that.
The data gained from this analysis may be used to enrich existing semantic dictionaries like WordNet and Wortschatz. In combination with ontologies like OpenCyc it may be used for automated text analysis, reasoning and translation.
This data may also be used to automatically generate or propose entries for the Ultimate Wiktionary and help to improve structural features like categorization, especially with respect to multilingual projects like the Wikimedia Commons.
- Using Ultimate Wiktionary for Commons - Wikimania Presentation (GerardM)
- WikiData (Erik Möller) - MetaData (Jakob Voss)
- SematicWeb (Markus Krötzsch et.al.)
- Original notes for this paper (german)
- some thoughts about sematic relations in a wiki
- suggestions for a RDF plugin for MediaWiki
- WikiMedia Research: meta:Research
- WikiMedia Wikipedistik: de:Wikipedia:Wikipedistik and de:Wikipedia:Wikipedistik/Bibliographie
- Francesco Bellomi http://www.fran.it/blog/
- Rudi Cilibrasi http://www.arxiv.org/abs/cs.CL/0412098 und http://www.newscientist.com/article.ns?id=dn6924
Paper (Draft) edit
I would like to present a project to automatically extract topics and their sematic relations from the structure of Wikipedia and other projects. The idea is to apply techniques of knowledge-management and data-mining to the data in the wikipedias, focussing on the link structure as opposed to the analysis of natural language. The Wikipedia a a very nice dataset for this kind analysis, because (nearly) every page describes exactly on topic. From this analysis I hope to create a network (graph) of topics and terms and their relations. This could be used for to create sematically enriched dictionaries or ontoliges for knowledge represantation, machine reasoning and automated translation. The results may also be used to benefit the Wikipedias and other Wikimedia projects. Key points are support of multilingual features and computer aided structuring.
The goal is to create a software that is able to analyse a MediaWiki database (or live site) and build a database that contains topics, terms (words) and semantic relations between those. All relations have a type and a confidence level.
The most important relations are:
- Synonyms (redirects) and flexions (grammatical forms)
- Homonyms (disambiguation)
- Translations (interwiki)
- Hyponyms (generalization)
- Classification (is-a / instance-of)
Some more advanced relations are:
- Elements (member of)
- Components (part of)
- Antonyms (opposite of)
- Association of a time or timespan
Additionally, it may be possible to extract properties of things from the articles, using pattern matching. Some of the properties easily extracted from the existing data include:
- For people: Date and place of birth and death
- For places: Geo-Coordinates, geographics and political association
Classification of pages Pages can be classified relatively easyly be looking at templates, categorization and some standard text. Main classes of pages are: Topic pages (real encyclopedia articles) Disambiguations (Homonym sets) Redirects (Synonym definitions) Lists and other navigations (this needs some heuristics) Portals (Topic overviews, often also used for maintanence)
Also, bad pages could be excluded from the analysis, for example:
- Dead end pages
- Deletion candidates
- Very short pages (maybe)
- Disputed pages (maybe)
Classification and Evaluation of Links edit
Most links can be classified by syntax or namespace. For this analysis, external links and links to pages in other namespaces can be ignored. There remain:
Normal Links to other pages edit
Links to other pages are the main way of building the semantic map: they are considered to represent a connection to related topics, although the exact nature of the relation is unknown: they are not interpretet as sematic relations, but as syntactic associations (colocations) from which sematic relations may be inferred.
Note that while sematic relations have a type and a confidence level, syntactic associations have no type but may have a weight. Links can be weighted by their position in the text (Links in the first sentence or paragraph are mor important, as are links in the „see also“ section or links that are bold).
In addition to the pure structure, the link text also gives us information we can use: the link text is in most cases a synonym, hypernym (sub-topic) or flexion of the name of the topic it points to.
Interlanguage Links edit
Interlanguage links are very important because they are a way to extract translations of a term into different languages. But we have to keep in mind that the Wikipedias have different levels of grannularity, so the link may point to a more general topic, i.e. the link may give us the translation of a hyponym (generalization) of the local topic.
Categorizations of pages (topics) may be used to classify a page, but usually already give us a valuable hint about sematic relations. Basically, it represents the relation „is-sub-topic“, which may then be narrowed down by more specific information.
Also, categories are topics temselfs – and usually more central ones. The information given by the category/subcategory structure can help to build sematic relations efficiently. (I also have proposed to drop the distinction between pages an categories) This however requres us to classify categories by the way their members relate to them. This is similar to the „facette-classification“ scheme discussed when categories where first introduced. Most importantly the following facettes should be identified:
- Fields of research (like maths, politics, etc)
- Space attribution (geographic categories)
- Time attribution (time categories)
- Classification categories (is-a relations)
The current practice is however to use categories with mixed sematics, which makes them difficult to use for the sematic analysis. For example, geographic categories like [[Category:Germany]] not only include places in germany, but also german people, german food, german history, etc (accordingly, this category should really be called „german“). This can be expressed by assigning multiple sematic relations to a category, each with a low confidence level.
Other categories are quite clear with regards to their sematics: [[Category:german people]] contains only people, so all topics in that category can get a is-a relation to the person topic with a high confidence.
Pattern matching on page content edit
Pattern matching is another important way of extracting sematic structure from an article text. It is not as sophisticated as natural speach analysis, but uses simple matches (regular expressions) to filter out some information. A good example is the time and place of birth and death of people.
Another example would be structures like townbox and taxobox – from those broilerplate elements, it's easy to extract properties and relations to other topics with heigh confidence. Also, the mere presence of a townbox implies that the page in fact describes a town, resulting in a is-a relation to town with a high confidence level.
We can also apply a simple collocation analysis to the link structure: for example, two pages that link to the same page can be considered neighbours. If both pages both link to several pages, the neighbour-relation would get more weight. Based on this we can calculate the similarity of two pages by analysing how similar their weighted neighbour sets are.
This may be used to build clusters of pages, representig topic-areas or categories. By this we could for instance aid categorization: If most pages in a cluster are in a specific category, it is likely that the others should be in that category (or a subcategory) too.
From the information gathered, it is possible to conclude new relations. Specifically, some relations are transitive, and some relations imply others. This way we can gain some knowlegde about the relations of things in the Wikipedias that was never entered there explicitely. Because some relations are uncertain however, we must be careful to perpetuate the confidence level when concluding new relations from old ones.
The analysis proposed here would result in a large set of „concepts“ and relations between them. Concepts can also have attributes (like date of birth/death for people, etc). The relations extracted can be categorized into lexical relations (translations, synonym, homonym, etc), basic sematic relations (is-a, part-of, etc) and maybe some high-level sematic relations (is-place-of-birth-of, is-author-of, etc).
The lexical relations may be useful for dictionies like Wiktionary. It may even be possible to automatically generate (or suggest) entries. The sematic relations could be added to existing ontology systems and sematic dictionaries (like WordNet, OpenCyc and Wortschatz), that specialize in handling such relations. Ontology systems are often used for AI-related „soft“ applications, like automatic text analysis and topic recognition, machine translation or expert systems.
For the Wikimedia projects, this could help building tools that could help structuring content (semi-)automatically, for instance by suggesting categories based on text content. The translations discovered could be used implement a multilingual search or automated translation of category names. The latter was already suggested to be managed via the Ultimate Wiktionary – wich would be a good place to store those relations.
Templates are good edit
From the perspective of analysis by a program, templates are a very good thing: they can easily be recognized in a text, and often represent an important attribute of the article's topic: for example, if an article contains a townbox template, it can be assumed to be about a town. Even better: template-parameters can often be interpretet directly as properties of the concept in question – a good example for this is the „Personendaten“-template („person-data“) in the german Wikipedia.
Categories are weak edit
Categories are currently used to express different relations at once, which often leads to confusion, both for people and for the automated analysis. A typical question would be, if the category „school book“ should contain books only, or also authors of school books. Relations often expressed by categories are: is-a, part-of, component-of, and subclass-of. However, other relations are common too, like the categorizations of musicians by genre.
Categories also pose the problem that they force us to have two separate entities for a single concept: an „article“ and a „category page“. This leads to the notion a „main“ article of a category, wich is really unnecessry: It would be much clearer if the „member“-articles would simply be „assigned“ to the „main“ article. That way, the „category“ would be a purly logical concept.
Ideally, articles could be „assigned“ in several ways, i.e. there would be several different relations possible between articles, like the once mentioned above: is-a, part-of, and so on.
Support for semantic relations edit
Support for sematic relations like suggested above would solve many problems the category system currently posts to users. It would also add another valuable source of information for automated analysis. When aided by automated suggestions, this would help to structure the Wikipedias. Relations could be defined using a similar syntax like for categories: [[is-a:city]], [[is-in:Germany]], etc. The set of possible relations should be configurable, like namespaces.
As a long-term perspective, it would be possible to define new relations just like pages. It would then become possible to also define relations between relations, for example that it is impossible to have [[is-a:book]] and [[is-a:person]] in the same article, or that [[is-in:happy]] makes no sense. This would mean to build an ontology, wiki-style.
Machine-Readable Wiki: RDF & co. edit
RDF is powerful standard to express relations between objects. On top of a very simple relational model, it defines properties, collections, class hierarchies, etc. It is often used for meta-data like license, authors, source, date, which would be a good idea for the wikipedia. Furthermore, it can also be used to express category listings, backlinks and the like.
RDF would be useful for the automated analyses in two ways: as a method of representing the result, and, more importantly, as a easy to parse data source. This is especially important when one wants to analyse only individual srticles or categories, without the bloat of the full database. RDF as an output format for all kinds of generated lists would also benefit bot development – it would be for instance a good way to list all contributors to an article, which is currently quite difficult.
It can be concluded that it is relatively simple to extract information about the semantic relations of topics from the data present in the Wikipedias, even without actually analysing natural speech. Also, a lot of lexical (dictionary) information like synonyms, translations and inflections can be extracted.
However, the semantic structure is limited to relatively few types of relations. To gain a more fine grained view, existing ontologies could be used. Generally, the set of data produced from the analysis is already quite interresting, it's best use would however be to be combined with existing semantic databases like Cyc, WordNet or Wortschatz.
The data gained from the analysis can also be used to aid the structuring of existing Wikipedias, and to support a better multilingual interface.
Clusters (nds:wikipedia) edit
Some results of a simple cluster analysis of the Low German (nds) Wikipedia, which has ca. 2000 Articles and 17000 (blue) links. The results work especially well for geographic articles, as apperently those are well developed in the nds wikipedia. Also, there is one very big cluster, which indicates that the algorithem and the threshold values it uses should be tweaked some more.
- Anguilla | Aruba
- Grootbritannien_un_Noordirland | Powys | Grootbritannien
- Kööm | Beer
- 1894 | 1945
- Mars | Sünn | Maand
- Protozoa | Protisten
- Zabrze | Chorzów
- Westerwolds | Twentsch | Noord-Veluws
- Argentinien | Brasilien | Bolivien | Venezuela | Ecuador | Paraguay | Lima
- Chile | Peru | Surinam | Kolumbien | Uruguay | Guyana
- Blinker | Kieker_(Reekner)
- Tallinn | Viljandi
- Ravenna | Theoderich_de_Grote | Theoderich
- Kuba | Haiti | Jamaika | Antigua_un_Barbuda | Grenada | Dominica | Dominikaansche_Republiek | St._Lucia | St._Kitts_un_Nevis | Karibik
- Bahamas | Barbados | Trinidad_un_Tobago | St._Vincent_un_de_Grenadinen
- Afghanistan | Irak | New_York | Bagdad | Terrorismus | Jordanien | Kuwait | Katar | Jemen | Kirgisien | Thailand | Saudi-Arabien
- Libanon | Bhutan | Syrien | Süüdkorea
- Malaysia | Indonesien | Myanmar | Tadschikistan
- Kambodscha | Laos | Vietnam
- Afrika | Antarktis | Algerien | Ägypten | Libyen | Tunesien | Marokko | Angola | Botswana | Sambia | Somalia | Tansania | Mosambik | Sudan | Malawi | Ghana | Gambia_(Land) | Äthiopien | Elfenbeenküst | Sierra_Leone | Süüdafrika_(Land) | Saint_Helena
- 773 | 772 | 776
- Namibia | Liberia | Mauritius | Simbabwe | Kap_Verde | Guinea-Bissau | Swasiland
- Atom | Anion | Anorganisch_Chemie | Blood | Base | Chemie | Chemische_Reaktschoon | Chemisch_Formel | Chemisch_Verbinnung | Chemische_Grundbegrepen | Elektrolyse | Energie | Elektron | Gaumookerphysik | Ion | Ioniseern | Iesen | Kation | Kohlensüür | Kohlenstoff | Molekül | Metall | Mineralogie | Noorddüütsche_Affinerie | NaCl | Natronlaug | Suerstoff | Solt | Swevel | Sülver | Süür | Theoter | Water | Waterchemie | Waterstoff | Tinn | Swefelsüür | Elektronik | Kiel_(Schipp) | Kopper | Kristallwater | Koppersulfaat | Proton
- Maandag | Dingsdag | Dunnersdag
- Freekark | Liste_vun_de_Freekarken_in_Düütschland
- Lyrik | Literatur
- Mali | Niger | Nigeria | Madagaskar | Guinea | Burkina_Faso | Tschad | Dschibuti | Zentraalafrikaansche_Republiek | Demokraatsche_Republiek_Kongo | Nairobi
- Chemisch_Element | Chemisch_Stoff | Nichtmetalle | Periodensysteem | Sott
- Kanada | Terror | USA | Belize | Mexiko | Honduras | Panama | Nicaragua | Guatemala | Costa_Rica | El_Salvador | Jazz | Blues | Charlie_Parker
- Botanik | Eukaryota | Zoologie | Beest | Archaeen | Bakterien | Anatomie
- Australien | Nauru | Kiribati | Niegseeland | Palau | Vanuatu | Tuvalu | Tonga | Samoa | Fidschi | Marshallinseln | Oosttimor | Papua-Niegguinea | Salomonen | Mikronesien | Ozeanien
- Rumäänsch | Istrorumäänsch | Dakorumäänsch
- Albanien | Andorra | Adam_vun_Bremen | Amerika | Asien | Anatolien | Ankara | Athen | Araabsche_Spraak | Atlantik | Belgien | Bosnien-Herzegowina | Bulgarien | Berlin | Barg | Baden-Württemberg | Bayern | Bewick | Billers | Baltikum | Preßburg | Christoph_Kolumbus | Düütschland | Däänmark | Düütsche_Spraak | Danzig | Däänsche_Spraak | Europa | Etymologie | Estland | Eerdeel | Eider | Eurasien | Finnland | Frankriek | Flüsse | Franzosentiet | Geographie | Grekenland | Gröönland | Grunneng | Groningen | Grunnengs | Gallien | Gaius_Julius_Caesar | Griepswohld | Hööftsiet | Hööftstadt | Hansetiet | Hessen | Hannober | Hanse | Hinnerk_De_Leuw | Ilv | Ingelsch | Italien | Irland | Island | Iesenbahn | Insel | Kark | Königriek | Klenner | Kroatien | Kiel_(Stadt) | Königsbarg | Klaipeda | Kirow_(Stadt) | Latiensche_Spraak | Lettland | Litauen | Luxemburg | Liechtensteen | London | Ljouwert | Lübeck | Labskaus | Monarkie | Middelöller | Mekelnborg-Vörpommern | Makedonien_(Land) | Malta | Moldawien | Monaco | Montenegro | Makedonien | Moskau | Middelamerika | Memel | Nedderlannen | Nokieksel | Neddersassen | Noordamerika | Norwegen | Noordsee | Nedderlandsche_Spraak | Oostfreesland | Ole_Tiet | Oostsee | Plattdüütsch | Polen | Plattdüütsch_Vokabular | Plattdüütsche_Orthographie | Portugal | Paris | Plautdietsch | Pommern | Religion | Religionen_vun_de_Welt | Russland | Römertiet | Rumänien | Rom | Röömsch_Riek | Rhienland-Palz | Rostock | Rügen | Riga | Swiez | Sassen | Sweden | See | Sleswig-Holsteen | San_Marino | Serbien_un_Montenegro | Slowakei | Slowenien | Spanien | Süüdamerika | Sibirien | Sankt_Petersborg | Sassen_(Bundsland) | Sassen-Anhalt | Städer_up_de_Eer | Stettin | Swerin | School | Soziologie | Skandinavien | Steentiet | Tschechien | Törkie | Transsibirisch_Isenbahn | Thüringen | Ukraine | Ungarn | Vatikaan | Vilnius | Wikipedia | Werser | Wittrussland | Warschau | Weströömsch_Riek | Zypern | Österriek | 1998 | 1492 | 395 | 596 | Spraken_vun_de_Welt | Bronzetiet | Upnohm_vun_niege_EU-Länner | EU | Republiek | Wetenschop | Stadt | Hamborger_Platt | Westfäälsch_Platt | Ollnborg | Ostnederdüütsch | Neddersassisch | Westfalen | Noordrhien-Westfalen | Niege_Hanse | Brannenborg_an_de_Havel | Kaliningrader_Oblast | Maschinenbu | Fritz_Reuter | Schriever | Football_EM_2004 | Bundsland | Kiel | Ingväonsche_Spraken | Fluss | England | Horst_Köhler | 2004 | Bunnspräsident_(Düütschland) | Johannes_Rau | Roman_Herzog | Theodor_Heuss | Heinrich_Lübke | Gustav_Heinemann | Walter_Scheel | Richard_von_Weizsäcker | Karl_Carstens | Bunnskanzler_(Düütschland) | Konrad_Adenauer | Ludwig_Erhard | Willy_Brandt | Helmut_Schmidt | Helmut_Kohl | Joschka_Fischer | Bunnsministerium_för't_Verdeffenderen | Gerhard_Schröder_(CDU) | Hans-Dietrich_Genscher | Klaus_Kinkel | Vizekanzler_(Düütschland) | Franz_Blücher | Jürgen_Möllemann | Erich_Mende | Hans-Christoph_Seebohm | Peter_Struck | Rudolf_Scharping | Volker_Rühe | Gerhard_Stoltenberg | Rupert_Scholz | Manfred_Wörner | Hans_Apel | Franz_Josef_Strauß | Kai-Uwe_von_Hassel | Theodor_Blank | 15_April | CDU | Bonn | Westplatt | Franzöösche_Spraak | Rhien | Oste | Donau | Haven | Wikinger | Japan | Balje_(Neddersassen) | Nordkehdingen | Südkehdingen | Amtsspraak | Stralsund | Middelmeer | Kosovo | Soest | Armenien | Aserbaidschan | Georgien | München | Bangladesch | Indien | Malediven | Mongolei | Brunei | Kasachstan | Nepal | Sri_Lanka | Singapur | Republiek_China | Noordkorea | Usbekistan | Philippinen | Turkmenistan | Volksrepubliek_China | Palästinensische_sülvstregeerte_Rebeden | Pazifische_Ozeaan | 1969 | 1963 | SPD | Provinz_Grunneng | Bozen | Düsseldörp | 27._April | Göttingen | Stavanger | Noordneddersassisch | Okzitansch | Breslau | Baukem | Meideborch | Kraków | Elbing | Frauenburg | Bergkamen | Sławno | Darłowo | Kamen | Breckerfeld | Lünen | Werne | Kaunas | Unna | Dööp | Germaansche_Spraken | 2005 | Essen | Nikosia | Hebrääsche_Spraak | Fröndenberg | Schweierte | Mennoniten | Hattingen | Swelm | Gevelsberg | Jesus_Christus | Ventspils | Smolensk | Nedderdüütsch | Rheine | Naugard | Provence | Frohnhausen | Gereformeerde_Kerken | Kornelius_Wiebe | Neuapostolische_Kirche | Pskow | Belosersk | Ruihen | Havelberg | Werl | Tangermünde | Demmin | Stendal | Werben | Osterburg | Lippstadt | Dollar | Euro | Masowier | Balve | Neuenrade | Holsterhausen | August_Hinrichs | Theodorianum | Poitevin-saintongeais | Hoochdüütsch | Warburg | Płock | Serbokroatsch | Spraken_in_Frankriek | Dülmen | Arnsberg | Altena | Penthouse | Coesfeld | Werner_Heisenberg | Nikolaus_Kopernikus | Peckelsheim | Städer_in_Polen | Bydgoszcz | Kielce | Gdynia | Oppeln | Blankenstein_(Hattingen) | Kattowitz | Astuursch | Bielsko-Biała | Sosnowiec | Dąbrowa_Górnicza | Langues_d'oïl | Picardsch | Lorrain | Franc-comtois | Walloonsch | Bourguignon | Champenois | Gaiseke | Nedderfranksch | Mandarin | Spaansche_Spraak | Limburgisch-Bergisch | Lüdenscheid | Buddhismus | Hindi-Urdu | Bredeney | Menden | Neustadt | Litausche_Spraak | Sloweensche_Spraak | Patterbuorn | Marie_Curie | Aristoteles | Platon | Kaschubsch | Malaische_un_indonessche_Spraak | Kommunismus | Iserlaun | Kollenhaordt | Beälke | Peter_de_Grote | Südgeldersch | Tony_Blackplait | Kujawier | Pund | Hollandsche_Dialekt | Vincent_van_Gogh | Paul_Cézanne | Tweet_Weltorlog | Hinduismus | Usâmah_bin_Lâdin | Frédéric_Chopin | Mönster | Akbar | Kolonisation_vun_Süüdamerika | Kolonisation_vun_Noordamerika | Koreaorlog | Quedlinburg | Johannes_Paul_II. | Tamerlan | Olwestfälsch | Pund_Sterling | Russ'sche_Börgerorlog | Kalter_Krieg | Sowjetische_Besetzung_Afghanistans | Grieth | Konfuzius | Konfuzianismus | Taoismus | Ingelsche_Börgerorlog | Siddhartha_Gautama | Opdecken_vun_Amerika | Hirschberg | Langscheid | Harriet_Tubman | Regionaalspraak | Königin_Elisabeth_I. | Emma_Goldman | Rosa_Luxemburg | Enger | Noordzypern | Halberstadt | Plettenberg | Arumuunsch | Meglenorumäänsch | List_vun_dat_Weltarv | List_vun_de_Grootstäder_in_Düütschland | Blohm_&_Voss | Pisa | Johann_Sebastian_Bach | Thulla | Nanak | Spraakwetenschop | Ido
- Week | Weekdag | Middeweken | Freedag | Sünndag
- Benin | Burundi | Kamerun | Ruanda | Mauretanien | Kenia | Togo | Uganda | Lesotho | Gambia | Senegal | Seychellen | Äquatoriaal-Guinea | Komoren | Eritrea | Gabun | São_Tomé_un_Príncipe | Republiek_Kongo | Mayotte | Gangnihessou
- Aant | Duun | Deerriek | Fedder | Gans | Vagel | Söögdeer | Puter | Systematik_(Biologie) | Oort_(Biologie) | Amphibia | Cnidaria | Mesomycetozoa | Ciliophora
- Othmarschen | Bahrenfeld | Mottenburg
- Irakorlog | Situatschoon_in'n_Irak
- Bülgenläng | Farv
- Balje | Land_Kehdingen
- 20_Juni | 14_November
- Sünnavend | Saterdag
- Euglenozoa | Apicomplexa
- Astronomie | Mathematik
- Reekner | Linux_op_Platt | K_Desktop_Environment | Böverflach | Muus_(Reekner) | Elektrotechnik | Nettkieker | Kieker | Software | Opera | Bedriefssysteem | Linux | Firmware | Unix | Microsoft_Windows | Hardware | Reeknernettwark | Nettwark
- Israel | Nahoost | Bahrain | Iran | Oman | Pakistan | Vereenigte_Araabsche_Emiraten | Semitsche_Spraken | Zarathustra
- Bremen | Hamborg | Holsteen | Rechtenfleth | Langeoog | Flottbek | Altona | Iserbrook | Lurup | Ottensen | Blankenese | Sülldorf | Rissen | Bezirk | Nienstedten | Wachholtz_Verlag
- Düörpm | Dorsten | Herne
- Niederpreußisch | Mark-Brannenborger_Platt
- Experiment | Natuurwetenschop | Afk
- Hohn | Bueree
- Nischni_Nowgorod | Wladiwostok
- Tarnów | Legnica
- Billerbeck | Borghorst
- Oer-Erkenswick | Marienmünster
- 875 | 1076
- Reformatschoon | Nieg_Testament
- Fürstenau | Quakenbrück
- Merseburg | Naumburg_(Saale)
- Łódź | Wałbrzych
- John_Major | Premierminister_vun_Grootbritannien
- 12_Dezember | 1903 | 1949
- Book | Juristeree | Kultur
- Chäsekerken | Castrop-Rauxel | Riäkelkusen
- Rindveeh | Schaap | Zeeg
- Brackwater | Meer | Soltwater | Fluss_(Water) | Stroom_(Water)
- Eresborg | Karl_de_Grote
- Sövenden-Dags-Adventisten | Baptisten
- Westgermaansche_Spraken | Südfränkisch
- Melk | Snacks | Veehtüch | Landwertschaplich_Bedrief
- Natschonaalversammeln_vun_Wales | Wales | Cardiff
- Minsch | Reptilia | Muus
- Greunkohl | Plant | Kruut | Boom | Photosynthese | Chloroplast
- Augustin_Wibbelt | Eli_Marcus
- Christelijke_Gereformeerde_Kerken | Gereformeerde_Kerke
- Nieheim | Bödefeld
- Planet | Dag | Eer | Präzession
- Springtid’ | Tiden
- Seehausen | Soltwedel
- Biologie | Medizin | Physik | Tardigrada | Poggenstöhl
- Gravitatschoon | Johr
- Tietrebeet | Internatschonale_Telefoonvörwahl
- Visby | Kokenhusen
- Dinslaken | Voerde
- Ammerland | Bad_Twüschenahn | Ammerlänner_Buurnhuus
- Galina_Starowojtowa | Wladimir_Putin
- Stedinger | Karl_Rudolf_Brommy
- Wizebsk | Polasier
- Eeten | Foot | Eten
- Auerk | Freesland
- Köslin | Belgard
- Drinken | Buddel
- 1_Juli | 22_Februar | 7_Juli
- Holt | Köök
- Vreden | Emmerek
- Jadebusen | Asegabook
- Stargard | Attendorn
- Kuldiga | Cesis
- Katt | Hund
- Sokrates | Ole_Grekenland
- Heinrich_von_Brentano | 5_Januar
- Kruutsand | Stood