Abstract Wikipedia/Sample sentences

This is the page where sample sentences are being collected with which to test the various NLG systems for Abstract Wikipedia. Feel free to add more!

Basic sentences:

  1. The age of a person; e.g., "Malala Yousafzai is 25 years old".
    • Example from the template language specification.
    • Minimum data that would be fetched from Wikidata or computed from it: 1) person name and 2) age.
  2. Basic geographic information, of a city being located in a province in a country; e.g., "Cape Town is located in the Western Cape.".
    • Minimum data that would be fetched from Wikidata or computed from it: 1) city name, 2) province/country name.
  3. Succession of presidents/head of states/kings/etc (as the case applies); e.g. "In India, Ram Nath Kovind was followed by president Droupadi Murmu.".
    • Minimum data that would be fetched from Wikidata or computed from it: 1) presidents' respective names, 2) periods when they were president (to double-check that the 'followed by' is semantically correct), 3) country of which they were president/head of state/etc.
  4. Founders (or owners) of museums; e.g., "Hasso Plattner founded the Museum Barberini".
    • Minimum data that would be fetched from Wikidata or computed from it: 1) the museum, 2) the founded by property (P112), 3) the founder
  5. Animals living in regions; e.g., "Capybaras live in Brazil".
    • Minimum data that would be fetched from Wikidata or computed from it: 1) the animal, 2) the region, 3) property relating the two, such as the taxon range (P9714) or endemic to (P183).
  6. Thing that are used in an occupation; e.g., "Carpenters use axes, saws, and woodworking tools".
    • Minimum data that would be fetched from Wikidata or computed from it: 1) the occupation, 2) the uses property (P2283) or similar, 3) all the entities specified as the range of that property.

Extended basic sentences, or variations thereof:

  1. The age of a person and geographic information; e.g., "Ms Malala Yousafzai was born in Mingora (located in the Swat district in Pakistan) and is 25 years old", or, for any person: PersonX was born in placeZ (located in a geographicEntityZZ) and is xxx years old.
    • Minimum data that would be fetched from Wikidata or computed from it: 1) person's name, 2) birth place, 3) birth district/province/country, 4) age.
  2. Extending basic geographic information with societal information about cities; e.g. "San Francisco is the cultural, commercial, and financial center of Northern California. It is the fourth-most populous city in California, after Los Angeles, San Diego and San Jose."
    • Example from the technical report Architecture for a Multilingual Wikipedia by Denny Vrandečić.
    • Minimum data that would be fetched from Wikidata or computed from it: 1) the city, 2) the attributes of it (cultural, commercial, and financial center for SF is currently not in wikidata), 3) the province the city is located in, 4) computed ranking on population.
  3. Succession of presidents/head of states/kings/etc (as the case applies); e.g. "In India, Mr Ram Nath Kovind was followed by president Ms Droupadi Murmu on 25 July 2022.".
    • Minimum data that would be fetched from Wikidata or computed from it: 1) presidents' respective names, 2) their respective gender for the title, 3) periods when they were president (to double-check that the 'followed by' is semantically correct), 4) country of which they were president/head of state/etc, and 5) date they took office.
  4. Founders (or owners) of museums; e.g., "Hasso Plattner founded the Museum Barberini, which was officially opened on 20 January 2017".
    • Minimum data that would be fetched from Wikidata or computed from it: 1) the museum, 2) the founded by property (P112), 3) the founder, 4) the officially opened on property (P1619), and 5) date of opening.

Longer sentences:

  1. "Edith Eger is the youngest daughter of Lajos and Ilona Elefánt, Hungarian Jews in an area which was, at the time of her birth, in Czechoslovakia. Her father was a tailor."
    • Example from the Wikidata abstract representation discussion document.
    • Minimum data that would be fetched from Wikidata or computed from it: 1) person name, 2) parents, 3) siblings of the person, 4) gender of the person, 5) country of birth of the person, 6) occupation of the father, 7) social or ethnic or racial etc group of the person or their parents.
  2. "Cooking is done both by people in their own dwellings and by professional cooks and chefs in restaurants and other food establishments.".
    • Example from the wikidata abstract representation discussion document.
    • Minimum data that would be fetched from Wikidata or computed from it, presumably: 1) activity (the 'cooking') in this case, 2) locations of the activity, 3) actors in the activity.
  3. "Don't Starve is a survival video game developed by the Canadian indie video game developer Klei Entertainment.".
    • Example from the wikidata abstract representation discussion document.
    • Minimum data that would be fetched from Wikidata or computed from it: 1) name of the video game, 2) category of the video game, 3) developer of the game, 4) company the developer works for.
  4. "Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic."
    • Example is the first sentence of the SARS-CoV-2 Wikipedia entry (d.d. 15 Dec 2022)
    • Minimum data that would be fetched from Wikidata or computed from it: 1) name of the virus, 2) category/type of the virus, 3) disease it is the causative agent of, 4) category/type of the disease, 5) 'status' of the disease spread (such as outbreak, endemic, epidemic etc), 6) timeframe when it had that status of disease spread in the population.
  5. Information about world heritage sites; e.g., "UNESCO World Heritage site the Taj Mahal is made from marble and blends Islamic, Iranian, and Mughal architectural styles." or, for another one: "UNESCO World Heritage site the Sagrada Familia is made from stone and concrete and blends Catalan modernism and Gothic Revival architectural styles."
    • Minimum data that would be fetched from Wikidata or computed from it: 1) the site, 2) heritage designation (P1435), 3) usage of the made from material (P186) property, 4) the range of that property, 5) the range(s) of the architectural style property (P149).
  6. "The Roman Empire lost the strengths that had allowed it to exercise effective control over its Western provinces."
  7. "The Japanese destroyers Makigumo and Akigumo finally finished off Hornet with 4 24-inch (610 mm) Long Lance torpedoes."

Note: when trying to convert it in the representation of one's system or another language, a different sentence construction true to the fact is perfectly fine. For instance, "In India, Ram Nath Kovind was followed by president Droupadi Murmu." and "President Droupadi Murmu followed Ram Nath Kovind in India." are different renderings but communicate the same information, as would "Indian President Droupadi Murmu followed predecessor Ram Nath Kovind.". Likewise, one could change the emphasis in the sentence by rearranging the content, notably to put the sentence focus on actors rather than the activity, or v.v.; e.g., instead of the "Cooking is done by... ", above, it may be reworded as "People in their own dwellings, and professional cooks and chefs, do cooking.".