Talk:Abstract Wikipedia/Archive 3

Active discussions

I don't know if a potential logo has been discussed, but I propose the Wikipedia Logo with Pictograms replacing the various characters from different languages. I might mock one up if I feel like it. If you have any ideas, or want to inform me there already is a logo, please ping me below. Cheers, WikiMacaroons (talk) 18:00, 26 July 2020 (UTC)[]

The logo discussion has not started yet. We're going to have the naming discussion first, and once the name is decided, the logo will follow up. Feel free to start a subpage for the logo, maybe here if you want, to collect ideas. --DVrandecic (WMF) (talk) 01:46, 5 August 2020 (UTC)[]
Thanks, DVrandecic (WMF), perhaps I shall :) WikiMacaroons (talk) 21:57, 7 August 2020 (UTC)[]
Note that the existing logo of wikimedia Commons nearly has the underlying concept of composition/fusion to generate a result that can be borrowed and imported to other projects (not just Wikipedia, just like the wiki of functions).
We could derive the logo of the future wiki of functions, from the logo for Wikidata (barcodes) placed at top with arrows pointing to a central point of processing, and then arrows outgoing from it to the set of Wikimedia projects (placed at bottom, represented by the tri-sector ring, reduced to an half circle at bottom. If you look beside at the logo for Meta, the top red segment of ring would be the Wikidata barcode, the two blue segments are kept, the green Earth is replaced by a central green point; the arrows are drawn inside the ring, with the same colors as the segments they are collected to. I would avoid any logo that represents a mathematical concept (such as the lambda letter, unless it is part of the chosen name, or parentheses representing classic maths functions: the functions in the new wiki will not really be mathematical functions, as they are polymorphic, and don't necessarily return a result but could return errors, or a "variant type" to represent actual values or errors returned). verdy_p (talk) 09:36, 29 October 2020 (UTC)[]
Note that the formal request is still not done, but a draft page allows you to prepare suggestions. This draft.Logo subpage needs to be updated before it is actually translated: I added a first proposal there. verdy_p (talk) 09:20, 17 November 2020 (UTC)[]

Some questions and concerns

Depending on the eventual implementation details, Abstract Wikipedia may need support from the WMF Search Platform team. So a couple of us from our team had a conversation with Denny about Abstract Wikipedia, and we had some questions and concerns. Denny suggested sharing them in a more public forum, so here we are! I've added some sub-headings to give it a little more structure for this forum. In no particular order...

Grammar/renderer development vs writing articles, and who will do the work

Each language renderer, in order to be reasonably complete, could be a significant fraction of the work required to just translate or write the "core" articles in that language, and the work can only be done by a much smaller group of people (linguists or grammarians of the language, not just any speaker). With a "medium-sized" language like Guarani (~5M speakers, but only ~30 active users on the Guarani Wikipedia in the last month), it could be hard to find interested people with the technical and linguistic skills necessary. Though, as Denny pointed out, perhaps graduate students in relevant fields would be interested in developing a grammar—though I'm still worried about how large the pool of people with the requisite skills is.

I share this concern, but I think we need to think of it as a two-edged sword. Maybe people will be more interested in contributing to both grammar and content than to either one alone. I certainly hope that this distinction will become very blurred; our goal is content and our interest in grammar is primarily to support the instantiation of content in a particular natural language (which, you know, is what people actually do all the time). We need to downplay the "technical and linguistic skills" by focusing on what works. People love fixing bad grammar or poor word choices (it's a parent–child thing), so perhaps there are two separate challenges here: how to get a rendering process that produces intelligible content versus one that produces correctly phrased content. Native speakers will certainly have a key role in identifying incorrectly phrased content; they may even be tempted to fix a problem they understand. They won't necessarily be existing editors, however; they may even have been intrigued by the slightly quirky Wikipedia idiolect they've been hearing about! Ultimately, though, their community will need the maturity to allow the more difficult linguistic problems they identify to percolate upwards or outwards to more specialist contributors, and these may be non-native grammarians, programmers, contributors of any stripe.
What about the initial "intelligible" renderers? We need to explore this as we develop language-neutral content. We will see certain very common content frameworks, which should already be apparent from Wikidata properties. So we will be asking whether there is a general (default) way to express a instance of (P31), for example, how it varies according to the predicate and (somewhat more problematically) the subject. We will also observe how certain Wikidata statements are linguistically subordinate (being implied or assumed). So, <person> is (notable for) <role in activity>, rather than instance of (P31) human (Q5)... To the extent that such observations are somewhat universal, they serve as a useful foundation for each successive renderer: how does the new language follow the rules and exceptions derived for previous languages (specifically for the language-neutral content so far developed; we never need a complete grammar of any language except, perhaps, the language-neutral synthetic language that inevitably emerges as we proceed).
Who will do the work? Anyone and everyone! Involvement from native speakers would be a pre-requisite for developing any new renderer, but the native speakers will be supported by an enthusiastic band of experienced linguistic problem-solvers, who (will by then) have already contributed to the limited success of several renderers for an increasing quantity of high-quality language-neutral content. --GrounderUK (talk) 12:59, 25 August 2020 (UTC)[]
@GrounderUK: "People love fixing bad grammar or poor word choices (it's a parent–child thing), so perhaps there are two separate challenges here: how to get a renderer that produces intelligible content versus one that produces correctly phrased content. Native speakers will certainly have a key role in identifying incorrectly phrased content; they may even be tempted to fix a problem they understand. They won't necessarily be existing editors, however; they may even have been intrigued by the slightly quirky Wikipedia idiolect they've been hearing about!" This is a great notion, and hopefully it will be one the project eventually benefits from. --Chris.Cooley (talk) 00:04, 29 August 2020 (UTC)[]
@Chris.Cooley: Thanks, Chris, one can but hope! Denny's crowdsourcing ideas (below) are a promising start. Just for the record, your quoting me prompted me to tweak the original; I've changed "a renderer" to "a rendering process" (in my original but not in your quoting of it).
@TJones (WMF): One workflow we have been brainstorming was to:
  1. crowdsource the necessary lexical knowledge, which can be done without particular expertise beyond language knowledge
  2. crowdsource how sentences for specific simple constructors would look like (from, e.g. bilingual speakers, who are shown simple sentences in another language, e.g. Spanish, and then asked to write it down in Guarani)
  3. then even non-Guarani speakers could try to actually build renderers, using the language input as test sentences
  4. now verify the renderers again with Guarani speakers for more examples, and gather feedback
It would be crucial that the end result would allow Guarani speakers to easily mark up issues (as GrounderUK points out), so that these can be addressed. But this is one possible workflow that would omit the necessity to have deep linguistic and coding language available in each language community, and can spread the workload in a way that could help with filling that gap.
This workflow would be build on top of Abstract Wikipedia and not require much changes to the core work. Does this sound reasonable or entirely crazy?
One even crazier idea, but that's well beyond what I hope for, is that we will find out that the Renderers across languages are in many areas rather uniform, and that there are a small number and that we can actually share a lot of the renderer code across languages and that languages are basically defined through a small number of parameters. There are some linguists who believe such things possible. But I don't dare bet on it. --DVrandecic (WMF) (talk) 21:50, 28 August 2020 (UTC)[]

The "Grammatical Framework" grammatical framework

Denny mentioned Grammatical Framework and I took a look at it. I think it is complex enough to represent most grammatical phenomena, but I don’t see very much that is actually developed. The examples and downloadable samples I see are all small toy systems. It isn’t 100% clear to me that any grammatical framework can actually capture all grammatical phenomena—and with certain kinds of edge cases and variation in dialects, it may be a lost cause—and linguists still argue over the right way to represent phenomena in major languages in major grammatical frameworks. Anyway, it looks like the Grammatical Framework still leaves a lot of grammar development to be done; assuming it’s possible (which I’m willing to assume), it doesn’t seem easy or quick, especially as we get away from major world languages.

Yes. I keep meaning to take another look but, basically, GF itself is not something I would expect many people to enjoy using for very long (I may become an exception) ...and "very long" certainly seems to be required. I'm usually pretty open-minded when it comes to possible solutions but I've not added GF to my list of things that might work. That's not to say that there can be no place for it in our solution landscape; it's just that we do need to keep focusing on the user experience, especially for the user who knows the under-served language but has no programming experience and no formal training in grammar. I can hardly remember what that feels like (ignoring the fact that my first language is hardly under-served)! Apart from committing to engaging with such stakeholders, it's not clear what more we can usefully do, at this stage, when it comes to evaluating alternative solutions. That said, I am 99.9% sure that we can find a somewhat satisfactory solution for a lot of encyclopedic content in a large number of languages; the principal constraint will always be the availability of willing and linguistically competent contributor communities.
One thing GF has going for it is that it is intentionally multilingual. As I understand it, our current plan is to begin natural-language generation with English renderers. I'm hoping we'll change our minds about that. Sooner rather than later, in any event, I would hope that renderer development would be trilingual (as a minimum). Some proportion of renderer functions may be monolingual, but I would like to see those limited to the more idiosyncratic aspects of the language. Or, perhaps, if there are good enough reasons to develop a function with only a single language in mind, we should also consider developing "equivalent provision" in our other current languages. What that means in practice is that the test inputs into our monolingual functions must also produce satisfactory results for the other languages, whether by using pre-existing functions or functions developed concurrently (more or less). --GrounderUK (talk) 13:02, 26 August 2020 (UTC)[]
I agree, we shouldn't start development based on a single language, and particularly not English. Starting in three languages, ideally across at least two main families, sounds like a good idea. --DVrandecic (WMF) (talk) 21:55, 28 August 2020 (UTC)[]
@DVrandecic (WMF): That's very good to hear! Is that an actual change of our collective minds, or just your own second thoughts? P2.7 doesn't seem to have a flexible interpretation. In my reading of P2.8 to P2.10, there also appears to be a dependency on P2.7 being fairly advanced, but maybe that's just my over-interpretation.--GrounderUK (talk) 07:52, 1 September 2020 (UTC)[]
It is true that a lot of development would be required for GF, but even more development would be required for any other framework. GF being open source, any development would flow back into the general public, too, so working with GF and its developers is likely the correct solution.
The GF community is interested in helping out. See https://groups.google.com/g/gf-dev/c/A6lNwZ813b0/m/eEyLYmfmCQAJ , where Aarne Ranta suggests a case study for some area with adequate data, like mathematics. —The preceding unsigned comment was added by Inariksit (talk) 18:44, 28 August 2020
I am a big fan of reusing as much of GF as possible, but my reference to GF was not to mean that we necessarily have to use it as is, but rather that it shows that such a project is at all possible. We should feel free to either follow GF, or to divert from it, or to use it as an inspiration - but what it does show is that the task we are aiming for has been achieved by others in some form.
Having said that, I am very happy to see Aarne react positively by the idea! That might be the beginning of a beautiful collaboration. --DVrandecic (WMF) (talk) 21:54, 28 August 2020 (UTC)[]

Incompatible grammatical frameworks across language documentation

I worry that certain grammatical traditions associated with given languages may make it difficult to work compatibly across languages. A lot of Western European languages have traditionally followed the grammatical model of Latin, even when it doesn’t make sense—though there are of course many grammatical frameworks for the major languages. But it’s easy to imagine that the best grammar for a given medium-sized language was written by an old-fashioned European grammarian, based on a popular grammatical model from the 1970s. Reconciling that with whatever framework that has been developed up until that point may create a mess.

Speaking as an old-fashioned European grammarian... "a mess" is inevitable! It should be as messy as necessary but no messier (well, not much messier). I'm not sure that there's much "reconciling" involved, however. Given our framing of the problem, I don't see how "our mess" can be anything other than "interlingual" (as Grammatical Framework is). This is why I would prefer not to start with English; the first few languages will (inevitably?) colour and constrain our interlingua. So we need to be very careful, here. To set against that is our existing language-neutral content in Wikidata. Others must judge whether Wikidata is already "too European", but we must take care that we do not begin by constructing a "more English" representation of Wikidata content, or coerce it into a "more interlingual" representation, where the interlingua is linguistically more Eurocentric than global. So, first we must act to counter first-mover advantage and pre-existing bias, which means making things harder for ourselves, initially. At the same time, all language communities can be involved in refining our evolving language-neutral content, which will be multi-lingually labelized (like Wikidata). If some labelized content seems alien in some language, this can be flagged at an early stage (beginning now, for Wikidata). What this means is that all supported languages can already be reconciled, to an extent, with our foundational interlingua (Wikidata), and any extensions we come up with can also be viewed through our multi-lingual labelization. I suppose this is a primitive version of the "intelligible" content I mentioned earlier. When it comes to adding a further language (one that we currently support with labelization in Wikidata), we may hope to find that we are already somewhat reconciled, because linguistic choices have already been made in the labelization and our new target consumers can say what they dislike and what they might prefer; they do not need to consult some dusty grammar tome. In practice, they will already have given their opinions because that is how we will know that they are ready to get started (that is, theirs is a "willing and linguistically competent" community). In the end, though, we have to accept that the previous interlingual consensus will not always work and cannot always be fixed. This is when we fall back on the "interlingual fork" (sounds painful!). That just means adding an alternative language-neutral representation of the same (type of) encyclopedic content. I say "just" even though it might require rather more effort than I imagine (trust me, it will!) because it does seem temptingly straightforward. I say we must resist the temptation, but not stubbornly; a fallback is a tactical withdrawal, not a defeat; it is messy, but not too messy.--GrounderUK (talk) 12:54, 27 August 2020 (UTC)[]
Agree with GrounderUK here - I guess that we will have some successes and some dead ends during the implementation of certain languages, and that this is unavoidable. And that might partially stem from the state of the art in linguistic research in some languages.
On the other side, my understanding is that we won't be aiming for 7000 languages but for 400, and that these 400 languages will in general be better described and have more current research work about them than the other 6600. So I have a certain hope that for most languages that we are interested in we do have more current and modern approaches, that are more respectful of the language itself.
And finally, the grammarians lens of the 1970s Europeans will probably not matter that much in the end anyway - if we have access to a large enough number of contributors native in the given language. That should be a much more important voice in shaping the renderers of a language than dated grammar books. --DVrandecic (WMF) (talk) 22:01, 28 August 2020 (UTC)[]

Introducing new features into the framework needed by a newly-added language

In one of Denny's papers (PDF) on Abstract Wikipedia, he discusses (Section “5 Particular challenges”, p.8) how different languages make different distinctions that complicate translation—in particular needing to know whether an uncle is maternal or paternal (in Uzbek), and whether or not a river flows to the ocean (in French). I am concerned about some of the implications of this kind of thing.

Recoding previously known information with new features

One problem I see is that when you add a new language with an unaccounted-for feature, you may have to go back and recode large swathes of the information in the Abstract Wikipedia, both at the fact level, and possibly at some level of the common renderer infrastructure.

Suppose we didn’t know about the Uzbek uncle situation and we add Uzbek. Suddenly, we have to code maternal/paternal lineage on every instance of uncle everywhere. Finding volunteers to do the new uncle encoding seems like it could be difficult. In some cases the info will be unknown and either require research, or it is simply unknowable. In other cases, if it hasn’t been encoded yet, you could get syntactically correct but semantically ill-formed constructions.

@TJones (WMF): Yes, that is correct, but note that going back and recoding won't actually have effect on the existing text.
So assume we have an abstract corpus that uses the "uncle" construct, and that renders fine in all languages supported at that time. Now we add Uzbek, and we need to refine the "uncle" construct into either "maternal-uncle" or "paternal-uncle" in order to render appropriately in Uzbek - but both these constructs would be (basically) implemented as "use the previous uncle construct unless Uzbek". So all existing text in all supported languages would continue to be fine.
Now when we render Uzbek, though, then the corpus need to be retrofitted. But that merely blocks the parts of Uzbek renderings that are dealing with the construct "uncle". It has no impact on other languages. And since Uzbek didn't have any text so far (that's why we are discovering this issue now), it also won't reduce the amount of Uzbek generated text.
So, yes, you are completely right, but we still have a monotonously growing generated text corpus. And as we backfill the corpus, more and more of the text now becomes also available in Uzbek - but there would be no losses on the way. --DVrandecic (WMF) (talk) 22:09, 28 August 2020 (UTC)[]

Semantically ill-formed defaults

For example, suppose John is famous enough to be in Abstract Wikipedia. It is known that John’s mother is Mary, and that Mary’s brother is Peter, hence Peter is John’s uncle. However, the connection from John to Peter isn’t specifically encoded yet, and we assume patrilineal links by default. We could then generate a sentence like “John’s paternal uncle Peter (Mary’s brother) gave John $100.” Alternatively, if you try to compute some of these values, you are building an inference engine and a) you don’t want to do that, b) you really don’t want to accidentally do it by building ad hoc rules or bots or whatever, c) it’s a huge undertaking, and d) did I mention that you don’t want to do that?

Agreed. --DVrandecic (WMF) (talk) 22:10, 28 August 2020 (UTC)[]

Incompatible cultural "facts"

In the river example, I can imagine further complications because of cultural facts, which may require language-specific facts to be associated with entities. Romanian makes the same distinction as French rivière/fleuve with râu/fluviu. Suppose, for example, that River A has two tributaries, River B and River C. For historical reasons, the French think of all three as separate rivers, while the Romanians consider A and B to be the same river, with C as tributary. It’s more than just overlapping labels—which is complex enough. In French, B is a rivière because it flows into A, which is a fleuve. In Romanian, A and B are the same thing, and hence a fluviu. So, a town on the banks of River B is situated on a rivière in French, and a fluviu in Romanian, even though the languages make the same distinctions, because they carve up the entity space differently.

Yes, that is an interesting case. Similar situations happen with the usage of articles in names of countries and cities and whether you refer to a region as a country, a state, a nation, etc., which may differ from language to language. For these fun examples I could imagine that we have information in Wikidata connecting the Q ID for A and B to the respective L items. But that is a bit more complex indeed.
Fortunately, these cultural differences happen particularly on topics that are important for a given language community. Which increases the chance that the given language community has already an article about this topic in its own language, and thus will not depend on Abstract Wikipedia to provide baseline content. That's the nice thing: we are not planning to replace the existing content, but only to fill in the currently existing gaps. And these gaps will, more likely than not, cover topics that will not have these strong linguistic-cultural interplays. --DVrandecic (WMF) (talk) 22:15, 28 August 2020 (UTC)[]

Having to code information you don't really understand

In both of the uncle and river cases, an English speaker trying to encode information is either going to be creating holes in the information store, or they are going to be asked to specify information they may not understand or have ready access to.

A far-out case that we probably won’t actually have to deal with is still illustrative. The Australian Aboriginal language Guugu Yimithirr doesn’t have relative directions like left and right. Instead, everything is in absolute geographic terms; i.e., “there is an ant on your northwest knee”, or “she is standing to the east-northeast of you.” In addition to requiring all sorts of additional coding of information (“In this image, Fred is standing to the left/south-southwest of Jane”) depending on the implementation details of the encoding and the renderers, it may require changing how information is passed along in various low-level rendering functions. Obviously, it makes sense to make data/annotation pipelines as flexible as possible. (And again, the necessary information may not be known because it isn’t culturally relevant—e.g., in an old photo from the 1928 Olympics, does anyone know which way North is?)

@TJones (WMF): Yes, indeed, we have to be able to deal with holes. My assumption is that a contributor creates a first draft of an abstract article, their main concern will be the languages they speak. The UI may nudge them to fill in further holes, but it shouldn't require it. And the contributor can save the content now and reap benefit for all languages where the abstract content has all necessary information.
Now if there is a language that requires some information that is not available in the abstract content, where there is such a hole, then the rendering of this sentence for this language will fail.
We should have workflows that find all holes for a given language and then contributors can go through those and try to fill them, thereby increasing the amount of content that gets rendered in their languages - something that might be amenable to micro-contributions (or not, depending on the case). But all of this will be a gradual, iterative process. --DVrandecic (WMF) (talk) 22:21, 28 August 2020 (UTC)[]

Impact on grammar and renderers

Other examples that are more likely but less dramatic are clusivity, evidentiality, and ergativity which are more common. If these features also require agreement in verbs, adjectives, pronouns, etc., the relevant features will have to be passed along to all the right places in the sentence being generated. Some language features seem pretty crazy if you don't know those languages—like Salishan nounlessness and Guarani noun tenses—and may make it necessary to radically rethink the organization of information in statements and how information is transmitted through renderers.

Yes, that is something where I look to Grammatical Framework for inspiration, as it solved these cases pretty neatly by having an abstract and several concrete grammars and the passing through from one to the other and still allowing the different flows of agreement. --DVrandecic (WMF) (talk) 22:22, 28 August 2020 (UTC)[]

Grammatical concepts that are similar, but not the same

I’m also concerned about grammatical concepts that are similar across languages but differ in the details. English and French grammatical concepts often map to each other more-or-less, but the details where they disagree cause consistent errors in non-native speakers. For example, French says “Je travaille ici depuis trois ans” in the present tense, while English uses (arguably illogically, but that’s language!) the perfect progressive “I have been working here for three years”. Learners in both directions tend to do direct translations because those particular tenses and aspects usually line up.

Depending on implementation, I can see it being either very complex to represent this kind of thing or the representation needing to be language-specific (which could be a disaster—or at least a big mess). Neither a monolingual English speaker nor a monolingual French speaker will be able to tag the tense and aspect of this information in a way that allows it to be rendered correctly in both languages. Similarly, the subjunctive in Spanish and French do not cover the same use cases, and as an English speaker who has studied both I still have approximately zero chance of reliably encoding either correctly, much less both at the same time—though it’s unclear, unlike uncle, where such information should be encoded. If it’s all in the renderers, maybe it’ll be okay—but it seems that some information will have to be encoded at the statement level.

The hope would be that the exact tense and mood would indeed not be encoded in the abstract representation at all, but added to it by the individual concrete renderers. So the abstract representation of the example might be something like works_at(1stPSg, here, years(3)), and it would be up to the renderers to either render that using the present or the perfect progressive. The abstract representation would need to abstract from the language-specificities. These high-level abstract representations would probably break down first into lower level concrete representations, such as French clause(1stPSg, work, here, depuis(years(3)), present tense, positive) or English clause(1stPSg, work, here, for(years(3)), perfect progressive, positive), so that we have several layers from the abstract representation slowly working down its way to a string with text in a given language. --DVrandecic (WMF) (talk) 22:30, 28 August 2020 (UTC)[]

“Untranslatable” concepts and fluent rendering

Another random thought I had concerns “untranslatable” concepts, one of the most famous being Portuguese saudade. I don’t think anything is actually untranslatable, but saudade carries a lot more inherent cultural associations than, say, a fairly concrete concept like Swedish mångata. Which sense/translation of saudade is best to use in English—nostalgia, melancholy, longing, etc.—is not something a random Portuguese speaker is going to be able to encode when they try to represent saudade. On the other hand, if saudade is not in the Abstract Wikipedia lexicon, Portuguese speakers may wonder why; it’s not super common, but it’s not super rare, either—they use it on Portuguese Wikipedia in the article about Facebook, for example.

Another cultural consideration is when to translate/paraphrase a concept and when to link it (or both)—saudade, rickroll, lolcats. Dealing with that seems far off, but still complicated, since the answer is language-specific, and may also depend on whether a link target exists in the language (either in Abstract Wikipedia or in the language’s “regular” Wikipedia).

Yes! And that's exactly how we deal with it! So the sausade construct might be represented as a single word in Portuguese, but as a paraphrase in other languages. There is no requirement that each language has to have each clause built of components of the same kind. The same thing would allow us to handle some sentence qualifiers as adjectives or adverbs if a language can do that (say if there is a adjective stating that something happened yesterday evening), and use a temporal phrase otherwise. The abstract content can break apart in very different concrete renderers. --DVrandecic (WMF) (talk) 22:35, 28 August 2020 (UTC)[]

Discourse-level encoding?

I’m also unclear how things are going to be represented at a discourse level. Biographies seem to be more schematic than random concrete nouns or historical events, but even then there are very different types of details to represent about different biographical subjects. Or is every page going to basically be a schema of information that gets instantiated into a language? Will there be templates like BiographyIntro(John Smith, 1/1/1900, Paris, 12/31/1980, Berlin, ...) ? I don’t know whether that is a brilliant idea or a disaster in the making.

It's probably neither. So, particularly for bio-intros? That's what, IIRC, Italian Wikipedia is already doing, in order to increase the uniformity of their biographies. But let's look at other examples: I expect that for tail entities that will be the default representation, a more or less large template that takes some data from Wikidata and generates a short text. That is similar to the way LSJbot is working for a large number of articles - but we're removing the bus factor from LSJbot and we are putting it into a collaboratively controllable space.
Now, I would be rather disappointed if things stopped there: so, when we go beyond tail entities, I hope that we will have much more individually crafted abstract contents, where the simple template is replaced by diverse, more specific constructors, and only in case these are not available for rendering in a given language, we fall back to the simple template. And these would indeed need to also represent discourse-level conjunctions such as "however", "in the meantime", "in contrast to", etc. --DVrandecic (WMF) (talk) 22:40, 28 August 2020 (UTC)[]

Volume, Ontologies, and Learning Curves, Oh My!

Finally, I worry about the volume of information to encode (though Wikipedia has conquered that) and the difficulty of getting the ontology right (which Wikidata has done well, though on a less ambitious scale than Wikilambda requires, I think), and the learning curve for the more complex grammatical representations from editors.

I (and others) like to say that Wikipedia shouldn't work in theory, but in practice it does—so maybe these worries are overblown, or maybe worrying about them is how they get taken care of.

Trey Jones (WMF) (talk) 21:59, 24 August 2020 (UTC)[]

I don't think the worries you stated are overblown - these are all well-grounded worries, and for some of them I had specific answers, and for some I am fishing for hope that we will be able to overcome it when we get to it. One advantage is that I don't have to have all answers yet, but that I can rely on the ingenuity of the community to get some of these blockers out of the way, as long as we create an environment and a project that attracts enough good people.
And this also means that this learning curve you are describing here is one of my main worries. The goal is to design a system that allows contributors who want to do so to dive deep into the necessary linguistic theories and in the necessary computer science background to really dig through the depths of Wikilambda and Abstract Wikipedia - and I expect to rely on these contributors sooner than later. But at the same time it must remain possible to effectively contribute if you don't do that, or else the project will fail. We must provide contribution channels where confirming or correcting lexicographic forms works without the contributor having to fully understand all the other parts, where abstract content can be created and maintained without having to have a degree in computational linguistics. I still aim for a system where, in case of an event (say, a celebrity marries, an author publishes a new book, or a new mayor gets elected) an entirely fresh contributor can figure out in less than five minutes how to actually make the change on Abstract Wikipedia. This will be a major challenge, and not even because I think the individual user experiences for those things will be incredibly hard, but rather because it is unclear how to steer the contributor to the appropriate UX.
But yes, the potential learning curve is a major obstacle, and we need to address that effectively, or we will fall short of the potential that this project has. --DVrandecic (WMF) (talk) 22:50, 28 August 2020 (UTC)[]

Comments

@All: @TJones (WMF):

These are all serious issues that need to be addressed, which is why I proposed a more rigorous development part for "producing a broad spectrum of ideas and alternatives from which a program for natural language generation from abstract descriptions can be selected" at https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Tasks#Development_Part_PP2_Suggestion.

With respect to your first three sections, I assumed Denny only referred to Grammatical Framework as a system with similar goals to Abstract Wikipedia. I also assumed that the idea was that only those grammatical phenomena needed to somewhat sensibly express the abstract content in a target language need to be captured, and that hopefully generating those phenomena will require much less depth and sophistication than constructing a grammar.

With respect to the section "Recoding previously known information with new features," the system needs to be typologically responsible from the outset. I think we should be aware of the features of all of the languages we are going to want to eventually support.

In your example, I think coding "maternal/paternal lineage on every instance of uncle everywhere" would be great, but not required. A natural language generation system will not (soon) be perfect.

With respect to the section "Semantically ill-formed defaults": in other words, what should we do when we want to make use of contextual neutralization (e.g., using paternal uncle in a neutral context where the paternal/maternal status of the uncle is unknown) but we cannot ensure the neutral context?[1] I would argue that there are certain forms that we should prefer because they appear less odd when the neutral context is supposedly violated than others (e.g., they in The man was outside, but they were not wearing a coat). There is also an alternative in circumlocution: for example, we could reduce specificity and use relative instead of parental or maternal uncle and hope that any awkward assumptions in the text of a specifically uncle-nephew/niece relationship are limited.

Unfortunately, it does seem that to get a really great system, you need a useful inference engine ...

With respect to the section "Incompatible cultural 'facts'," this is the central issue to me, but I would rely less on the notion of "cultural facts." We are going to make a set of entity distinctions in the chosen set of Abstract Wikipedia articles, but the generated natural language for these articles needs to respect how the supported languages construe conceptual space (speaking in cognitive linguistics terms for a moment).[2] I am wondering if there is a typologically responsible way of ordering these construals (perhaps as a "a lattice-like structure of hierarchically organized, typologically motivated categories"[3]) that could be helpful to this project.

With respect to the section "Having to code information you don't really understand," as above, I think there are sensible ways in which we can handle "holes in the information store." For example, in describing an old photo from the 1928 Olympics, can we set North to an arbitrary direction, and what are the consequences if there is context that implies a different North?

With respect to the section "Discourse-level encoding?," I do hope we can simulate the "art" of writing a Wikipedia article to at least some degree.

  1. Haspelmath, Martin. "Against markedness (and what to replace it with)." Journal of linguistics (2006): 39.; Croft, William. Typology and universals, 100-101. Cambridge University Press, 2002.
  2. Croft, William, and Esther J. Wood. "Construal operations in linguistics and artificial intelligence." Meaning and cognition: A multidisciplinary approach 2 (2000): 51.
  3. Van Gysel, Jens EL, Meagan Vigus, Pavlina Kalm, Sook-kyung Lee, Michael Regan, and William Croft. "Cross-lingual semantic annotation: Reconciling the language-specific and the universal." ACL 2019 (2019): 1.

--Chris.Cooley (talk) 11:45, 25 August 2020 (UTC)[]

Thanks Chris, and thanks for starting that page! I think this will indeed be an invaluable resource for the community as we get to these questions. And yes, as you said, I referred to GF as a similar project and as an inspiration and as a demonstration of what is possible, not necessarily as the actual implementation to use. Also, you will find that my answers above don't always fully align with what you said, but I think that they are in general compatible. --DVrandecic (WMF) (talk) 22:56, 28 August 2020 (UTC)[]
@DVrandecic (WMF): Thanks, and I apologize for sounding like a broken record with respect to a part PP2! With respect to the above, it does sound like we disagree on some things, but I am sure there will be time to get into them later! --Chris.Cooley (talk) 00:29, 29 August 2020 (UTC)[]
Some of these issues are I think avoided by the restricted language domain we are aiming for - encyclopedic content. There should be relatively limited need for constructions in the present or future tenses. No need to worry about the current physical location or orientation of people, etc. If something is unknown then a fall-back construction ("brother of a parent" rather than a specific sort of "uncle") should be fine, if possibly inelegant. We don't need to capture every possible aspect of language, just sufficient to convey the meanings intended. ArthurPSmith (talk) 18:56, 25 August 2020 (UTC)[]
Thanks Arthur! Yes, I very much appreciate the pragmatic approach here - and I think a combination of the pragmatic getting things done with the aspirational wanting everything to be perfect will lead to the best tension to get this project catapulted forward! --DVrandecic (WMF) (talk) 22:56, 28 August 2020 (UTC)[]
@ArthurPSmith: "the restricted language domain we are aiming for - encyclopedic content" I wish this was so, but I am having trouble thinking of an encyclopedic description of a historical event — for example — as significantly restricted. I could easily see the physical location or orientation of people becoming relevant in such a description. "If something is unknown then a fall-back construction ('brother of a parent' rather than a specific sort of 'uncle') should be fine, if possibly inelegant. We don't need to capture every possible aspect of language, just sufficient to convey the meanings intended." I totally agree, and I think this could be another great fallback alternative. --Chris.Cooley (talk) 00:29, 29 August 2020 (UTC)[]

Translatable modules

Hi,

This is not exactly about Abstract Wikipedia, but it's quite closely related, so it may interest the people who care about Abstract Wikipedia.

The Wikimedia Foundation's Language team started the Translatable modules initiative. Its goal is to find a good way to localize Lua modules as conveniently as it is done for MediaWiki extensions.

This project is related to task #2 in Abstract Wikipedia/Tasks, "A cross-wiki repository to share templates and modules between the WMF projects". The relationship to Abstract Wikipedia is described in much more detail on the page mw:Translatable modules/Engineering considerations.

Everybody's feedback about this project is welcome. In particular, if you are a developer of templates or Lua (Scribunto) modules, your user experience will definitely be affected sooner or later, so make your voice heard! The current consultation stage will go on until the end of September 2020.

Thank you! --Amir E. Aharoni (WMF) (talk) 12:45, 6 September 2020 (UTC)[]

@Aaharoni-WMF: Thanks, that is interesting. I do wonder whether there is more overlap in Phase 1 than your link suggests. Although it is probably correct to say that the internalization of ZObjects will be centralized initially, there was some uncertainty about which multi-lingual labelization and documentation solution should be pursued. Phab:T258953 is now resolved but I don't understand it well enough to see whether it aligns with any of your solution options. As for phase 2, I simply encourage anyone with an interest in language-neutral encyclopedic content to take a look at the link you provided and the associated discussion. Thanks again.--GrounderUK (talk) 19:43, 7 September 2020 (UTC)[]

Response from the Grammatical Framework community

The Grammatical Framework (GF) community has been following the development of Abstract Wikipedia with great interest. This summary is based on a thread at GF mailing list and my (inariksit) personal additions.

Resources

GF has a Resource Grammar Library (RGL) for 40 or so languages, and 14 of them have large-scale lexical resources and extensions for wide-coverage parsing. The company Digital Grammars (my employer) has been using GF in commercial applications since 2014.

To quote GF's inventor Aarne Ranta on the previously linked thread:

My suggestion would have have a few items:

  • that we develop a high-level API for the purpose, as done in many other NLG projects
  • that we make a case study on an area or some areas where there is adequate data. For instance from OpenMath
  • that we propagate this as a community challenge
  • Digital Grammars can sponsor this with some tools, since we have gained experience from some larger-scale NLG projects

Morphological resources from Wiktionary inflection tables

With work of Forsberg and Hulden and Kankainen, it's possible to extract GF resources from Wiktionary inflection tables.

Quoting Kristian Kankainen's message:

Since the Wiktionary is a popular place for inflection tables, these could be used for boot-strapping GF resources for those languages. Moreover, but not related to GF nor Abstract Wikipedia, the master's thesis generates also FST code and integrates the language into the Giella platform which provides an automatically derived simple spell-checker for the language contained in the inflection tables. Coupling or "boot-strapping" the GF development using available data on Wiktionary could be seen as a nice touch and would maybe be seen as a positive inter-coupling of different Wikimedia projects.

Division of labour

Personally, I would love to spend the next couple of years reading grammar books and encoding basic morphological and syntactic structures of languages like Guarani or Greenlandic into the GF RGL. With those in place, a much wider audience can write application grammars, using the RGL via a high-level API.

Of course, for this to be a viable solution, more people than just me need to join in. I believe that if the GF people know that their grammars will be used, the motivation to write them is much higher. To kickstart the resource and API creation, we could make Abstract Wikipedia as a special theme of the next GF summer school, whenever that is organised (live or virtually).

Addressing some concerns from this talk page

Some of the concerns on the talk page are definitely valid.

  • It is going to take a lot of time. Developing the GF Resource Grammar Library has taken 20 calendar years and (at least) 20 person years. I think everyone who has a say in the choice of renderer implementation should get familiar with the field---check out other grammar formalisms, like HPSG, you'll see similar coverage and timelines to GF[1].
  • The "Uzbek uncle" situation happens often with GF grammars when adding a new language or new concepts. Since this happens often, we are prepared for it. There are constructions in the GF language and module system that make dealing with this manageable.
  • "Incompatible cultural facts" is a minefield of its own, far beyond the scope of NLG. I personally think we should start with a case study for a limited domain.

On the other hand, worrying about things like ergativity or when to use subjunctive tells me that the commenters haven't understood just how abstract an abstract syntax can be. To illustrate this, let me quote the GF Best Practices document on page 9:

Linguistic knowledge. Even the most trivial natural language grammars involve expert linguistic knowledge. In the current example, we have, for instance, word inflection and gender agreement shown in French: le bar est ouvert (“the bar is open”, masculine) vs. la gare est ouverte (“the station is open”, feminine). As Step 3 in Figure 3 shows, the change of the noun (bar to gare) causes an automatic change of the definite article (le to la) and the adjective (ouvert to ouverte). Yet there is no place in the grammar code (Figure 2) that says anything about gender or agreement, and no occurrence of the words la, le, ouverte! The reason is that such linguistic details are inherited from a library, the GF Resource Grammar Library (RGL). The RGL guarantees that application programmers can write their grammars on a high level of abstraction, and with a confidence of getting the linguistic details automatically right.

Language differences. The RGL takes care of the rendering of linguistic structures in different languages. [--] The renderings are different in different languages, so that e.g. the French definition of the constant the_Det produces a word whose form depends on the noun, whereas Finnish produces no article word at all. These variations, which are determined by the grammar of each language, are automatically created by the RGL. However, the example also shows another kind of variation: English and French use adjectives to express “open” and “closed”, whereas Finnish uses adverbs. This variation is chosen by the grammarian, by picking different RGL types and categories for the same abstract syntax concepts.

Obviously the GF RGL is far from covering all possible things people might want to say in a wikipedia article. But an incomplete tool that covers the most common use cases, or covers a single domain well, is still very useful.

Non-European and underrepresented languages

Regarding discussions such as https://lists.wikimedia.org/pipermail/wikimedia-l/2020-August/095399.html, I'm happy to see that you are interested in underrepresented languages. The GF community has members in South Africa, Uganda and Kenya, doing or having previously done work on Bantu languages. At the moment (September 2020), there is ongoing development in Zulu, Xhosa and Northern Sotho.

This grammar work has been used in a healthcare application, and you can find a link to a paper describing the application in this message.

If any of these sounds interesting to you, we can start a direct dialogue with the people involved.

Concluding words

Whatever system Abstract Wikipedia will choose, that will follow the evolutionary path of GF (and no doubt other similar systems), so it's better to learn from that regardless of whether GF is chosen or not. We are willing to help, whether it's actual GF programming or sharing experiences---what we have tried, what has worked, what hasn't.

On behalf of the GF community,

inariksit (talk) 08:19, 7 September 2020 (UTC)[]

Responses

I second what inariksit says about the interest from the GF community - if GF were to be used by AW, it would give a great extra motivation for writing resource grammars, and it would also benefit the GF community by giving the opportunity to test and find remaining bugs in the grammars for smaller languages.

Skarpsill (talk) 08:34, 7 September 2020 (UTC)[]

Thanks for this summary! Nemo 09:10, 13 September 2020 (UTC)[]

@Inariksit:, @Skarpsill: - thank you for your message, and thank you for reaching out. I am very happy to see the interest and the willingness to cooperate from the Grammatical Framework community. In developing the project, I have read the GF book at least three times (I wish I was exaggerating), and have taken inspiration in how GF has solved a problem many times when I got stuck. In fact, the whole idea that Abstract Wikipedia can be built on top of a functional library being collected in the wiki of functions can be traced back to GF being built as a functional language itself.

I would love for us to find ways to cooperate. I think it would be a missed opportunity not to use learn or even directly use the RGLs.

I keep this answer short, and just want to check a few concrete points:

  • when Aarne mentioned to "develop a high-level API for the purpose, as done in many other NLG projects", what kind of API is he thinking of? An API to GF, or an abstract grammar for encyclopaedic knowledge?
  • whereas math would be a great early domain, given that you already have experience in the medical domain, and several people have raised the importance of medical knowledge for closing gaps in Wikipedia, could that be an interesting early focus domain?
  • regarding our timeline, we plan to work on the wiki of functions in 2021, and start working on the NLG functionalities in late 2021 and throughout 2022. Given that timeline, what engagement would make sense from your side?

I plan to come back to this and answer a few more points, but I have been sitting on this for too long already. Thank you for reaching out, I am very excited! --DVrandecic (WMF) (talk) 00:44, 26 September 2020 (UTC)[]

@DVrandecic (WMF): Thanks for your reply! I'm not quite sure what Aarne means with the high-level API: an application grammar in GF, or some kind of API outside GF that builds GF trees from some structured data. If it's the former, it would just look like any other GF application grammar, if the latter, it could look something like this toy example in my blog post (ignore that the black box says "nobody understands this code"): the non-GF users would interact with the GF trees on some higher level, like in that picture, first choosing the dish pizza and then choosing pizza toppings, generates the GF tree for "Your pizza has (the chosen toppings)".
The timeline starting in late 2021 is ideal for me personally. I think that medical domain is a good domain as well, but I don't know enough to discuss details. I hope that our South African community gets interested (initial interest is there, as confirmed by the thread on GF mailing list).
I would love to have an AW-themed GF summer school in 2022. It would be a great occasion to introduce GF people to AW, and AW people to GF, in a 2-week intensive course. If you (as in the whole AW team) think this is a good idea, then it would be appropriate to start organising it already in early 2021. If we want to target African languages, we could e.g. try to organise it in South Africa. I know this may seem premature to bring it up now, but if we want to do something like this in a larger scale, it's good to start early.
Btw, I joined the IRC channel #wikipedia-abstract, so we can talk more there if you want. --inariksit (talk) 17:42, 14 October 2020 (UTC)[]

Might deep learning-based NLP be more practical?

First of all, I'd like to state that Abstract Wikipedia is a very good idea. I applaud Denny and everyone else who has worked on it.

This is kind of a vague open-ended question, but I didn't see it discussed so I'll write it anyway. The current proposal for Wikilambda is heavily based on a generative-grammar type view of linguistics; you formulate a thought in some formal tree-based syntax, and then explicitly programmed transformational rules are applied to convert the output to a given natural language. I was wondering whether it would make any sense to make use of connectionist models instead of / in addition to explicitly programmed grammatical rules. Deep learning based approaches (most impressively, Transformer models like GPT-3) have been steadily improving over the course of the last few years, vindicating connectionism at least in the context of NLP. It seems like having a machine learning model generate output text would be less precise than the framework proposed here, but it would also drastically reduce the amount of human labor needed to program lots and lots of translational rules.

A good analogy here would be Apertium vs. Google Translate/DeepL. Apertium, as far as I understand it, consists of a large number of rules programmed manually by humans for translating between a given pair of languages. Google Translate and DeepL are just neural networks trained on a huge corpus of input text. Apertium requires much more human labor to maintain, and its output is not as good as its ML-based competitors. On the other hand, Apertium is much more "explainable". If you want to figure out why a translation turned out the way it did (for example, to fix it), you can find the rules that caused it and correct them. Neural networks are famously messy and Transformer models are basically impossible to explain.

Perhaps it would be possible to combine the two approaches in some way. I'm sure there's a lot more that could be said here, but I'm not an expert on NLP so I'll leave it at that. PiRSquared17 (talk) 23:46, 25 August 2020 (UTC)[]

@PiRSquared17: thank you for this comment, and it is a question that I get asked a lot.
First, yes, the success of ML systems in the last decade have been astonishing, and I am amazed by how much the field has developed. I had one prototype that was built around an ML system, but even more than the Abstract Wikipedia proposal it exhibited a Matthew effect - languages that were already best represented benefitted from that architecture the most, whereas the languages that needed most help would get least of it.
Another issue is, as you point out, that within a Wikimedia project I would expect the ability for contributors to go in and fix errors. This is considerably easier with the symbolic approach chosen for Abstract Wikipedia than with an ML-based approach.
Having said that, there are certain areas I will rely on ML-based solutions in order to get them working. This includes an improved UX to create content, and this includes analysis of the existing corpora as well as of the generated corpora. There is even the possibility of using an ML-based system to do the surface cleanup of the text to make it more fluent - basically, to have an ML-based system do copy on top of the symbolically generated text, which could have the potential to reduce the complexity of the renderers considerably and yet get good fluency - but all of these are ideas.
In fact, I am planning to write a page here where I outline possible ML tasks in more detail.
@DVrandecic (WMF): Professor Reiter ventured an idea or two on this topic a couple of weeks ago: "...it may be possible to use GPT3 as an authoring assistant (eg, for developers who write NLG rules and templates), for example suggesting alternative wordings for NLG narratives. This seems a lot more plausible to me than using GPT3 for end-to-end NLG."--GrounderUK (talk) 18:22, 31 August 2020 (UTC)[]
@GrounderUK: With respect to GPT-3, I am personally more interested in things like https://arxiv.org/abs/1909.01066 for the (cross-linguistic) purposes of Abstract Wikipedia. You might be able to imagine strengthening Wikidata and/or assisting abstract content editing. --Chris.Cooley (talk) 22:33, 31 August 2020 (UTC)[]
@Chris.Cooley: I can certainly imagine such a thing. However it happens, the feedback into WikidataPlusPlus is kinda crucial. I think I mentioned the "language-neutral synthetic language that inevitably emerges" somewhere (referring to Wikidata++) as being the only one we might have a complete grammar for. Translations (or other NL-type renderings) into that interlingua from many Wikipedias could certainly generate an interesting pipeline of putative data. Which brings us back to #Distillation of existing content (2nd paragraph)...--GrounderUK (talk) 23:54, 31 August 2020 (UTC)[]
@DVrandecic (WMF): May we go ahead and create such an ML page, if you haven't already? James Salsman (talk) 19:00, 16 September 2020 (UTC)[]
Now it could be that ML and AI will develop with such a speed to make Abstract Wikipedia superfluous. But to be honest (and that's just my point of view), given the development of the field in the last ten years, I don't see that moment be considerably closer than it was five years ago (but also, I know a number of teams working on a more ML-based solution to this problem, and I honestly wish them success). So personally I think there is a window of opportunity for Abstract Wikipedia to help billions of people for quite a few years, and to allow many more to contribute to the world's knowledge sooner. I think that's worth it.
Amusingly, if Abstract Wikipedia succeeds, I think we'll actually accelerate the moment where we make it's existent unnecessary. --DVrandecic (WMF) (talk) 03:55, 29 August 2020 (UTC)[]
I will oppose introducing any deep learning technique as it is 1. difficult to develop 2. difficult to train 3. difficult to generalize 4. difficult to maintain.--GZWDer (talk) 08:06, 29 August 2020 (UTC)[]
It could be helpful to use ML-based techniques for auxiliary features, such as parsing natural language into potential abstract contents for users to choose/modify, but using such techniques for rendered text might not be a good idea, even if it's just used on top of symbolically generated text. For encyclopedic content, accuracy/preciseness is much more important than naturalness/fluency. As suggested above, "brother of a parent" could be an acceptable fallback solution for the "uncle" problem, even it doesn't sound natural in a sentence. While a ML-based system will make sentences more fluent, it could potentially turn a true statement into a false one, which would be unacceptable. Actually, many concerns raised above, including the "uncle" problem, could turn out to be advantages of current rule-based approach over ML-based approach. Although those are challenging issues we need to address, it would be more challenging for ML/AI to resolve those issues. --Stevenliuyi (talk) 23:41, 12 September 2020 (UTC)[]

Hello everyone. Somewhat late to the party, but I recently proposed a machine learning data curation that is germane to this discussion I think. I propose using machine learning to re-index Wikipedias to paraphrases. FrameNet and WordNet have been working on the semantics problem that Abstract is trying to address. The problem they face is the daunting task of manually building out their semantic nuggets. Not to mention the specter over every decision's shoulder, opinion. With research telling us that sentences follow Zipf's Law paraphrases become the sentence equivalent to words' synonyms. It also means our graph is completely human readable, as we retain context for each sentence member of the paraphrase node. You could actually read a Wikipedia article directly from the graph. This way, we can verify the machine learning algorithms. Just like a thesaurus lists all of the words with the same meaning, our graph would illustrate all of the concepts with the same meaning, and the historic pathways used to got to related concepts. Further, by mapping Wikidata meta to the paraphrase nodes, we would have a basic knowledge representation. The graph is directed, which will also assist function logic for translating content. Extending this curation to the larger Internet brings new capabilities into play. I came at this from a decision support for work management perspective. My goal was to help identify risk and opportunities in our work. I realized that I needed to use communication information to get to predictive models like Polya's Urn with Innovation Triggering. I realized along the way that this data curation is foundational, and must be under the purview of a collective rather than some for profit enterprise. So here I am. I look forward to discussing this concept while learning more about Abstract. Please excuse my ignorance as I get up to speed on this wonderfully unique culture.--DougClark55 (talk) 21:33, 11 January 2021 (UTC)[]

Parsing Word2Vec models and generally

Moved to Talk:Abstract Wikipedia/Architecture#Constructors. James Salsman (talk) 20:36, 16 September 2020 (UTC)[]

Naming the wiki of functions

We've started the next steps of the process for selecting the name for the "wiki of functions" (currently known as Wikilambda), at Abstract Wikipedia/Wiki of functions naming contest. We have more than 130 proposals already in, which is far more than we expected. Thank you for that!

On the talk page some of you have raised the issue that this doesn’t allow for an effective voting, because most voters will not go through 130+ proposals. We've adjusted the process based on the discussion. We're trying an early voting stage, to hopefully help future voters by emphasizing the best candidates.

If you'd like to help highlight the best options, please start (manually) adding your Support votes to the specific proposals, within the "Voting" sub-sections, using: * {{support}} ~~~~

Next week we'll split the list into 2, emphasizing the top ~20+ or so, and continue wider announcements for participation, along with hopefully enabling the voting-button gadget for better accessibility. Cheers, Quiddity (WMF) (talk) 00:18, 23 September 2020 (UTC)[]

Merging the Wikipedia Kids project with this

I also proposed a Wikipedia Kids project, and somebody said that I should merge it with Abstract Wikipedia. Is this possible?Eshaan011 (talk) 16:50, 1 October 2020 (UTC)[]

Hi, @Eshaan011: Briefly: Not at this time. In more detail: That's an even bigger goal than our current epic goal, and whilst it would theoretically become feasible to have multiple-levels of reading-difficulty (albeit still very complicated, both technically and socially) once the primary goal is fully implemented and working successfully, it's not something we can commit to, so we cannot merge your proposal here. However, I have updated the list of related proposals at Childrens' Wikipedia to include your proposal and some other older proposals that were missing, so you may wish to read those, and potentially merge your proposal into one of the others. I hope that helps. Quiddity (WMF) (talk) 20:06, 1 October 2020 (UTC)[]
@Eshaan011 and Quiddity (WMF): Ultimately, I agree with Quiddity but, in the real world, not so much. We have already discussed elsewhere the possibility of different levels of content and respecting the editorial policies of different communities. Goals such as these imply that language-neutral content will be filtered or concealed by default in some contexts. Furthermore, the expressive power of different languages will not always be equally implemented, so we will need to be able to filter out (or gracefully fail to render) certain content in certain languages. Add to this the fact that language-neutral content will be evolving over a very long time, so we might expect to begin with (more) basic facts (more) simply expressed in a limited number of languages. To what extent the community will be focusing on extending the provision of less basic facts in currently supported languages rather than basic facts in more languages is an open question. It does not seem to me to be unreasonable, however, to expect that we might begin to deliver some of the suggested levelled content as we go along, rather than after we have substantially completed any part of our primary goal. (Quite why we would take the trouble to find a less simple way of expressing existing language-neutral content is unclear; perhaps it would be a result of adopting vocabulary specific to the subject area (jargon). In any event, I would expect some editorial guidelines to emerge here.)--GrounderUK (talk) 15:31, 4 November 2020 (UTC)[]

Why defining input or output parameters for pure "functions" ?

Some functions may also take as input *references* to other functions that will be called internally (as callbacks that may be used to return data, or for debugging, tracing... or for hinting the process that will compute the final result: imagine a "sort()" function taking a custom "comparator" function as one of its inputs).

Note as well that formally, functions are not imperative about the role assigned to input and output parameters: imagine the case of inversible inferences, like "sum(2,3,:x)" returning "{x=5}" and "sum(:x,3,5)" returning "{x=2}", like in Smalltalk, Prolog and other IA languages: no need to define multiple functions if we develop the concept of "pure" functions *without* side effects.

So what is important is the type of all input and ouput parameters together, without the need to restrict one of them as an output: binding parameters to values is to assign them the role of input. The function will return one or more solutions, or could return another function representing the set of solutions.

And to handle errors/exception, we need an additional parameter (usually defined as an input, but we can make inferences as well on errors. Errors/exceptions are just another datatype.

What will be significant is just the the type signature of all parameters (input or output, including error types) and a way to make type inference to select a suitable implementation that can reduce the set of solutions.

-- verdy_p (talk) 09:30, 29 October 2020 (UTC)[]

A question about the logo voting

  Question: I have a question about the logo voting. Is the page Abstract Wikipedia/Logo about Abstract Wikipedia or wiki of functions? (Or about both?) --Atmark-chan <T/C> 06:56, 3 December 2020 (UTC)[]

The page is a draft, for early propositions; for now there will be no specific logo for Abstract Wikipedia itself, which will not start before one year (and that will still not be a new wiki, but an overall project across Wikimedia projects, to integrate Wikifunctions, Wikidata and other tools into existing Wikipedia projects)
The content of that page will be updated in the coming discussions about the final organization of the vote. The draft is just there to allow prople to prepare and reference their proposals (I just made the first proposal, others are welcome, the structure for putting multiple proposals and organize them is not really decided. For this reason, that page is a draft to complete (and note that there's no emergency for now for the logo, even the early wiki for Wikifunctions will be in draft for many months and it is likely to change a lot in the coming year; we have plenty of time to decide a logo; note also that the current alpha test wikis use an early custom logo, seen in the Facebook page of the project, but very poor; it was based on some older presentations of the project before its approval by the WMF).
Wikifunctions however is not just for Wikipedia and will certainly have a use on all other non-wikipedia projects or even projects outside Wikimedia, including non-wiki sites).
The proposals to submit soon will be for the new wiki built very soon for Wikifunctions only (note: the name was voted, but is still not official, it will be announced in a couple of week, we are waiting for a formal decision by the WMF after legal review; this is independant of the logo). I don't think we need to include the project name in the logo (and during the evaluation of names, it was decided that it should be easily translatable, so likely the name will be translated: translating the project name inside the logo can be done separately in an easier way if the logo does not include this name, which can be composed later if we want it, or could be displayed dynamically on the wiki, in plain HTML, without modifying its logo; a tool could also automatically precompose several variants of the logo with different rendered names, and then display the composed logo image according to the user's language, if Mediawiki supports that).
How all other projects will be coordinated is still something not decided, and only Wikifunctions has been approved as a new wiki, plus the long-term project for the Abstract Wikipedia (which will require coordination with each Wikipedia edition and other development for the integration of Wikifunctions and Wikidata.
In one year or so, it will be time to discuss about Abstract Wikipedia and the logo (if one is needed) should then focus Wikipedia.
Note also that Wikifunctions will work with a separate extension for MediaWiki, which will have its own name too (WikiLambda) but it will be generic and not necessarily tied to Wikifunctions: this extension may be later integrable to other wikis, including outside Wikimedia: it will be a generic plugin for MediaWiki. But we are too far from this goal, and it will require some interests from other external wikis (they already use several other extensions, including notably SemanticMediawiki, which may or may not also be used later along with Wikifunctions, possibly for building AbstractWikipedia as well). verdy_p (talk) 10:40, 3 December 2020 (UTC)[]
@Verdy p: Oh, I see. Thank you! Atmark-chan <T/C> 08:05, 4 December 2020 (UTC)[]
I'm not convinced that there will be no new wiki for our language-neutral Wikipedia but I agree that none is yet planned! --GrounderUK (talk) 12:25, 4 December 2020 (UTC)[]

Confusion on Wikipedia

Apparently, there's a big confusion throughout multiple languages of Wikipedia about the real scope of Wikifunctions and Abstract Wikipedia. I tried to fix the Wikifunctions Wikipedia article and separate the items into Abstract Wikipedia (Q96807071) and Wikifunctions (Q104587954), but the articles in other languages need to be updated to better reflect how both the projects work. Luk3 (talk) 19:19, 30 December 2020 (UTC)[]

@Luk3: well done, i've updated the page in italian --Sinucep (talk) 19:49, 2 January 2021 (UTC)[]
Having this project running under the name Abstract Wikipedia is a needless way to engage in conflict with Wikipedians. Why not rename the project here into Wikifunctions now that we have a name? ChristianKl❫ 01:25, 16 January 2021 (UTC)[]
@ChristianKl: This topic explains exactly that the two projects are different in nature. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 12:22, 16 January 2021 (UTC)[]

Inter-language links

Greetings. The existence of two separate wikidata items, Wikifunctions (Q104587954) and Abstract Wikipedia (Q96807071), has broken the inter-language links. E.g., en:Wikifunctions lists four languages (English, Català, Italiano, Vèneto) while pt:Abstract Wikipedia lists nine languages (excluding English). Any ideas how to fix this? Maybe split the redirect en:Abstract Wikipedia into a new article? Fgnievinski (talk) 21:44, 25 February 2021 (UTC)[]

Logo for Wikifunctions wiki

Wikifunctions needs a logo. Please help us to discuss the overall goals of the logo, to propose logo design ideas, and to give feedback on other designs. More info on the sub-page, plus details and ideas on the talkpage. Thank you! Quiddity (WMF) (talk) 23:01, 14 January 2021 (UTC)[]

  • The logo should be inviting to a broad public of users. It shouldn't produce a "this project is not for me" reaction in users who don't see themselves as technical.
Maybe a cute mascot could provide for a good logo. ChristianKl❫ 00:46, 16 January 2021 (UTC)[]

There's no consensus in Wikidata that Abstract Wikipedia is an extension of it

@DVrandecic (WMF) and Verdy p: The project description currently says: Abstract Wikipedia is an extension of Wikidata. As far as I'm concerned what's an exntension of Wikidata and what isn't is up to the Wikidata community. I thus removed the sentence. It was reverted. Do we need an explicit RfC on Wikidata to condem Abstract Wikipedia for overstepping boundaries to get that removed? ChristianKl❫ 13:00, 16 January 2021 (UTC)[]

  • @ChristianKl: Your edit was reverted because you seem to have used a weeks-old version of the page for your edit, and thus have yourself reverted all the recent edits for no reason at all. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 13:07, 16 January 2021 (UTC)[]
    Actually this edit was based on a much older version, than just one ago, it used the old version from 16 October, 3 months ago, and the Wikifunctions name was still not voted ! Other details have changed since then, there were some new announcements, changes in the working team, changes for some links (including translated pages). What you did also dropped all edits by the WMF team made since 3 months up to yesterday. verdy_p (talk) 17:45, 16 January 2021 (UTC)[]

@ChristianKl: I agree. I have qualified the original statement, quoting the relevant section of the Abstract Wikipedia plan. Hope it's okay now.--GrounderUK (talk) 15:02, 16 January 2021 (UTC)[]

I revisited your addition: note that it will break an existing translted paragraph, so to ease the work, I separated the reference, which also included a link not working properly with translations: beware notably with anchors (avoid using section headings directly, they vary across languages!) I used the stable (untranslated) anchor and located the target of the link more precisely. Your statement was a bit too elusive, as there's no intent for now to modify Wikidata before 2022 to add contents there. It's clear we'll have new special pages, but not necessarily new content pages in Wikidata.
The "abstract content" needed for Abstract Wikipedia is also different from what will be stored initially in Wikifunctions (which will be independant of any reference to Wikidata elements but will contain the implementations of functions, needed for transforming the "abtract content" into translated content integrable to any Wikipedia (or other wikis, multilingual or not...). The first thing will be to integrate Wikifunctions (still useful for reusable modules and templates), but this is still independant of Abstract Wikipedia still not developed this year: Wikifunctions will just be one of the tools usable to create LATER the "Abstract Wikipedia" and integrate it (don't expect a integration into Wikidata and Wikipedias before the end of 2022; I think it will be in 2023 or even later, after lot of experiments in just very few wikipedias; the integration in large wikipedias is very unlikely before about 5 years, probably not before 2030 for the English Wikipedia: the first goal will be to support existing small Wikipedias, including those in Philippines languages which seem to grow large and fast, but with lot of bot-generated articles.). verdy_p (talk) 18:49, 16 January 2021 (UTC)[]
Thanks. I'm not sure what you found "elusive"; my statement was a direct quotation from the linked source, which asserts that Wikidata community agreement is necessary for any eventual changes to Wikidata and that some alternative will be adopted if the community does not agree. Perhaps the plan itself is elusive, but additional speculation about timescales and the nature of the alternatives seems to obscure the point; it is also absent from the referenced source (and makes additional work for translators). You make a couple of points above that I don't agree with. It is not clear to me that new special pages will be required in Wikidata but, if they are, they will require agreement from the Wikidata comunity. I also believe that early examples of functions in Wikifunctions will (and should) be dependent on "reference to Wikidata elements", in the same way that some Wikipedia infoboxes are dependent on such elements.--GrounderUK (talk) 20:39, 16 January 2021 (UTC)[]
There's no "speculation " about timescale in what is to translate in the article. All this is already in the development plan. I only speculate a bit at end of my reply just before your reaction here (it seems clear that about 5-10 years will pass before we see a change in English Wikipedia to integrate the Abstract Wikipedia, simply because it does not really need it: the Abstract Wikipedia is clearly not the goal of this project; on the opposite, Wikifunctions will have its way in English Wikipedia quite soon, just because it will also facilitate the exchanges with Commons, also needing a lot of the same functions). I clearly make a difference between Wikifunctions (short term with a limited goal), and Abstract Wikipedia (no clear term, but in fact long term umbrella project needing much more than just Wikifunctions). verdy_p (talk) 03:25, 17 January 2021 (UTC)[]
If the timescale is in the development plan, please indicate where this is. The timeline on the main page only states that Abstract Wikipedia development "proper" will start in 2022 (elsewhere, "in roughly 2022"), not that "integration" with Wikidata will be achieved (if agreed) by July 2022 (or whenever "the second year of the project" is deemed to end).--GrounderUK (talk) 11:25, 17 January 2021 (UTC)[]
This objection can't be serious? The Wikidata community has no mandate to decide whether another WMF project, the WMF, or even a non-WMF site, does some commentary stating a project is an extension of Wikidata. The WMF announcement, Abstract Wikipedia/July 2020 announcement, is quite clear on the 'sourcing' for this sentence. In particularly, the end of paragraph 2 and all of paragraph 3. ProcrastinatingReader (talk) 11:59, 19 January 2021 (UTC)[]

I am trying to understand "Abstract Wikipedia" (please help)

Dear @Quiddity (WMF): and others, after not having had much time for this project, I am now trying to catch up with the developments and understand for myself what "Abstract Wikipedia" is all about.

In my current understanding:

  • Wikipedia is an encyclopedia with articles that consist mainly of text.
  • "Abstract Wikipedia" is the name of an initiative. This initiative will have two results: first, a new wiki called "wiki functions". This new wiki will, second, help to create new "information pages" in the Wikipedia language versions. These information pages (as I call them) will constitute the so called "Abstract Wikipedia".
  • The information pages will be a better version of the "ArticlePlaceholder tool". The "ArticlePlaceholders" are not supposed to be articles, but exactly "placeholders" for articles where they do not exist. For example, Wikipedia in language X has no article about a certain German village. If you search for the village, you will get a "placeholder", which is basically a factsheet with titles/descriptions and data coming from Wikidata. (These placeholders exist currently in a small number of Wikipedias, experimentally.)
  • The information pages will be "better" than placeholders because of Wiki functions (WF). The WF are code that can transform data/information from Wikidata into something that looks not like a factsheet but more like sentences.
  • This means that in future, the reader of Wikipedia language version X will search for the German village. If the Wikipedians of that language version have not created such an article, then the reader will see the "information page" (again, as I call these pages) about that village. Or: if article and information page exist, the reader will be offered both. (A consequence: we might tend to have less data in an article that could be better presented via the information page.)

To me (and maybe others) it is confusing that "Abstract Wikipedia" is the name of the whole initiative and for one of the two results of the initiative. Also, the proposal talks about "articles" in "Abstract Wikipedia", and that sounds as if the information pages are supposed to replace the encyclopedic (text) articles. Finally, "Abstract Wikipedia" will not be a Wikipedia as we know it (encyclopedia with text articles).

So, what do you think about this summary? I am open for any correction necessary. :-)

Kind regards Ziko (talk) 10:44, 21 January 2021 (UTC)[]

@Ziko: Hallo. That is partially accurate, partially incomplete, and partially not-yet-decided. I would try to clarify it like this:
  • The Wikifunctions wiki has a broader goal beyond purely linguistic transformations. It will host functions that do any kind of specific calculation on any kind of data.
    • As someone who thinks in lists, I found these 2 links particularly helpful for getting a better sense of the scope of the project: The (minimal) mockup screenshot of what a potential future Main Page might look like; and the very short list of Abstract Wikipedia/Early function examples (~50 examples out of what will hopefully become 100,000+++).
    • Part of its scope will include some of what we currently use templates/modules for, on the other wikis. I.e. this will partially solve the problem we currently have, of needing 150+ separate copies of Template:Convert (and 130+ copies of Module:Convert) which aren't easy to maintain or translate. It won't solve it completely (in the short-term at least!) because the largest wikis will inevitably continue to want to use their own versions instead of a global version; but it should make things vastly easier for the smaller and mid-size wikis. It's also not a solution for the entire problem of "global templates" because Wikifunctions is not intended to host simple page-lists or box-layout-design code or many other template/module uses - but it will be able to partially solve the "calculation" part.
    • There are some obvious existing semi-comparable projects in the math realm, such as fxSolver and WolframAlpha, but nothing (that we know of) in the more linguistic and other-data realms.
    • This Wikifunctions project will be useful as a place to collaboratively (and safely) edit shared code, which can then be utilized on the other wikis, and even beyond.
    • For example, at d:Q42, we know the birth-date of the subject in the Gregorian calendar. But Wikifunctions will provide a way for users to take that date and run calculations on it, such as: What day of the week was that? What was that date in the Hebrew calendar? How old were his parents at the time? What was the world/country/city population at the time? etc. -- Users will be able to run the calculations (functions) directly within the Wikifunctions wiki, as well as be able to output the result of the calculation to some other wiki.
  • The Abstract Wikipedia component is separate, but with overlap. Many of the aspects of how it might be implemented still need a lot of discussion.
    • For a content example, you make a good comparison with ArticlePlaceholder as a simple precursor. There's also Reasonator (e.g.1 en, e.g.2 fr); However those don't exist for many languages, nor most statements (e.g.3 vi). In the backend, those short descriptions are actually coming from the AutoDesc tool. (e.g.4). That AutoDesc system currently has 27 translatable word-elements, and can only handle very simple sentences for cases where each word can be separately replaced with a translated word (and the word order tweaked per-language). The most important difference to both ArticlePlaceholder and Reasonator - but also to projects such as LsjBot or Rambot - is that we want to allow the community to take more ownership of the way and the scope of information being rendered into the individual languages.
    • The Abstract Wikipedia project (and the linguistic function parts of it within Wikifunctions), aims to be able to generate more complex content. It will partially rely upon Lexemes within Wikidata, and partially upon "functions" that convert sentence-structure-fragments into multiple languages.
    • Regarding how that content might then appear within any particular Wikipedia project, there are 3 broad options outlined at Abstract Wikipedia/Components#Extensions to local Wikipedias (which I won't attempt to summarize; please read!). But there are also options beyond those 3, such as partial-integration (and partial local-override) at a per-section level. E.g. The Wikipedia article on "Marie Curie" (which only exists in any form in 169 languages today), could have an "Abstract version" shown at a small wiki, but the local editors at that small wiki could be able to override the intro-section and/or the "Life" section, or add a custom "Further reading" section, whilst still retaining the rest of the "abstract version". This should enable readers to get the most possible information (in their language) at any given time, whilst also enabling editors to improve any part of the content at any given time.
    • Those sorts of details are yet to be discussed in great depth, partially because we need to determine what is technically possible/feasible, and partially because we need to slowly build up a broader community of participants for this sort of discussion/decision.
    • It will be unusually intricate in the sense that components on different wiki projects will be working together to create the outcome of the project. But in the same domains of things we're all already very experienced with.
    • We currently expect that the term "Abstract Wikipedia" will disappear in time, and it is really the name of the initiative. We don't expect that there will be, in the end, a component that will be called "Abstract Wikipedia". Or, put differently, the outcome of Abstract Wikipedia will be in the interplay of the different components that already exist and are being developed. But that's also up for discussion, as you can see in the discussion above.
Lastly, yes, naming things is hard. It's both hard to initially come up with perfectly descriptive and universally non-ambiguous short phrases for things, and even harder to change them once people have started using them!
I hope that helps you, and I'll emphasize that we would also welcome any help making clear (and concise!) improvements to the main documentation pages to clarify any of this for others! :-) Quiddity (WMF) (talk) 23:00, 21 January 2021 (UTC)[]

Dear @Quiddity (WMF):, thanks so much that you took the time for your comments! This helps a lot. I am still digesting your explanations, and might have a few follow up questions later. But this helps me to study further. For the moment, Ziko (talk) 14:49, 25 January 2021 (UTC)[]

Spreadsheets functions

Hello,

I often use Spreadsheets and I like Wikifunctions. From my point of view Spreadsheet functions syntax is language independent. The most syntax of the functions there like IF or MID are translated into many languages and so user can edit the functions in their language. From my point of view this is a chance for Wikifunctions to get more contributions after I think that editing Spreadsheet functions is something what can be done by much more people than more complex programming taks. If I understand the function model and am able to write functions in Wikifunctions I can help to create something what helps to bring spreadsheets functions into a Wikifunction. This is something I say after I tried to convert a Spreadsheet function into a programm in R. After that works I think it can also work in other programming languages. I am not good in Programming and so I think I can only create a tool for converting a Spreadsheet function with the skills I have at the moment but not a gadget or something like that what is more integrated in Mediawiki. Please tell me what you think about that. --Hogü-456 (talk) 21:09, 1 February 2021 (UTC)[]

@Hogü-456: Thanks for your comment. I am also rather excited about the idea of bring spreadsheets and Wikifunctions close together. But I am not sure what you are suggesting:
  1. it should be possible for someone who has the skills to write a formula into a spreadsheet to contribute functions to Wikifunctions
  2. it should be possible for a spreadsheet user to use functions from Wikifunctions
I agree with both, but I wanted to make sure I read your comment right. --DVrandecic (WMF) (talk) 23:20, 1 February 2021 (UTC)[]
I suggest the first thing and the second thing is also interesting. I dont know if there is a equivalent in Spreadsheets for all the arguments in Wikifunctions. So I think it is possible for the most things but not for all. --Hogü-456 (talk) 18:53, 2 February 2021 (UTC)[]
I agree both are interesting. There is also a third possibility, which is having a spreadsheet implementation of a Wikifunctions function. In spreadsheets, arguments are ranges, arrays, formulas or constants, and different spreadsheet functions have different expectations of the types of argument they work with. For example, in Google Sheets, you can SUM a 2-dimensional range but you can't CONCATENATE it. In Wikifunctions, any such limitations would be a result of the argument's type. In principle, a Wikifunctions function could concatenate a 2D or 3D array, or one with as many dimensions we care to define support for. GrounderUK (talk) 20:01, 2 February 2021 (UTC)[]
Interesting points. We will definitely take a very close look at spreadsheets once the UX person has joined us, in order to learn from how spreadsheets does allow many people to program. I think that the mix of guidance by the data and autocomplete is extremely helpful, and we can probably do something akin to that: if we know what the function contributor is trying to get to, and know what they have available, we should have pretty good constraints on what could be done. --DVrandecic (WMF) (talk) 00:38, 4 February 2021 (UTC)[]
Any idea when the UX person will be joining?--GrounderUK (talk) 08:08, 3 March 2021 (UTC)[]
The process is ongoing, we'll announce it through the weeklies when we know it. --DVrandecic (WMF) (talk) 01:30, 4 March 2021 (UTC)[]
I like the concept of using Wikifunctions with an extension for Speadsheets (but not only: lot of programming languages could have such "pure functions" integrated while being executed offsite, over the cloud, and not not just over Wikifunctions, which would just be the place where functions are found as a repository, described, then located with the evaluator running on Wikimedia servers... or elsewhere, such as Amazon Lambda, Azure functions, IBM cloud, or other public clouds,n or private clouds like nextcould. For this to work, I think that Wikifunctions should use a common web API that allows chosing the location where evaluators will run (including for the application using these functions, or the OS running that application, or the user profile using that application, the possibility to set preferences; a user could have in his profile a set of prefered locations for the evaluators). In summary what we need is: (1) a repository with lookup facility, that repository should describe the functions in the same computer language via its interface, that API will be able to collect sets of locations where these functions can be evaluate; (2) locating a function description and locating an evaluator for it requires resolvers (based on URNs); (3) searching a function is more or like like a classic search engine, except that it searches in structured metadata, i.e. special documents containing sets of properties, or a database, just like what Wikidata is already; the same occurs with other common repositories for software updates on Linux with APT, DNF, YUM or EMERGE, or for programming environments like CPAN or NODE.JS, or LUAROCKS and also in GitHub with a part of its services or with various connectors (Wikifunctions could as well provide a connectror for GitHub...)
So we should separate the repository of function metadata (i.e. Wikifunctions, containing lot of info and other services like talk pages and presentations, logos, Wikimedia user groups via subscribed Wikiprojects...) from the repository of locations/sites where a function can be executed (either because it is installed locally, or because the site offers the way to accept a task to be uploaded there and registered in a cache for later reuse), in order for a client (like Wikifunction itself or any application using Wikifunctions) to be able to accept jobs to be executed on that site to produce a result (of course this requires security: connecting a new evaluation site has constraints, including legal ones for privacy: calling a function with parameters will easily allow the evaluator to collect private data); as well we must avoid the denial of service (so that the repository of evaluator will detect sites that are not responding to any evaluation request, or just return corrupted or fake answers: this requires evaluation of trust and reliability, just like in any distributed computing environment (this is not so much a deal if evaluation sites cannot be freely be added but are added by admins of the locator server for all possible evaluators.
In summary, separate the description of functions (in its metadata language) from its evaluation (except for a few builtin functions that are resolved locally and possibly runnign on the same host or only on a specific small set of hosts with secure connections, and accepting to delegate execution only to the same final client of the function, if that client wants to run the function himself with his own resources, for example for debugging inside his web browser, with Javascript or WebAsm). verdy_p (talk) 01:13, 5 March 2021 (UTC)[]

There are a few ideas like this Wiki Spreadsheet design, which depend on functions being defined in some shared namespace on a project, or globally. As a result, spreadsheet functions may be a good class to expand to, in addition to those useful on Wikipedia pages. –SJ talk  21:09, 20 April 2021 (UTC)[]

@Sj: I've merged your comment up into this existing thread, I hope that's ok. :)
I'm also reminded of Wikipedia and Wikidata Tools (see TLDR in github), which seems closely related. Quiddity (WMF) (talk) 03:52, 21 April 2021 (UTC)[]
@Quiddity: more than ok! Thanks for catching that. Spreadsheets and wikicalc remain the future... and a necessary building block for even more interesting transparent dynamic refactoring. –SJ talk  18:10, 24 April 2021 (UTC)[]

License

Hello, I don't see much communication about the essential point of license. Since it's a direct continuation of the Wikidata agenda, is it correct that it will impose CC-0 everywhere it will be integrated within the Wikimedia environment? A clear yes/no would be appreciated. --Psychoslave (talk) 22:57, 5 March 2021 (UTC)[]

All appologies, I didn't saw that this talk page had archives and thus already had this point discussed in Talk:Abstract_Wikipedia/Archive_1#License_for_the_code_and_for_the_output and Talk:Abstract_Wikipedia/Archive_2#Google's_involvement. So it looks like I have some homework to do before possibly further comment. At first glance the answer seems to be "it's not decided yet". --Psychoslave (talk) 23:06, 5 March 2021 (UTC)[]


Follow up on the earlier topic « License for the code and for the output »

Given my understanding of this, which I'm not sure is correct at all, then the generated text can't be copyrighted, but the data and code to generate the text can be protected. — @Jeblad:

I this this is an analogous problem of the compilation problem : the binary usually does not inherit from the compiler licence, the result depends on the source code licence, see. the FSF FAQ on the GPL licence : https://www.gnu.org/licenses/gpl-faq.en.html#GPLOutput More generally, when a program translates its input into some other form, the copyright status of the output inherits that of the input it was generated from. … it seem the licence of the result depends on the license of the source data, not the licence of the source code of the functions.

Which could be the lexicagraphical datas, and the abstract form of articles, I guess, but at some point the distinction of the code and the datas can somewhat be blurry …

Also @Denny, Deryck Chan, Psychoslave, DVrandecic (WMF), and Amire80: TomT0m (talk) 10:53, 7 March 2021 (UTC)[]

I agree with the principles laid out above and don't have any specific preference to give. Deryck C. 16:56, 7 March 2021 (UTC)[]
Thank you for pinging me TomT0m. Do you have any question on the topic, or specific point on which you would like some feedback?
For what it's worth, "I am not a lawyer but…", your exposure of the subject seems correct to me. I would simply add with a bit more granularity you might add that :
  • it depends on the terms of use of your toolchain. If you use some SaaS that basically say "all your data are belong to us" and between two cabalistic attorney incantation say that the end output will be placed "under exclusive ownership of Evil-Corp.™ and we may ask you to pay big money at any time for privileges like accessing it in some undefined future", the terms might well be valid. But to the best of my knowledge, there's no such a thing in the Wikimedia compiler toolchains, houra,
  • you can't legally take some work on which you don't have personal ownership, throw it through some large automaton chain like, and claim ownership on the result: you have to make a deal with the owner of each peace of work you took as input. All of them. Or none, if none of them pay attention or don't care about what you do. Until they do.
  • the world is always more complicated (even when you took into account that the wold is always more complicated): there is no such thing as "the copyright law". Each legislation as its myriad of laws and exceptions, which may apply on a more or less wide area, with a lot of other locale rules which may or not have preemption on the former, etc.
But once again, that's just more "focused details", the general exposure you made seems correct to me. --Psychoslave (talk) 16:24, 8 March 2021 (UTC)[]

Thanks for asking the question! Yes, that's still something we need to figure out. We are working together with Wikimedia's legal department on the options. A decision must be made before the launch, but it will still take as a while to get to something that's ready to be shown to the communities. Thank you for your patience! --DVrandecic (WMF) (talk) 17:54, 8 March 2021 (UTC)[]

Example Article?

There's a lot of talk about mathematical functions here; there's less talk in functions that produce sentences. Would there be interest in having an "example" article to show what Abstract Wikipedia might look like? Specifically:

  • Write a pseudo-code article on a high-profile topic. For example, for simple:Chicago, (Q1297 IS Q515 LOCATED IN Q1204. Q1297 IS Q131079 LIST POSITION [three])
  • Evaluate the functions manually to translate it to several languages (ideally including languages such as ht:Chicago with bad coverage, if we have sufficient language materials to do so).

Does this seem like a reasonable thing to do? power~enwiki (talk) 20:08, 11 March 2021 (UTC)[]

@Power~enwiki: Hi. Yes that sounds like a good idea. We can create a detailed example of at least one complex paragraph sometime in the next few weeks, which should be enough to extrapolate from. And then anyone could build out further examples for discussion and potential problem-solving.
For now, we do have a couple of example sentences from an early research paper in Abstract Wikipedia/Examples, and 2 very early mockups at Abstract Wikipedia/Early mockups#Succession (2 sub-sections there) which might help a little. Thanks for asking. Quiddity (WMF) (talk) 23:29, 11 March 2021 (UTC)[]

A few more questions for the crowd.

  1. Are we going to try to have each sentence be 1 line, or will these be multiple lines?
  2. Wikidata is great for nouns. For verbs, it doesn't seem as good. Many concepts ("to destroy", "to fix") don't have encyclopedia articles at all. Are these available in Wikidata somehow? power~enwiki (talk) 22:49, 14 March 2021 (UTC)[]
  3. When writing an article, should pronouns be used at all? It seems better to refer to a Q-item (or an ARTICLETOPIC macro), and to have the natural-language generation determine when to replace an explicit subject with a pronoun.

Thoughts? power~enwiki (talk) 22:49, 14 March 2021 (UTC)[]

@Power~enwiki I think d:Wikidata:Lexicographical data might help. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 10:34, 15 March 2021 (UTC)[]

@Power~enwiki: I’m not sure I understand your first question, but...

  1. I don’t think we should have limits on the length or complexity of sentences. For more complex sentences (like the previous one), a version using simpler sentences should be considered. (Long sentences are fine. Complex sentences are fine.)
  2. Further to what @1234qwer1234qwer4: said, Wikidata lexemes (may) have a specific property (P5137) linking a sense to a particular Wikidata item. Currently, destroy doesn’t, but be links to both existence ( Q468777) and being ( Q203872), for example.
  3. I agree that it would generally be better to use unambiguous references. In the translations (natural language renderings), I suggest we might use piped links for this. For example, “Some landmarks in the city are...”. We might also use, for example, <ref>[[:d:Q1297| of Chicago]]</ref> where there is ellipsis, as in “As of 2018, the population[1] is 2,705,994.”

References

--GrounderUK (talk) 14:55, 15 March 2021 (UTC)[]

Regarding "1-line" v "multi-line": My question is whether the syntax will look more like SQL or more like JSON (or possibly like LISP). I suppose any can be one-line or multi-line. Regarding verbs: Lexemes are language-specific; it's currently annoying to search Wikidata for them but I don't think they should be in the language-agnostic Abstract Wikipedia text anyhow. power~enwiki (talk) 03:26, 16 March 2021 (UTC)[]

@Power~enwiki: Thanks for clarifying. There’s a chicken-and-egg problem here. I don’t think @Quiddity (WMF): is intending to include me in his “we” and I have a open mind on the question of “pseudo-code”. I suggest we [inclusive and impersonal] should avoid something that looks like “labelized Wikifunctions”. This is because “we” should try to “translate” the pseudo-code into that format, as well as into natural languages (and/or Wikitext that looks very like the Wikitext one might find in the Wikipedia in that language). So the pseudo-code should be more like “our” language of thought (labelled “English”, for example) but using an unambiguous identifier for each semantic concept. This means we can have many “pseudo-codes”, one for each natural language we select. Such a pseudo-code should not look too much like a translation into that target language or its Wikitext, of course. Without endorsing its content, I refer interested parties to Metalingo (2003)--GrounderUK (talk) 12:22, 16 March 2021 (UTC)[]
  • I'd like to second the suggestion of a relatively fleshed out example. One paragraph would be fine, but I think it's important to also try rendering the same abstraction into a couple languages other than English (including at least one non-Indo-European language). As I said at Talk:Abstract Wikipedia/Examples, I think it's easy to fall into a trap of coming up with functions that accidentally have baked into them features that are peculiar to the English language, or which at least fail to generalize to some languages.
Another worthwhile exercise would be taking a couple of the functions used in the example, and coming up with several test cases for those functions, and then seeing whether, in principle, a single renderer for language X could produce cogent results for all of those test cases. For example, maybe there's a function possible(X). And so we come up with test cases like:
  1. possible(infinitive_action_at_location(see(Russia), Alaska))
  2. possible(exists(thing_at_location(Alien life, Mars))
  3. possible(action(US President, declare(state_of_war_against(China)))))
And it's straightforward (modulo some fine details) to imagine how a renderer for this function could be implemented in English. But there are languages that distinguish grammatically between the possibility that something is true (epistemic modality: either there's life on Mars or there isn't - we don't know which is true, but we have some reason to think there might be) and someone having the ability to do something (the President having the ability/power to declare war against another country, an arbitrary person being able to see Russia from Alaska). So in fact this is a bad abstraction, since a renderer in, say, Turkish needs to distinguish between these senses, and can't do so with the information given. Colin M (talk) 17:09, 18 April 2021 (UTC)[]

I started fleshing out the start of an article here: Jupiter. Happy if we continue to work on this and try to figure out if we can find common ground. It would be great to do some Wizard of Oz style test and try to render that by hand for some languages. The only non-PIE language I have a tiny grip on is Uzbek - I am sure we can find more competent speakers than me to try this out. --DVrandecic (WMF) (talk) 00:53, 20 April 2021 (UTC)[]

This is great. I would suggest linking to it from the main examples page for more visibility (though I understand if you want to do more work on it first). One suggestion I would have for a human-as-renderer test might be to do some replacement of lexemes with dummies in such a way that the renderer won't be influenced by their domain knowledge. e.g. instead of Superlative(subject=Jupiter, quality=large, class=planet, location constraint=Solar System), something like... Superlative(subject=Barack Obama, quality=large, class=tree, location constraint=France).
In this case, for the renderers to have a hope of getting the right answers, we would probably need to also give them some documentation of the intended semantics of some of these functions and their parameters. But I see that as a feature rather than a bug. Colin M (talk) 17:35, 20 April 2021 (UTC)[]
Yeah, I probably should link to it. I didn't because it is not complete yet, but then again, who knows when I will get to completing it? So, yeah, I will just link to it. Thanks for the suggestion. --DVrandecic (WMF) (talk) 00:10, 21 April 2021 (UTC)[]

Schemas

I am interested in Wikifunctions and I like the idea of Abstract Wikipedia. I created easy structured sentences with variable parts for some times in Spreadsheets. I think a chance of Abstract Wikipedia is that it could improve the data quality in Wikidata. I think if it is known what is important about a topic and someone miss something and understands how the text is generated, then maybe people enter the information into Wikidata. For that it is helpful to have clear structures and forms can help to enter information. At the moment this is not so easy. Have you thinked about how the content what is important about a topic can be defined. As far as I know there are shape expressions in Wikidata. --Hogü-456 (talk) 20:35, 22 March 2021 (UTC)[]

@Hogü-456: Yes, shape expressions are the way to go, as far as I understand the Wikidata plans. Here's the Wikidata project page on Schemas. My understanding is that Shex can be used both for creating forms as well as for checking data - I would really like to see more work on that. This could then also be used to make sure that certain conditions on an item are fulfilled, and thus we know that we can create a certain text. Fully agreed! --DVrandecic (WMF) (talk) 23:15, 19 April 2021 (UTC)[]

Info-Box (German)

Who takes care of the entries in the Info-Box? At least in the German Info-Box the items differ from the real translations. —The preceding unsigned comment was added by Wolfdietmann (talk) 09:32, 15 April 2021 (UTC)[]

@Wolfdietmann: Hi. In the navigation-box, there's an icon in the bottom corner that leads to the translation-interface (or here's a direct link). However, that page doesn't list any entries as being "untranslated" or "outdated". Perhaps you mean some of the entries are mis-translated, in which case please do help to correct them! (Past contributions are most easily seen via the history page for Template:Abstract_Wikipedia_navbox/de). I hope that helps. Quiddity (WMF) (talk) 18:33, 15 April 2021 (UTC)[]
@Quiddity (WMF): Thank´s a lot. Your hint was exactly what I needed.--Wolfdietmann (talk) 09:29, 16 April 2021 (UTC)[]

Effects of Abstract Text to the Job Market

Hello,

I have thinked about what could happen if abstract Text works good and is used in many contexts also out of the Wikimediaprojects. What does that mean for jobs. I asked me is Abstract Text a innovation that reduces the need of personnel for example for writing technical instructions. Changes are something what happen and they are not bad. I think it is important that there is a support for employees that are affected by that change and to make sure that they have another job after the change. From my point of view is Wikifunctions here a important part because it can help enable people to learn the skills that are good to know, to do other jobs. From my point of view programming is something that is interesting and can help in many parts. I suggest to create a page with some recommendations for potential users of Abstract text to make sure they are aware of the changes that can come through with that change for their employees and that they get the knowledge they need. What do you think about that. What is my responsibility as a volunteer who is interested in that project and plans to participate if it is live, until now I read through the most pages here and tried to understand how it works, to make sure that I do not support a increasing unemployment through optimiziation. --Hogü-456 (talk) 19:55, 17 April 2021 (UTC)[]

@Hogü-456: Thank you for this thoughtful comment. In a scenario where Abstract Wikipedia and abstract content are so successful to make a noticeable dent on the market for technical and other translators, it would also create enormous amounts of value for many people by expanding potential markets and by making more knowledge available to more people. In such a world, creating and maintaining (or translating natural language texts to) abstract content will become a new, very valuable skill, that will create new job opportunities. Imagine people who create a chemistry or economics text book in abstract content! How awesome would that be, and the potential that this could unlock!
I actually had an interview with the Tool Box Journal: A Computer Journal For Translation Professionals earlier this year (it is in issue 322, but I cannot find a link to that issue). The interesting part was that, unlike with machine learning tools for translation, the translator that interviewed me really found it interesting, because they, as the translator, could really control the content and the presentation, unlike with machine learned systems. He sounded rather eager and interested in what we are building, and not so much worried about it. I think he also thought that the opportunities - as outlined above - are so big, if all of this works at all. --DVrandecic (WMF) (talk) 21:06, 21 April 2021 (UTC)[]

Recent spam on https://annotation.wmcloud.org/

A few suspicious accounts have been created today, and started creating spam pages on the wiki (see the recent changes log). TomT0m (talk) 18:20, 18 April 2021 (UTC)[]

Thanks for the heads-up. Quiddity (WMF) (talk) 03:53, 21 April 2021 (UTC)[]

Regular inflection

Hello,

in the last weeks I tried to add the inflactions/Beugungen of german nouns that consist of more than one noun as Lexemes in Wikidata. At the overview about the phases I have seen that it is part of the second phase to make it possible to automatically create regular inflections. I created a template as a csv-File that helps me to extract the possible words out of a longer word. In the template I extract all possible combinations with a length beetween 3 and 10 characters. After that I check which combinations match with a download of the so far existing german lexems for nouns with their forms and there I then check if it is going to the end and for these word I extract then the part before the last word with the first character of the last word and match to that additional the existing forms from the Lexemes. For that I have a script in R and a spreadsheet. I want to work at a script for creating the forms of a noun starting with german at the Wikimedia Hackathon. Has someone created something similar or have you thinked about that also for other languages and how the rules are in that language.

At the Wikimedia Remote Hackathon 2021 I suggested a session what is a conversation about how to enable more people to learn coding also with a focus to the Wikimedia Projects and in the phabricator ticket for that I added a link to the mission statement of Wikifunctions after I think this is a project what has this as a goal to make functions accessable to more people and I have several ideas about functions that I think are helpful and I plan to publish them in Wikifunctions if I am able to write them. If you are interested you can attend at the Conversation.--Hogü-456 (talk) 21:26, 12 May 2021 (UTC)[]

@Hogü-456: Thanks! This is exactly the kind of functions I hope to see in Wikifunctions. Alas, it is still a bit too early for this year's Hackathon for us to participate properly, i.e. for Wikifunctions to be a target where such functions land. But any such work will be great preparation for what we want to be able to represent in Wikifunctions. We hope to get there later this year, definitely by next year's Hackathon.
If you were to write your code in Python or JavaScript, then Wikifunctions will be soon ready to accept and execute such functions. I also hope that we will cover R at some point, but currently there is no timeline for that.
If you are looking for another resource that has already implemented inflections for German, there seem to be some code packages that do so. The one that I look to for inspiration is usually Grammatical Framework. The German dictionary is here: http://www.grammaticalframework.org/~john/rgl-browser/#!german/DictGer - you can find more in through their Resource Grammar Library: http://www.grammaticalframework.org/~john/rgl-browser/
One way or the other, that's exciting, and I hope that we will be able to incorporate your work once Wikifunctions has reached the point where we can host these functions. Thank you! --DVrandecic (WMF) (talk) 00:43, 14 May 2021 (UTC)[]

Idea: Fact-id for each fact?

Can each fact statement get its own id? i.e., make a Wiki of Facts along with abstract Wikipedia.

  • Each fact should get its own fact-id, so that people can share the id to support claims made in discussions elsewhere. Similar to Z-id or Q-id of wikidata. This proposal requests fact-id for each statement or facts.
  • It will create a structured facts wiki, in which each page about a topic will list bulleted list of facts, with each facts having its own id. References are added to it to support the claims.
  • It presents Facts directly without hiding it in verbal prose. Cut to the chase!
  • Example 1: Each fact statement of a list like Abstract Wikipedia/Examples/Jupiter will get its own id. Example 2: A page will list facts like w:List of common misconceptions with each fact in it getting its own id.
  • This wiki will become the go-to site to learn, find, link, support and verify facts/statements/claims. And over time, Wiki of Facts will have more reliability and credibility than any other format of knowledge. (elaborated at WikiFacts) -Vis M (talk) 04:54, 27 May 2021 (UTC)[]
@Vis M:I see you have withdrawn the WikiFacts proposal. The Abstract Wikipedia Wiki of Facts would be Wikidata, with each Wikidata statement or claim corresponding to a "fact". There are occasions when an explicit statement ID would be helpful in Wikidata, notably when a different Property is used to represent the same information. However, the Item/Property combination used in Wikidata is aligned to the classical subject—predicate form, which is probably a better starting point for natural-language representations. In particular, it is likely that similar predicates (represented using identical Property IDs) would be rendered in similar language for identifiable classes of subject. Allocating individual subjects (Wikidata Items) to appropriate classes for this purpose is also likely to be achieved within Wikidata, either with an additional Property or within lexical data (according to the factors driving the different linguistic representations).--GrounderUK (talk) 09:17, 19 June 2021 (UTC)[]
Ok, thanks! Vis M (talk) 14:46, 22 June 2021 (UTC)[]

Wikifunctions

What will be the URL for the upcoming Wikifunctions website? When is it expected to be completed? 54nd60x (talk) 12:55, 17 August 2021 (UTC)[]

@54nd60x: The URL will be wikifunctions.org (similar to wikidata.org).
Timeline: Overall, "we'll launch when we're ready". We're hoping to have a version on the mw:Beta Cluster sometime during the next few months, and hoping the "production" version will be ready for initial launch sometime early in the next calendar year. It won't be "feature-complete" at launch, but it will be: stable for adding the initial functions content; locally-usable for writing and running functions; and ready for further development. The design and feature-set will steadily grow and adapt based on planned work and on feedback, over the coming years. More broad details on the development are at Abstract Wikipedia/Phases, and finer details at the links within. I hope that helps! Quiddity (WMF) (talk) 19:51, 18 August 2021 (UTC)[]

Decompilation

I am still working on Spreadsheetsfunctions and try to understand them and write functions to bring them into another programming language. In the last days I have thinked about how far it is possible to get the specific program that is generated for the input I gave through entering functions in a Spreadsheet. So that the Output is generated and printet out in a Cell. I want the program as Byte code or Assembler or in another notation from where it is possible to bring it to other programming languages in a automatic way. Do you think that this is possible to make out of spreadsheet functions a program in another programming language by trying to get the binary code and then try to decompile it. After I dont know much about programming and I dont know if it is possible. Has someone of you experiences about decompiling and how it is possible to get the byte code that is executed when I enter a own written composition of functions in my Spreadsheet program.--Hogü-456 (talk) 18:01, 8 July 2021 (UTC)[]

@Hogü-456: There is a lot of "it depends" in this answer :) If you search the Web for "turn spreadsheet into code" you can find plenty of tools and descriptions helping with that. I have my doubts about these, to be honest. I think, turning them into binary code or assembler would probably be an unnecessarily difficult path.
My suggestion would be to turn each individual formula into a function, and the wire the inputs and outputs of the functions according to the cells together. That's probably what these tools do. But I would try to stay at least at the abstraction level of the spreadsheet, and not dive into the binary.
It will be interesting to see how this will play together with Wikifunctions. I was thinking about using functions from Wikifunctions in a spreadsheet - but not the other way around, using a spreadsheet to implement functions in Wikifunctions. That's an interesting idea, that could potentially open the path for some people to contribute implementations?
I filed a task to keep your idea in the tracker! --DVrandecic (WMF) (talk) 21:13, 3 September 2021 (UTC)[]

Logo of Wikifunctions and License

Hello, when will the logo for Wikifunctions be finalized and then published. There was a voting about the favourite logo and since then I havent heard something about it. Something I am also interested in is to understand what the license of the published functions will be. Do you know what license will be used for the content in Wikifunctions. Have you talked with the legal team of the Wikimedia Foundation what they think about it. Please try to give an answer to this questions after it is now a while since the discussions about that topics.--Hogü-456 (talk) 19:49, 23 August 2021 (UTC)[]

@Hogü-456: We are working on both, sorry for the delay! We had some meetings with the legal team, and we are still working on the possible options regarding the license. That is definitely a conversation or decision (depending on how the discussion with legal progresses) we still owe to the community. This will likely take two or three months before we get to the next public step.
Regarding the logo, we ran into some complications. It is currently moving forward, but will still take a few weeks (but not months, this should be there sooner).
Thanks for inquiring! We'll keep you posted on both as soon as we have updates! --DVrandecic (WMF) (talk) 20:45, 3 September 2021 (UTC)[]
Return to "Abstract Wikipedia/Archive 3" page.