Talk:Abstract Wikipedia/Archive 3
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Logo
I don't know if a potential logo has been discussed, but I propose the Wikipedia Logo with Pictograms replacing the various characters from different languages. I might mock one up if I feel like it. If you have any ideas, or want to inform me there already is a logo, please ping me below. Cheers, WikiMacaroons (talk) 18:00, 26 July 2020 (UTC)
- The logo discussion has not started yet. We're going to have the naming discussion first, and once the name is decided, the logo will follow up. Feel free to start a subpage for the logo, maybe here if you want, to collect ideas. --DVrandecic (WMF) (talk) 01:46, 5 August 2020 (UTC)
- Thanks, DVrandecic (WMF), perhaps I shall :) WikiMacaroons (talk) 21:57, 7 August 2020 (UTC)
- Note that the existing logo of wikimedia Commons nearly has the underlying concept of composition/fusion to generate a result that can be borrowed and imported to other projects (not just Wikipedia, just like the wiki of functions).
- We could derive the logo of the future wiki of functions, from the logo for Wikidata (barcodes) placed at top with arrows pointing to a central point of processing, and then arrows outgoing from it to the set of Wikimedia projects (placed at bottom, represented by the tri-sector ring, reduced to an half circle at bottom. If you look beside at the logo for Meta, the top red segment of ring would be the Wikidata barcode, the two blue segments are kept, the green Earth is replaced by a central green point; the arrows are drawn inside the ring, with the same colors as the segments they are collected to. I would avoid any logo that represents a mathematical concept (such as the lambda letter, unless it is part of the chosen name, or parentheses representing classic maths functions: the functions in the new wiki will not really be mathematical functions, as they are polymorphic, and don't necessarily return a result but could return errors, or a "variant type" to represent actual values or errors returned). verdy_p (talk) 09:36, 29 October 2020 (UTC)
- Note that the formal request is still not done, but a draft page allows you to prepare suggestions. This draft.Logo subpage needs to be updated before it is actually translated: I added a first proposal there. verdy_p (talk) 09:20, 17 November 2020 (UTC)
Some questions and concerns
Depending on the eventual implementation details, Abstract Wikipedia may need support from the WMF Search Platform team. So a couple of us from our team had a conversation with Denny about Abstract Wikipedia, and we had some questions and concerns. Denny suggested sharing them in a more public forum, so here we are! I've added some sub-headings to give it a little more structure for this forum. In no particular order...
Grammar/renderer development vs writing articles, and who will do the work
Each language renderer, in order to be reasonably complete, could be a significant fraction of the work required to just translate or write the "core" articles in that language, and the work can only be done by a much smaller group of people (linguists or grammarians of the language, not just any speaker). With a "medium-sized" language like Guarani (~5M speakers, but only ~30 active users on the Guarani Wikipedia in the last month), it could be hard to find interested people with the technical and linguistic skills necessary. Though, as Denny pointed out, perhaps graduate students in relevant fields would be interested in developing a grammar—though I'm still worried about how large the pool of people with the requisite skills is.
- I share this concern, but I think we need to think of it as a two-edged sword. Maybe people will be more interested in contributing to both grammar and content than to either one alone. I certainly hope that this distinction will become very blurred; our goal is content and our interest in grammar is primarily to support the instantiation of content in a particular natural language (which, you know, is what people actually do all the time). We need to downplay the "technical and linguistic skills" by focusing on what works. People love fixing bad grammar or poor word choices (it's a parent–child thing), so perhaps there are two separate challenges here: how to get a rendering process that produces intelligible content versus one that produces correctly phrased content. Native speakers will certainly have a key role in identifying incorrectly phrased content; they may even be tempted to fix a problem they understand. They won't necessarily be existing editors, however; they may even have been intrigued by the slightly quirky Wikipedia idiolect they've been hearing about! Ultimately, though, their community will need the maturity to allow the more difficult linguistic problems they identify to percolate upwards or outwards to more specialist contributors, and these may be non-native grammarians, programmers, contributors of any stripe.
- What about the initial "intelligible" renderers? We need to explore this as we develop language-neutral content. We will see certain very common content frameworks, which should already be apparent from Wikidata properties. So we will be asking whether there is a general (default) way to express a instance of (P31), for example, how it varies according to the predicate and (somewhat more problematically) the subject. We will also observe how certain Wikidata statements are linguistically subordinate (being implied or assumed). So, <person> is (notable for) <role in activity>, rather than instance of (P31) human (Q5)... To the extent that such observations are somewhat universal, they serve as a useful foundation for each successive renderer: how does the new language follow the rules and exceptions derived for previous languages (specifically for the language-neutral content so far developed; we never need a complete grammar of any language except, perhaps, the language-neutral synthetic language that inevitably emerges as we proceed).
- Who will do the work? Anyone and everyone! Involvement from native speakers would be a pre-requisite for developing any new renderer, but the native speakers will be supported by an enthusiastic band of experienced linguistic problem-solvers, who (will by then) have already contributed to the limited success of several renderers for an increasing quantity of high-quality language-neutral content. --GrounderUK (talk) 12:59, 25 August 2020 (UTC)
- @GrounderUK: "People love fixing bad grammar or poor word choices (it's a parent–child thing), so perhaps there are two separate challenges here: how to get a renderer that produces intelligible content versus one that produces correctly phrased content. Native speakers will certainly have a key role in identifying incorrectly phrased content; they may even be tempted to fix a problem they understand. They won't necessarily be existing editors, however; they may even have been intrigued by the slightly quirky Wikipedia idiolect they've been hearing about!" This is a great notion, and hopefully it will be one the project eventually benefits from. --Chris.Cooley (talk) 00:04, 29 August 2020 (UTC)
- @Chris.Cooley: Thanks, Chris, one can but hope! Denny's crowdsourcing ideas (below) are a promising start. Just for the record, your quoting me prompted me to tweak the original; I've changed "a renderer" to "a rendering process" (in my original but not in your quoting of it).
- @GrounderUK: "People love fixing bad grammar or poor word choices (it's a parent–child thing), so perhaps there are two separate challenges here: how to get a renderer that produces intelligible content versus one that produces correctly phrased content. Native speakers will certainly have a key role in identifying incorrectly phrased content; they may even be tempted to fix a problem they understand. They won't necessarily be existing editors, however; they may even have been intrigued by the slightly quirky Wikipedia idiolect they've been hearing about!" This is a great notion, and hopefully it will be one the project eventually benefits from. --Chris.Cooley (talk) 00:04, 29 August 2020 (UTC)
- @TJones (WMF): One workflow we have been brainstorming was to:
- crowdsource the necessary lexical knowledge, which can be done without particular expertise beyond language knowledge
- crowdsource how sentences for specific simple constructors would look like (from, e.g. bilingual speakers, who are shown simple sentences in another language, e.g. Spanish, and then asked to write it down in Guarani)
- then even non-Guarani speakers could try to actually build renderers, using the language input as test sentences
- now verify the renderers again with Guarani speakers for more examples, and gather feedback
- It would be crucial that the end result would allow Guarani speakers to easily mark up issues (as GrounderUK points out), so that these can be addressed. But this is one possible workflow that would omit the necessity to have deep linguistic and coding language available in each language community, and can spread the workload in a way that could help with filling that gap.
- This workflow would be build on top of Abstract Wikipedia and not require much changes to the core work. Does this sound reasonable or entirely crazy?
- One even crazier idea, but that's well beyond what I hope for, is that we will find out that the Renderers across languages are in many areas rather uniform, and that there are a small number and that we can actually share a lot of the renderer code across languages and that languages are basically defined through a small number of parameters. There are some linguists who believe such things possible. But I don't dare bet on it. --DVrandecic (WMF) (talk) 21:50, 28 August 2020 (UTC)
The "Grammatical Framework" grammatical framework
Denny mentioned Grammatical Framework and I took a look at it. I think it is complex enough to represent most grammatical phenomena, but I don’t see very much that is actually developed. The examples and downloadable samples I see are all small toy systems. It isn’t 100% clear to me that any grammatical framework can actually capture all grammatical phenomena—and with certain kinds of edge cases and variation in dialects, it may be a lost cause—and linguists still argue over the right way to represent phenomena in major languages in major grammatical frameworks. Anyway, it looks like the Grammatical Framework still leaves a lot of grammar development to be done; assuming it’s possible (which I’m willing to assume), it doesn’t seem easy or quick, especially as we get away from major world languages.
- Yes. I keep meaning to take another look but, basically, GF itself is not something I would expect many people to enjoy using for very long (I may become an exception) ...and "very long" certainly seems to be required. I'm usually pretty open-minded when it comes to possible solutions but I've not added GF to my list of things that might work. That's not to say that there can be no place for it in our solution landscape; it's just that we do need to keep focusing on the user experience, especially for the user who knows the under-served language but has no programming experience and no formal training in grammar. I can hardly remember what that feels like (ignoring the fact that my first language is hardly under-served)! Apart from committing to engaging with such stakeholders, it's not clear what more we can usefully do, at this stage, when it comes to evaluating alternative solutions. That said, I am 99.9% sure that we can find a somewhat satisfactory solution for a lot of encyclopedic content in a large number of languages; the principal constraint will always be the availability of willing and linguistically competent contributor communities.
- One thing GF has going for it is that it is intentionally multilingual. As I understand it, our current plan is to begin natural-language generation with English renderers. I'm hoping we'll change our minds about that. Sooner rather than later, in any event, I would hope that renderer development would be trilingual (as a minimum). Some proportion of renderer functions may be monolingual, but I would like to see those limited to the more idiosyncratic aspects of the language. Or, perhaps, if there are good enough reasons to develop a function with only a single language in mind, we should also consider developing "equivalent provision" in our other current languages. What that means in practice is that the test inputs into our monolingual functions must also produce satisfactory results for the other languages, whether by using pre-existing functions or functions developed concurrently (more or less). --GrounderUK (talk) 13:02, 26 August 2020 (UTC)
- I agree, we shouldn't start development based on a single language, and particularly not English. Starting in three languages, ideally across at least two main families, sounds like a good idea. --DVrandecic (WMF) (talk) 21:55, 28 August 2020 (UTC)
- @DVrandecic (WMF): That's very good to hear! Is that an actual change of our collective minds, or just your own second thoughts? P2.7 doesn't seem to have a flexible interpretation. In my reading of P2.8 to P2.10, there also appears to be a dependency on P2.7 being fairly advanced, but maybe that's just my over-interpretation.--GrounderUK (talk) 07:52, 1 September 2020 (UTC)
- It is true that a lot of development would be required for GF, but even more development would be required for any other framework. GF being open source, any development would flow back into the general public, too, so working with GF and its developers is likely the correct solution.
- The GF community is interested in helping out. See https://groups.google.com/g/gf-dev/c/A6lNwZ813b0/m/eEyLYmfmCQAJ , where Aarne Ranta suggests a case study for some area with adequate data, like mathematics. —The preceding unsigned comment was added by Inariksit (talk) 18:44, 28 August 2020
- I am a big fan of reusing as much of GF as possible, but my reference to GF was not to mean that we necessarily have to use it as is, but rather that it shows that such a project is at all possible. We should feel free to either follow GF, or to divert from it, or to use it as an inspiration - but what it does show is that the task we are aiming for has been achieved by others in some form.
- Having said that, I am very happy to see Aarne react positively by the idea! That might be the beginning of a beautiful collaboration. --DVrandecic (WMF) (talk) 21:54, 28 August 2020 (UTC)
Incompatible grammatical frameworks across language documentation
I worry that certain grammatical traditions associated with given languages may make it difficult to work compatibly across languages. A lot of Western European languages have traditionally followed the grammatical model of Latin, even when it doesn’t make sense—though there are of course many grammatical frameworks for the major languages. But it’s easy to imagine that the best grammar for a given medium-sized language was written by an old-fashioned European grammarian, based on a popular grammatical model from the 1970s. Reconciling that with whatever framework that has been developed up until that point may create a mess.
- Speaking as an old-fashioned European grammarian... "a mess" is inevitable! It should be as messy as necessary but no messier (well, not much messier). I'm not sure that there's much "reconciling" involved, however. Given our framing of the problem, I don't see how "our mess" can be anything other than "interlingual" (as Grammatical Framework is). This is why I would prefer not to start with English; the first few languages will (inevitably?) colour and constrain our interlingua. So we need to be very careful, here. To set against that is our existing language-neutral content in Wikidata. Others must judge whether Wikidata is already "too European", but we must take care that we do not begin by constructing a "more English" representation of Wikidata content, or coerce it into a "more interlingual" representation, where the interlingua is linguistically more Eurocentric than global. So, first we must act to counter first-mover advantage and pre-existing bias, which means making things harder for ourselves, initially. At the same time, all language communities can be involved in refining our evolving language-neutral content, which will be multi-lingually labelized (like Wikidata). If some labelized content seems alien in some language, this can be flagged at an early stage (beginning now, for Wikidata). What this means is that all supported languages can already be reconciled, to an extent, with our foundational interlingua (Wikidata), and any extensions we come up with can also be viewed through our multi-lingual labelization. I suppose this is a primitive version of the "intelligible" content I mentioned earlier. When it comes to adding a further language (one that we currently support with labelization in Wikidata), we may hope to find that we are already somewhat reconciled, because linguistic choices have already been made in the labelization and our new target consumers can say what they dislike and what they might prefer; they do not need to consult some dusty grammar tome. In practice, they will already have given their opinions because that is how we will know that they are ready to get started (that is, theirs is a "willing and linguistically competent" community). In the end, though, we have to accept that the previous interlingual consensus will not always work and cannot always be fixed. This is when we fall back on the "interlingual fork" (sounds painful!). That just means adding an alternative language-neutral representation of the same (type of) encyclopedic content. I say "just" even though it might require rather more effort than I imagine (trust me, it will!) because it does seem temptingly straightforward. I say we must resist the temptation, but not stubbornly; a fallback is a tactical withdrawal, not a defeat; it is messy, but not too messy.--GrounderUK (talk) 12:54, 27 August 2020 (UTC)
- Agree with GrounderUK here - I guess that we will have some successes and some dead ends during the implementation of certain languages, and that this is unavoidable. And that might partially stem from the state of the art in linguistic research in some languages.
- On the other side, my understanding is that we won't be aiming for 7000 languages but for 400, and that these 400 languages will in general be better described and have more current research work about them than the other 6600. So I have a certain hope that for most languages that we are interested in we do have more current and modern approaches, that are more respectful of the language itself.
- And finally, the grammarians lens of the 1970s Europeans will probably not matter that much in the end anyway - if we have access to a large enough number of contributors native in the given language. That should be a much more important voice in shaping the renderers of a language than dated grammar books. --DVrandecic (WMF) (talk) 22:01, 28 August 2020 (UTC)
Introducing new features into the framework needed by a newly-added language
In one of Denny's papers (PDF) on Abstract Wikipedia, he discusses (Section “5 Particular challenges”, p.8) how different languages make different distinctions that complicate translation—in particular needing to know whether an uncle is maternal or paternal (in Uzbek), and whether or not a river flows to the ocean (in French). I am concerned about some of the implications of this kind of thing.
Recoding previously known information with new features
One problem I see is that when you add a new language with an unaccounted-for feature, you may have to go back and recode large swathes of the information in the Abstract Wikipedia, both at the fact level, and possibly at some level of the common renderer infrastructure.
Suppose we didn’t know about the Uzbek uncle situation and we add Uzbek. Suddenly, we have to code maternal/paternal lineage on every instance of uncle everywhere. Finding volunteers to do the new uncle encoding seems like it could be difficult. In some cases the info will be unknown and either require research, or it is simply unknowable. In other cases, if it hasn’t been encoded yet, you could get syntactically correct but semantically ill-formed constructions.
- @TJones (WMF): Yes, that is correct, but note that going back and recoding won't actually have effect on the existing text.
- So assume we have an abstract corpus that uses the "uncle" construct, and that renders fine in all languages supported at that time. Now we add Uzbek, and we need to refine the "uncle" construct into either "maternal-uncle" or "paternal-uncle" in order to render appropriately in Uzbek - but both these constructs would be (basically) implemented as "use the previous uncle construct unless Uzbek". So all existing text in all supported languages would continue to be fine.
- Now when we render Uzbek, though, then the corpus need to be retrofitted. But that merely blocks the parts of Uzbek renderings that are dealing with the construct "uncle". It has no impact on other languages. And since Uzbek didn't have any text so far (that's why we are discovering this issue now), it also won't reduce the amount of Uzbek generated text.
- So, yes, you are completely right, but we still have a monotonously growing generated text corpus. And as we backfill the corpus, more and more of the text now becomes also available in Uzbek - but there would be no losses on the way. --DVrandecic (WMF) (talk) 22:09, 28 August 2020 (UTC)
Semantically ill-formed defaults
For example, suppose John is famous enough to be in Abstract Wikipedia. It is known that John’s mother is Mary, and that Mary’s brother is Peter, hence Peter is John’s uncle. However, the connection from John to Peter isn’t specifically encoded yet, and we assume patrilineal links by default. We could then generate a sentence like “John’s paternal uncle Peter (Mary’s brother) gave John $100.” Alternatively, if you try to compute some of these values, you are building an inference engine and a) you don’t want to do that, b) you really don’t want to accidentally do it by building ad hoc rules or bots or whatever, c) it’s a huge undertaking, and d) did I mention that you don’t want to do that?
- Agreed. --DVrandecic (WMF) (talk) 22:10, 28 August 2020 (UTC)
Incompatible cultural "facts"
In the river example, I can imagine further complications because of cultural facts, which may require language-specific facts to be associated with entities. Romanian makes the same distinction as French rivière/fleuve with râu/fluviu. Suppose, for example, that River A has two tributaries, River B and River C. For historical reasons, the French think of all three as separate rivers, while the Romanians consider A and B to be the same river, with C as tributary. It’s more than just overlapping labels—which is complex enough. In French, B is a rivière because it flows into A, which is a fleuve. In Romanian, A and B are the same thing, and hence a fluviu. So, a town on the banks of River B is situated on a rivière in French, and a fluviu in Romanian, even though the languages make the same distinctions, because they carve up the entity space differently.
- Yes, that is an interesting case. Similar situations happen with the usage of articles in names of countries and cities and whether you refer to a region as a country, a state, a nation, etc., which may differ from language to language. For these fun examples I could imagine that we have information in Wikidata connecting the Q ID for A and B to the respective L items. But that is a bit more complex indeed.
- Fortunately, these cultural differences happen particularly on topics that are important for a given language community. Which increases the chance that the given language community has already an article about this topic in its own language, and thus will not depend on Abstract Wikipedia to provide baseline content. That's the nice thing: we are not planning to replace the existing content, but only to fill in the currently existing gaps. And these gaps will, more likely than not, cover topics that will not have these strong linguistic-cultural interplays. --DVrandecic (WMF) (talk) 22:15, 28 August 2020 (UTC)
Having to code information you don't really understand
In both of the uncle and river cases, an English speaker trying to encode information is either going to be creating holes in the information store, or they are going to be asked to specify information they may not understand or have ready access to.
A far-out case that we probably won’t actually have to deal with is still illustrative. The Australian Aboriginal language Guugu Yimithirr doesn’t have relative directions like left and right. Instead, everything is in absolute geographic terms; i.e., “there is an ant on your northwest knee”, or “she is standing to the east-northeast of you.” In addition to requiring all sorts of additional coding of information (“In this image, Fred is standing to the left/south-southwest of Jane”) depending on the implementation details of the encoding and the renderers, it may require changing how information is passed along in various low-level rendering functions. Obviously, it makes sense to make data/annotation pipelines as flexible as possible. (And again, the necessary information may not be known because it isn’t culturally relevant—e.g., in an old photo from the 1928 Olympics, does anyone know which way North is?)
- @TJones (WMF): Yes, indeed, we have to be able to deal with holes. My assumption is that a contributor creates a first draft of an abstract article, their main concern will be the languages they speak. The UI may nudge them to fill in further holes, but it shouldn't require it. And the contributor can save the content now and reap benefit for all languages where the abstract content has all necessary information.
- Now if there is a language that requires some information that is not available in the abstract content, where there is such a hole, then the rendering of this sentence for this language will fail.
- We should have workflows that find all holes for a given language and then contributors can go through those and try to fill them, thereby increasing the amount of content that gets rendered in their languages - something that might be amenable to micro-contributions (or not, depending on the case). But all of this will be a gradual, iterative process. --DVrandecic (WMF) (talk) 22:21, 28 August 2020 (UTC)
Impact on grammar and renderers
Other examples that are more likely but less dramatic are clusivity, evidentiality, and ergativity which are more common. If these features also require agreement in verbs, adjectives, pronouns, etc., the relevant features will have to be passed along to all the right places in the sentence being generated. Some language features seem pretty crazy if you don't know those languages—like Salishan nounlessness and Guarani noun tenses—and may make it necessary to radically rethink the organization of information in statements and how information is transmitted through renderers.
- Yes, that is something where I look to Grammatical Framework for inspiration, as it solved these cases pretty neatly by having an abstract and several concrete grammars and the passing through from one to the other and still allowing the different flows of agreement. --DVrandecic (WMF) (talk) 22:22, 28 August 2020 (UTC)
Grammatical concepts that are similar, but not the same
I’m also concerned about grammatical concepts that are similar across languages but differ in the details. English and French grammatical concepts often map to each other more-or-less, but the details where they disagree cause consistent errors in non-native speakers. For example, French says “Je travaille ici depuis trois ans” in the present tense, while English uses (arguably illogically, but that’s language!) the perfect progressive “I have been working here for three years”. Learners in both directions tend to do direct translations because those particular tenses and aspects usually line up.
Depending on implementation, I can see it being either very complex to represent this kind of thing or the representation needing to be language-specific (which could be a disaster—or at least a big mess). Neither a monolingual English speaker nor a monolingual French speaker will be able to tag the tense and aspect of this information in a way that allows it to be rendered correctly in both languages. Similarly, the subjunctive in Spanish and French do not cover the same use cases, and as an English speaker who has studied both I still have approximately zero chance of reliably encoding either correctly, much less both at the same time—though it’s unclear, unlike uncle, where such information should be encoded. If it’s all in the renderers, maybe it’ll be okay—but it seems that some information will have to be encoded at the statement level.
- The hope would be that the exact tense and mood would indeed not be encoded in the abstract representation at all, but added to it by the individual concrete renderers. So the abstract representation of the example might be something like works_at(1stPSg, here, years(3)), and it would be up to the renderers to either render that using the present or the perfect progressive. The abstract representation would need to abstract from the language-specificities. These high-level abstract representations would probably break down first into lower level concrete representations, such as French clause(1stPSg, work, here, depuis(years(3)), present tense, positive) or English clause(1stPSg, work, here, for(years(3)), perfect progressive, positive), so that we have several layers from the abstract representation slowly working down its way to a string with text in a given language. --DVrandecic (WMF) (talk) 22:30, 28 August 2020 (UTC)
“Untranslatable” concepts and fluent rendering
Another random thought I had concerns “untranslatable” concepts, one of the most famous being Portuguese saudade. I don’t think anything is actually untranslatable, but saudade carries a lot more inherent cultural associations than, say, a fairly concrete concept like Swedish mångata. Which sense/translation of saudade is best to use in English—nostalgia, melancholy, longing, etc.—is not something a random Portuguese speaker is going to be able to encode when they try to represent saudade. On the other hand, if saudade is not in the Abstract Wikipedia lexicon, Portuguese speakers may wonder why; it’s not super common, but it’s not super rare, either—they use it on Portuguese Wikipedia in the article about Facebook, for example.
Another cultural consideration is when to translate/paraphrase a concept and when to link it (or both)—saudade, rickroll, lolcats. Dealing with that seems far off, but still complicated, since the answer is language-specific, and may also depend on whether a link target exists in the language (either in Abstract Wikipedia or in the language’s “regular” Wikipedia).
- Yes! And that's exactly how we deal with it! So the sausade construct might be represented as a single word in Portuguese, but as a paraphrase in other languages. There is no requirement that each language has to have each clause built of components of the same kind. The same thing would allow us to handle some sentence qualifiers as adjectives or adverbs if a language can do that (say if there is a adjective stating that something happened yesterday evening), and use a temporal phrase otherwise. The abstract content can break apart in very different concrete renderers. --DVrandecic (WMF) (talk) 22:35, 28 August 2020 (UTC)
Discourse-level encoding?
I’m also unclear how things are going to be represented at a discourse level. Biographies seem to be more schematic than random concrete nouns or historical events, but even then there are very different types of details to represent about different biographical subjects. Or is every page going to basically be a schema of information that gets instantiated into a language? Will there be templates like BiographyIntro(John Smith, 1/1/1900, Paris, 12/31/1980, Berlin, ...) ? I don’t know whether that is a brilliant idea or a disaster in the making.
- It's probably neither. So, particularly for bio-intros? That's what, IIRC, Italian Wikipedia is already doing, in order to increase the uniformity of their biographies. But let's look at other examples: I expect that for tail entities that will be the default representation, a more or less large template that takes some data from Wikidata and generates a short text. That is similar to the way LSJbot is working for a large number of articles - but we're removing the bus factor from LSJbot and we are putting it into a collaboratively controllable space.
- Now, I would be rather disappointed if things stopped there: so, when we go beyond tail entities, I hope that we will have much more individually crafted abstract contents, where the simple template is replaced by diverse, more specific constructors, and only in case these are not available for rendering in a given language, we fall back to the simple template. And these would indeed need to also represent discourse-level conjunctions such as "however", "in the meantime", "in contrast to", etc. --DVrandecic (WMF) (talk) 22:40, 28 August 2020 (UTC)
Volume, Ontologies, and Learning Curves, Oh My!
Finally, I worry about the volume of information to encode (though Wikipedia has conquered that) and the difficulty of getting the ontology right (which Wikidata has done well, though on a less ambitious scale than Wikilambda requires, I think), and the learning curve for the more complex grammatical representations from editors.
I (and others) like to say that Wikipedia shouldn't work in theory, but in practice it does—so maybe these worries are overblown, or maybe worrying about them is how they get taken care of.
Trey Jones (WMF) (talk) 21:59, 24 August 2020 (UTC)
- I don't think the worries you stated are overblown - these are all well-grounded worries, and for some of them I had specific answers, and for some I am fishing for hope that we will be able to overcome it when we get to it. One advantage is that I don't have to have all answers yet, but that I can rely on the ingenuity of the community to get some of these blockers out of the way, as long as we create an environment and a project that attracts enough good people.
- And this also means that this learning curve you are describing here is one of my main worries. The goal is to design a system that allows contributors who want to do so to dive deep into the necessary linguistic theories and in the necessary computer science background to really dig through the depths of Wikilambda and Abstract Wikipedia - and I expect to rely on these contributors sooner than later. But at the same time it must remain possible to effectively contribute if you don't do that, or else the project will fail. We must provide contribution channels where confirming or correcting lexicographic forms works without the contributor having to fully understand all the other parts, where abstract content can be created and maintained without having to have a degree in computational linguistics. I still aim for a system where, in case of an event (say, a celebrity marries, an author publishes a new book, or a new mayor gets elected) an entirely fresh contributor can figure out in less than five minutes how to actually make the change on Abstract Wikipedia. This will be a major challenge, and not even because I think the individual user experiences for those things will be incredibly hard, but rather because it is unclear how to steer the contributor to the appropriate UX.
- But yes, the potential learning curve is a major obstacle, and we need to address that effectively, or we will fall short of the potential that this project has. --DVrandecic (WMF) (talk) 22:50, 28 August 2020 (UTC)
Comments
@All: @TJones (WMF):
These are all serious issues that need to be addressed, which is why I proposed a more rigorous development part for "producing a broad spectrum of ideas and alternatives from which a program for natural language generation from abstract descriptions can be selected" at https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Tasks#Development_Part_PP2_Suggestion.
With respect to your first three sections, I assumed Denny only referred to Grammatical Framework as a system with similar goals to Abstract Wikipedia. I also assumed that the idea was that only those grammatical phenomena needed to somewhat sensibly express the abstract content in a target language need to be captured, and that hopefully generating those phenomena will require much less depth and sophistication than constructing a grammar.
With respect to the section "Recoding previously known information with new features," the system needs to be typologically responsible from the outset. I think we should be aware of the features of all of the languages we are going to want to eventually support.
In your example, I think coding "maternal/paternal lineage on every instance of uncle everywhere" would be great, but not required. A natural language generation system will not (soon) be perfect.
With respect to the section "Semantically ill-formed defaults": in other words, what should we do when we want to make use of contextual neutralization (e.g., using paternal uncle in a neutral context where the paternal/maternal status of the uncle is unknown) but we cannot ensure the neutral context?[1] I would argue that there are certain forms that we should prefer because they appear less odd when the neutral context is supposedly violated than others (e.g., they in The man was outside, but they were not wearing a coat). There is also an alternative in circumlocution: for example, we could reduce specificity and use relative instead of parental or maternal uncle and hope that any awkward assumptions in the text of a specifically uncle-nephew/niece relationship are limited.
Unfortunately, it does seem that to get a really great system, you need a useful inference engine ...
With respect to the section "Incompatible cultural 'facts'," this is the central issue to me, but I would rely less on the notion of "cultural facts." We are going to make a set of entity distinctions in the chosen set of Abstract Wikipedia articles, but the generated natural language for these articles needs to respect how the supported languages construe conceptual space (speaking in cognitive linguistics terms for a moment).[2] I am wondering if there is a typologically responsible way of ordering these construals (perhaps as a "a lattice-like structure of hierarchically organized, typologically motivated categories"[3]) that could be helpful to this project.
With respect to the section "Having to code information you don't really understand," as above, I think there are sensible ways in which we can handle "holes in the information store." For example, in describing an old photo from the 1928 Olympics, can we set North to an arbitrary direction, and what are the consequences if there is context that implies a different North?
With respect to the section "Discourse-level encoding?," I do hope we can simulate the "art" of writing a Wikipedia article to at least some degree.
- ↑ Haspelmath, Martin. "Against markedness (and what to replace it with)." Journal of linguistics (2006): 39.; Croft, William. Typology and universals, 100-101. Cambridge University Press, 2002.
- ↑ Croft, William, and Esther J. Wood. "Construal operations in linguistics and artificial intelligence." Meaning and cognition: A multidisciplinary approach 2 (2000): 51.
- ↑ Van Gysel, Jens EL, Meagan Vigus, Pavlina Kalm, Sook-kyung Lee, Michael Regan, and William Croft. "Cross-lingual semantic annotation: Reconciling the language-specific and the universal." ACL 2019 (2019): 1.
--Chris.Cooley (talk) 11:45, 25 August 2020 (UTC)
- Thanks Chris, and thanks for starting that page! I think this will indeed be an invaluable resource for the community as we get to these questions. And yes, as you said, I referred to GF as a similar project and as an inspiration and as a demonstration of what is possible, not necessarily as the actual implementation to use. Also, you will find that my answers above don't always fully align with what you said, but I think that they are in general compatible. --DVrandecic (WMF) (talk) 22:56, 28 August 2020 (UTC)
- @DVrandecic (WMF): Thanks, and I apologize for sounding like a broken record with respect to a part PP2! With respect to the above, it does sound like we disagree on some things, but I am sure there will be time to get into them later! --Chris.Cooley (talk) 00:29, 29 August 2020 (UTC)
- Some of these issues are I think avoided by the restricted language domain we are aiming for - encyclopedic content. There should be relatively limited need for constructions in the present or future tenses. No need to worry about the current physical location or orientation of people, etc. If something is unknown then a fall-back construction ("brother of a parent" rather than a specific sort of "uncle") should be fine, if possibly inelegant. We don't need to capture every possible aspect of language, just sufficient to convey the meanings intended. ArthurPSmith (talk) 18:56, 25 August 2020 (UTC)
- Thanks Arthur! Yes, I very much appreciate the pragmatic approach here - and I think a combination of the pragmatic getting things done with the aspirational wanting everything to be perfect will lead to the best tension to get this project catapulted forward! --DVrandecic (WMF) (talk) 22:56, 28 August 2020 (UTC)
- @ArthurPSmith: "the restricted language domain we are aiming for - encyclopedic content" I wish this was so, but I am having trouble thinking of an encyclopedic description of a historical event — for example — as significantly restricted. I could easily see the physical location or orientation of people becoming relevant in such a description. "If something is unknown then a fall-back construction ('brother of a parent' rather than a specific sort of 'uncle') should be fine, if possibly inelegant. We don't need to capture every possible aspect of language, just sufficient to convey the meanings intended." I totally agree, and I think this could be another great fallback alternative. --Chris.Cooley (talk) 00:29, 29 August 2020 (UTC)
Translatable modules
Hi,
This is not exactly about Abstract Wikipedia, but it's quite closely related, so it may interest the people who care about Abstract Wikipedia.
The Wikimedia Foundation's Language team started the Translatable modules initiative. Its goal is to find a good way to localize Lua modules as conveniently as it is done for MediaWiki extensions.
This project is related to task #2 in Abstract Wikipedia/Tasks, "A cross-wiki repository to share templates and modules between the WMF projects". The relationship to Abstract Wikipedia is described in much more detail on the page mw:Translatable modules/Engineering considerations.
Everybody's feedback about this project is welcome. In particular, if you are a developer of templates or Lua (Scribunto) modules, your user experience will definitely be affected sooner or later, so make your voice heard! The current consultation stage will go on until the end of September 2020.
Thank you! --Amir E. Aharoni (WMF) (talk) 12:45, 6 September 2020 (UTC)
- @Aaharoni-WMF: Thanks, that is interesting. I do wonder whether there is more overlap in Phase 1 than your link suggests. Although it is probably correct to say that the internalization of ZObjects will be centralized initially, there was some uncertainty about which multi-lingual labelization and documentation solution should be pursued. Phab:T258953 is now resolved but I don't understand it well enough to see whether it aligns with any of your solution options. As for phase 2, I simply encourage anyone with an interest in language-neutral encyclopedic content to take a look at the link you provided and the associated discussion. Thanks again.--GrounderUK (talk) 19:43, 7 September 2020 (UTC)
Response from the Grammatical Framework community
The Grammatical Framework (GF) community has been following the development of Abstract Wikipedia with great interest. This summary is based on a thread at GF mailing list and my (inariksit) personal additions.
Resources
GF has a Resource Grammar Library (RGL) for 40 or so languages, and 14 of them have large-scale lexical resources and extensions for wide-coverage parsing. The company Digital Grammars (my employer) has been using GF in commercial applications since 2014.
To quote GF's inventor Aarne Ranta on the previously linked thread:
My suggestion would have have a few items:
- that we develop a high-level API for the purpose, as done in many other NLG projects
- that we make a case study on an area or some areas where there is adequate data. For instance from OpenMath
- that we propagate this as a community challenge
- Digital Grammars can sponsor this with some tools, since we have gained experience from some larger-scale NLG projects
Morphological resources from Wiktionary inflection tables
With work of Forsberg and Hulden and Kankainen, it's possible to extract GF resources from Wiktionary inflection tables.
Quoting Kristian Kankainen's message:
Since the Wiktionary is a popular place for inflection tables, these could be used for boot-strapping GF resources for those languages. Moreover, but not related to GF nor Abstract Wikipedia, the master's thesis generates also FST code and integrates the language into the Giella platform which provides an automatically derived simple spell-checker for the language contained in the inflection tables. Coupling or "boot-strapping" the GF development using available data on Wiktionary could be seen as a nice touch and would maybe be seen as a positive inter-coupling of different Wikimedia projects.
Division of labour
Personally, I would love to spend the next couple of years reading grammar books and encoding basic morphological and syntactic structures of languages like Guarani or Greenlandic into the GF RGL. With those in place, a much wider audience can write application grammars, using the RGL via a high-level API.
Of course, for this to be a viable solution, more people than just me need to join in. I believe that if the GF people know that their grammars will be used, the motivation to write them is much higher. To kickstart the resource and API creation, we could make Abstract Wikipedia as a special theme of the next GF summer school, whenever that is organised (live or virtually).
Addressing some concerns from this talk page
Some of the concerns on the talk page are definitely valid.
- It is going to take a lot of time. Developing the GF Resource Grammar Library has taken 20 calendar years and (at least) 20 person years. I think everyone who has a say in the choice of renderer implementation should get familiar with the field---check out other grammar formalisms, like HPSG, you'll see similar coverage and timelines to GF[1].
- The "Uzbek uncle" situation happens often with GF grammars when adding a new language or new concepts. Since this happens often, we are prepared for it. There are constructions in the GF language and module system that make dealing with this manageable.
- "Incompatible cultural facts" is a minefield of its own, far beyond the scope of NLG. I personally think we should start with a case study for a limited domain.
On the other hand, worrying about things like ergativity or when to use subjunctive tells me that the commenters haven't understood just how abstract an abstract syntax can be. To illustrate this, let me quote the GF Best Practices document on page 9:
Linguistic knowledge. Even the most trivial natural language grammars involve expert linguistic knowledge. In the current example, we have, for instance, word inflection and gender agreement shown in French: le bar est ouvert (“the bar is open”, masculine) vs. la gare est ouverte (“the station is open”, feminine). As Step 3 in Figure 3 shows, the change of the noun (bar to gare) causes an automatic change of the definite article (le to la) and the adjective (ouvert to ouverte). Yet there is no place in the grammar code (Figure 2) that says anything about gender or agreement, and no occurrence of the words la, le, ouverte! The reason is that such linguistic details are inherited from a library, the GF Resource Grammar Library (RGL). The RGL guarantees that application programmers can write their grammars on a high level of abstraction, and with a confidence of getting the linguistic details automatically right.
Language differences. The RGL takes care of the rendering of linguistic structures in different languages. [--] The renderings are different in different languages, so that e.g. the French definition of the constant the_Det produces a word whose form depends on the noun, whereas Finnish produces no article word at all. These variations, which are determined by the grammar of each language, are automatically created by the RGL. However, the example also shows another kind of variation: English and French use adjectives to express “open” and “closed”, whereas Finnish uses adverbs. This variation is chosen by the grammarian, by picking different RGL types and categories for the same abstract syntax concepts.
Obviously the GF RGL is far from covering all possible things people might want to say in a wikipedia article. But an incomplete tool that covers the most common use cases, or covers a single domain well, is still very useful.
References
Non-European and underrepresented languages
Regarding discussions such as https://lists.wikimedia.org/pipermail/wikimedia-l/2020-August/095399.html, I'm happy to see that you are interested in underrepresented languages. The GF community has members in South Africa, Uganda and Kenya, doing or having previously done work on Bantu languages. At the moment (September 2020), there is ongoing development in Zulu, Xhosa and Northern Sotho.
This grammar work has been used in a healthcare application, and you can find a link to a paper describing the application in this message.
If any of these sounds interesting to you, we can start a direct dialogue with the people involved.
Concluding words
Whatever system Abstract Wikipedia will choose, that will follow the evolutionary path of GF (and no doubt other similar systems), so it's better to learn from that regardless of whether GF is chosen or not. We are willing to help, whether it's actual GF programming or sharing experiences---what we have tried, what has worked, what hasn't.
On behalf of the GF community,
inariksit (talk) 08:19, 7 September 2020 (UTC)
Responses
I second what inariksit says about the interest from the GF community - if GF were to be used by AW, it would give a great extra motivation for writing resource grammars, and it would also benefit the GF community by giving the opportunity to test and find remaining bugs in the grammars for smaller languages.
Skarpsill (talk) 08:34, 7 September 2020 (UTC)
- Thanks for this summary! Nemo 09:10, 13 September 2020 (UTC)
@Inariksit:, @Skarpsill: - thank you for your message, and thank you for reaching out. I am very happy to see the interest and the willingness to cooperate from the Grammatical Framework community. In developing the project, I have read the GF book at least three times (I wish I was exaggerating), and have taken inspiration in how GF has solved a problem many times when I got stuck. In fact, the whole idea that Abstract Wikipedia can be built on top of a functional library being collected in the wiki of functions can be traced back to GF being built as a functional language itself.
I would love for us to find ways to cooperate. I think it would be a missed opportunity not to use learn or even directly use the RGLs.
I keep this answer short, and just want to check a few concrete points:
- when Aarne mentioned to "develop a high-level API for the purpose, as done in many other NLG projects", what kind of API is he thinking of? An API to GF, or an abstract grammar for encyclopaedic knowledge?
- whereas math would be a great early domain, given that you already have experience in the medical domain, and several people have raised the importance of medical knowledge for closing gaps in Wikipedia, could that be an interesting early focus domain?
- regarding our timeline, we plan to work on the wiki of functions in 2021, and start working on the NLG functionalities in late 2021 and throughout 2022. Given that timeline, what engagement would make sense from your side?
I plan to come back to this and answer a few more points, but I have been sitting on this for too long already. Thank you for reaching out, I am very excited! --DVrandecic (WMF) (talk) 00:44, 26 September 2020 (UTC)
- @DVrandecic (WMF): Thanks for your reply! I'm not quite sure what Aarne means with the high-level API: an application grammar in GF, or some kind of API outside GF that builds GF trees from some structured data. If it's the former, it would just look like any other GF application grammar, if the latter, it could look something like this toy example in my blog post (ignore that the black box says "nobody understands this code"): the non-GF users would interact with the GF trees on some higher level, like in that picture, first choosing the dish pizza and then choosing pizza toppings, generates the GF tree for "Your pizza has (the chosen toppings)".
- The timeline starting in late 2021 is ideal for me personally. I think that medical domain is a good domain as well, but I don't know enough to discuss details. I hope that our South African community gets interested (initial interest is there, as confirmed by the thread on GF mailing list).
- I would love to have an AW-themed GF summer school in 2022. It would be a great occasion to introduce GF people to AW, and AW people to GF, in a 2-week intensive course. If you (as in the whole AW team) think this is a good idea, then it would be appropriate to start organising it already in early 2021. If we want to target African languages, we could e.g. try to organise it in South Africa. I know this may seem premature to bring it up now, but if we want to do something like this in a larger scale, it's good to start early.
- Btw, I joined the IRC channel #wikipedia-abstract, so we can talk more there if you want. --inariksit (talk) 17:42, 14 October 2020 (UTC)
Might deep learning-based NLP be more practical?
First of all, I'd like to state that Abstract Wikipedia is a very good idea. I applaud Denny and everyone else who has worked on it.
This is kind of a vague open-ended question, but I didn't see it discussed so I'll write it anyway. The current proposal for Wikilambda is heavily based on a generative-grammar type view of linguistics; you formulate a thought in some formal tree-based syntax, and then explicitly programmed transformational rules are applied to convert the output to a given natural language. I was wondering whether it would make any sense to make use of connectionist models instead of / in addition to explicitly programmed grammatical rules. Deep learning based approaches (most impressively, Transformer models like GPT-3) have been steadily improving over the course of the last few years, vindicating connectionism at least in the context of NLP. It seems like having a machine learning model generate output text would be less precise than the framework proposed here, but it would also drastically reduce the amount of human labor needed to program lots and lots of translational rules.
A good analogy here would be Apertium vs. Google Translate/DeepL. Apertium, as far as I understand it, consists of a large number of rules programmed manually by humans for translating between a given pair of languages. Google Translate and DeepL are just neural networks trained on a huge corpus of input text. Apertium requires much more human labor to maintain, and its output is not as good as its ML-based competitors. On the other hand, Apertium is much more "explainable". If you want to figure out why a translation turned out the way it did (for example, to fix it), you can find the rules that caused it and correct them. Neural networks are famously messy and Transformer models are basically impossible to explain.
Perhaps it would be possible to combine the two approaches in some way. I'm sure there's a lot more that could be said here, but I'm not an expert on NLP so I'll leave it at that. PiRSquared17 (talk) 23:46, 25 August 2020 (UTC)
- @PiRSquared17: thank you for this comment, and it is a question that I get asked a lot.
- First, yes, the success of ML systems in the last decade have been astonishing, and I am amazed by how much the field has developed. I had one prototype that was built around an ML system, but even more than the Abstract Wikipedia proposal it exhibited a Matthew effect - languages that were already best represented benefitted from that architecture the most, whereas the languages that needed most help would get least of it.
- Another issue is, as you point out, that within a Wikimedia project I would expect the ability for contributors to go in and fix errors. This is considerably easier with the symbolic approach chosen for Abstract Wikipedia than with an ML-based approach.
- Having said that, there are certain areas I will rely on ML-based solutions in order to get them working. This includes an improved UX to create content, and this includes analysis of the existing corpora as well as of the generated corpora. There is even the possibility of using an ML-based system to do the surface cleanup of the text to make it more fluent - basically, to have an ML-based system do copy on top of the symbolically generated text, which could have the potential to reduce the complexity of the renderers considerably and yet get good fluency - but all of these are ideas.
- In fact, I am planning to write a page here where I outline possible ML tasks in more detail.
- @DVrandecic (WMF): Professor Reiter ventured an idea or two on this topic a couple of weeks ago: "...it may be possible to use GPT3 as an authoring assistant (eg, for developers who write NLG rules and templates), for example suggesting alternative wordings for NLG narratives. This seems a lot more plausible to me than using GPT3 for end-to-end NLG."--GrounderUK (talk) 18:22, 31 August 2020 (UTC)
- @GrounderUK: With respect to GPT-3, I am personally more interested in things like https://arxiv.org/abs/1909.01066 for the (cross-linguistic) purposes of Abstract Wikipedia. You might be able to imagine strengthening Wikidata and/or assisting abstract content editing. --Chris.Cooley (talk) 22:33, 31 August 2020 (UTC)
- @Chris.Cooley: I can certainly imagine such a thing. However it happens, the feedback into WikidataPlusPlus is kinda crucial. I think I mentioned the "language-neutral synthetic language that inevitably emerges" somewhere (referring to Wikidata++) as being the only one we might have a complete grammar for. Translations (or other NL-type renderings) into that interlingua from many Wikipedias could certainly generate an interesting pipeline of putative data. Which brings us back to #Distillation of existing content (2nd paragraph)...--GrounderUK (talk) 23:54, 31 August 2020 (UTC)
- @GrounderUK: With respect to GPT-3, I am personally more interested in things like https://arxiv.org/abs/1909.01066 for the (cross-linguistic) purposes of Abstract Wikipedia. You might be able to imagine strengthening Wikidata and/or assisting abstract content editing. --Chris.Cooley (talk) 22:33, 31 August 2020 (UTC)
- @DVrandecic (WMF): May we go ahead and create such an ML page, if you haven't already? James Salsman (talk) 19:00, 16 September 2020 (UTC)
- @DVrandecic (WMF): Professor Reiter ventured an idea or two on this topic a couple of weeks ago: "...it may be possible to use GPT3 as an authoring assistant (eg, for developers who write NLG rules and templates), for example suggesting alternative wordings for NLG narratives. This seems a lot more plausible to me than using GPT3 for end-to-end NLG."--GrounderUK (talk) 18:22, 31 August 2020 (UTC)
- Now it could be that ML and AI will develop with such a speed to make Abstract Wikipedia superfluous. But to be honest (and that's just my point of view), given the development of the field in the last ten years, I don't see that moment be considerably closer than it was five years ago (but also, I know a number of teams working on a more ML-based solution to this problem, and I honestly wish them success). So personally I think there is a window of opportunity for Abstract Wikipedia to help billions of people for quite a few years, and to allow many more to contribute to the world's knowledge sooner. I think that's worth it.
- Amusingly, if Abstract Wikipedia succeeds, I think we'll actually accelerate the moment where we make it's existent unnecessary. --DVrandecic (WMF) (talk) 03:55, 29 August 2020 (UTC)
- I will oppose introducing any deep learning technique as it is 1. difficult to develop 2. difficult to train 3. difficult to generalize 4. difficult to maintain.--GZWDer (talk) 08:06, 29 August 2020 (UTC)
- It could be helpful to use ML-based techniques for auxiliary features, such as parsing natural language into potential abstract contents for users to choose/modify, but using such techniques for rendered text might not be a good idea, even if it's just used on top of symbolically generated text. For encyclopedic content, accuracy/preciseness is much more important than naturalness/fluency. As suggested above, "brother of a parent" could be an acceptable fallback solution for the "uncle" problem, even it doesn't sound natural in a sentence. While a ML-based system will make sentences more fluent, it could potentially turn a true statement into a false one, which would be unacceptable. Actually, many concerns raised above, including the "uncle" problem, could turn out to be advantages of current rule-based approach over ML-based approach. Although those are challenging issues we need to address, it would be more challenging for ML/AI to resolve those issues. --Stevenliuyi (talk) 23:41, 12 September 2020 (UTC)
Hello everyone. Somewhat late to the party, but I recently proposed a machine learning data curation that is germane to this discussion I think. I propose using machine learning to re-index Wikipedias to paraphrases. FrameNet and WordNet have been working on the semantics problem that Abstract is trying to address. The problem they face is the daunting task of manually building out their semantic nuggets. Not to mention the specter over every decision's shoulder, opinion. With research telling us that sentences follow Zipf's Law paraphrases become the sentence equivalent to words' synonyms. It also means our graph is completely human readable, as we retain context for each sentence member of the paraphrase node. You could actually read a Wikipedia article directly from the graph. This way, we can verify the machine learning algorithms. Just like a thesaurus lists all of the words with the same meaning, our graph would illustrate all of the concepts with the same meaning, and the historic pathways used to got to related concepts. Further, by mapping Wikidata meta to the paraphrase nodes, we would have a basic knowledge representation. The graph is directed, which will also assist function logic for translating content. Extending this curation to the larger Internet brings new capabilities into play. I came at this from a decision support for work management perspective. My goal was to help identify risk and opportunities in our work. I realized that I needed to use communication information to get to predictive models like Polya's Urn with Innovation Triggering. I realized along the way that this data curation is foundational, and must be under the purview of a collective rather than some for profit enterprise. So here I am. I look forward to discussing this concept while learning more about Abstract. Please excuse my ignorance as I get up to speed on this wonderfully unique culture.--DougClark55 (talk) 21:33, 11 January 2021 (UTC)
Parsing Word2Vec models and generally
Naming the wiki of functions
We've started the next steps of the process for selecting the name for the "wiki of functions" (currently known as Wikilambda), at Abstract Wikipedia/Wiki of functions naming contest. We have more than 130 proposals already in, which is far more than we expected. Thank you for that!
On the talk page some of you have raised the issue that this doesn’t allow for an effective voting, because most voters will not go through 130+ proposals. We've adjusted the process based on the discussion. We're trying an early voting stage, to hopefully help future voters by emphasizing the best candidates.
If you'd like to help highlight the best options, please start (manually) adding your Support votes to the specific proposals, within the "Voting" sub-sections, using:
* {{support}} ~~~~
Next week we'll split the list into 2, emphasizing the top ~20+ or so, and continue wider announcements for participation, along with hopefully enabling the voting-button gadget for better accessibility. Cheers, Quiddity (WMF) (talk) 00:18, 23 September 2020 (UTC)
Merging the Wikipedia Kids project with this
I also proposed a Wikipedia Kids project, and somebody said that I should merge it with Abstract Wikipedia. Is this possible?Eshaan011 (talk) 16:50, 1 October 2020 (UTC)
- Hi, @Eshaan011: Briefly: Not at this time. In more detail: That's an even bigger goal than our current epic goal, and whilst it would theoretically become feasible to have multiple-levels of reading-difficulty (albeit still very complicated, both technically and socially) once the primary goal is fully implemented and working successfully, it's not something we can commit to, so we cannot merge your proposal here. However, I have updated the list of related proposals at Childrens' Wikipedia to include your proposal and some other older proposals that were missing, so you may wish to read those, and potentially merge your proposal into one of the others. I hope that helps. Quiddity (WMF) (talk) 20:06, 1 October 2020 (UTC)
- @Eshaan011 and Quiddity (WMF): Ultimately, I agree with Quiddity but, in the real world, not so much. We have already discussed elsewhere the possibility of different levels of content and respecting the editorial policies of different communities. Goals such as these imply that language-neutral content will be filtered or concealed by default in some contexts. Furthermore, the expressive power of different languages will not always be equally implemented, so we will need to be able to filter out (or gracefully fail to render) certain content in certain languages. Add to this the fact that language-neutral content will be evolving over a very long time, so we might expect to begin with (more) basic facts (more) simply expressed in a limited number of languages. To what extent the community will be focusing on extending the provision of less basic facts in currently supported languages rather than basic facts in more languages is an open question. It does not seem to me to be unreasonable, however, to expect that we might begin to deliver some of the suggested levelled content as we go along, rather than after we have substantially completed any part of our primary goal. (Quite why we would take the trouble to find a less simple way of expressing existing language-neutral content is unclear; perhaps it would be a result of adopting vocabulary specific to the subject area (jargon). In any event, I would expect some editorial guidelines to emerge here.)--GrounderUK (talk) 15:31, 4 November 2020 (UTC)
Why defining input or output parameters for pure "functions" ?
Some functions may also take as input *references* to other functions that will be called internally (as callbacks that may be used to return data, or for debugging, tracing... or for hinting the process that will compute the final result: imagine a "sort()" function taking a custom "comparator" function as one of its inputs).
Note as well that formally, functions are not imperative about the role assigned to input and output parameters: imagine the case of inversible inferences, like "sum(2,3,:x)" returning "{x=5}" and "sum(:x,3,5)" returning "{x=2}", like in Smalltalk, Prolog and other IA languages: no need to define multiple functions if we develop the concept of "pure" functions *without* side effects.
So what is important is the type of all input and ouput parameters together, without the need to restrict one of them as an output: binding parameters to values is to assign them the role of input. The function will return one or more solutions, or could return another function representing the set of solutions.
And to handle errors/exception, we need an additional parameter (usually defined as an input, but we can make inferences as well on errors. Errors/exceptions are just another datatype.
What will be significant is just the the type signature of all parameters (input or output, including error types) and a way to make type inference to select a suitable implementation that can reduce the set of solutions.
A question about the logo voting
Question: I have a question about the logo voting. Is the page Abstract Wikipedia/Logo about Abstract Wikipedia or wiki of functions? (Or about both?) --Atmark-chan <T/C> 06:56, 3 December 2020 (UTC)
- The page is a draft, for early propositions; for now there will be no specific logo for Abstract Wikipedia itself, which will not start before one year (and that will still not be a new wiki, but an overall project across Wikimedia projects, to integrate Wikifunctions, Wikidata and other tools into existing Wikipedia projects)
- The content of that page will be updated in the coming discussions about the final organization of the vote. The draft is just there to allow prople to prepare and reference their proposals (I just made the first proposal, others are welcome, the structure for putting multiple proposals and organize them is not really decided. For this reason, that page is a draft to complete (and note that there's no emergency for now for the logo, even the early wiki for Wikifunctions will be in draft for many months and it is likely to change a lot in the coming year; we have plenty of time to decide a logo; note also that the current alpha test wikis use an early custom logo, seen in the Facebook page of the project, but very poor; it was based on some older presentations of the project before its approval by the WMF).
- Wikifunctions however is not just for Wikipedia and will certainly have a use on all other non-wikipedia projects or even projects outside Wikimedia, including non-wiki sites).
- The proposals to submit soon will be for the new wiki built very soon for Wikifunctions only (note: the name was voted, but is still not official, it will be announced in a couple of week, we are waiting for a formal decision by the WMF after legal review; this is independant of the logo). I don't think we need to include the project name in the logo (and during the evaluation of names, it was decided that it should be easily translatable, so likely the name will be translated: translating the project name inside the logo can be done separately in an easier way if the logo does not include this name, which can be composed later if we want it, or could be displayed dynamically on the wiki, in plain HTML, without modifying its logo; a tool could also automatically precompose several variants of the logo with different rendered names, and then display the composed logo image according to the user's language, if Mediawiki supports that).
- How all other projects will be coordinated is still something not decided, and only Wikifunctions has been approved as a new wiki, plus the long-term project for the Abstract Wikipedia (which will require coordination with each Wikipedia edition and other development for the integration of Wikifunctions and Wikidata.
- In one year or so, it will be time to discuss about Abstract Wikipedia and the logo (if one is needed) should then focus Wikipedia.
- Note also that Wikifunctions will work with a separate extension for MediaWiki, which will have its own name too (WikiLambda) but it will be generic and not necessarily tied to Wikifunctions: this extension may be later integrable to other wikis, including outside Wikimedia: it will be a generic plugin for MediaWiki. But we are too far from this goal, and it will require some interests from other external wikis (they already use several other extensions, including notably SemanticMediawiki, which may or may not also be used later along with Wikifunctions, possibly for building AbstractWikipedia as well). verdy_p (talk) 10:40, 3 December 2020 (UTC)
- @Verdy p: Oh, I see. Thank you! Atmark-chan <T/C> 08:05, 4 December 2020 (UTC)
- I'm not convinced that there will be no new wiki for our language-neutral Wikipedia but I agree that none is yet planned! --GrounderUK (talk) 12:25, 4 December 2020 (UTC)
Confusion on Wikipedia
Apparently, there's a big confusion throughout multiple languages of Wikipedia about the real scope of Wikifunctions and Abstract Wikipedia. I tried to fix the Wikifunctions Wikipedia article and separate the items into Abstract Wikipedia (Q96807071) and Wikifunctions (Q104587954), but the articles in other languages need to be updated to better reflect how both the projects work. Luk3 (talk) 19:19, 30 December 2020 (UTC)
- @Luk3: well done, i've updated the page in italian --Sinucep (talk) 19:49, 2 January 2021 (UTC)
- Having this project running under the name Abstract Wikipedia is a needless way to engage in conflict with Wikipedians. Why not rename the project here into Wikifunctions now that we have a name? ChristianKl ❪✉❫ 01:25, 16 January 2021 (UTC)
- @ChristianKl: This topic explains exactly that the two projects are different in nature. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 12:22, 16 January 2021 (UTC)
Inter-language links
Greetings. The existence of two separate wikidata items, Wikifunctions (Q104587954) and Abstract Wikipedia (Q96807071), has broken the inter-language links. E.g., en:Wikifunctions lists four languages (English, Català, Italiano, Vèneto) while pt:Abstract Wikipedia lists nine languages (excluding English). Any ideas how to fix this? Maybe split the redirect en:Abstract Wikipedia into a new article? Fgnievinski (talk) 21:44, 25 February 2021 (UTC)
- That would make sense. These are different projects and should be kept apart, but ideally, there should always be an article on both. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 21:56, 25 February 2021 (UTC)
- The question is if the different Wikipediae should agree on one article title while both concepts are explained in the same article. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 21:57, 25 February 2021 (UTC)
- @Fgnievinski: Thanks for the note. This is an example of (what is often referred to as) "The Bonnie and Clyde problem", as detailed at d:Help:Handling sitelinks overlapping multiple items. I believe one solution here, would be to link the Wikidata item for Abstract Wikipedia (Q96807071) to the redirect at en:Abstract Wikipedia, which will then still work as desired if/when someone creates an Enwiki article about that distinct topic in the future. Quiddity (WMF) (talk) 01:05, 27 February 2021 (UTC)
- @Quiddity (WMF): I've implemented as suggested, thank you: [1] but the inter-language links still do not seem to appear; see, e.g.: [2] Fgnievinski (talk) 05:37, 1 March 2021 (UTC)
- @Fgnievinski: I believe that is the way it is meant to be done. I.e. It's correct, but not perfect (due to technical limitations), and this occurs whenever there isn't a 1-to-1 relation between articles on the Wikipedias. Quiddity (WMF) (talk) 23:00, 1 March 2021 (UTC)
- @Quiddity (WMF): I think it's good now: I've created corresponding redirects in non-English Wikipedias; now all inter-language links seem to be recognized. Fgnievinski (talk) 04:37, 2 March 2021 (UTC)
- @Fgnievinski: I believe that is the way it is meant to be done. I.e. It's correct, but not perfect (due to technical limitations), and this occurs whenever there isn't a 1-to-1 relation between articles on the Wikipedias. Quiddity (WMF) (talk) 23:00, 1 March 2021 (UTC)
- @Quiddity (WMF): I've implemented as suggested, thank you: [1] but the inter-language links still do not seem to appear; see, e.g.: [2] Fgnievinski (talk) 05:37, 1 March 2021 (UTC)
Logo for Wikifunctions wiki
Wikifunctions needs a logo. Please help us to discuss the overall goals of the logo, to propose logo design ideas, and to give feedback on other designs. More info on the sub-page, plus details and ideas on the talkpage. Thank you! Quiddity (WMF) (talk) 23:01, 14 January 2021 (UTC)
- The logo should be inviting to a broad public of users. It shouldn't produce a "this project is not for me" reaction in users who don't see themselves as technical.
- Maybe a cute mascot could provide for a good logo. ChristianKl ❪✉❫ 00:46, 16 January 2021 (UTC)
There's no consensus in Wikidata that Abstract Wikipedia is an extension of it
@DVrandecic (WMF) and Verdy p: The project description currently says: Abstract Wikipedia is an extension of Wikidata. As far as I'm concerned what's an exntension of Wikidata and what isn't is up to the Wikidata community. I thus removed the sentence. It was reverted. Do we need an explicit RfC on Wikidata to condem Abstract Wikipedia for overstepping boundaries to get that removed? ChristianKl ❪✉❫ 13:00, 16 January 2021 (UTC)
- @ChristianKl: Your edit was reverted because you seem to have used a weeks-old version of the page for your edit, and thus have yourself reverted all the recent edits for no reason at all. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 13:07, 16 January 2021 (UTC)
- Actually this edit was based on a much older version, than just one ago, it used the old version from 16 October, 3 months ago, and the Wikifunctions name was still not voted ! Other details have changed since then, there were some new announcements, changes in the working team, changes for some links (including translated pages). What you did also dropped all edits by the WMF team made since 3 months up to yesterday. verdy_p (talk) 17:45, 16 January 2021 (UTC)
@ChristianKl: I agree. I have qualified the original statement, quoting the relevant section of the Abstract Wikipedia plan. Hope it's okay now.--GrounderUK (talk) 15:02, 16 January 2021 (UTC)
- I revisited your addition: note that it will break an existing translted paragraph, so to ease the work, I separated the reference, which also included a link not working properly with translations: beware notably with anchors (avoid using section headings directly, they vary across languages!) I used the stable (untranslated) anchor and located the target of the link more precisely. Your statement was a bit too elusive, as there's no intent for now to modify Wikidata before 2022 to add contents there. It's clear we'll have new special pages, but not necessarily new content pages in Wikidata.
- The "abstract content" needed for Abstract Wikipedia is also different from what will be stored initially in Wikifunctions (which will be independant of any reference to Wikidata elements but will contain the implementations of functions, needed for transforming the "abtract content" into translated content integrable to any Wikipedia (or other wikis, multilingual or not...). The first thing will be to integrate Wikifunctions (still useful for reusable modules and templates), but this is still independant of Abstract Wikipedia still not developed this year: Wikifunctions will just be one of the tools usable to create LATER the "Abstract Wikipedia" and integrate it (don't expect a integration into Wikidata and Wikipedias before the end of 2022; I think it will be in 2023 or even later, after lot of experiments in just very few wikipedias; the integration in large wikipedias is very unlikely before about 5 years, probably not before 2030 for the English Wikipedia: the first goal will be to support existing small Wikipedias, including those in Philippines languages which seem to grow large and fast, but with lot of bot-generated articles.). verdy_p (talk) 18:49, 16 January 2021 (UTC)
- Thanks. I'm not sure what you found "elusive"; my statement was a direct quotation from the linked source, which asserts that Wikidata community agreement is necessary for any eventual changes to Wikidata and that some alternative will be adopted if the community does not agree. Perhaps the plan itself is elusive, but additional speculation about timescales and the nature of the alternatives seems to obscure the point; it is also absent from the referenced source (and makes additional work for translators). You make a couple of points above that I don't agree with. It is not clear to me that new special pages will be required in Wikidata but, if they are, they will require agreement from the Wikidata comunity. I also believe that early examples of functions in Wikifunctions will (and should) be dependent on "reference to Wikidata elements", in the same way that some Wikipedia infoboxes are dependent on such elements.--GrounderUK (talk) 20:39, 16 January 2021 (UTC)
- There's no "speculation " about timescale in what is to translate in the article. All this is already in the development plan. I only speculate a bit at end of my reply just before your reaction here (it seems clear that about 5-10 years will pass before we see a change in English Wikipedia to integrate the Abstract Wikipedia, simply because it does not really need it: the Abstract Wikipedia is clearly not the goal of this project; on the opposite, Wikifunctions will have its way in English Wikipedia quite soon, just because it will also facilitate the exchanges with Commons, also needing a lot of the same functions). I clearly make a difference between Wikifunctions (short term with a limited goal), and Abstract Wikipedia (no clear term, but in fact long term umbrella project needing much more than just Wikifunctions). verdy_p (talk) 03:25, 17 January 2021 (UTC)
- If the timescale is in the development plan, please indicate where this is. The timeline on the main page only states that Abstract Wikipedia development "proper" will start in 2022 (elsewhere, "in roughly 2022"), not that "integration" with Wikidata will be achieved (if agreed) by July 2022 (or whenever "the second year of the project" is deemed to end).--GrounderUK (talk) 11:25, 17 January 2021 (UTC)
- There's no "speculation " about timescale in what is to translate in the article. All this is already in the development plan. I only speculate a bit at end of my reply just before your reaction here (it seems clear that about 5-10 years will pass before we see a change in English Wikipedia to integrate the Abstract Wikipedia, simply because it does not really need it: the Abstract Wikipedia is clearly not the goal of this project; on the opposite, Wikifunctions will have its way in English Wikipedia quite soon, just because it will also facilitate the exchanges with Commons, also needing a lot of the same functions). I clearly make a difference between Wikifunctions (short term with a limited goal), and Abstract Wikipedia (no clear term, but in fact long term umbrella project needing much more than just Wikifunctions). verdy_p (talk) 03:25, 17 January 2021 (UTC)
- Thanks. I'm not sure what you found "elusive"; my statement was a direct quotation from the linked source, which asserts that Wikidata community agreement is necessary for any eventual changes to Wikidata and that some alternative will be adopted if the community does not agree. Perhaps the plan itself is elusive, but additional speculation about timescales and the nature of the alternatives seems to obscure the point; it is also absent from the referenced source (and makes additional work for translators). You make a couple of points above that I don't agree with. It is not clear to me that new special pages will be required in Wikidata but, if they are, they will require agreement from the Wikidata comunity. I also believe that early examples of functions in Wikifunctions will (and should) be dependent on "reference to Wikidata elements", in the same way that some Wikipedia infoboxes are dependent on such elements.--GrounderUK (talk) 20:39, 16 January 2021 (UTC)
- This objection can't be serious? The Wikidata community has no mandate to decide whether another WMF project, the WMF, or even a non-WMF site, does some commentary stating a project is an extension of Wikidata. The WMF announcement, Abstract Wikipedia/July 2020 announcement, is quite clear on the 'sourcing' for this sentence. In particularly, the end of paragraph 2 and all of paragraph 3. ProcrastinatingReader (talk) 11:59, 19 January 2021 (UTC)
I am trying to understand "Abstract Wikipedia" (please help)
Dear @Quiddity (WMF): and others, after not having had much time for this project, I am now trying to catch up with the developments and understand for myself what "Abstract Wikipedia" is all about.
In my current understanding:
- Wikipedia is an encyclopedia with articles that consist mainly of text.
- "Abstract Wikipedia" is the name of an initiative. This initiative will have two results: first, a new wiki called "wiki functions". This new wiki will, second, help to create new "information pages" in the Wikipedia language versions. These information pages (as I call them) will constitute the so called "Abstract Wikipedia".
- The information pages will be a better version of the "ArticlePlaceholder tool". The "ArticlePlaceholders" are not supposed to be articles, but exactly "placeholders" for articles where they do not exist. For example, Wikipedia in language X has no article about a certain German village. If you search for the village, you will get a "placeholder", which is basically a factsheet with titles/descriptions and data coming from Wikidata. (These placeholders exist currently in a small number of Wikipedias, experimentally.)
- The information pages will be "better" than placeholders because of Wiki functions (WF). The WF are code that can transform data/information from Wikidata into something that looks not like a factsheet but more like sentences.
- This means that in future, the reader of Wikipedia language version X will search for the German village. If the Wikipedians of that language version have not created such an article, then the reader will see the "information page" (again, as I call these pages) about that village. Or: if article and information page exist, the reader will be offered both. (A consequence: we might tend to have less data in an article that could be better presented via the information page.)
To me (and maybe others) it is confusing that "Abstract Wikipedia" is the name of the whole initiative and for one of the two results of the initiative. Also, the proposal talks about "articles" in "Abstract Wikipedia", and that sounds as if the information pages are supposed to replace the encyclopedic (text) articles. Finally, "Abstract Wikipedia" will not be a Wikipedia as we know it (encyclopedia with text articles).
So, what do you think about this summary? I am open for any correction necessary. :-)
Kind regards Ziko (talk) 10:44, 21 January 2021 (UTC)
- @Ziko: Hallo. That is partially accurate, partially incomplete, and partially not-yet-decided. I would try to clarify it like this:
- The Wikifunctions wiki has a broader goal beyond purely linguistic transformations. It will host functions that do any kind of specific calculation on any kind of data.
- As someone who thinks in lists, I found these 2 links particularly helpful for getting a better sense of the scope of the project: The (minimal) mockup screenshot of what a potential future Main Page might look like; and the very short list of Abstract Wikipedia/Early function examples (~50 examples out of what will hopefully become 100,000+++).
- Part of its scope will include some of what we currently use templates/modules for, on the other wikis. I.e. this will partially solve the problem we currently have, of needing 150+ separate copies of Template:Convert (and 130+ copies of Module:Convert) which aren't easy to maintain or translate. It won't solve it completely (in the short-term at least!) because the largest wikis will inevitably continue to want to use their own versions instead of a global version; but it should make things vastly easier for the smaller and mid-size wikis. It's also not a solution for the entire problem of "global templates" because Wikifunctions is not intended to host simple page-lists or box-layout-design code or many other template/module uses - but it will be able to partially solve the "calculation" part.
- There are some obvious existing semi-comparable projects in the math realm, such as fxSolver and WolframAlpha, but nothing (that we know of) in the more linguistic and other-data realms.
- This Wikifunctions project will be useful as a place to collaboratively (and safely) edit shared code, which can then be utilized on the other wikis, and even beyond.
- For example, at d:Q42, we know the birth-date of the subject in the Gregorian calendar. But Wikifunctions will provide a way for users to take that date and run calculations on it, such as: What day of the week was that? What was that date in the Hebrew calendar? How old were his parents at the time? What was the world/country/city population at the time? etc. -- Users will be able to run the calculations (functions) directly within the Wikifunctions wiki, as well as be able to output the result of the calculation to some other wiki.
- The Abstract Wikipedia component is separate, but with overlap. Many of the aspects of how it might be implemented still need a lot of discussion.
- For a content example, you make a good comparison with ArticlePlaceholder as a simple precursor. There's also Reasonator (e.g.1 en, e.g.2 fr); However those don't exist for many languages, nor most statements (e.g.3 vi). In the backend, those short descriptions are actually coming from the AutoDesc tool. (e.g.4). That AutoDesc system currently has 27 translatable word-elements, and can only handle very simple sentences for cases where each word can be separately replaced with a translated word (and the word order tweaked per-language). The most important difference to both ArticlePlaceholder and Reasonator - but also to projects such as LsjBot or Rambot - is that we want to allow the community to take more ownership of the way and the scope of information being rendered into the individual languages.
- The Abstract Wikipedia project (and the linguistic function parts of it within Wikifunctions), aims to be able to generate more complex content. It will partially rely upon Lexemes within Wikidata, and partially upon "functions" that convert sentence-structure-fragments into multiple languages.
- Regarding how that content might then appear within any particular Wikipedia project, there are 3 broad options outlined at Abstract Wikipedia/Components#Extensions to local Wikipedias (which I won't attempt to summarize; please read!). But there are also options beyond those 3, such as partial-integration (and partial local-override) at a per-section level. E.g. The Wikipedia article on "Marie Curie" (which only exists in any form in 169 languages today), could have an "Abstract version" shown at a small wiki, but the local editors at that small wiki could be able to override the intro-section and/or the "Life" section, or add a custom "Further reading" section, whilst still retaining the rest of the "abstract version". This should enable readers to get the most possible information (in their language) at any given time, whilst also enabling editors to improve any part of the content at any given time.
- Those sorts of details are yet to be discussed in great depth, partially because we need to determine what is technically possible/feasible, and partially because we need to slowly build up a broader community of participants for this sort of discussion/decision.
- It will be unusually intricate in the sense that components on different wiki projects will be working together to create the outcome of the project. But in the same domains of things we're all already very experienced with.
- We currently expect that the term "Abstract Wikipedia" will disappear in time, and it is really the name of the initiative. We don't expect that there will be, in the end, a component that will be called "Abstract Wikipedia". Or, put differently, the outcome of Abstract Wikipedia will be in the interplay of the different components that already exist and are being developed. But that's also up for discussion, as you can see in the discussion above.
- The Wikifunctions wiki has a broader goal beyond purely linguistic transformations. It will host functions that do any kind of specific calculation on any kind of data.
- Lastly, yes, naming things is hard. It's both hard to initially come up with perfectly descriptive and universally non-ambiguous short phrases for things, and even harder to change them once people have started using them!
- I hope that helps you, and I'll emphasize that we would also welcome any help making clear (and concise!) improvements to the main documentation pages to clarify any of this for others! :-) Quiddity (WMF) (talk) 23:00, 21 January 2021 (UTC)
Dear @Quiddity (WMF):, thanks so much that you took the time for your comments! This helps a lot. I am still digesting your explanations, and might have a few follow up questions later. But this helps me to study further. For the moment, Ziko (talk) 14:49, 25 January 2021 (UTC)
Spreadsheets functions
Hello,
I often use Spreadsheets and I like Wikifunctions. From my point of view Spreadsheet functions syntax is language independent. The most syntax of the functions there like IF or MID are translated into many languages and so user can edit the functions in their language. From my point of view this is a chance for Wikifunctions to get more contributions after I think that editing Spreadsheet functions is something what can be done by much more people than more complex programming taks. If I understand the function model and am able to write functions in Wikifunctions I can help to create something what helps to bring spreadsheets functions into a Wikifunction. This is something I say after I tried to convert a Spreadsheet function into a programm in R. After that works I think it can also work in other programming languages. I am not good in Programming and so I think I can only create a tool for converting a Spreadsheet function with the skills I have at the moment but not a gadget or something like that what is more integrated in Mediawiki. Please tell me what you think about that. --Hogü-456 (talk) 21:09, 1 February 2021 (UTC)
- @Hogü-456: Thanks for your comment. I am also rather excited about the idea of bring spreadsheets and Wikifunctions close together. But I am not sure what you are suggesting:
- it should be possible for someone who has the skills to write a formula into a spreadsheet to contribute functions to Wikifunctions
- it should be possible for a spreadsheet user to use functions from Wikifunctions
- I agree with both, but I wanted to make sure I read your comment right. --DVrandecic (WMF) (talk) 23:20, 1 February 2021 (UTC)
- I suggest the first thing and the second thing is also interesting. I dont know if there is a equivalent in Spreadsheets for all the arguments in Wikifunctions. So I think it is possible for the most things but not for all. --Hogü-456 (talk) 18:53, 2 February 2021 (UTC)
- I agree both are interesting. There is also a third possibility, which is having a spreadsheet implementation of a Wikifunctions function. In spreadsheets, arguments are ranges, arrays, formulas or constants, and different spreadsheet functions have different expectations of the types of argument they work with. For example, in Google Sheets, you can SUM a 2-dimensional range but you can't CONCATENATE it. In Wikifunctions, any such limitations would be a result of the argument's type. In principle, a Wikifunctions function could concatenate a 2D or 3D array, or one with as many dimensions we care to define support for. GrounderUK (talk) 20:01, 2 February 2021 (UTC)
- Interesting points. We will definitely take a very close look at spreadsheets once the UX person has joined us, in order to learn from how spreadsheets does allow many people to program. I think that the mix of guidance by the data and autocomplete is extremely helpful, and we can probably do something akin to that: if we know what the function contributor is trying to get to, and know what they have available, we should have pretty good constraints on what could be done. --DVrandecic (WMF) (talk) 00:38, 4 February 2021 (UTC)
- Any idea when the UX person will be joining?--GrounderUK (talk) 08:08, 3 March 2021 (UTC)
- The process is ongoing, we'll announce it through the weeklies when we know it. --DVrandecic (WMF) (talk) 01:30, 4 March 2021 (UTC)
- Any idea when the UX person will be joining?--GrounderUK (talk) 08:08, 3 March 2021 (UTC)
- Interesting points. We will definitely take a very close look at spreadsheets once the UX person has joined us, in order to learn from how spreadsheets does allow many people to program. I think that the mix of guidance by the data and autocomplete is extremely helpful, and we can probably do something akin to that: if we know what the function contributor is trying to get to, and know what they have available, we should have pretty good constraints on what could be done. --DVrandecic (WMF) (talk) 00:38, 4 February 2021 (UTC)
- I like the concept of using Wikifunctions with an extension for Speadsheets (but not only: lot of programming languages could have such "pure functions" integrated while being executed offsite, over the cloud, and not not just over Wikifunctions, which would just be the place where functions are found as a repository, described, then located with the evaluator running on Wikimedia servers... or elsewhere, such as Amazon Lambda, Azure functions, IBM cloud, or other public clouds,n or private clouds like nextcould. For this to work, I think that Wikifunctions should use a common web API that allows chosing the location where evaluators will run (including for the application using these functions, or the OS running that application, or the user profile using that application, the possibility to set preferences; a user could have in his profile a set of prefered locations for the evaluators). In summary what we need is: (1) a repository with lookup facility, that repository should describe the functions in the same computer language via its interface, that API will be able to collect sets of locations where these functions can be evaluate; (2) locating a function description and locating an evaluator for it requires resolvers (based on URNs); (3) searching a function is more or like like a classic search engine, except that it searches in structured metadata, i.e. special documents containing sets of properties, or a database, just like what Wikidata is already; the same occurs with other common repositories for software updates on Linux with APT, DNF, YUM or EMERGE, or for programming environments like CPAN or NODE.JS, or LUAROCKS and also in GitHub with a part of its services or with various connectors (Wikifunctions could as well provide a connectror for GitHub...)
- So we should separate the repository of function metadata (i.e. Wikifunctions, containing lot of info and other services like talk pages and presentations, logos, Wikimedia user groups via subscribed Wikiprojects...) from the repository of locations/sites where a function can be executed (either because it is installed locally, or because the site offers the way to accept a task to be uploaded there and registered in a cache for later reuse), in order for a client (like Wikifunction itself or any application using Wikifunctions) to be able to accept jobs to be executed on that site to produce a result (of course this requires security: connecting a new evaluation site has constraints, including legal ones for privacy: calling a function with parameters will easily allow the evaluator to collect private data); as well we must avoid the denial of service (so that the repository of evaluator will detect sites that are not responding to any evaluation request, or just return corrupted or fake answers: this requires evaluation of trust and reliability, just like in any distributed computing environment (this is not so much a deal if evaluation sites cannot be freely be added but are added by admins of the locator server for all possible evaluators.
- In summary, separate the description of functions (in its metadata language) from its evaluation (except for a few builtin functions that are resolved locally and possibly runnign on the same host or only on a specific small set of hosts with secure connections, and accepting to delegate execution only to the same final client of the function, if that client wants to run the function himself with his own resources, for example for debugging inside his web browser, with Javascript or WebAsm). verdy_p (talk) 01:13, 5 March 2021 (UTC)
There are a few ideas like this Wiki Spreadsheet design, which depend on functions being defined in some shared namespace on a project, or globally. As a result, spreadsheet functions may be a good class to expand to, in addition to those useful on Wikipedia pages. –SJ talk 21:09, 20 April 2021 (UTC)
- @Sj: I've merged your comment up into this existing thread, I hope that's ok. :)
- I'm also reminded of Wikipedia and Wikidata Tools (see TLDR in github), which seems closely related. Quiddity (WMF) (talk) 03:52, 21 April 2021 (UTC)
- @Quiddity: more than ok! Thanks for catching that. Spreadsheets and wikicalc remain the future... and a necessary building block for even more interesting transparent dynamic refactoring. –SJ talk 18:10, 24 April 2021 (UTC)
License
Hello, I don't see much communication about the essential point of license. Since it's a direct continuation of the Wikidata agenda, is it correct that it will impose CC-0 everywhere it will be integrated within the Wikimedia environment? A clear yes/no would be appreciated. --Psychoslave (talk) 22:57, 5 March 2021 (UTC)
- All appologies, I didn't saw that this talk page had archives and thus already had this point discussed in Talk:Abstract_Wikipedia/Archive_1#License_for_the_code_and_for_the_output and Talk:Abstract_Wikipedia/Archive_2#Google's_involvement. So it looks like I have some homework to do before possibly further comment. At first glance the answer seems to be "it's not decided yet". --Psychoslave (talk) 23:06, 5 March 2021 (UTC)
Follow up on the earlier topic « License for the code and for the output »
Given my understanding of this, which I'm not sure is correct at all, then the generated text can't be copyrighted, but the data and code to generate the text can be protected. — @Jeblad:
I this this is an analogous problem of the compilation problem : the binary usually does not inherit from the compiler licence, the result depends on the source code licence, see. the FSF FAQ on the GPL licence : https://www.gnu.org/licenses/gpl-faq.en.html#GPLOutput More generally, when a program translates its input into some other form, the copyright status of the output inherits that of the input it was generated from. … it seem the licence of the result depends on the license of the source data, not the licence of the source code of the functions.
Which could be the lexicagraphical datas, and the abstract form of articles, I guess, but at some point the distinction of the code and the datas can somewhat be blurry …
Also @Denny, Deryck Chan, Psychoslave, DVrandecic (WMF), and Amire80: TomT0m (talk) 10:53, 7 March 2021 (UTC)
- I agree with the principles laid out above and don't have any specific preference to give. Deryck C. 16:56, 7 March 2021 (UTC)
- Thank you for pinging me TomT0m. Do you have any question on the topic, or specific point on which you would like some feedback?
- For what it's worth, "I am not a lawyer but…", your exposure of the subject seems correct to me. I would simply add with a bit more granularity you might add that :
- it depends on the terms of use of your toolchain. If you use some SaaS that basically say "all your data are belong to us" and between two cabalistic attorney incantation say that the end output will be placed "under exclusive ownership of Evil-Corp.™ and we may ask you to pay big money at any time for privileges like accessing it in some undefined future", the terms might well be valid. But to the best of my knowledge, there's no such a thing in the Wikimedia compiler toolchains, houra,
- you can't legally take some work on which you don't have personal ownership, throw it through some large automaton chain like, and claim ownership on the result: you have to make a deal with the owner of each peace of work you took as input. All of them. Or none, if none of them pay attention or don't care about what you do. Until they do.
- the world is always more complicated (even when you took into account that the wold is always more complicated): there is no such thing as "the copyright law". Each legislation as its myriad of laws and exceptions, which may apply on a more or less wide area, with a lot of other locale rules which may or not have preemption on the former, etc.
- But once again, that's just more "focused details", the general exposure you made seems correct to me. --Psychoslave (talk) 16:24, 8 March 2021 (UTC)
Thanks for asking the question! Yes, that's still something we need to figure out. We are working together with Wikimedia's legal department on the options. A decision must be made before the launch, but it will still take as a while to get to something that's ready to be shown to the communities. Thank you for your patience! --DVrandecic (WMF) (talk) 17:54, 8 March 2021 (UTC)
Example Article?
There's a lot of talk about mathematical functions here; there's less talk in functions that produce sentences. Would there be interest in having an "example" article to show what Abstract Wikipedia might look like? Specifically:
- Write a pseudo-code article on a high-profile topic. For example, for simple:Chicago, (Q1297 IS Q515 LOCATED IN Q1204. Q1297 IS Q131079 LIST POSITION [three])
- Evaluate the functions manually to translate it to several languages (ideally including languages such as ht:Chicago with bad coverage, if we have sufficient language materials to do so).
Does this seem like a reasonable thing to do? power~enwiki (talk) 20:08, 11 March 2021 (UTC)
- @Power~enwiki: Hi. Yes that sounds like a good idea. We can create a detailed example of at least one complex paragraph sometime in the next few weeks, which should be enough to extrapolate from. And then anyone could build out further examples for discussion and potential problem-solving.
- For now, we do have a couple of example sentences from an early research paper in Abstract Wikipedia/Examples, and 2 very early mockups at Abstract Wikipedia/Early mockups#Succession (2 sub-sections there) which might help a little. Thanks for asking. Quiddity (WMF) (talk) 23:29, 11 March 2021 (UTC)
A few more questions for the crowd.
- Are we going to try to have each sentence be 1 line, or will these be multiple lines?
- Wikidata is great for nouns. For verbs, it doesn't seem as good. Many concepts ("to destroy", "to fix") don't have encyclopedia articles at all. Are these available in Wikidata somehow? power~enwiki (talk) 22:49, 14 March 2021 (UTC)
- When writing an article, should pronouns be used at all? It seems better to refer to a Q-item (or an ARTICLETOPIC macro), and to have the natural-language generation determine when to replace an explicit subject with a pronoun.
Thoughts? power~enwiki (talk) 22:49, 14 March 2021 (UTC)
- @Power~enwiki I think d:Wikidata:Lexicographical data might help. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 10:34, 15 March 2021 (UTC)
@Power~enwiki: I’m not sure I understand your first question, but...
- I don’t think we should have limits on the length or complexity of sentences. For more complex sentences (like the previous one), a version using simpler sentences should be considered. (Long sentences are fine. Complex sentences are fine.)
- Further to what @1234qwer1234qwer4: said, Wikidata lexemes (may) have a specific property (P5137) linking a sense to a particular Wikidata item. Currently, destroy doesn’t, but be links to both existence ( Q468777) and being ( Q203872), for example.
- I agree that it would generally be better to use unambiguous references. In the translations (natural language renderings), I suggest we might use piped links for this. For example, “Some landmarks in the city are...”. We might also use, for example, <ref>[[:d:Q1297| of Chicago]]</ref> where there is ellipsis, as in “As of 2018, the population[1] is 2,705,994.”
References
Regarding "1-line" v "multi-line": My question is whether the syntax will look more like SQL or more like JSON (or possibly like LISP). I suppose any can be one-line or multi-line. Regarding verbs: Lexemes are language-specific; it's currently annoying to search Wikidata for them but I don't think they should be in the language-agnostic Abstract Wikipedia text anyhow. power~enwiki (talk) 03:26, 16 March 2021 (UTC)
- @Power~enwiki: Thanks for clarifying. There’s a chicken-and-egg problem here. I don’t think @Quiddity (WMF): is intending to include me in his “we” and I have a open mind on the question of “pseudo-code”. I suggest we [inclusive and impersonal] should avoid something that looks like “labelized Wikifunctions”. This is because “we” should try to “translate” the pseudo-code into that format, as well as into natural languages (and/or Wikitext that looks very like the Wikitext one might find in the Wikipedia in that language). So the pseudo-code should be more like “our” language of thought (labelled “English”, for example) but using an unambiguous identifier for each semantic concept. This means we can have many “pseudo-codes”, one for each natural language we select. Such a pseudo-code should not look too much like a translation into that target language or its Wikitext, of course. Without endorsing its content, I refer interested parties to Metalingo (2003)--GrounderUK (talk) 12:22, 16 March 2021 (UTC)
- I'd like to second the suggestion of a relatively fleshed out example. One paragraph would be fine, but I think it's important to also try rendering the same abstraction into a couple languages other than English (including at least one non-Indo-European language). As I said at Talk:Abstract Wikipedia/Examples, I think it's easy to fall into a trap of coming up with functions that accidentally have baked into them features that are peculiar to the English language, or which at least fail to generalize to some languages.
- Another worthwhile exercise would be taking a couple of the functions used in the example, and coming up with several test cases for those functions, and then seeing whether, in principle, a single renderer for language X could produce cogent results for all of those test cases. For example, maybe there's a function
possible(X)
. And so we come up with test cases like:possible(infinitive_action_at_location(see(Russia), Alaska))
possible(exists(thing_at_location(Alien life, Mars))
possible(action(US President, declare(state_of_war_against(China)))))
- And it's straightforward (modulo some fine details) to imagine how a renderer for this function could be implemented in English. But there are languages that distinguish grammatically between the possibility that something is true (epistemic modality: either there's life on Mars or there isn't - we don't know which is true, but we have some reason to think there might be) and someone having the ability to do something (the President having the ability/power to declare war against another country, an arbitrary person being able to see Russia from Alaska). So in fact this is a bad abstraction, since a renderer in, say, Turkish needs to distinguish between these senses, and can't do so with the information given. Colin M (talk) 17:09, 18 April 2021 (UTC)
I started fleshing out the start of an article here: Jupiter. Happy if we continue to work on this and try to figure out if we can find common ground. It would be great to do some Wizard of Oz style test and try to render that by hand for some languages. The only non-PIE language I have a tiny grip on is Uzbek - I am sure we can find more competent speakers than me to try this out. --DVrandecic (WMF) (talk) 00:53, 20 April 2021 (UTC)
- This is great. I would suggest linking to it from the main examples page for more visibility (though I understand if you want to do more work on it first). One suggestion I would have for a human-as-renderer test might be to do some replacement of lexemes with dummies in such a way that the renderer won't be influenced by their domain knowledge. e.g. instead of
Superlative(subject=Jupiter, quality=large, class=planet, location constraint=Solar System)
, something like...Superlative(subject=Barack Obama, quality=large, class=tree, location constraint=France)
. - In this case, for the renderers to have a hope of getting the right answers, we would probably need to also give them some documentation of the intended semantics of some of these functions and their parameters. But I see that as a feature rather than a bug. Colin M (talk) 17:35, 20 April 2021 (UTC)
- Yeah, I probably should link to it. I didn't because it is not complete yet, but then again, who knows when I will get to completing it? So, yeah, I will just link to it. Thanks for the suggestion. --DVrandecic (WMF) (talk) 00:10, 21 April 2021 (UTC)
Schemas
I am interested in Wikifunctions and I like the idea of Abstract Wikipedia. I created easy structured sentences with variable parts for some times in Spreadsheets. I think a chance of Abstract Wikipedia is that it could improve the data quality in Wikidata. I think if it is known what is important about a topic and someone miss something and understands how the text is generated, then maybe people enter the information into Wikidata. For that it is helpful to have clear structures and forms can help to enter information. At the moment this is not so easy. Have you thinked about how the content what is important about a topic can be defined. As far as I know there are shape expressions in Wikidata. --Hogü-456 (talk) 20:35, 22 March 2021 (UTC)
- @Hogü-456: Yes, shape expressions are the way to go, as far as I understand the Wikidata plans. Here's the Wikidata project page on Schemas. My understanding is that Shex can be used both for creating forms as well as for checking data - I would really like to see more work on that. This could then also be used to make sure that certain conditions on an item are fulfilled, and thus we know that we can create a certain text. Fully agreed! --DVrandecic (WMF) (talk) 23:15, 19 April 2021 (UTC)
Info-Box (German)
Who takes care of the entries in the Info-Box? At least in the German Info-Box the items differ from the real translations. —The preceding unsigned comment was added by Wolfdietmann (talk) 09:32, 15 April 2021 (UTC)
- @Wolfdietmann: Hi. In the navigation-box, there's an icon in the bottom corner that leads to the translation-interface (or here's a direct link). However, that page doesn't list any entries as being "untranslated" or "outdated". Perhaps you mean some of the entries are mis-translated, in which case please do help to correct them! (Past contributions are most easily seen via the history page for Template:Abstract_Wikipedia_navbox/de). I hope that helps. Quiddity (WMF) (talk) 18:33, 15 April 2021 (UTC)
- @Quiddity (WMF): Thank´s a lot. Your hint was exactly what I needed.--Wolfdietmann (talk) 09:29, 16 April 2021 (UTC)
Effects of Abstract Text to the Job Market
Hello,
I have thinked about what could happen if abstract Text works good and is used in many contexts also out of the Wikimediaprojects. What does that mean for jobs. I asked me is Abstract Text a innovation that reduces the need of personnel for example for writing technical instructions. Changes are something what happen and they are not bad. I think it is important that there is a support for employees that are affected by that change and to make sure that they have another job after the change. From my point of view is Wikifunctions here a important part because it can help enable people to learn the skills that are good to know, to do other jobs. From my point of view programming is something that is interesting and can help in many parts. I suggest to create a page with some recommendations for potential users of Abstract text to make sure they are aware of the changes that can come through with that change for their employees and that they get the knowledge they need. What do you think about that. What is my responsibility as a volunteer who is interested in that project and plans to participate if it is live, until now I read through the most pages here and tried to understand how it works, to make sure that I do not support a increasing unemployment through optimiziation. --Hogü-456 (talk) 19:55, 17 April 2021 (UTC)
- @Hogü-456: Thank you for this thoughtful comment. In a scenario where Abstract Wikipedia and abstract content are so successful to make a noticeable dent on the market for technical and other translators, it would also create enormous amounts of value for many people by expanding potential markets and by making more knowledge available to more people. In such a world, creating and maintaining (or translating natural language texts to) abstract content will become a new, very valuable skill, that will create new job opportunities. Imagine people who create a chemistry or economics text book in abstract content! How awesome would that be, and the potential that this could unlock!
- I actually had an interview with the Tool Box Journal: A Computer Journal For Translation Professionals earlier this year (it is in issue 322, but I cannot find a link to that issue). The interesting part was that, unlike with machine learning tools for translation, the translator that interviewed me really found it interesting, because they, as the translator, could really control the content and the presentation, unlike with machine learned systems. He sounded rather eager and interested in what we are building, and not so much worried about it. I think he also thought that the opportunities - as outlined above - are so big, if all of this works at all. --DVrandecic (WMF) (talk) 21:06, 21 April 2021 (UTC)
Recent spam on https://annotation.wmcloud.org/
A few suspicious accounts have been created today, and started creating spam pages on the wiki (see the recent changes log). TomT0m (talk) 18:20, 18 April 2021 (UTC)
- Thanks for the heads-up. Quiddity (WMF) (talk) 03:53, 21 April 2021 (UTC)
Regular inflection
Hello,
in the last weeks I tried to add the inflactions/Beugungen of german nouns that consist of more than one noun as Lexemes in Wikidata. At the overview about the phases I have seen that it is part of the second phase to make it possible to automatically create regular inflections. I created a template as a csv-File that helps me to extract the possible words out of a longer word. In the template I extract all possible combinations with a length beetween 3 and 10 characters. After that I check which combinations match with a download of the so far existing german lexems for nouns with their forms and there I then check if it is going to the end and for these word I extract then the part before the last word with the first character of the last word and match to that additional the existing forms from the Lexemes. For that I have a script in R and a spreadsheet. I want to work at a script for creating the forms of a noun starting with german at the Wikimedia Hackathon. Has someone created something similar or have you thinked about that also for other languages and how the rules are in that language.
At the Wikimedia Remote Hackathon 2021 I suggested a session what is a conversation about how to enable more people to learn coding also with a focus to the Wikimedia Projects and in the phabricator ticket for that I added a link to the mission statement of Wikifunctions after I think this is a project what has this as a goal to make functions accessable to more people and I have several ideas about functions that I think are helpful and I plan to publish them in Wikifunctions if I am able to write them. If you are interested you can attend at the Conversation.--Hogü-456 (talk) 21:26, 12 May 2021 (UTC)
- @Hogü-456: Thanks! This is exactly the kind of functions I hope to see in Wikifunctions. Alas, it is still a bit too early for this year's Hackathon for us to participate properly, i.e. for Wikifunctions to be a target where such functions land. But any such work will be great preparation for what we want to be able to represent in Wikifunctions. We hope to get there later this year, definitely by next year's Hackathon.
- If you were to write your code in Python or JavaScript, then Wikifunctions will be soon ready to accept and execute such functions. I also hope that we will cover R at some point, but currently there is no timeline for that.
- If you are looking for another resource that has already implemented inflections for German, there seem to be some code packages that do so. The one that I look to for inspiration is usually Grammatical Framework. The German dictionary is here: http://www.grammaticalframework.org/~john/rgl-browser/#!german/DictGer - you can find more in through their Resource Grammar Library: http://www.grammaticalframework.org/~john/rgl-browser/
- One way or the other, that's exciting, and I hope that we will be able to incorporate your work once Wikifunctions has reached the point where we can host these functions. Thank you! --DVrandecic (WMF) (talk) 00:43, 14 May 2021 (UTC)
Idea: Fact-id for each fact?
Can each fact statement get its own id? i.e., make a Wiki of Facts along with abstract Wikipedia.
- Each fact should get its own fact-id, so that people can share the id to support claims made in discussions elsewhere. Similar to Z-id or Q-id of wikidata. This proposal requests fact-id for each statement or facts.
- It will create a structured facts wiki, in which each page about a topic will list bulleted list of facts, with each facts having its own id. References are added to it to support the claims.
- It presents Facts directly without hiding it in verbal prose. Cut to the chase!
- Example 1: Each fact statement of a list like Abstract Wikipedia/Examples/Jupiter will get its own id. Example 2: A page will list facts like w:List of common misconceptions with each fact in it getting its own id.
- This wiki will become the go-to site to learn, find, link, support and verify facts/statements/claims. And over time, Wiki of Facts will have more reliability and credibility than any other format of knowledge. (elaborated at WikiFacts) -Vis M (talk) 04:54, 27 May 2021 (UTC)
- @Vis M:I see you have withdrawn the WikiFacts proposal. The Abstract Wikipedia Wiki of Facts would be Wikidata, with each Wikidata statement or claim corresponding to a "fact". There are occasions when an explicit statement ID would be helpful in Wikidata, notably when a different Property is used to represent the same information. However, the Item/Property combination used in Wikidata is aligned to the classical subject—predicate form, which is probably a better starting point for natural-language representations. In particular, it is likely that similar predicates (represented using identical Property IDs) would be rendered in similar language for identifiable classes of subject. Allocating individual subjects (Wikidata Items) to appropriate classes for this purpose is also likely to be achieved within Wikidata, either with an additional Property or within lexical data (according to the factors driving the different linguistic representations).--GrounderUK (talk) 09:17, 19 June 2021 (UTC)
- Ok, thanks! Vis M (talk) 14:46, 22 June 2021 (UTC)
- @Vis M:I see you have withdrawn the WikiFacts proposal. The Abstract Wikipedia Wiki of Facts would be Wikidata, with each Wikidata statement or claim corresponding to a "fact". There are occasions when an explicit statement ID would be helpful in Wikidata, notably when a different Property is used to represent the same information. However, the Item/Property combination used in Wikidata is aligned to the classical subject—predicate form, which is probably a better starting point for natural-language representations. In particular, it is likely that similar predicates (represented using identical Property IDs) would be rendered in similar language for identifiable classes of subject. Allocating individual subjects (Wikidata Items) to appropriate classes for this purpose is also likely to be achieved within Wikidata, either with an additional Property or within lexical data (according to the factors driving the different linguistic representations).--GrounderUK (talk) 09:17, 19 June 2021 (UTC)
Wikifunctions
What will be the URL for the upcoming Wikifunctions website? When is it expected to be completed? 54nd60x (talk) 12:55, 17 August 2021 (UTC)
- @54nd60x: The URL will be wikifunctions.org (similar to wikidata.org).
- Timeline: Overall, "we'll launch when we're ready". We're hoping to have a version on the mw:Beta Cluster sometime during the next few months, and hoping the "production" version will be ready for initial launch sometime early in the next calendar year. It won't be "feature-complete" at launch, but it will be: stable for adding the initial functions content; locally-usable for writing and running functions; and ready for further development. The design and feature-set will steadily grow and adapt based on planned work and on feedback, over the coming years. More broad details on the development are at Abstract Wikipedia/Phases, and finer details at the links within. I hope that helps! Quiddity (WMF) (talk) 19:51, 18 August 2021 (UTC)
Decompilation
I am still working on Spreadsheetsfunctions and try to understand them and write functions to bring them into another programming language. In the last days I have thinked about how far it is possible to get the specific program that is generated for the input I gave through entering functions in a Spreadsheet. So that the Output is generated and printet out in a Cell. I want the program as Byte code or Assembler or in another notation from where it is possible to bring it to other programming languages in a automatic way. Do you think that this is possible to make out of spreadsheet functions a program in another programming language by trying to get the binary code and then try to decompile it. After I dont know much about programming and I dont know if it is possible. Has someone of you experiences about decompiling and how it is possible to get the byte code that is executed when I enter a own written composition of functions in my Spreadsheet program.--Hogü-456 (talk) 18:01, 8 July 2021 (UTC)
- @Hogü-456: There is a lot of "it depends" in this answer :) If you search the Web for "turn spreadsheet into code" you can find plenty of tools and descriptions helping with that. I have my doubts about these, to be honest. I think, turning them into binary code or assembler would probably be an unnecessarily difficult path.
- My suggestion would be to turn each individual formula into a function, and the wire the inputs and outputs of the functions according to the cells together. That's probably what these tools do. But I would try to stay at least at the abstraction level of the spreadsheet, and not dive into the binary.
- It will be interesting to see how this will play together with Wikifunctions. I was thinking about using functions from Wikifunctions in a spreadsheet - but not the other way around, using a spreadsheet to implement functions in Wikifunctions. That's an interesting idea, that could potentially open the path for some people to contribute implementations?
- I filed a task to keep your idea in the tracker! --DVrandecic (WMF) (talk) 21:13, 3 September 2021 (UTC)
Logo of Wikifunctions and License
Hello, when will the logo for Wikifunctions be finalized and then published. There was a voting about the favourite logo and since then I havent heard something about it. Something I am also interested in is to understand what the license of the published functions will be. Do you know what license will be used for the content in Wikifunctions. Have you talked with the legal team of the Wikimedia Foundation what they think about it. Please try to give an answer to this questions after it is now a while since the discussions about that topics.--Hogü-456 (talk) 19:49, 23 August 2021 (UTC)
- @Hogü-456: We are working on both, sorry for the delay! We had some meetings with the legal team, and we are still working on the possible options regarding the license. That is definitely a conversation or decision (depending on how the discussion with legal progresses) we still owe to the community. This will likely take two or three months before we get to the next public step.
- Regarding the logo, we ran into some complications. It is currently moving forward, but will still take a few weeks (but not months, this should be there sooner).
- Thanks for inquiring! We'll keep you posted on both as soon as we have updates! --DVrandecic (WMF) (talk) 20:45, 3 September 2021 (UTC)
Why was this approved so quickly?
Wikispore, Wikijournal, and Wikigenealogy all have potential. Wikifunctions is just another project that will sit near-empty for years like Wikispecies has, and should be part of Wikidata anyway. 2001:569:BD7D:6E00:F9E8:8F6F:25D1:B825 01:38, 19 June 2021 (UTC)
- I kind of agree. If lexemes were just added as a separate namespace to Wikidata, why weren't functions? ~~~~
User:1234qwer1234qwer4 (talk) 11:41, 19 June 2021 (UTC)- From my point of view one difference between Wikifunctions and for example lexems is, that Wikifunctions will offer calculation resources so that some of the calculations can be made on the platform directly and it is not needed to run a function locally. As far as I understand it can be also used to centralize templates that are currently defined locally in the different language versions of Wikipedia and the other Wikimedia projects. So there is maybe a technical aspect why Wikifunctions is an own sister project.
- Wikidata has a lot of contenct so I think it can happen that is not so easy to find something for a user. I have sometimes problems to find Lexems after I need to change the search for that, so that the search goes also to the Lexemenamespace. So I hope that it is easier also for external users that they can find the content if Wikifunctions is an own project. To make sure that it will not sit near-empty for years is from my point of view a big challenge. As far as I understand the project was approved because there is the hope that it can help making knowledge accessible in more languages. I dont know if this works also for small languages but I hope that it will work and I think it is important to work on it for the next years that it can become reality. How to make Wikifunctions accessible for many people is an important question and I hope that there are more discussions about it in the next weeks. The Wikimania this year is a chance to talk about Wikifunctions also with people who speak small languages.--Hogü-456 (talk) 19:54, 20 June 2021 (UTC)
- Yes, as Hogü-456 describes, those are among the reasons why the project is better as a separate wiki, and some of the goals of the project. They require very different types of software back-ends and front-ends to Wikidata, and a significantly different skillset among some of the primary contributors. I might describe it as: Wikidata is focused on storing and serving certain types of structured data, whereas Wikifunctions is focused on running calculations on structured data. There are more overview details in Abstract Wikipedia/Overview that you might find helpful. Quiddity (WMF) (talk) 22:19, 21 June 2021 (UTC)
- @Hogü-456: Re: problems searching for Lexemes on Wikidata - If you prefix your searches there with
L:
then that will search only the Lexeme namespace. E.g. L:apple. :) Quiddity (WMF) (talk) 22:23, 21 June 2021 (UTC)
- this was asked and answered in Talk:Abstract_Wikipedia/Archive_2#How_was_it_approved and Talk:Abstract_Wikipedia/Archive_2#A_few_questions_and_concerns. --QDinar (talk) 13:21, 17 September 2021 (UTC)
- see also Talk:Abstract_Wikipedia/Archive_2#Google's_involvement and Talk:Abstract_Wikipedia/Archive_2#Confusion, these are suspicions about google's involvement, and answers. --QDinar (talk) 21:10, 18 September 2021 (UTC)
you are creating just a new language just like any of existing natural languages. this is wrong.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia :
In Abstract Wikipedia, people can create and maintain Wikipedia articles in a language-independent way. A particular language Wikipedia can translate this language-independent article into its language.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Examples :
Article( content: [ Instantiation( instance: San Francisco (Q62), class: Object_with_modifier_and_of( object: center, modifier: And_modifier( conjuncts: [cultural, commercial, financial] ), of: Northern California (Q1066807) ) ), Ranking( subject: San Francisco (Q62), rank: 4, object: city (Q515), by: population (Q1613416), local_constraint: California (Q99), after: [Los Angeles (Q65), San Diego (Q16552), San Jose (Q16553)] ) ] )* English : San Francisco is the cultural, commercial, and financial center of Northern California. It is the fourth-most populous city in California, after Los Angeles, San Diego and San Jose.
you are creating a new language just like other natural languages. and it is worse than the existing natural languages, because it is going to have many functions like "Object_with_modifier_and_of", "Instantiation", "Ranking".
an advantageous feature, compared to natural languages, shown there, is that you link concepts to wikidata, but that can be done also with natural languages. other good thing here is structure shown with parentheses, but that also can be done with natural languages. so, there is nothing better in this proposed (artificial) language, compared to natural languages.
i think that, probably, any sentence of any natural language is semantically a binary tree like this:
( ( (San Francisco) ( be ( the ( ( ( ( (culture al) ( , (commerce ial) ) ) ( , ( and (finance ial) ) ) ) center ) ( of ( (North ern) California ) ) ) ) ) ) s ) . ( ( it ( be ( ( the ( (four th) ( ( (much est) (populous city) ) (in California) ) ) ) ( , ( after ( ( (Los Angeles) ( , (San Diego) ) ) ( and (San Jose) ) ) ) ) ) ) ) s ) .
some parts of text can be shown several ways as binary trees. for example:
((San Francisco) ((be X) s)) (((San Francisco) (be X)) s) fourth ( ( most (populous city) ) (in California) ) ( fourth most ) ( (populous city) (in California) ) ( fourth ( ( most (populous city) ) ) ) (in California) fourth ( ( ( most populous ) city ) (in California) )
--QDinar (talk) 12:31, 17 August 2021 (UTC) (last edited 06:44, 6 September 2021 (UTC))
creating a new language is a huge effort, and only few people are going to know it. you have to discuss different limits of every word in it to come to some consensus... and all that work is just to create just another language no better, by its structure, than existing thousands natural languages. (lexicon can be bigger than of some languages). what you should do instead is just use a format like this for every language and use functions to transform it to usual form of that language. also, speach synthesis can be done better using the parentheses. also you can transform this formats from language to language. --QDinar (talk) 12:55, 17 August 2021 (UTC)
i think, any paragraph, probably, also can be structured into binary tree, like this, and i make a tree of mediawiki discussion signature, for purpose of demonstration of the binary tree concept:
( ( ( creating a new language is a huge effort, and only few people are going to know it. you have to discuss different limits of every word in it to come to some consensus... ) ( and all that work is just to create just another language no better, by its structure, than existing thousands natural languages. (lexicon can be bigger than of some languages). ) ) ( ( ( what you should do instead is just use a format like this for every language and use functions to transform it to usual form of that language. also, speach synthesis can be done better using the parentheses. ) also you can transform this formats from language to language. ) ) ) ( -- ( ( QDinar ("()" talk) ) ( ( 12 (: 55) ) ( , ( ( (17 August) 2021 ) ( "()" ( (U T) C ) ) ) ) ) ) )
(the regular sentences are intentionally not structured into binary tree in this example). this structure can be useful to better connect sentences via pronouns. and different languages may have different limits and preferences in using one sentence vs several sentences with pronouns. this parentheses may help to (properly) translate that places to other languages. --QDinar (talk) 13:22, 17 August 2021 (UTC)
since this is like out of scope of the Abstract Wikipedia project, i have submitted this as a project: Structured text. --QDinar (talk) 19:20, 18 August 2021 (UTC)
@Qdinar: Yes, you're right, in a way we are creating a new language. With the difference, that we are creating it together and that we are creating tools to work with that language. But no, it is not a natural language, it is a formal language. Natural languages are very hard to parse (the structures that you put into your examples, all these parentheses, were done with a lot of intelligence on your part). The other thing is that a lot of words in natural languages are ambiguous, which makes them hard to translate. The "sprengen" in "Ich sprenge den Rasen" is a very different "sprengen" than the one in "Ich sprenge die Party". That's why we think we need to work with a formal language, in order to avoid these issues. I don't think that you could use a natural language as the starting point for this (although I have a grammar of Ithkuil on my table right now, and that might be an interesting candidate. One could argue whether that's natural, though). --DVrandecic (WMF) (talk) 20:59, 3 September 2021 (UTC)
- "Natural languages are very hard to parse (the structures that you put into your examples, all these parentheses, were done with a lot of intelligence on your part)." - i do not agree with this. to write this i just need to know this language and write, that is english and i already know it. binary tree editor could help to make this tree faster. i have also added request for binary tree tool in the Structured text, and additional explanations. in comparison, to write in your language, i need to learn that your language. and this binary tree structure is easier to parse than your complicated language. if you say about parsing from traditional text, then, it is possible to do it, and there is almost zero texts in your language yet. --QDinar (talk) 08:49, 4 September 2021 (UTC)
- "and this binary tree structure is easier to parse than your complicated language" - probably, easiest way to parse this your new language is to parse it also into a binary tree form first, just like with natural languages, and then parse that binary tree. --QDinar (talk) 09:19, 4 September 2021 (UTC)
- probably you are going to use a tree, not binary, while parsing your code. i think, there is one advantageous thing: list shown as list, instead of binary tree, is more easier to read by human, and seems a little easier to compute for computer. but that, lists for (ordered and unordered) lists, can be added also to this binary tree idea. --QDinar (talk) 16:45, 4 September 2021 (UTC)
- "and this binary tree structure is easier to parse than your complicated language" - probably, easiest way to parse this your new language is to parse it also into a binary tree form first, just like with natural languages, and then parse that binary tree. --QDinar (talk) 09:19, 4 September 2021 (UTC)
- "The other thing is that a lot of words in natural languages are ambiguous, which makes them hard to translate." - probably, your artificial language is also going to have some ambiguities. because get every meaning and you can divide that meaning into several cases. "The "sprengen" in "Ich sprenge den Rasen" is a very different "sprengen" than the one in "Ich sprenge die Party"." - this example has not been useful example to prove to me. in both cases it is about scattering something, is not it? if somebody causes a party to cancel before even 10% of its people have known about it is going to be held, is this word used? i suspect, it is not used in that case. --QDinar (talk) 09:11, 4 September 2021 (UTC)
- and, it is possible to refer to submeanings of words with natural languages, like sprenge1, sprenge2, (maybe using meanings which shown in wiktionary). --QDinar (talk) 12:35, 4 September 2021 (UTC)
- " although I have a grammar of Ithkuil on my table right now, and that might be an interesting candidate. One could argue whether that's natural, though " - according to wikipedia, it is a constructed language with no users. (ie artificial language). that artificial languages have few users. probably ithkuil have some problems, like limits of meaning are not established. since it has 0 users, when new people start to use it, they are going to change that limits. --QDinar (talk) 09:11, 4 September 2021 (UTC)
i saw that aim of your proposition is to write something once and to get it in multiple languages. with using structured text for all languages it is also possible, because one wikipedia can get a structure from another wikipedia. and, seems, it also can solve the "uzbek uncle" problem (they say there are different words for mother's brothers and for father's): if there are several languages with such "uncle"s, they can reuse structures, one wiki from another wiki. --QDinar (talk) 20:20, 19 September 2021 (UTC)
- they already solved that "uzbek uncle" problem in the forum: user: ArthurPSmith said "If something is unknown then a fall-back construction ("brother of a parent" rather than a specific sort of "uncle") should be fine" in Talk:Abstract_Wikipedia/Archive_3#Comments and user:DVrandecic (WMF) and user:Chris.Cooley agreed. and, i want to say, another way to encode it is "paternal uncle or maternal uncle". --QDinar (talk) 15:30, 27 September 2021 (UTC)
examples in Abstract Wikipedia/Examples/Jupiter do not have complex things like "Object_with_modifier_and_of", for example, "and" is shown as a single element. that complex thing is also discussed in Talk:Abstract_Wikipedia/Examples#Object_with_modifier_and_of and a version with more separated functions ("constructors") is proposed. so, if you go into that direction, you are going to almost copy english language, so, in direction to my proposition. --QDinar (talk) 20:20, 19 September 2021 (UTC)
- also, i have seen somewhere, seems there was an example of how rendering works, (or, maybe, i saw it here: "<person> is (notable for) <role in activity>, rather than instance of (P31) human (Q5)...", by user:GrounderUK, i am not sure that he says about structure, maybe just template, (meaning, a string with places for arguments), and also in an early proposal it is clearly explained as templates), in process of rendering a text in a natural language, you go through phase of stucture of that language. and, i think, it is inavoidable. so, structures like i proposed are anyway going to be used, but you might use more traditional approach with grammatic parts of speach, cases, gender, etc, and, i think, i propose very simple and universal grammar... and you are (or was) going to have it inside wikifunctions, on-the-fly, or, maybe, cached, and by my proposition, they should stay with the languages' wikis. having that structures in respective wikis is (must be) very useful. --QDinar (talk) 21:13, 19 September 2021 (UTC), last edited 19 sep 2021, 21:49 utc.
additional arguments against the "Abstract Wikipedia" proposal:
1. languages have equal right, but you are or were going to put a language in a governing position. putting all languages in equal rights position should be just a principle, that should be accomplished/performed.
2. developing a new language seems is not in scope of aims of wikimedia.
3. seems, as i saw from the talk archives, you hope you can just divide meanings as much as it is needed to make them usable for all languages. (btw, i have just seen, 2nd meaning of "abstract" in wiktionary is "Something that concentrates in itself the qualities of a larger item, or multiple items."). but, i think, just division is not enough; borders of meanings are a little shifted in every natural language compared to other languages. so, it is not going to be possible easy to perfectly translate from the principal language to other languages with only that amount of precision in the principal language. and, i want to say, lexems are like stones or bricks of different forms, and when you tell an idea, the idea is like a castle. you can build castles of almost same form with different sets of bricks and stones, and every language is like a different set of bricks and stones. you need very small bricks to be able to resemble the castles built with different stones precisely enough to rebuild also their every stone. (for example, the additional data like "subject", "quality", "class", "location constraint" in coding of "Jupiter is the largest planet in the Solar System." in the "jupiter" example probably still does not give enough precision for the goal.)
4. esperanto is most widely used constructed language, according to wikipedia, created in 1887, it has now nearly 100000 speakers. it is still few compared to some natural languages. this your language is still a different language, not english, by its structure, even as shown in the "jupiter" example, (in the "san francisco" example it is more different). number of speakers of it also probably will, expected to, grow slowly. even more so (slow) if/since it is going to be so enormously precise, compared to natural languages, as i explained in the previous paragraph.
--QDinar (talk) 00:53, 20 September 2021 (UTC) last edited 05:30, 20 September 2021 (UTC)
- 3. even a language with so many tiny lexems would not be enough to be easily translatable to natural languages, because when an idea is built, it is like one castle from one set of stones, to show other language versions other castles should be built. --QDinar (talk) 04:50, 20 September 2021 (UTC)
- such abstract language could be used to more exactly/precisely describe idea/reality, minimizing ambiguities given by language, but anyway, when that is translated into normal languages, they have to use their lexems.--QDinar (talk) 09:42, 21 September 2021 (UTC)
- 3. even human translators do not make perfect translation, but best possible translation. if they try to deliver exact meaning of original text, they have to make long translation, like explaining every lexeme. --QDinar (talk) 09:42, 21 September 2021 (UTC)
- 3. "borders of meanings are a little shifted in every natural language compared to other languages" - i wanted to find examples for this. i clicked "random article" in english wikipedia and tried to translate paragraphs with google translate.
- first result is w:Bronchogenic cyst and when i translated second paragraph i see this: "young adults" is translated as "молодых людей" (nominative "молодые люди"). this is correct translation, by google. a rule-based machine translation could not translate it correctly. if you see wikt:adult#Translations it is translated as "vzrosliy", and translation of "young" is "molodoy". it would translate it as "molodoy vzrosliy". that is not a usual way russian people speak. though, it is not logically incorrect. as i saw in talk archives, such translations are ok for you.
- next problem is found in the same paragraph. "difficulty breathing or swallowing", it is correclty translated by google as "zatrudnyonnoye dykhaniye ili glotaniye", which is not direct translation, it is "difficulted breathing or swallowing" by its structure, this is the way the russian people speak. rule based mt would translate that as "trudnost [vo vremya] dykhaniya ili glotaniya", part in square brackets is optional, or "trudnost pri dykhaniyi ili glotaniyi", and they would not be incorrect.
- these 2 examples are very little mistakes. maybe these are the cases i felt like stones of slightly different shape. i tried also to find similar mistakes that lead to serious mistranslation, but have not found yet. and, seems, if to use the way to translate i suggested, this things also could be translated with similar quality, while not creating a new language, and having many ready texts, much more easily usable. this your project need not only be absolutely worthy, but it has to compete with alternative potential decisions. while searching for the examples, i have got an idea to challenge you with some sentences, to see, how can you encode them into the abstract language, and how much that code will look easier and qualitilier translatable, compared to the method i suggested. if that method, proposed by me, also can give translations of same quality, why to bother with that encoding process?
- --QDinar (talk) 21:18, 27 September 2021 (UTC)
- 3. also, "young" can be translated with 3 words into russian, according to wiktionary. "molodoy", "yuniy", "mladoy". i said about problems beyond division of meanings. but that division also mean, automatically, that in some cases the 2nd, or third word maybe more appropriate, and the first word, like have slightly shifted, soft border in that place. --QDinar (talk) 18:04, 4 October 2021 (UTC)
- 3. this argument is also against my proposal of "structured text". that method also will give slight mistranslations. but, the structured text method can be more easily used with ML technique, because such structures are easier to build from natural language, and then that texts can be used to train ML, then further building of that stuctures is going to be even more easier, just editing ML results. --QDinar (talk) 20:45, 4 October 2021 (UTC), edited 15:33, 6 October 2021 (UTC)
- 2. replied in #maybeworthy and #why_not_to_use_a_ready_"abstract_language"?.--QDinar (talk) 20:45, 4 October 2021 (UTC)
You have raised a lot of different points, and many arguments that I fully agree with. But given the structure of your replies, it is hard to answer them. Thanks for breaking a few of the questions out in answerable chunks below.
In the end, it is basically just templates that are being filled and that work across languages. Let's stick with the Superlative
constructor we discuss below. With these four arguments one can create a lot of sentences - and all of them seem to be quite easily expressed in many different languages. May I propose we focus on this single example, in order to crystallize your criticism? --DVrandecic (WMF) (talk) 23:32, 24 September 2021 (UTC)
- how are you going to encode sentences like "Jupiter and Saturn are two largest planets in the Solar System." and "Hydrogen is smallest atom"? ie with 1 more argument or 1 less argument? (according to the example 4.3 "two" is going to be an additional argument...) are you going encode them with different zid, or use same function with possibity of empty arguments? --QDinar (talk) 21:08, 26 September 2021 (UTC), edited at 21:20
- how can you say it is just templates, if in languages with genders they have to render it differently depending on gender of the arguments? --QDinar (talk) 21:12, 26 September 2021 (UTC)
- and, how can you say it is just template, while if you put constructions as arguments, like "planetary systems in Local Bubble" instead of just "Solar System", renderers of some languages may need to detect the head of the construction, "system" or plural "s", to add suffixes to it? --QDinar (talk) 15:56, 27 September 2021 (UTC), edited 18:47, 27 September 2021 (UTC)
- i think, head there is "s", its meaning "multiple objects of type given in subordinate". --QDinar (talk) 15:41, 6 October 2021 (UTC)
- main meaning of the "Superlative" example code is simple: it is superlative. arguments maybe anything. the superlative function corresponds to "est" lexeme of english, (or, alternatively, they can be divided into 2 lexemes: "est" and "most"). problem of the "superlative" code itself is as much (or little) as of the "est" lexeme. seems meaning of "est" is simple and it must be very similar across languages. --QDinar (talk) 17:46, 4 October 2021 (UTC)
- seems, you do not know exactly yet how to use that "superlative" function. in the Abstract_Wikipedia/Examples/Jupiter#Sentence_4.3 example you have put subject out of the function's argument list. --QDinar (talk) 17:50, 4 October 2021 (UTC)
- you are going to make users to select from variants like "location constraint", "time constraint", "category constraint". there maybe also just "constraint". and seems it can be left unused. while in the alternative i proposed that is not planned. the location constraint is just added outside of "((large est) planet)". the info of that it is constraint of "est" is inside the "est" and in "in", and how they are put together. and that info is even not needed in most cases, just translate "est" and "in" to corresponding lexemes of target language, and then after generating linear text it can look ok, acceptable. --QDinar (talk) 18:39, 4 October 2021 (UTC)
an additional argument against the "Abstract Wikipedia" proposal: you are hoping to create something better than the natural languages that evolved for thousands of years. (that looks/sounds impossible to achieve; though you may potentially outcompete in some aspects / by some criteria). --QDinar (talk) 20:56, 4 October 2021 (UTC)
are sentences "z-objects"?
are sentences planned to be "z-objects"? --QDinar (talk) 13:02, 17 September 2021 (UTC)
- Yes, a sentence would be represented as a Z-Object. E.g. "Jupiter is the largest planet in the Solar System." would be represented as
Superlative(subject: Jupiter, quality: large, class: planet, location constraint: Solar System)
(and all of that would be ZIDs, so in real maybe something likeZ19349(Q319, Z29393, Q634, Q544)
. -- DVrandecic (WMF) (talk) 22:42, 24 September 2021 (UTC)
will programming languages be converted to own language first?
is wikifunctions planned to have its own language? is it something like lambda calculus? are all other supported languages planned to be converted to that own language first, before being interpreted? --QDinar (talk) 13:02, 17 September 2021 (UTC)
if there is an own language, i would like to see some small example code in it, like for fibonacci sequence. --QDinar (talk) 18:02, 19 September 2021 (UTC)
- No. Code written in Python will be evaluated by a Python interpreter, code in JavaScript by a JavaScript interpreter, Code in C by a C compiler and then executed, etc. What little system we have to compose such function calls together will be on top of code in such programming languages, not a common runtime we convert everything to first. -- DVrandecic (WMF) (talk) 22:46, 24 September 2021 (UTC)
- i have not seen that this is clearly written in the description pages. please, write, and/or link to that place, from here. --QDinar (talk) 19:48, 4 October 2021 (UTC)
- I hoped that would be visible either from the description of the tasks, or the description of the phases. We were always planning to support native code in an existing programming language. --DVrandecic (WMF) (talk) 00:42, 24 November 2021 (UTC)
what api will be used?
is http-web api planned for wikifunctions? is the functions planned to be called through web api or other way? --QDinar (talk) 13:02, 17 September 2021 (UTC)
- We offer already a Web API to call functions from Wikifunctions, see https://notwikilambda.toolforge.org/w/api.php?action=help&modules=wikilambda_function_call. We might not always use the API for our internal use cases (e.g. Wikipedia calling a function might not go through an HTTP request), it depends on what is efficient. -- DVrandecic (WMF) (talk) 22:49, 24 September 2021 (UTC)
why not to use ready programming language implementations?
what do you think about idea to just use usual programming language interpreters? ie code can be in wiki page, and in can be runned. some dangerous functions can be removed or turned off in order to save from hacking/vandalism. --QDinar (talk) 13:02, 17 September 2021 (UTC)
- Great idea - and we do that! Python code is run by the standard Python implementation, JavaScript by Node, etc. The way we hope to avoid dangerous functions is by running them in their own containers with limited resources and no access to the outside world. The architecture is described here. -- DVrandecic (WMF) (talk) 22:51, 24 September 2021 (UTC)
what are z-ids?
what is origin of "z" letter in z-ids? are there already z-ids in wikidata? as i understood, z-ids just replace multiple natural language strings, is it so? if it is so, why function names like "Object_with_modifier_and_of" also not replaced with them? the code in right block in https://notwikilambda.toolforge.org/wiki/Z10104 is hard to understand. are the z-codes in it planned to be replaced with natural language strings? --QDinar (talk) 13:02, 17 September 2021 (UTC)
- You are totally right! "Object_with_modifier_and_of" is just the English name of the function, in reality it is identified by a ZID, and it will have a different name in Arabic, and in Russian, and in German, and in Tatar, etc. That is true for all of our functions. The "and" function is called "i" in Croatian, but the ZID in Notwikilambda is Z10026. In the User Interface though, we will try to hide all ZIDs, and instead display the names in your language. -- DVrandecic (WMF) (talk) 22:55, 24 September 2021 (UTC)
- there are 5 questions in main text (paragraph, body) part of my question. you answered third and 4th. what can you say about others? --QDinar (talk) 19:23, 4 October 2021 (UTC)
- There are no Z-IDs in Wikidata yet. --DVrandecic (WMF) (talk) 00:43, 24 November 2021 (UTC)
response to an old reply (in the archives) about ML
in Talk:Abstract_Wikipedia/Archive_2#Wikidata_VS_Wikipedia_-_General_structure user:DVrandecic (WMF) said:
1. "we don't have automatic translation for many languages"
2. "The hope is that Abstract Wikipedia will generate content of a consistently high quality that it can be incorporated by the local Wikipedias without the necessity to check each individual content. .... translation doesn't help with updates. If the world changes, and the English Wikipedia article gets updated, there is nothing that keeps the local translation current."
i want to say that:
1. i saw news saying yandex has developed machine translation for bashkir language, using tatar language, because this languages are very similar, and there are more content in tatar. (so, more languages may appear, (in ML), using such tehniques).
2.
i doubt that the way you are going to use is going to provide more stable results than the ML. users will constantly edit the functions, renderers, constructors, the abstract code, and probably something is going to brake also. so, an idea have come just in my mind: in case of editing a renderer, if all cases of using that renderer are linked to it, editor may check all the use cases before applying changes... if that use cases are not too many...
it is possible also to make ML automatical updates more easy to check. if after one or several updates some user is notified with them, and given an easy to read page showing differences that are made in original page, and differences that are going to be made in translation.
though, there is a stronger argument against ML in this forum in Talk:Abstract_Wikipedia/Archive_3#Might_deep_learning-based_NLP_be_more_practical? by user:Stevenliuyi: "a ML-based system will make sentences more fluent, it could potentially turn a true statement into a false one".
--QDinar (talk) 23:11, 19 September 2021 (UTC)
- I very much look forward and hope that more high quality machine translation will become available for everyone. I think there's a window of opportunity where Wikifunctions / Abstract Wikipedia will provide knowledge in high quality in languages where machine translation will not yet.
- I like the idea of showing the results when editing. Our first implementation of that idea is to do that with the testers. But as the system develops, we will probably be able to use more of the system to let the contributors understand the impact of their edits. -- DVrandecic (WMF) (talk) 23:16, 24 September 2021 (UTC)
Boilerplate functions
From my point of view Scratch is a good example for a low-coding-plattform. It is easy possible on it to create a program. In Scrath there are Boilerplates with gaps and it is possible with drag and drop to take the different parts that have different looks and it is possible to connect the parts to a function if they can belong together. From my view for at least some functions that principle could be a possibilty with a lower barrier for creating functions after the boiler plate templates could be translated into other languages to reach people with lower coding knowledge. Making it for them possible to create a function. Have you thinked about offering a possibility like that in the User Interface.--Hogü-456 (talk) 20:42, 25 September 2021 (UTC)
- I created a script and with what it is possible to convert a script in a text file in a COBOL-like language to code in R. The definitions of the structure of the sentences are in a CSV-File and you can find the code in https://public.paws.wmcloud.org/User:Hog%C3%BC-456/TexttoCode/Structured%20Text%20to%20Code/. This is an example for what functions in Wikifunctions could be used and I think that reduces the barrier to create a program and if I create more examples I collect through that functions. For me it is important that it will be possible in Wikifunctions to use the functions Offline. For example in schools there are sometimes restrictions regarding web services and it is from my point of view good if data is not transferred to another party if not neccessary. What do you think about the program that I wrote. Do you think that this can be helpful if there are more sentences and their code equivalent defined.--Hogü-456 (talk) 20:58, 22 November 2021 (UTC)
- Regarding "Will it be possible to run Wikifunctions functions offline?" - we hope so! We hope that there will be evaluation engines that can run offline, and where you can then have certain functions available to run them on your own hardware, yes. We really want to support the creation of such evaluation engines, and hope that they will help with having many people run Wikifunction functions in all kind of environments. --DVrandecic (WMF) (talk) 00:21, 24 November 2021 (UTC)
- Regarding "code that takes a COBOL-like language and translates it to R", I do hope that we will be able to support that kind of functions in Wikifunctions as well, basically a compiler or transpiler from a specification language you define to another language such as R. I took a look at your directory, but have to admit that I didn't exactly figure out how it works. But yes, having several layers of code build on top of Wikifunctions, for example to have a simple declaration of Wikidata queries which then gets compiled into SPARQL and executed - that would be very good to have in Wikifunctions! I hope I understood your suggestion. --DVrandecic (WMF) (talk) 00:25, 24 November 2021 (UTC)
- Regarding a Scratch-like interface, yes, that would be awesome. I am not saying everyone would need to learn Scratch in order to implement functions for Wikifunctions, but it would be great if, besides the usual text-based languages such as Python or JavaScript, we would also support Scratch as a programming language, including its UX. --DVrandecic (WMF) (talk) 00:26, 24 November 2021 (UTC)
- I am currently using Snap!. This is a further development of Scratch and it is there possible to define own blocks and export the results as XML. I tried that in the last days and I wrote a program to create out of the export a result in the programming language R. This does work now so far and I am able to write programs with that. I dont have yet put all functions I regular use into blocks but I am working on it. The result is published on my PAWS-Profile. I need to check the license compatibilty of the AGPL 3.0 and GPL 2.0 or later further. So this is currently the thing where I am not completely sure if it is allowed to combine such software. I only use the export of Snap-XML at the AGPL licensed part in the program and the structure I parse out of that is the begin and the end of the blocks and the content of the blocks what are all own defined ones. So I currently see there no problem. Do you have experience with the the compatibility. Under what license am I allowed to publish the result or is it not allowd to combine it.--Hogü-456 (talk) 22:12, 11 January 2022 (UTC)
- I asked the developer team of Snap! and they told me that the exported XML-Projectfiles are free. And that at the page of Snap there the published scripts are licenced CC-BY-NC-SA (or, optionally, CC-BY-SA). They also told me that there is a Codification feature so it is possible directly in Snap to convert the result to another programming language. I do not yet understand the feature detailed and so I dont know how to export the generated source code. From my point of view after Snap has the feature it can be helpful to use it and it is a example of a low barrier entry for creating programs. Maybe there can be a cooperation with the development team of Snap.--Hogü-456 (talk) 20:58, 18 January 2022 (UTC)
- That's pretty awesome! I would love for us to support Snap! or Scratch or something of our own, possibly based on Blockly, in Wikifunctions. I filed T301418 to keep that on the list of things to do for Wikifunctions. --DVrandecic (WMF) (talk) 21:48, 9 February 2022 (UTC)
- Regarding a Scratch-like interface, yes, that would be awesome. I am not saying everyone would need to learn Scratch in order to implement functions for Wikifunctions, but it would be great if, besides the usual text-based languages such as Python or JavaScript, we would also support Scratch as a programming language, including its UX. --DVrandecic (WMF) (talk) 00:26, 24 November 2021 (UTC)
why not to use a ready "abstract language"?
many natural language generation projects are listed in Abstract_Wikipedia/Related_and_previous_work/Natural_language_generation. why you decided to develop a new coding standart, instead of using of a ready system? --QDinar (talk) 19:35, 4 October 2021 (UTC), edited 19:37, 4 October 2021 (UTC)
- Because we don't know which one of these is the right one, so we offer a platform where the community can either re-use an existing one, or come up with their own. --DVrandecic (WMF) (talk) 00:27, 24 November 2021 (UTC)
- as i know, you planned to develop just one new coding standart, with many human languages on outer surface, but inner code with z-ids was going to be a language, (determined with many constraints). have you now decided to allow other "abstract" languages to be added, like new human languages can be added to wikimedia projects (through incubator wiki, and other possible stages)? --QDinar (talk) 18:56, 2 December 2021 (UTC)
- I was hoping we would have only one single abstract language, although it might include several ways to say the same thing. We have not yet decided where the abstract content would be stored, i.e. on an incubator wiki or some other place. This is a discussion we are going to have early next year. -- DVrandecic (WMF) (talk) 20:55, 10 December 2021 (UTC)
- as i know, you planned to develop just one new coding standart, with many human languages on outer surface, but inner code with z-ids was going to be a language, (determined with many constraints). have you now decided to allow other "abstract" languages to be added, like new human languages can be added to wikimedia projects (through incubator wiki, and other possible stages)? --QDinar (talk) 18:56, 2 December 2021 (UTC)
i would like you allow many "abstract languages"... using only one language is like dictatorship, not freedom. also, allowing different languages would allow separate your language project from wikimedia, or develop several languages within wikimedia. your project may not gain big communtiy of developers, and other projects may already have big communities. but how that can be made? seems, the listed, by the url, languages are not {multilingual like your project}. such codes could be hold on every language wikipedia. they could be mixed with traditional code included within tags like <code lang="..."></code>. several stages of generation of string can be made available to see in a tab near "edit source" tab. generated wiki code may be in a tab. a code with structured text (with structure of human language text) just before linearising it (into wiki code) can be shown in an other tab, if available... in case a multilingual code is used, it can be included with tags like <includetext from="Z..." targetlang="..." />. its wikitext and structured text also can be shown in tabs. --QDinar (talk) 20:50, 14 December 2021 (UTC)
Mention the original name Wikilambda
There is message box on top with a footnote, but nonetheless I want to suggest that a (near) native English user adds in the text passage about Wikifunctions some note about the original name Wikilambda for various reasons:
- This name appears several times in the timeline and background sections.
- The extension still bears the name Extension:WikiLambda.
- The new logo for Wikifunctions contains a lambda.
Because of the translation system I do not simply want to add text, but my own suggestion would be: “Originally it was named Wikilambda derived from the Lambda calculus. The name Extension:WikiLambda and the Wikifunctions logo containing a lambda still are reminiscences.” Is this OK? — Speravir – 23:17, 28 October 2021 (UTC)
- @Speravir: Thank you for the suggestion! I've added it to Abstract Wikipedia#Background. Sorry for the delayed response. Quiddity (WMF) (talk) 21:33, 22 November 2021 (UTC)
- @Quiddity (WMF): Thank you nonetheless. — Speravir – 00:30, 23 November 2021 (UTC)
How will Abstract Wikipedia work from the editors point of view?
I am a German with an interest in diplomats, using the VisualEditor to create and update articles. Abstract Wikipedia sounds like a great idea for my area of work. When I create an article about the new ambassador of Germany to, say, Sweden, it would be great if that article is immediately available to swedish readers (and others worldwide) as well. And vice versa, there are ambassadors from many countries in Germany, so it would be a very efficient use of resources, if we do not have to create and maintain articles about them in parallel in various languages.
Now I read in the explanations and discussions a lot about programming. Is that, what Abstract Wikipedia is going to be? Just another programming language?
Do you expect me to learn this new programming language and all its functions in order to contribute to Abstract Wikipedia? Or will that be a background functionality, so that I can continue to create the article using the VisualEditor and then push a button which will then translate the (in my case German) article into the objects required for Abstract Wikipedia?
--Wikipeter-HH (talk) 16:21, 29 November 2021 (UTC)
- @Wikipeter-HH: That's a great question! And in many ways the answer is "We don't know yet". There might be several ways that we will explore, here is just one possibility. I will describe it a bit, but if that isn't enough we can also make a few mockups.
- First, it will likely not be like VisualEditor nor like Programming.
- So, let's assume we want to create a new biography for an ambassador. Let's say the first two sentences are "Malala Jones (born January 14, 1984 in Fairfax, VA) is the current ambassador of the United States to Nigeria. She is a former member of the girlband Foxy Fairies."
- When starting an article, we would need to select a constructor for the first sentence (selecting the right constructor will be one of the hardest parts). We'll likely have a constructor for "Biography start definition". Once we select that constructor, we will get a form that has several fields. In this case I could imagine fields such as "first name", "last name", "date of birth", "place of birth", "position".
- There would be a page describing the constructor and the fields, and we could play around to see how it creates sentences in different languages. I imagine that "position" is something like a specific, unique role (which is why the constructor is called a definition, it identifies a specific individual). So for position we would need to use another constructor.
- If we are lucky, there will be a constructor for ambassadors, which might ask "current" (a checkbox), "since", "until", "from", "to". So we choose the ambassador constructor, and another set of forms opens up and allows us to choose values.
- So we would be building up the sentence. The catalog of constructors defines the expressivity we have available.
- Once we are happy with the first sentence, we could add a second sentence (by clicking in the right place), and use a new constructor, e.g. "Person description", and it could ask for fields such as "person", "description", "from", "to", "location", etc. We again would select the Person (and if the renderers do their job right, it would either say "She", or "Jones", or whatever is appropriate), we would leave the "from", "to", and "location" fields free, but in the "description" field we could be happy to have a "band member noun phrase" constructor, which allows us to set "former", and add the band. The band could be an item from Wikidata, or a simple name.
- But in general, the rough idea is to have a lot of forms, to fill them up with more forms, or with items or constructors, and to have a very clicky and restricted interface. If something needs to be changed, we would again click on edit, and change it in the forms. It won't be as easy as writing an article in a specific language. But it will allow to create content in many languages at once.
- One idea we will be exploring is to have a natural language input box, where you can just write in natural language, and then we have a classifier or parser that tries to figure what the right constructors would be to create similar text. It might then switch the sentence "She is a former member of the girlband Foxy Fairies." to "She was a member of the girlband Foxy Fairies." (i.e. adjust the tense, drop the adjective), depending on what kind of constructors are available. So you would type in natural language text, it would guess the constructors, show you the text output in the languages you are comfortable with, and allow you edit the forms of the constructors directly to fix errors.
- I hope that we will hear more ideas for the UX as we get closer to this, and I expect that our designers will be very busy to create usable workflows and user experiences.
- All the constructors I named here are just suggestions. I don't want to be prescriptive about what kind of constructors we should or will have.
- I hope this helps a bit! -- DVrandecic (WMF) (talk) 01:38, 11 December 2021 (UTC)
- Hi @DVrandecic (WMF):,
- thanks for the detailed explanation. That helps me a lot in understanding the route you are pursuing. What you describe reminds me of ancestry.com. You can enter details of a persons life (birth, parents, marriage, children, ...) and it will create a story from that. However, that is a sequence of loose sentences and sometimes a bit boring (His son John was born 12th February 1907. His daugther Mary was born 31. July 1909. ...). In my view a Wikipedia Article should be a coherent text which makes an interesting read. That is where authors writing in natural language make a difference. I do like the idea of the natural language input box, which is fed into an (artficial intelligence based?) parser. That tool will actually be inevitable to get the millions of existing articles into Abstract Wikipedia.
- I will keep an eye on the further developments and if you need a pilot user, please do not hesitate to contact me. --Wikipeter-HH (talk) 12:32, 11 December 2021 (UTC)
- @Wikipeter-HH: Thank you! I had this mock-up I did much earlier, and used in a few talks, but couldn't find it on Commons. So I uploaded it now. Maybe that helps, too.
- Yes, indeed, the series of sentences are expected to be more boring and monotonous than hand-written text. It depends on how many and what kind of constructors we have to see how fluent we can make the results be. -- DVrandecic (WMF) (talk) 02:10, 14 December 2021 (UTC)
Sub-pages
- Project pages:
- Talk pages:
What about implicit bias, THE TRUTH and fuzzy logic?
After reading here and there and even on Github and elsewhere, I have many questions about ontology behind this project, probably naive, and I am not sure if it is the right forum. Also, I tend to (ab)use hrefs, examples and metaphors, so do tell me if anything is unclear:
1. Why do you assume that TRUTH is universal?
2. What about counterexamples:
Given a constructor Superlative with the keys subject, quality, class, and location constraint, we can have the following abstract content:
Superlative( subject: Crimea quality: large, class: penisula, location constraint: Russia) Superlative( subject: Taiwan, Hainan quality: large, class: island, location constraint: China)
In Wikifunctions, we would have the following function signature:
generate text(superlative, language) : text
[...] The application of the function to the abstract content would result in the following output content:
(in English) Crimea is the largest penisula in Russia.
(in Croatian) Krim je najveći poluotok u Rusiji.
...
(in English) Taiwan is the largest island in China, with Hainan being the second largest one.
(in French) Taïwan est la plus grande île de Chine, Hainan étant la deuxième plus grande...
or anything with "Allah", "Prophet", "Kosovo", "Palestine" in its "superlative subject" field, or other examples from the Lamest edit wars set?
Indeed, "Romanian Wikipedia on the other hand offers several paragraphs of content about their [river] ports" but so did the Croatian one offer detailed information about their Jasenovac camp.
3. What about the resulting output being un-PC, with "GPT-3 Abstract WP making racist jokes, condoning terrorism, and accusing people of being rapists"? (Who is "a terrorist" in the first place?)
4. Has it been discussed somewhere else? Is there a FAQ for such advocati diaboli as myself?
Zezen (talk) 09:54, 14 December 2021 (UTC)
- The underlying question seems to be: How will the community decide on how to write content, especially for complex topics? Just like usual - via discussions, guidelines, policies, style guides, and other standard wiki processes. Quiddity (WMF) (talk) 19:00, 15 December 2021 (UTC)
- No. So mea cupla for being unclear.
- The underlying challenge (question) is the implicit abandonment of the WP:PILLARS thereby.
- I can see now that Heather Ford had mentioned parts of these Points 1 and 2 above in Wikipedia@20's chapter, "The Rise of the Underdog": To survive, Wikipedia needs to initiate a renewed campaign for the right to verifiability. ... the ways in which unattributed facts violate the principle of verifiability on which Wikipedia was founded ... [consumers] will see those facts as solid, incontrovertible truth, when in reality they may have been extracted during a process of consensus building or at the moment in which the article was vandalized... search engines and digital assistants are removing the clues that readers could use to (a) evaluate the veracity of claims and (b) take active steps to change that information through consensus...
- Add to this Quid est veritas? or these basic THE TRUTH enwiki essays, including the WP:NOTTRUTH policy itself, for a more eloquent and strategic challenge @Quiddity (WMF) and the other gentle readers.
- Should you dislike epistemology, the "rights" and anything similarly vague, abstruse, perplexing or nebulous referenced in my original comment, do answer my challenge 2:
- generate text(superlative, language) : text -> Crimea is the largest penisula in Russia.
- TRUE or FALSE? Shall we accept Abstract Wikipedia generating it?
- If you say TRUE (or FALSE, but yes, we accept it), then move to Challenge 3 with these Wired examples... Zezen (talk) 12:18, 16 December 2021 (UTC)
- this is not a logical processing machine. this is just another human but artificial language, like esperanto. you can say anything in it. so, i think, they accept it. for challenge 3, i read about that news several months ago, and it is about ML, and this is not ML. --QDinar (talk) 17:02, 16 December 2021 (UTC)
- by "logical processing machine", i mean w:Logic programming systems, languages. as i saw, this is not positioned as one of them. it has some logic, but i have not seen intention to make it mathematically/logically accurate/perfect. otherwise there should be some theory, things like axioms, i assume. --QDinar (talk) 11:29, 20 December 2021 (UTC)
- but a question is left: if traditional wiki used different text in different languages, this has to have one text... solutions: 1. show different points of view. 2. just do not use the abstract code in that cases, this was already suggested by user:DVrandecic (WMF). you asked, where it was discussed? check out talk archives here, if you have not seen they exist. --QDinar (talk) 17:08, 16 December 2021 (UTC)
- this is not a logical processing machine. this is just another human but artificial language, like esperanto. you can say anything in it. so, i think, they accept it. for challenge 3, i read about that news several months ago, and it is about ML, and this is not ML. --QDinar (talk) 17:02, 16 December 2021 (UTC)
Test wiki not working
@Lucas Werkmeister Can't create account on test wiki. It says my username is already taken. Wargo (talk) 12:19, 28 December 2021 (UTC)
- @Wargo you shouldn’t create an account, just log in via OAuth (Special:UserLogin should redirect you to metawiki). Lucas Werkmeister (talk) 14:43, 28 December 2021 (UTC)
- Yes, I tried this but when I confirmed this application on Meta, it returned me to your site's login page with error message. Wargo (talk) 14:53, 28 December 2021 (UTC)
- I suppose a bug. --Wargo (talk) 15:52, 28 December 2021 (UTC).
- Yes, I tried this but when I confirmed this application on Meta, it returned me to your site's login page with error message. Wargo (talk) 14:53, 28 December 2021 (UTC)
Recent edits
Could someone please look at the recent edits to Abstract Wikipedia/Early discussion topics and Abstract Wikipedia/Ideas? I'm inclined to say that they should be reverted because it's far too late to add more content to those pages, but would like a second opinion. * Pppery * it has begun 19:56, 4 March 2022 (UTC)
- Hi Pppery. Both those pages are still open for contributions/discussions. However I agree the latest contributions make it complicated to "mark for translation"... Does anyone have ideas on how to restructure either or both pages? (I'll continue trying to rethink them next week, along with marking up the bullet-list at the top of the first page with which items are already completed). Cheers, Quiddity (WMF) (talk) 00:06, 5 March 2022 (UTC)
- I've marked both pages for translation. I don't see any major restructuring needed. * Pppery * it has begun 15:54, 19 March 2022 (UTC)
Wikimedia Hackathon 2022
In May 2022 there is the Wikimedia Hackathon at the 20th to 22th of May. Does this year some people of the Development team for Abstract Wikipedia and Wikifunctions attend at the Hackathon. I think it is a good chance to talk with other people that are interested in the new Wikimedia project Wikifunctions and to get new ideas.--Hogü-456 (talk) 21:03, 20 March 2022 (UTC)
- Yes. We will have a session (currently scheduled for 14:00 UTC on the Friday, though details and timing may change), and we are working on clarifying some ideas about hacking projects (many based on suggestions/discussion in the Telegram/IRC, mailing list, and elsewhere).
- If you have any specific ideas, for either session-topics or hacking projects, please do add them here!
- More details from the team's side, soon. Quiddity (WMF) (talk) 19:57, 29 April 2022 (UTC)
Comment on Multilingual Wikipedia with proposed examples
Hello. I understand that developpers are rightly very concentrated on the P1 part of Wikifunctions at present, but I would like to make a comment about Multilingual Wikipedia (or Abstract Wikipedia). I had the idea that Multilingual Wikipedia was a project which would be implemented on Wikifunctions, but apart from that it was independent and in principle it could have been done on another platform. Under "What is Abstract Wikipedia?" the article page states
The new wiki of functions, Wikifunctions, will develop the coding infrastructure to make this vision possible.
But a wiki cannot develop a coding infrastructure; that has to be done by people. It would be a great help if you could give your vision of who will develop the coding infrastructure and how they will come to work on the project. I think that you are implying that the WMF will be involved, and that is a useful point. From this and other Abstract Wikipedia pages I get the impression that Multilingual Wikipedia should emerge automatically from volunteers who will be creating functions without any particular management. But apart from a little playing, the volunteers will only create wikifunctions if they have a motive, and a specification to work from. First a framework needs to be defined, within which the Multilingual Wikipedia constructors would be developped and the requisite Wikidata data populated. I would be interested in any clues about how the process would play out.
I think there is a great need for very detailed proposals and on this page I offer two examples of how language text could be generated from abstract text, with a list of difficulties. I think that the rendering cannot be done in one pass, but instead there should be an initial pass which will create a parse tree and then a second pass to do the rendering with all the necessary information now available. I would be interested in any comments. Strobilomyces (talk) 19:45, 9 April 2022 (UTC)
- @StrobilomycesThat page is a lot of excellent work. Thank you. I’m not sure that two passes will be enough, but I prefer to think in terms of an end-to-end pipeline. In any event, I think there is no consensus yet on what “abstract text” should look like. We can look at the two ends of the pipeline as something like Wikidata (input) emerging as something like a Wikipedia article (output). But somewhere in between, we have an intermediate “pumping station” called “abstract content”. (Perhaps we need more than one kind of intermediate form, but I assume not.) Personally, I believe we shall make better progress by looking at the semantics of the finished articles (and/or at language-neutral templates for classes of articles) but we shall certainly need the ability to transform a single Wikidata statement directly into a sentence in any given language. If we tackle both ends of the pipeline at the same time, we are more likely (it seems to me) to begin developing a consensus around the form and function of the intermediate abstract content. (For previous thoughts, please see Talk:Abstract Wikipedia/Archive 2#Hybrid article.) GrounderUK (talk) 10:54, 12 April 2022 (UTC)
- @GrounderUK Hello. Thank you for your comment and for the link to the archived talk page, which I was not familiar with. Allow me to reproduce this important definition of article content "layers" by User:ArthurPSmith.
What I am envisioning is probably 3 or more layers of article content: (1) Language-specific, topic-specific content as we have now in the wikipedias, (2) Language-independent topic-specific content that would be hosted in a common repository and available to all languages that have appropriate renderers for the language-independent content (abstract wikipedia), (3) Language-independent generic content that can be auto-generated from Wikidata properties and appropriate renderers (generic in the sense of determined by Wikidata instance/subclass relations, for instance). I'm not entirely sure how the existing Reasonator and Article Placeholder functionalities work, but my impression was they are sort of at this 3rd or maybe even a lower 4th level, being as fully generic as possible. That probably makes them a useful starting point or lowest-functionality-level for this at least. The question of how to mix these various pieces together, and even harder how to present a useful UI for editing them, is definitely going to be tricky! ArthurPSmith (talk) 20:38, 28 July 2020 (UTC)
- For me the main Abstract Wikipedia task is layer 2 - to find a way of generating text in many languages from a single abstract text. This is the "hard problem", and I think that the other questions raised on the earlier talk page are minor implementation details. Layer 3 functions may be useful in templates or in something like Reasonator, but they come nowhere near the ambition of Abstract Wikipedia to render multilingual text. This is illustrated by Denny's example where you cannot put into the normal Wikidata structure the assertion that Marie Curie was the only person to win Nobel prizes in two different disciplines. The abstract text, whatever form it has, has to be created by a human, doesn't it? No-one thinks that a program can automatically think what to say about an item from the Wikidata claims?
- In the detailed examples of proposed abstract text which I have seen, I think that there is a good consensus on the structure of the abstract text; that it will be like a tree of constructor calls reflecting the structure of the sentence. The actual format doesn't matter much - whether it is text in lines, JSON-like, expressed with Wikidata properties, or whatever; that is a decision which can be made later. I don't understand how the abstract text can be something intermediate - I suppose you must be thinking of something like layer 3, but I don't understand how that can generate a useful encyclopedia article. Reasonator is something different and much more like standard computation. Surely the "input end" has to be the abstract text? The normal WD claims are just database-like entries about the subject, and the information which would construct a proper Wikipedia article just isn't there. For instance a proper article would include sentences which weren't about the subject of the article. You say "we shall certainly need the ability to transform a single WD statement directly into a sentence in any given language" (by a "statement" I suppose you mean an item claim similar to the ones which are already in WD). That is an easy goal, but I don't think it gets us any nearer to solving the main problem. I would be interested in seeing an example of how that could work to generate a typical actual Wikipedia sentence.
- When I talk about multiple passes (an "expand" pass and a "render" pass) I am saying that I think that there will have to be a first-pass function and a render-pass function for each constructor. You can't do it in a single pass as necessary syntactic information might come from anywhere in the tree. I don't know how that fits in with your pipeline.
- I have seen elsewhere the idea that just by chipping away at the edge of the problem (for instance by advancing Wikifunctions), we can eventually converge on a solution for Abstract Wikipedia. I completely disagree; in a software project you can go on forever in that way without really getting anywhere. That may be reminiscent of the Wiki way, but it will not work for software design. Instead it is necessary to concentrate on the most difficult parts of the problem and map out robust solutions there. Strobilomyces (talk) 08:20, 13 April 2022 (UTC)
- Yes, I agree that Arthur’s “level 2” is the “Abstract Wikipedia”, conceived of as a curated repository of “abstract content”. It is also the ‘intermediate “pumping station”’ that I referred to. Whether we can get to (useful default) abstract content directly from Wikidata using Wikifunctions is an open question, but it seems to me to be a useful aspiration, particularly for publications cited as sources in Wikipedias. I think this is a useful example because, of course, we want every Abstract Wikipedia “fact” to have a reliable source and appropriate sources may not be available in all languages. But this is not an especially hard language problem, since the Wikipedia target would (presumably) be our existing citation templates. It is nevertheless a far from trivial amount of necessary work. Well, if we can convert a populated citation template into a set of Wikidata statements and convert those statements into a default Abstract Wikipedia article, surely we can generate an Abstract Wikipedia article for any publication that has such statements in Wikidata. And if we find, for each supported language, a satisfactory representation, in encyclopaedic language, for the content of citation templates, we can automatically generate language-specific Wikipedia articles about publications more generally. This will already require some natural language features to be addressed to some extent, including genders, plurals, active and passive moods, as well as date representations and specialist vocabulary for types of author (poet, playwright, novelist etc). Without trying it, it seems likely that such fairly basic capabilities could usefully be extended to other creations and their creators. Looking more generally at infobox and other templates, it seems to me that there is a huge amount of useful work to be done, and how we prioritize that should be driven by the editors of the Wikipedias that will use the abstract content, as principal stakeholders (representing the readers of their Wikipedias). Whether we have a sufficiently robust architecture for delivering Wikifunctions solutions to these stakeholders rather depends on what their priorities turn out to be. That is not to say that we should delay decisions about the form and function of abstract content, but in making or deferring such decisions we should be very careful to reflect, with due humility, upon the needs of stakeholders who have yet to come forward and express their opinions. GrounderUK (talk) 12:19, 13 April 2022 (UTC)
- I think I understand your thinking a bit better now; you are hoping that level 3 work will grow into a level 2 system (whereas I don't think that that is possible). Certainly the new system needs to support citations, and that would be a small problem inside Abstract Wikipedia. There are already many references to journal articles and books in Wikidata and citations could be generated from those for any language WP (but this does not include translating the titles; that would be a level 2 problem and anyway legally difficult). But I don't understand why you say "surely we can generate an Abstract Wikipedia article for any publication that has such statements in Wikidata". Many publications have WP articles about them, but only citation-style information exists in WD - an article is quite a different thing from just a citation and it is not derivable from it. Generating a respectable article is a level 2 problem and I don't think it can be solved incrementally like that. A solution for general text needs to be worked out if the project is to achieve the previously stated objectives. That is not to say that a multilingual citation solution should not be developped with Wikifunctions, but it is not Abstract Wikipedia. It may touch on some natural language features, but only in an easily-manageable way.
- For me much of this page is very useful for knowing what the objective of Abstract Wikipedia is. I agree that the stake-holders must be consulted, and it may well turn out that Abstract Wikipedia is impractical and it would be better to just settle for a few level 3 systems (like a system for citations and templates without attempting to generate general text). Surely we should try to find the answer to this sooner rather than later? For the stakeholders to express their opinions I think they urgently need detailed proposals. Denny has described how the system should work in several places, such as here, but as I try to demonstrate in my proposal page, when you look into the detail it is terribly complicated. Strobilomyces (talk) 17:15, 13 April 2022 (UTC)
- Well, yes, it is terribly complicated! That is precisely why I do not see a “solution for general text” as a starting point. Even as a final destination, it seems too much to expect. I expect we shall always support some languages better than others and be better able to communicate some kinds of information. We need good solutions for selecting, structuring and supplementing information in a fairly language-neutral way. We also need to understand how we can deal with different cultural and editorial norms. For some languages (most, indeed all, initially), we will not have “respectable” articles for most subjects. Opinions will differ on whether we should focus on delivering more content or improving the quality of whatever happens to be available. Much as I look forward to seeing high quality natural language representations of some encyclopaedic content in some languages, I rather suspect that more (and more intelligible) content for more people will become our primary goal, with more and better natural language representations being seen as neither the speediest nor the most effective way to achieve that goal. Nevertheless, I support more and better natural language representations (more of them in more languages). And I believe that Wikifunctions can deliver these, if people are willing to contribute to that endeavour. Personally, I am comfortable with the current level of architecture definition but I am happy to collaborate on a further iteration. Watch out for me on your user page, @Strobilomyces! GrounderUK (talk) 13:07, 16 April 2022 (UTC)
- @GrounderUK Your view is very interesting and I believe that it is shared by some in WMF. But it seems to me different from the vision expressed by User:DVrandecic (WMF), which would need a solution for general text. I suppose, then that the users of Abstract WP would only see certain basic types of sentence. Anyway, it would certainly be interesting to see a detailed proposal.
- You say "I am comfortable with the current level of architecture definition" - do you mean you are confortable with the page Abstract_Wikipedia/Architecture? Or some other architecture page? Strobilomyces (talk) 19:45, 20 April 2022 (UTC)
- Well, yes, it is terribly complicated! That is precisely why I do not see a “solution for general text” as a starting point. Even as a final destination, it seems too much to expect. I expect we shall always support some languages better than others and be better able to communicate some kinds of information. We need good solutions for selecting, structuring and supplementing information in a fairly language-neutral way. We also need to understand how we can deal with different cultural and editorial norms. For some languages (most, indeed all, initially), we will not have “respectable” articles for most subjects. Opinions will differ on whether we should focus on delivering more content or improving the quality of whatever happens to be available. Much as I look forward to seeing high quality natural language representations of some encyclopaedic content in some languages, I rather suspect that more (and more intelligible) content for more people will become our primary goal, with more and better natural language representations being seen as neither the speediest nor the most effective way to achieve that goal. Nevertheless, I support more and better natural language representations (more of them in more languages). And I believe that Wikifunctions can deliver these, if people are willing to contribute to that endeavour. Personally, I am comfortable with the current level of architecture definition but I am happy to collaborate on a further iteration. Watch out for me on your user page, @Strobilomyces! GrounderUK (talk) 13:07, 16 April 2022 (UTC)
- Yes, I agree that Arthur’s “level 2” is the “Abstract Wikipedia”, conceived of as a curated repository of “abstract content”. It is also the ‘intermediate “pumping station”’ that I referred to. Whether we can get to (useful default) abstract content directly from Wikidata using Wikifunctions is an open question, but it seems to me to be a useful aspiration, particularly for publications cited as sources in Wikipedias. I think this is a useful example because, of course, we want every Abstract Wikipedia “fact” to have a reliable source and appropriate sources may not be available in all languages. But this is not an especially hard language problem, since the Wikipedia target would (presumably) be our existing citation templates. It is nevertheless a far from trivial amount of necessary work. Well, if we can convert a populated citation template into a set of Wikidata statements and convert those statements into a default Abstract Wikipedia article, surely we can generate an Abstract Wikipedia article for any publication that has such statements in Wikidata. And if we find, for each supported language, a satisfactory representation, in encyclopaedic language, for the content of citation templates, we can automatically generate language-specific Wikipedia articles about publications more generally. This will already require some natural language features to be addressed to some extent, including genders, plurals, active and passive moods, as well as date representations and specialist vocabulary for types of author (poet, playwright, novelist etc). Without trying it, it seems likely that such fairly basic capabilities could usefully be extended to other creations and their creators. Looking more generally at infobox and other templates, it seems to me that there is a huge amount of useful work to be done, and how we prioritize that should be driven by the editors of the Wikipedias that will use the abstract content, as principal stakeholders (representing the readers of their Wikipedias). Whether we have a sufficiently robust architecture for delivering Wikifunctions solutions to these stakeholders rather depends on what their priorities turn out to be. That is not to say that we should delay decisions about the form and function of abstract content, but in making or deferring such decisions we should be very careful to reflect, with due humility, upon the needs of stakeholders who have yet to come forward and express their opinions. GrounderUK (talk) 12:19, 13 April 2022 (UTC)