Talk:Abstract Wikipedia/Archive 1

Latest comment: 4 years ago by 尼普瑞斯 in topic Statement of opposition

Supporting signatures and statements

  1.   Support denny (talk) 00:11, 5 May 2020 (UTC) (as proposer)
  2.   Support Seemplez (talk) 10:21, 17 June 2020 (UTC)
  3.   Support Ainali (talk) 19:33, 5 May 2020 (UTC) This is a big, ambitious goal if any.
  4.   SupportArkanosis 19:34, 5 May 2020 (UTC)
  5.   Support GoranSM (talk) 19:35, 5 May 2020 (UTC)
  6.   Support, but I'll also add a few comments :) --Amir E. Aharoni (talk) 19:40, 5 May 2020 (UTC)
    Thank you! --denny (talk) 19:49, 5 May 2020 (UTC)
  7.   Support Finn Årup Nielsen (fnielsen) (talk) 19:44, 5 May 2020 (UTC) This is indeed a very interesting project. I have not read all of Denny's writing in this aspect, and I still haven't understood how users would be able to easy define "Constructors, Content, and Renderers", but I suppose that is part of a development process. The Wikilambda/Organization is not clear. Who is going to do it. Would Denny be free from Google? Would he be provided by funds from Wikimedia Foundation? What would be the inputs from Magnus Manske?
    Yes, the development process involves tasks to figure out to make the development of constructors, content, and renderers easy. That will be challenging.
    Regarding the organization, that is indeed a bit vague. I would love the Board to tell the Foundation to commit to the project, and then figure out the details (such as funding, personnel, etc). Before that, I think, it would not be fair to ask any individuals to commit. --denny (talk) 20:09, 5 May 2020 (UTC)
  8.   Support Mike Peel (talk) 19:49, 5 May 2020 (UTC)
  9.   Support Thadguidry (talk) 19:53, 5 May 2020 (UTC)
  10.   Support You may want to consider changing the name to WikiLambda since the lower case "L" and upper case "I" are indistinguishable and might be confusing. Fuzheado (talk) 20:01, 5 May 2020 (UTC)
    Yes, the name is just a preliminary strawman! I added your argument and a few others to the page on the name. At least the titles here have serifs! --denny (talk) 20:29, 5 May 2020 (UTC)
    Think that going old school with WikiWords in general with project names would be nice :) Zblace (talk) 17:12, 17 May 2020 (UTC)
    Added to naming page. --denny (talk) 17:46, 17 May 2020 (UTC)
  11.   Strong support-- Bodhisattwa (talk) 20:03, 5 May 2020 (UTC)
  12.   Strong support TiagoTorrent (talk) 20:04, 5 May 2020 (UTC) This is a very ambitious and promising initiative not only for the wiki community itself, but also for those working with language models in Computational Linguistics. As a member of the former group, I'm really supportive of this project and hope to be able to contribute to it with assorted framenetteries.
  13.   Support Interesting. Well-planned, and very well documented. I am interested and want to join in any capacity. There is a lot to read/watch, specially in the subpages/signpost etc. I'll get back with questions/suggestions (if any). All the best. -- Tito Dutta (talk) 20:11, 5 May 2020 (UTC)
  14.   Support. --Csisc (talk) 20:32, 5 May 2020 (UTC)
  15.   Support —M@sssly 20:33, 5 May 2020 (UTC)
  16.   Support -❙❚❚❙❙ JinOy ❚❙❚❙❙ 20:36, 5 May 2020 (UTC)
  17.   Support --Joalpe (talk) 20:38, 5 May 2020 (UTC)
  18.   Strong support Very excited about this.--Akorenchkin (talk) 20:42, 5 May 2020 (UTC)
  19.   Support Sturm (talk) 20:51, 5 May 2020 (UTC)
  20.   Support A logical extension of the work we already do, well-aligned with our mission, and has great promise. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:02, 5 May 2020 (UTC)
  21.   Support Blue Rasberry (talk) 21:03, 5 May 2020 (UTC)
  22.   Support --Casual (talk) 21:16, 5 May 2020 (UTC).
  23.   Support Wskent (talk) 21:24, 5 May 2020 (UTC)
  24.   Support --I9606 (talk) 21:39, 5 May 2020 (UTC)
  25.   Support Petermr (talk) 21:45, 5 May 2020 (UTC)
  26.   Support --GiFontenelle (talk) 21:51, 5 May 2020 (UTC)
  27.   Strong support Lazowik (talk) 21:56, 5 May 2020 (UTC)
  28. Very   Strong support Ederporto (talk) 22:34, 5 May 2020 (UTC)
  29.   Support Mitar (talk) 22:44, 5 May 2020 (UTC)
  30.   Support 非常支持~很需要! bridging a gap that the current Wikimedia movement can better serve its mission. Xinbenlv (talk) 22:52, 5 May 2020 (UTC)
  31.   Support - PKM (talk) 23:16, 5 May 2020 (UTC)
  32.   Support An optimal time for this! –SJ talk  23:32, 5 May 2020 (UTC)
  33.   Support Interesting project, may I propose the name WikiRosetta or WikiSetta?--BugWarp (talk) 01:16, 6 May 2020 (UTC)
    I am adding name proposals and considerations to the naming page. --denny (talk) 03:15, 6 May 2020 (UTC)
  34.   Support TiagoLubiana (talk) 01:41, 6 May 2020 (UTC)
  35.   Support --Palnatoke (talk) 04:35, 6 May 2020 (UTC)
  36.   Strong support This is an incredible idea. Arep Ticous 04:46, 6 May 2020 (UTC)
  37.   Strong support Looking at the state most Wikis are in, undeveloped, unreliable, and sometimes with nationalistic bias, this seems the logical next step. Even if it is more trickle-down then grassroot. --Hadmar von Wieser (talk) 05:33, 6 May 2020 (UTC)
  38.   Strong support -I like the idea, I would also like to join any capacity, and also feel that more discussion/suggestions needed for the project 'name'. -Suyash Dwivedi (talk) 09:15, 6 May 2020 (UTC)
  39.   Strong support Given the proposer and the extent of the proposal, this is an experiment that needs to be conducted. --Sannita - not just another it.wiki sysop 09:30, 6 May 2020 (UTC)
  40.   Support A very good idea, even if it does not bring all I hope for, it is worth the effort. Edoderoo (talk) 09:50, 6 May 2020 (UTC)
  41.   Support -- Regards, ZI Jony (Talk) 09:55, 6 May 2020 (UTC)
  42.   Support let's hope it's the right time...--Alexmar983 (talk) 10:26, 6 May 2020 (UTC)
  43.   Support what a powerful idea! Pundit (talk) 12:16, 6 May 2020 (UTC)
  44.   Strong support for building a multilingual platform. John Samuel 13:53, 6 May 2020 (UTC)
  45.   Strong support What a great idea! --Humni (talk) 15:03, 6 May 2020 (UTC)
  46.   Support I think this is a very exciting concept with great and good effect if succesful. And with great opportunity to learn from should it fail. DivadH (talk) 15:06, 6 May 2020 (UTC)
  47.   Strong support This idea holds great promise for non-English Wikipedias as well as for artificial intelligence tools. Gdm (talk) 15:39, 6 May 2020 (UTC)
  48.   Strong support Something like this will exist 10,000 years from now, so let's get started on it now. --Rosiestep (talk) 15:46, 6 May 2020 (UTC)
  49.   Support A worthwhile experiment with a lot of potential benefit, and no risk beyond some time investment. It is an idea whose time has come, or will be coming very soon, and it would be infinitely better implemented by Wikimedia than by a commercial entity. Ijon (talk) 16:53, 6 May 2020 (UTC)
  50.   Support Let's build YULdigitalpreservation (talk) 17:08, 6 May 2020 (UTC)
  51.   Support Brilliant new initiative. (talk) 19:39, 6 May 2020 (UTC)
  52.   Strong support I expect the project to provide the chance for better Wikidata item descriptions. ChristianKl22:38, 6 May 2020 (UTC)
  53.   Support Among other things, I think this should help us on Wikidata. Not to mention the great support it will be for editors of small wikipedia projects of course. Jane023 (talk) 07:35, 7 May 2020 (UTC)
  54.   Strong support I love the idea, it's really time to attempt to overcome language silos. Ls1g (talk) 09:45, 7 May 2020 (UTC)
  55.   Support This is a fantastic idea that could change not just Wikipedia but the world. I'm not convinced about the current proposed implementation of the "universal language", with eneyj, but I'm very much convinced about the general concept. Also, I like the name "Wikilambda": it suggests the defining of functions, while being obscure enough that it won't be easily misinterpreted to mean something more generic. "Project Eco" sounds cool too - though the danger of naming it after someone is that you run the risk that five years from now we'll find out that Umberto Eco used to kick puppies. Yaron Koren (talk) 03:43, 8 May 2020 (UTC)
  56.   Support --Emptyfear (talk) 10:47, 8 May 2020 (UTC)
  57.   Support All in - daring, interesting. If not done by us, who should dare?! --Mirer (talk) 02:54, 9 May 2020 (UTC)
  58.   Support --Tmv (talk) 03:00, 9 May 2020 (UTC)
  59.   Support --GerardM (talk) 06:51, 9 May 2020 (UTC) it is a step in the right direction and as a Wiki it will morph over time.
  60.   Support Celestinesucess (talk) 15:09, 9 May 2020 (UTC)
  61.   Support there are likely to be language-specific pitfalls in translation (languages are complicated) and this seems incredibly ambitious, but this seems like a great idea Zoozaz1 (talk) 15:58, 9 May 2020 (UTC)
  62.   Support --Jamie7687 (talk) 18:58, 9 May 2020 (UTC)
  63.   Support So this is about hosting? Shouldn't do any harm, should it? I hope it will be a welcoming community that will figure out how to do useful things and who won't blow up the servers with too many requests. SashiRolls (talk) 19:40, 9 May 2020 (UTC)
  64.   Support --Jmmuguerza (talk) 01:49, 10 May 2020 (UTC)
  65.   Support --Gloumouth1 (talk) 05:55, 11 May 2020 (UTC)
  66.   Strong support -- What great idea, you could start with making universal infoboxes that are the same across all wiki's and just adapt thier labels based of the viewer preferences Back ache (talk) 08:32, 11 May 2020 (UTC)
  67.   Support After a very first glimpse, this proposal has waken up the computational linguist in me, and triggered two keywords: frame semantics, context-free grammars. Really inspiring. --Hjfocs (talk) 15:22, 11 May 2020 (UTC)
  68.   Support --Epìdosis 15:56, 11 May 2020 (UTC)
  69.   Support Languages and semantics are still rather disconnected in multiple ways across Wikimedia platforms (and beyond), and this proposal outlines a promising direction towards bringing them more closely together, so let's explore how far we can move this forward. My main concerns right now are performance, which I expect to be worked out eventually, and the environmental footprint, which we should try to minimize right from the start. -- Daniel Mietchen (talk) 03:15, 12 May 2020 (UTC)
  70.   Support Exciting idea in principle, some aditional thoughts below.--Eloquence (talk) 07:55, 13 May 2020 (UTC)
  71.   Strong support This is bold! Think it will set the ground for exploring many interesting (and needed!) language potentialities throughout wikimedia projects and beyond. Btw, "Towards and abstract Wikipedia" is a great piece! EricaAzzellini (talk) 23:41, 13 May 2020 (UTC)
  72.   Support Let's try to be the first to generate natural language by computers! I think this could work because in the present the languages are following enough rules to allow computers understanding and synthesizing them. But remember that there is a lot of work to do before the whole project works … --FF-11 (talk) 17:53, 14 May 2020 (UTC)
  73.   Strong support. Wikimedia has artificial intelligence projects. So, it can be part of it and also using Wikidata to propose translations (Google translations are awfull). 71 users supporting, wow!!.BoldLuis (talk) 15:54, 17 May 2020 (UTC)
  74.   Strong support I think it is a brilliant unifying project as it also gives chance to question established and somewhat problematic norms of language power relations. Smaller Wikipedias have toxic atmosphere also for the gaps that are exploit in this 'quantified labor' of copy-paste work by admins/bureaucrats. --Zblace (talk) 17:12, 17 May 2020 (UTC)
  75.   Support; let's try it! —MisterSynergy (talk) 20:52, 17 May 2020 (UTC)
  76.   Strong support we really need to be better on scaling Wikipedia to more languages - Salgo60 (talk) 01:16, 18 May 2020 (UTC)
  77.   Strong support Glad to support by pooling resources with our Wikispore proposal, I think this effort might prove particulary fruitful for supporting medical translation of article intros as a first phase.--Pharos (talk) 02:00, 18 May 2020 (UTC)
  78.   Support, but it has to offer more than we have in local communities.Carn (talk) 06:22, 18 May 2020 (UTC)
  79.   Support (yes, finally). -- Mathias Schindler (talk) 09:08, 21 May 2020 (UTC)
  80.   Support looks really exciting and promising, interesting how the project will unfold. Kpjas (talk) 13:09, 21 May 2020 (UTC)
  81.   Support I want to see where this will go. --Monirec (talk) 04:22, 26 May 2020 (UTC)
  82.   Support Yes, it's really ambitious and complex, it could take years, but Wikimedia and the humanity need this to really overcome language and culture barriers: one global knowledge mission, one (main) project to accomplish it. -jem- (talk) 20:38, 26 May 2020 (UTC)
  83.   Support. Ambitious project but very much appreciated. I think this is best executed as another extension to Wikidata (like an editable Reasonator) but those are details we should sort out later. (Conflict of interest disclaimer: I am a final-year student and will be interested in applying to be a paid staff developer for this project if it goes ahead and starts hiring.) Deryck C. 12:30, 27 May 2020 (UTC)
  84.   Strong support very ambitious, but that's what I like. Dibbydib (talk) 03:48, 28 May 2020 (UTC)
  85.   Support I suppose I'd better add my voice (I'd love to have more of you joining us working on denny's github project though!) ArthurPSmith (talk) 00:44, 30 May 2020 (UTC)
  86.   Strong support Lanix1516 (talk) 01:41, 2 Jun 2020 (UTC) This can help many wikipedias, Wikipedia has over 300 varieties of languages and this can help them grow and even create new ones.
  87.   Support I am not optimistic that it will lead to a useful product which fulfills the project objectives, but it is very interesting. It should lead to practical knowledge about the possibilities of multilingual projects and perhaps other indirect applications. It is worth putting resources into. Strobilomyces (talk) 17:47, 2 June 2020 (UTC)
  88.   Support I was a Wikimedian and enwiki rollbacker until I made a stupid mistake and got myself sitebanned one Christmas Eve. I wanted to be a template editor so badly, but I never became one, because of my poor judgement. If this "Template Commons" that I have been advocating for since 2016 was created, then it would probably push me to come back to the Wikimedia projects, including RC patrolling on Commons, enwiktionary, and enwikiquote. 2001:569:BD7D:6E00:84A2:7021:E798:5BE2 22:32, 2 June 2020 (UTC)
  89.   Support im excited for this. --JordenLang (talk) 18:23, 3 June 2020 (UTC)
  90. Abstract   Strong support, meaning that I strongly believe that Wikipedia ultimately ought to become centralized in a way that allows it to be multilingual. No opinion on any of the details of the proposed implementation, since I lack the technical expertise required to productively discuss those. One question I have: looking ahead to once the project gets off the ground, how will we prevent the duplication of editor effort when someone chooses to update only their local version or only Abstract Wikipedia? Sdkb (talk) 08:39, 7 June 2020 (UTC)
    @Sdkb: If they update only Abstract Wikipedia, the changes will automatically flow to the language Wikipedias that use that content. If they only update the local Wikipedias, then the situation is as it is today, there is nothing that triggers changes on the other projects. One hope would be that we would get more visibility in the differences between the language editions, and thus maybe make changes more visible. --denny (talk) 02:38, 8 June 2020 (UTC)
  91.   Support Elilopes (talk) 15:05, 11 June 2020 (UTC)
  92.   Support The right proposal at the right time. As was already said above, it's an idea worth exploring and investing in. Even if the result won't be perfect, it will take us steps further to having a more sustainable ecosystem of knowledge. So a strong support from me. Shani Evenstein. 23:13, 14 June 2020 (UTC)
  93.   Support --David L Martin (talk) 18:58, 26 June 2020 (UTC)
  94.   Strong support Definitely a project to look out for, will love to be associated in any volunteer capacity.Rajeeb  (talk!) 19:32, 26 June 2019 (UTC)
  95.   Support Definitely worth exploring. 𝒬𝔔 00:23, 29 June 2020 (UTC)
  96.   Support One more good breakthrough towards the synergy of text and wikidata. — Ailbeve (talk) 17:59, 2 July 2020 (UTC)
  97.   Support --Aristotles (talk) 21:23, 2 July 2020 (UTC) I'm inclined to view Googlers' involment in good faith. Initiative seems exciting.
  98.   Strong support But worried that there might be problems for other languages include Asian languages (Chinese, Korean, Japanese, Thai etc.), That grammar can might be wrong. --Nakaret (talk) 03:13, 4 July 2020 (UTC)
  99.   Strong support There are so many challenges, but simply understanding and explaining what they are would be a great result.--GrounderUK (talk) 01:26, 5 July 2020 (UTC)
  100.   Strong support This is helpful especially for minority languages. I hope this would push through. Kunokuno (talk) 08:48, 6 July 2020 (UTC)
  101.   Support.--جار الله (talk) 22:38, 6 July 2020 (UTC)
  102.   Strong support the field of natural language processing, which is benefiting from Wikipedia & other projects data, will get some further advancement from this project Tttrung (talk) 10:26, 7 July 2020 (UTC)
  103.   Support could really close the disparities between different language editions of wikipedia. Eltomas2003 (talk) 03:53, 8 July 2020 (UTC)
  104.   Support. Good idea for supporting new brand and rebranding! — Niklitov (talk) 16:47, 8 July 2020 (UTC)
  105.   Support This is a helpful project for community research of NLP. Alphama (talk) 07:54, 10 July 2020 (UTC)
  106.   Support Great idea! IWI (chat) 13:30, 14 July 2020 (UTC)

Statement of opposition

  1. (Nahuatl) Mild pointless opposition due to the ambiguity between contextualization and situationing of the Chinese box problem. My first choice is wait on this for two to five years while we do other work in machine learning instead. EllenCT (talk) 05:46, 8 May 2020 (UTC)
    @EllenCT: Do you mean the Chinese room argument? Just trying to be sure to understand your argument. --denny (talk) 15:14, 8 May 2020 (UTC)
    Yes, that's right. I love the idea of an interlingua, and of an encyclopedia written in an interlingua, but I'm sure it wouldn't be an improvement over encyclopedias written in natural languages, although it could be in at least a few and easily several years down the road. EllenCT (talk) 01:02, 27 May 2020 (UTC)
    @EllenCT: {{nah}} means "Nahuatl language", not "nope"... Deryck C. 22:53, 27 May 2020 (UTC)
  2. Sort of pointless opposition I believe the proposal needs some kind of demo, as too much is a bit fluffy for the moment. I am pretty sure the idea would work as it is, but I wonder if parts of this can be done by machine intelligence. Abstract Wikipedia can be created by machine translation, coding from natural language and to functional representation, perhaps cleaned up with human help. Wikilambda can then be and then be created by machine translation, coding from functional representation to natural language. Some parts of both Abstract Wikipedia and Wikilambda can be handcoded.
    In short, I agree with EllenCT, but I'm a bit more optimistic that parts of this can be made dualistic; both supporting machine intelligence and human intelligence. (Imagine black-box functions that holds learned machine models.)
    Slightly longer, I suspect language differences can make this extremely difficult. In particular the functional representation must encode some cultural variations that are completely alien to other cultures, and that can create a lot of both in-project fighting and resistance against reuse. In parts of Norway «nordover» might refer to up along the river, no matter the actual direction of the river. In other languages they have grammatical case for direction towards the sea. In Norwegian we have prepositions for “i” and “på” that are really unpredictable. (Some believe it has to do with form of male and female genetalia, or has the same origin. Typically “i” follows forms that goes inwards, and “på” forms that goes outward.) There are a whole bunch of such small distinctions that must be handled, but isn't easily codified. I suspect the functional representation will be extremely verbose. — Jeblad 18:14, 25 May 2020 (UTC)
    Tentative oppose Neutral due to the concerns listed below. Kaldari (talk) 21:51, 8 May 2020 (UTC)
  3.   Strong oppose This project is proposed by Google (see here). It is very shocking that Wikimedia is becoming a lobby organisation for this company, which stands for monopolisation and against data protection on the Internet. Habitator terrae (talk) 11:36, 9 May 2020 (UTC)
    Discussion moved to #Google's involvement below. Deryck C. 22:03, 27 May 2020 (UTC)
  4. I am concerned that this would be another project that would drain editors' time and energy from Wikipedia, and potentially even replace Wikipedia. Google's Knowledge Graph which seems to mostly use Wikipedia/Wikidata content is already drawing more than enough readers from Wikipedia. This proposed Llamba project would accelerate the development. With that, Wikipedia editors would experience a shift from writing educational articles to feeding software with data. Also, Dennis, I understand that you wrote this proposal on your own time and in a private capacity. But I feel that it would be better if your proposal would transparently disclose your function at Google and describe which interest Google may have in such a project. I am not saying that you are in a conflict of interest but it sure looks like there might be a potential of such a conflict, and I would like to see that this was openly addressed. --Martina Nolte (talk) 15:52, 10 May 2020 (UTC)
    @Martina Nolte: Google's mission is to organize the world's information and make it universally accessible and useful. This mission aligns with the goals of this proposal. Honestly, I never needed anything more to get support inside of Google to work on this (and most of that work happened in my 20% time anyway). I hope that answers your question transparently. --denny (talk) 00:45, 11 May 2020 (UTC)
    The knowledge graph isn't just reuse of Wikipedia/Wikidata. It's a project on which hundreds of Google employees and hundreds of other contractors work. ChristianKl17:12, 11 May 2020 (UTC)
    @denny: For transparency reasons, I'd like to see a section on the proposal page that is dedicated to your role at Google and to Google's interest and investment in this project. --Martina Nolte (talk) 04:04, 12 May 2020 (UTC)
    I added that I work at Google to the page. --denny (talk) 14:13, 14 May 2020 (UTC)
    In the current form, the proposal will create an artificial and unfair competition between the "old" wiki and the equivalent language version of abstract wiki and will be confusing for pretty much everybody. I would be willing to support only if the multilanguage Wikipedia is dropped and the UX on Wikipedia is improved.--Strainu (talk) 11:04, 11 May 2020 (UTC)
    Every of the big Wiki's has their own set of policies, having a new abstract Wikipedia project means that the new project can set it's own policies without getting in the way of the policies of the individual Wikipedia's. Plenty Wikipedia editors are also unlikely to want to spent a lot more effort to write their texts to write them in this abstract way. Between Wikipedia Simple and EnWiki, a user can already today find two different English Wikipedia's that describe a topic. This would just add a third and improve knowledge diversity that way. ChristianKl17:12, 11 May 2020 (UTC)
    I don't like the idea of being welcoming to (...) contributions (...) paid (...) by (...) companies and editable (...) by bots. A project consisting of commercial articles created by bots seems not be helpful for the image of the other Wikimedia projects. Although I have already expressed during a discussion on Wikipedia that I fear a slimming down of perspectives, if only one single view is valid for all languages, I would support the project if paid editing was clearly rejected. (some Wikipedias are entirely written by bots, this is not my main concern) Sargoth (talk) 21:22, 23 May 2020 (UTC)
    @Sargoth: Regarding the first quote, welcoming to (...) contributions (...) paid (...) by (...) companies - this was meant to refer to contributions to the code base, not to the project content, sorry for not being clear. My assumption is that most of the development work for the project will be done through paid developers. I tried to clarify this further. Whether or not paid contributions to the content will be allowed is a different question, and I did not want to imply any endorsement of that. I guess (I cannot know, because this is part of the autonomy of the community) that Abstract Wikipedia will follow a nuanced policy similar to the other Wikipedias regarding paid editing. --denny (talk) 22:56, 24 May 2020 (UTC)
    OMG thank you denny for clarifying, makes total sense now. I have stricken (striked??) my statement. Regards --Sargoth (talk) 08:46, 25 May 2020 (UTC)
  5. Honestly I think this will become a translation platform for the English Wikipedia, draining the readers/contributors from the local language Wikipedias to the proposed project. Even worse than the Google Knowledge Graph - Google KG drains the reader, this drains the contributor. — regards, Revi 12:07, 8 June 2020 (UTC)
    And seriously, so-called core team function must be something like Wikidata and the WMDE - if the core team interferes with the policy, guidelines, conduct, etc, why is that on Wikimedia? Our value is that Community takes everything community-wide, if we can't get it, the so-called hosting org for the so-called core team should launch it themselves. — regards, Revi 12:12, 8 June 2020 (UTC)
    Agreed, the community should take everything, not the community. I am very much imagining a similar relation as between WMDE and Wikidata. Just to make that clear. If there is something that contradicts that, let me know, and I will remove or clarify it. --denny (talk) 00:33, 9 June 2020 (UTC)
  6.   Strong oppose - Concerned of the foreseeable massive enforcement of already overrepresented voices from the Global North, which would clearly contradict the WMF's goal of "Knowledge equity". Denis Barthel (talk) 13:21, 11 June 2020 (UTC),   Support if problem might get solved. Denis Barthel (talk) 09:53, 15 June 2020 (UTC)
  7. Oppose. I don't see that this project's goal is possible to achieve and I think that it is not desirable. Contents cannot be abstracted from actual text in a specific natural language, as is clearly to be seen in translations. Moreover, this is the exact opposite of diversity. Any effort in this direction will lead to a standardization and homogenization that is definitely harmful for the Wikimedia projects. For this reason, the project is not desirable.Mautpreller (talk) 14:57, 3 July 2020 (UTC)
    @Mautpreller: The presumption is definitely false, there is a lot of language for which there is simply no informations on a lot of languages. You then cannot make poorer an information that simply does not exists. These are probably the main target of this project. It’s also unlikely to replace existing content, and it could be overridden if any community want to. So … no, it will not make anything poorer. It’s also likely that complex ideas or syntaxes will be at least at first difficult to express, write and translate, so the incentive for contributors to write stuff in their own language will not disappear anytime soon. TomT0m (talk) 15:32, 3 July 2020 (UTC)
    I don't agree. My argument is twofold: This project cannot be successful in the sense in which it is conceived. The idea to transfer contents into a "world language" based on something like Chomsky's en:Deep structure cannot be realized since this deep structure is only a theoretical proposition in the analysis of natural language but not a real and "corporeal" entity. So far about the possibility. But you can try to do this, only the result will be a very different kind of structure, you could call it an imperialist structure. If there are no Wikipedia articles about a subject in a natural language, this is not a deficit. It simply means that up to now no one has found it necessary and/or desirable to deal with this subject. If the communities see this as a problem, they can do something about it in their own speed and with their own means. This is self-determination. But if an "international" community (which inevitably will be a community dominated by the powerful language groups) predefines knowledge in an easily transferrable "artifical language" form, this will result in overriding all the possibilities of autonomy. If this is successful (I am fully aware that this is not the idea but it is what might actually be possible), it will be harmful. It will be a kind of standardization, homogenization, even colonialization. It will suppress any chance to find an own way. So far about the desirability.Mautpreller (talk) 15:57, 3 July 2020 (UTC)
    Generating text articles is a matter of the future. There is a ru:Шаблон:ВД-Преамбула template in Russian Wikipedia - it can add one line in the article about human. But at first it will be only about some unified wiki cards as I imagine. Carn (talk) 16:45, 3 July 2020 (UTC)
  8.   Strong oppose--尼普瑞斯 (talk) 04:07, 19 July 2020 (UTC)

Infographics and examples

I'd love to see this explained via: simple infographics of "projects input" and "example output".

Simple boxes-and-lines diagram(s), showing how all the source-material and existing projects [wikidata + wiktionary + wikilexeme + wikilambda + Abstract Wikipedia + Commons + ArticlePlaceholder + Reasonator + WDQS + ...???!!!],

will turn into: Various examples of the DEEPLY-WISHED-FOR output.

Bonus points for adding in some of the specific names from the "Related projects/proposals" section links, like WikiCalculator.

Double bonus for explaining how (I think?) this could be the next major step in the global open-source epic project to replace the formerly-open-source OpenCyc, and the sadly closed-source and restricted-access WolframAlpha (so good, but so closed), as well as enable more open-source massively-multilingual translation efforts, and eventually feeding into things like open-source voice-interfaces (list at mw:Voice assistants and Wikimedia).

E.g. Is this finally going to give us one of the final pieces in the puzzle of: A trustable system that I can verbally ask - in ANY language - "show me a graph of the population of the monarch butterfly species over the last 20 years", and it will dynamically do exactly that based on the latest and historic data, without also trying to sell me tourist packages next week?

Currently I can't figure out quite where this chunk fits within the massive goal-spectrum of "what exists and we work on/with", and "what exists but is closed-access/commercial", and "what doesn't exist yet, but should".

I'm a visual-thinker, so infographics and deconstructed-examples would help a lot! I hope these thoughts help you. :) Quiddity (talk) 02:33, 7 May 2020 (UTC)

I added a project architecture and the component diagram to Architecture. I hope this helps.
I like the idea of examples. I start a new page, Examples and will collect the examples there, and there they can be discussed, dissected, and deconstructed.
Regarding the comparison with Cyc and WolframAplha, I think they are quite apt, with the obvious difference that this is an open project. Cyc is much more geared towards knowledge representation and is less aiming to be a catalog of all functions (although it conceivably could be), but WolframAlpha, yes, that's a fair comparison (it has major differences too, but a lot of it is comparable).
Regarding natural language question answering: well, I don't think that's a viable short-term goal. But it will offer a few of the building blocks needed to get there. I mean, technically you could implement a parse function that tries to understand natural language questions and then answer those in Wikilambda. And given the amount of linguistic and ontological knowledge Wikilamba will amass over time, this might be quite a feasible thing. But it's not my primary objective, and I wouldn't take it up in the plan. This stuff is really hard and would be a quite considerable extension of the current plan.
Thanks for your questions and suggestions! --denny (talk) 04:31, 9 May 2020 (UTC)
@Denny: That's great, thank you!
Re: big dreams: Yeah, my latter comments were mostly meant to help me/us understand how important this could be in the longer-term future.
Re: infographics and other projects: I'd really like to understand more about how this proposal fits with the other existing (and proposed) projects, whether in text or diagram form. Specifically ArticlePlaceholder + Reasonator + WDQS (+ WikiCalculator, etc?). I grok the desire to keep things clear and understandable, but I think it also helps to show the complexity that it will eventually require. E.g. calculations for converting between numeral systems as used in different languages. However I now also grok that the architecture is still undecided, which makes clear diagrams difficult! But those other existing projects should at least be mentioned somewhere, in the meantime.
Thanks again. Quiddity (talk) 19:13, 12 May 2020 (UTC)
@Denny: with respect to Cyc, I know they have (had?) a natural language generation system capable of producing both precise and friendlier less-precise sentence-level paraphrases from their system. I do not know if they have done any work towards being able to produce a very friendly narrative text like a Wikipedia article. (It seems outside of their mission, so I would assume not. Even so, it may be worthwhile to ask if they have any comments on this project.) --Chris.Cooley (talk) 06:33, 4 July 2020 (UTC)
@Chris.Cooley: I did chat with a former Cycer about this project, and he did not point me to anything in that direction. Might be worthwhile to reach out to check, but since OpenCyc has been closed in 2018, I am not sure how much of it could even be used. --DVrandecic (WMF) (talk) 03:38, 15 July 2020 (UTC)

Initial set of types and functions

The page Wikilambda/Requirements has an interesting point:

Unlike with the other Wikimedia projects, the developers will take an active stand in setting up and kick-starting the initial set of types and functions, and creating the necessary functions in Wikilambda for Abstract Wikipedia, and helping with getting language Renderer communities started. Unlike with other projects, the development team of Abstract Wikipedia and Wikilambda will be originally more involved with the project, but aims to hand all of that over to the communities sooner rather than later.

This reminds me of the early days of Wikidata. In retrospect, one of the problems I see with Wikidata is that the developers made only the most generic functions for accessing data from the wikis. Because they are so generic and low-level, different wikis used them differently: French Wikipedia, Hebrew Wikipedia, and Catalan Wikipedia are all examples of projects that embed values from Wikidata quite a lot, but each of them wraps the generic functions in distinct modules and templates. It would probably be better if the developers made richer functions for data access, which would be shareable across projects as magic words or as Lua modules embedded in the extensions, rather than maintained as wiki pages.

What lesson can be learned from this for Wikilambda? Wikilambda and Wikidata are quite different, but what I can think of it is that the developers should make a variety of working examples that can be actually reused easily by different languages, without a lot of copying and forking. --Amir E. Aharoni (talk) 20:17, 5 May 2020 (UTC)

+1 Ijon (talk) 16:51, 6 May 2020 (UTC)
@Amire80: Thank you for your question, Amir! It's always a delicate question regarding autonomy of the communities and how much should the development team go into creating content, and when you started quoting the paragraph, I thought you would go exactly the other way and ask to reconsider whether we really should get into content creation. I was surprised (and, to be honest, relieved) to see the question go the other way.
One main difference to Wikidata is that Wikidata is, in many ways, much more static, and much more limited in what can be expressed in Wikidata. In Wikilambda we very intentionally create a space where we can have complex functions, probably also Lua code and templates, be provided from a central place that all Wikimedia projects can access. And it is all happening in a space that contributors can control! So instead of having just a small number of very basic functions that access Wikidata knowledge, contributors can decide themselves how comprehensive their interface should be. This would allow communities who want so to share very comprehensive, high-level APIs across different projects, and other communities who prefer not to, could go for other functions or not use it at all.
For translation and cross-wiki collaboration though, the relevant part is that all these functions can be accessed from all the projects in a uniform, global way. So if someone decides to create the "Citation to New York Times" function, all projects will have access to that function (modulo localization issues).
So I think that the problem you identified with Wikidata should not arise for Wikilambda. --denny (talk) 18:15, 6 May 2020 (UTC)

License for the code and for the output

The page Wikilambda/Requirements says:

All content of Abstract Wikipedia and Wikilambda will be made available under free licenses.

This is to be expected from any Wikimedia project, of course, but this particular project poses a curious challenge. What is the content of Abstract Wikipedia and Wikilambda?

I can think of three layers:

  1. The underlying data, from which information is build. This is Wikidata or something else.
  2. The renderer functions' code.
  3. The renderer functions' output.

I am not a lawyer, but something tells me that each of these should perhaps be under a different license.

I don't have much to say about the underlying data. Probably CC-0 or CC-BY-SA is OK for it, but lawyers should examine it.

The default license for Wikimedia projects is CC-BY-SA, and this is also the license for templates, modules, and gadgets, which are stored as wiki pages. Perhaps this is not the best license for them, given that templates, modules, and gadgets are software, and CC licenses are intended more for artistic works than for software. The renderer functions' code is probably also software, so maybe they should be under some other license.

And then there's the output of the renderers. What license is it under? If I understand correctly, it's supposed to look like text written by people, but it's not actually written by people: it's written by code, which, in turn, is written by people. Can it be licensed under any license at all? This is comparable to the output of modules and templates on current wiki projects, but something tells me that it will be far more complex, raising new questions about licensing.

So, I don't have solutions, but someone who knows copyright law better than I do should think deeper about it all. --Amir E. Aharoni (talk) 20:28, 5 May 2020 (UTC)

Ouch, you made my brain break for a moment!
That's a tough question and I don't have answers. Which means we should add this explicitly to the proposal to deal with that. Which I will do. Thanks for raising it!
It is obvious that we want all of it be licensed in some free manner, but it is very non-obvious how to ensure that. It will probably involve some dual-licensing of some of the stuff, and to figure out whether the output is licensable at all. And whereas I am an ardent defender of CC-0 for data, I am not sure whether this would be the best approach for abstract content.
Thanks for raising this, I am adding it to the proposal. --denny (talk) 18:21, 6 May 2020 (UTC)
I added it here. It should be its own task, really, but renumbering is a pain. --denny (talk) 19:51, 6 May 2020 (UTC)
  • Given my understanding of this, which I'm not sure is correct at all, then the generated text can't be copyrighted, but the data and code to generate the text can be protected. — Jeblad 10:28, 11 May 2020 (UTC)

Repository to share templates and modules

The page Wikilambda/Tasks lists a cross-wiki repository to share templates and modules between the WMF projects as one of the goals, noting that it is a long-standing wish by the communities and that this will be part of Wikilambda.

It is indeed a long-standing wish: first requested in 2004, and a top-10 voted wish in the Community Wishlist. I am a big supporter of it, and I wrote the page Global templates about it.

I agree that it's closely related to Wikilambda. In more technical terms, a repository of shared modules and templates and a repository of shared text rendering functions will both need a better mechanism for sharing things across wikis in general—caching, tracking dependencies, propagating changes, and so on. This is also necessary for improving the performance of Commons and Wikidata.

And precisely because of this it makes sense to me that a repository of shared modules and templates should be developed not as a part of Wikilambda, but as an autonomous project, not coupled to Wikilambda too tightly. It will be much more immediately useful to all the current wiki editors, who are familiar with wikitext and Lua as development tools. It also makes sense given the fact that the Wikilambda explicitly strives to create a cross-wiki community of developers. Creating such a community will be much easier with familiar programming languages (wikitext and Lua), than with a new renderer language. This also corresponds to the notion expressed on the page Wikilambda/Requirements: "Deployed is better than perfect".

Once the repository and the community is created with familiar tools, it will be easier to build the new Wikilambda-specific functions upon this foundation. --Amir E. Aharoni (talk) 21:02, 5 May 2020 (UTC)

If I understand you correctly you would suggest to have an additional repository for global modules and templates, besides Wikilambda?
I was thinking of Wikilambda to be the place where the shared modules and templates actually live - and not to introduce another place where the shared modules and templates live, and then use the same mechanism to also additionally allow to call Wikilambda and this shared repository.
I see the value in putting Lua and Wikitext to higher priorities than the proposal currently does, as it would allow for an easier transfer of skills and knowledge. Yes, maybe that should indeed be done (The Wikitext implementation is only mentioned in passing, for example). But I am actually wondering whether the work needed to move the templates to a central repository and internationalizing them and ensuring the Wikipedias agree on them is really so much lower than writing them from scratch. --denny (talk) 03:57, 7 May 2020 (UTC)
Thanks for the response.
Not necessarily an additional repository. It can be on the same site. In conversations about a global template repository, the question of where will this repository is discussed a lot: should it be on a new wiki? On Commons? On Meta? But really, this is one of the least important questions. Creating a new a wiki is not a big deal.
The much more important questions are:
  1. How will the internals of delivering content from one wiki to another work? I am not an expert, but I heard that the way in which delivering content from Commons and Wikidata to other wikis could be better. This also applies to less famous global features, such as global user page and global preferences. For templates and for Wikilambda functions it would be even more complicated, so the whole thing needs a rewrite. See https://phabricator.wikimedia.org/T201004
  2. How exactly can any given wiki community "have it both ways": use a template from the global repository when it's good enough, but override locally the parts that it wants to be different? This is obviously necessary, but the spec I proposed could be more detailed about this. Like, where does the parsing run: the global repo or the local repo? And how do you do the overriding: do you just create a template with the same name on the local wiki, or will you have to do something more complicated? Do the global templates have to be in a different namespace? And so on. This will probably require a bunch of technical RFC discussions. Complicated, but necessary.
Back to your question, when I say that a repository of shared template needs to be autonomous, I'm just saying that it shouldn't be tied too strongly to the full release of Wikilambda, and to the Wikilambda branding (whatever it will be). The internal component for delivering content across wikis ("dependency engine", or whatever it's called) will probably be the same, so it should be developed early. Everyone will appreciate it, because it will probably also improve the performance of Commons and Wikidata. The site on which the Wikilambda functions and the global templates and modules are stored can be the same, too, but the templates and modules can probably be made available much earlier than Wikilambda functions.
If I understand correctly, even the design of the programming language in which these functions will be written is not finalized, let alone its implementation, whereas the programming languages for templates and modules are already familiar to lots of people and can start being used very early.
So what I definitely don't want to see is that the release of Wikilambda is the blocker for the release of global templates. Neither should global templates be the blocker for Wikilambda. The "dependency engine" is probably the real blocker for both things, so as far as I can see, the sequence of release phases should be:
  1. The internal Dependency engine, because it's needed for both things
  2. Global modules, because they are familiar to many editors, because many templates depend on them, and because they are in an extension and not in core, so it's easier to make them global
  3. Global templates, after modules because they are more complicated than modules, but before Wikilambda because they are familiar to everyone
  4. Wikilambda functions, because it's the most innovative and futuristic feature
It's also comparable to how Wikidata was released in phases: the first release only made interlanguage links available. It was a major and highly-demanded improvement of a familiar feature, so everyone loved it. --Amir E. Aharoni (talk) 05:18, 7 May 2020 (UTC)
Oh, denny, I initially missed your question towards the end:
But I am actually wondering whether the work needed to move the templates to a central repository and internationalizing them and ensuring the Wikipedias agree on them is really so much lower than writing them from scratch.
I've been trying to write another document about this, mw:Global templates/Transition, but for now it's very far from being complete.
The very, very first thing that should probably happen, even long before a global modules and templates repo is available, is a way to make modules and templates nicely translatable (Phabricator: T238417, T238411). This will be necessary for the global modules and templates, and it would be immediately useful for templates on multilingual sites like Commons, Wikidata, and Meta. At the moment, templates are internationalized on them using the TNT system, but it could be much better. Once a robust internationalization system becomes available, I'm sure people will start adapting templates to use it.
A super-short summary of what I imagine should happen once a global repo is available:
  1. The communities will move the internal, ubiquitous modules, such as Wikidata and Arguments.
  2. The communities will move the simplest and most ubiquitous templates, such as {{tl}}, {{Bot}}, and maybe {{Citation needed}} and {{Quote}}. Their code will need some updating to make strings translatable, but it won't be very big.
  3. The communities will move templates that are more complicated, but mostly the same on a lot of sites, such as {{Cite web}}.
  4. The communities will hopefully agree on a common internal structure for the more complicated templates, such as infoboxes. Note: "common internal structure" doesn't mean that the visual appearance of the infobox, or even the information that it presents will be the same. It can be, but this is optional. I'm only talking about the internals. There are several families of infobox implementation: English Wikipedia, French Wikipedia, Russian Wikipedia, Spanish Wikipedia, and some others. Each of them is copied to several other sites and modified there. Many of their ideas are essentially the same, but the internals are significantly different. It won't be easy, but I hope that with some good will the template developers from these different sites will find a way to share some code. This may end up being a full rewrite, although I hope that at least some existing code will be reused. It's also possible that we'll end up with several implementations of the same thing on the global repo, at least for a transition stage, and that's OK, too.
The first three points will probably take no more than half a year. I gave just a short list of examples, but the real number of templates and modules that will become common is actually hundreds, if not thousands.
The last point, migrating the complex templates, will probably take longer. It will require more difficult community discussions and development work from the community. But that's fine, because it took so many years to get to where we are now, and it's OK to take things at whatever pace the community wants. --Amir E. Aharoni (talk) 05:55, 7 May 2020 (UTC)
The taxbox system in ru.wp is awful, Putnik now is making work to copy data from many local templates in tables on commons. I think that would be cases, then the logic or the data would not be the same for diferrent wikis. The references from WikiData is not always good for local communities (both in completeness of the design of links, and in reliability / credibility). Sections like en.wp, with a large community, can push through some of their standards. But such work is worth doing, and a repository with normal version control and normal debugging capabilities will be tempting. (I tried to copy the lua environment with different mw functions locally, but crashed on a task) If such additional opportunities are not expected, then it is not a fact that such a project will lure people from their local communities. Carn (talk) 22:09, 17 May 2020 (UTC)
  • I am working at ru.wp on some modules, that which should form the date in different formats, indicating calendar, etc. When I tried to take module from en.wp that forms durations between dates (years, months, weeks), I found that due to grammatical differences in languages, there was too much to redo and I decide to made it from scratch.
  • However, the mathematical part of the modules with same functions usualy can be completely unchanged. Therefore, I support the gradual transfer of modules from all Wikipedias to one centralized place. I personally do not understand the technical aspects of calling a module from another section, in fact, if this technical problem is solved without undue stress, it should be done. A multilingual project will bring more people who are willing to do technical work.Carn (talk) 21:56, 17 May 2020 (UTC)
@Carn: Thanks! Yes, I also think that it would be beneficial to put some of the functionalities of templates and modules, or even the complete templates and modules, into a centralized place so they can be shared among the communities. We will have to see how this will play out exactly, but I think Wikilambda would be a great place to store such functionality. But I agree, the details can get rather complicated - the good thing is that each community will be able to take it at their pace.
If you want to support the project, don't forget to also leave your signature on top! --denny (talk) 01:03, 18 May 2020 (UTC)

Implicit article creation

I love the "Implicit article creation" idea on the page Wikilambda/Components. However, to get this to actually work and be used by a lot of people rather than a few open data, free knowledge, or language development enthusiasts, it will have to be easily findable by the general public.

The general public comes to Wikipedia mostly by typing the names of the things they are looking for in their language into common search engines. This means that things will have to have their names translated into lots of languages. This can probably be achieved by translating lots of Wikidata labels. Sometimes it can be done by bots or AI, but this is not totally reliable and scalable, so it has to involve humans.

The current tools for massive crowd-sourced translation of Wikidata labels are not up to the task. I am aware of two main ways to do it: editing labels in Wikidata itself, which is fine for adding maybe a dozen of labels, but quickly gets tiring, and using Tabernacle, which appears to be more oriented at massive batch translations, but is too complicated to actually use for most people.

Therefore, a massive and integrated label-translation tool with an easy, modern frontend, that can be used by lots of people, is a necessity, and should be part of the project. --Amir E. Aharoni (talk) 21:18, 5 May 2020 (UTC)

+1 Ijon (talk) 16:56, 6 May 2020 (UTC)
That sounds right. Adding this as a task. I am stealing your text for that. --denny (talk) 04:03, 7 May 2020 (UTC)
One thing about labels that I run up against constantly in my work on paintings, is that we want a single item to describe one painting, but it would be nice to have the label field reflect the title of the painting over time, as it is presented in collections and exhibitions. The current Wikidata limitations of the "label/alias/short description" could possibly be solved by this project, by enabling "history pages" of some sort, that can reflect who claimed what when. Jane023 (talk) 07:44, 7 May 2020 (UTC)
@Jane023: Does a property such as P1448 (or some other) not allow for that? And in a qualifier you could mention the catalog and/or the time? --denny (talk) 03:33, 8 May 2020 (UTC)
Yes of course, but then it is not something that can be found easily in the inline search function. I am thinking more about "history" in the way we mean the "history page". There is a difference between documented titles and descriptions & non-documented titles and descriptions. So for example, there is a movement to change the word "slave" to "enslaved person" which emphasizes the enslaver as the active participant in the state of the person depicted. This kind of thing is better handled outside of Wikidata I think, since it really is something that is outside the scope of art collections and art collectors though it remains relevant to cetain paintings. Just one example, there are of course many more aspects that illustrate the limits of short descriptions. We currently use short descriptions massively in inline search. Jane023 (talk) 09:50, 8 May 2020 (UTC)
@Jane023: Probably a good way to help with that would be a bot that copies all names from something like P1448 to the aliases. Then it would also be available for search. --denny (talk) 02:06, 9 May 2020 (UTC)
Yes but copy from where? Generally I am searching Wikidata to link a portrait of someone (or something like a church) to something, and the label I am using is coming from some language art catalog or Wikipedia (generally some catalog or Wikipedia I can actually read, but occasially from something I can't read or indeed even have the fonts for on my computer, so I copy/paste into inline search). When I am successful, I do add this into the English alias field, but of course I can only see and edit the alias fields for the languages I have enabled (used to be more, but I pared it back to four: en,de,fr,nl). Jane023 (talk) 06:52, 9 May 2020 (UTC)

Community risks of deploying natural language article content

I'm very interested in the possibilities of applying data techniques to semantics, but I can't help but feel very sensitive to the social implications of a natural language production project in the Wikipedia article realm.

It has a strong potential to result in a situation where non-Western community and knowledge building growth is suppressed by a technological system largely run by a small minority of Western contributors.

The implied targets of this technology are the smaller Wikipedia communities, which are in various states of development, but also comprise one of Wikipedia's few growth areas in terms of active users. A known driver of the development of Wikipedia communities is the absence of knowledge (article content). An important driver of a community's capacity to generate and maintain knowledge is for it to have engaged in its own knowledge creation processes.

The semi-automated rollout of auto-generated natural language content may reduce incentives for community building in small Wikipedias, creating a dynamic where speakers of a certain language become consumer-oriented communities, lacking in content-generating abilities. Stunting of certain communities like this directly threatens our collective ability to gather knowledge (assuming these communities have something novel to contribute). This community consideration is notably absent from the project's primary goals (its raison d'etre mainly seems to be one of productivity - getting more content from existing contributors out to more readers).

One only need look at the deleterious effect of globalised free trade on developing economics in the 20th century to see that this kind of result is not a "what-if scenario", but is well-known and well-studied phenomenon that can permanently undermine the capacities of developing communities. Can we add this as a known risk for Wikilambda?

On a positive note - I do support the application of this technology outside of the article natural language content realm (unifying templates, math, data tables, infoboxes, multi-lingual modules etc). Sillyfolkboy (talk) 18:43, 6 May 2020 (UTC)

Thank you for this important question!
I actually discuss this issue in Wikipedia@20 chapter and name loss of knowledge diversity as a possible risk in the technical paper (No, I obviously don't expect everyone to read all documents before commenting here, I just wanted to point to the chapter in case you want more context).
I raise a number of arguments of why this could be a benefit for smaller communities, and could, in fact, increase the participation of members of smaller communities in global knowledge creation. I think that Abstract Wikipedia has the potential to massively shake up the incentive infrastructure that currently exists, and to the advantage of the smaller communities. Here are just a few of the arguments:
  1. We already had 20 years of Wikipedia, and we see how far this has taken us in the different languages. I am not convinced that continuing the current trend will have a major effect on the majority of language communities in the next five years.
  2. By being able to rely on a common pool of knowledge for some topics, Wikipedias with a small number of contributors are in fact freed up to focus on the topics they particularly care about. I base this experience on myself: when I started the Croatian Wikipedia, I really wanted to write about the island I am from, the villages of my mom and dad, etc. But it felt weird to write a full-fledged article about a village of 160 when there was no article about Nigeria. Abstract Wikipedia will allow a smaller community to decide whether they really want to write an article on a far-away city or on a chemical element themselves, or whether they want to focus on topics of local interest.
  3. Furthermore, and even more interestingly, some of these contributors might feel inclined of carrying that knowledge to Abstract Wikipedia, where they would contribute to knowledge that can make it into other editions as well. Just as we see with Commons and Wikidata, we *do* have many contributors who are not Western contributors. In fact, I just read an article about how Indian contributors use Wikidata to share numbers of COVID cases between different Wikipedias.
  4. One reason the English Wikipedia receives so many contributors is that it is the de facto global Wikipedia. So many people who speak English besides their native languages decide whether to contribute to their local Wikipedia or to the English one (a similar dynamic happens with other language perceived as more prestigious, such as French in some African or Russian in some Asian cultures). Additionally, they may face the question whether they want to bring knowledge about their local cultures and localities to the perceived global audience, or whether they want to bring it to people who presumably know it already much better, i.e. their own folks. This dynamic leads to more contributors to the English Wikipedia than to more local Wikipedias. Abstract Wikipedia offers the chance of shaking up that dynamic and lead to a more equitable access to knowledge creation.
  5. Another incentive system that comes into play is that for half of the Wikipedias we have ten or fewer active contributors (using the 5+ definition). It might be perceived as unachievable to ever achieve decent coverage and currency with so few people, and therefore potential contributors might be inclined to prefer to engage in other activities. But with Abstract Wikipedia we will be able to make this goal much more achievable, and thus add positive feedback that would lead to more people contributing.
As said, I wrote this in more detail in the mentioned paper. Thank you for raising the point! I think it is a very important point, and it is crucial to worry about it. But for the reasons I mentioned I think that the project will have a much more positive effect than you fear. --denny (talk) 03:16, 7 May 2020 (UTC)

I believe the risks are substantial, but not insurmountable. EllenCT (talk) 05:47, 8 May 2020 (UTC)

Question from utilisateur:lambda

Will wikicite and wikilambda be working together in any way? SashiRolls (talk) 21:01, 6 May 2020 (UTC)

I sure hope so. It is, as always, a decision by the communities, but I would be expecting that most references in the multilingual content would be coming from Wikicite. Since Abstract Wikipedia will follow the core rules of the Wikipedia projects, it is necessary that the content will be as richly referenced as we are used to from Wikipedia already. And then it seems rather straightforward to use the structured representation of those references that already exist in Wikidata.
This would probably look similar to how Wikicite already does it with the Cite template.
That is the most straightforward use of Wikilambda. Much more complex things can be done by writing functions that automatically check if a certain reference actually supports a certain claim in Abstract Wikipedia, which would be really cool. But that's future dreaming :) --denny (talk) 02:08, 8 May 2020 (UTC)
I copied the statement given above (in your Cite link) to a couple sandboxes and it does not appear to work as written on en.wp or on fr.wp. :/ I would have loved to see that method work... maybe the statement just needs tweaking?
I think maybe we agree that translating the reference templates would be a crucial aspect of a multilingual content translation. Obviously it isn't the most exciting problem, but (en:WP:V) is pretty important. I think this is a good concrete use case, before moving on to tackle all the more complicated wikitext sentences about e.g. recording people with cellphones, or e.g. beating every donkey with an owner. SashiRolls (talk) 13:03, 8 May 2020 (UTC)
@SashiRolls: Yeah, now we get to the problem that the modules mentioned in the example are not available to all Wikimedia projects. So you should probably try it out on Wikidata. Funnily enough this is something we would like to solve with Wikilambda too: create a central place for the functionality of such templates and modules where they can be written only once and be available to all the projects. Your example would have worked then! :) --denny (talk) 02:09, 9 May 2020 (UTC)

Is this feasible from a linguistic point of view?

Does the theoretical foundation actually support the feasibility of the idea? I am not a specialist but my reading of lay summaries (and stories embedded in works like Don't Sleep, there are Snakes) of various linguistic theoreticians is that there is no universal structure to all languages, or indeed that some languages cannot be reduced to formal grammars (a la computer science). (Copied from another place, questioner wanted to remain anonymous)

That is a fundamental, important question. In fact, I also thought that this might not be possible, and I read up on a lot of literature around that topic. One of my favorite books on that is by Umberto Eco, who was an expert on translation, and who wrote The search for the perfect language, which gives an overview of many failed attempts at what I am proposing here. On the other side of the spectrum there are people such as Anna Wierzbicka and her book Semantic Primes and Universals. In short, I don't think linguists regard this as a fully solved question, but most of them seem to subscribe to the idea that of commensurability of (natural) languages, i.e. everything you can express in one language you can express in every other language (but that does not mean that you have a single word for everything that has a single word in another language, but you can always describe it, and express the same meaning). Chomsky goes so far to state a deep structure that is the same across all languages, saying that all languages are basically just dialects of each other, based on that deep structure.
I am sure that Daniel Everett would disagree with me on the premise of my project though.
Regarding formal grammars, the situation is more nuanced, and it is a well-known hard problem that parsing arbitrary natural language input based on a formal grammar fails. This is why neural methods have recently made so much progress - because the symbolic ones keep failing. Fortunately, Abstract Wikipedia does not need parsing - but merely generation. And for generating text, you don't need anywhere as much coverage as for parsing. That's a much easier task, because you need to encode only one way to encode how to say something, and not to understand all the ways something may be said.
So, whereas it would be very helpful if linguists would have solved your riddle of reducing natural languages to formal grammars, the proposal explicitly does not rely on it. --denny (talk) 03:31, 7 May 2020 (UTC)
I've already pointed out my doubts and questions regarding this aspect of the project in the paper, but, in a nutshell and just to bring the discussion to a larger audience, it is not that simple, and not that impossible (in my opinion, of course, people are free and welcome to disagree). The main point is that Wikilambda must find a way to encode how languages construe meaning differently, using language structure as clues for comprehenders to adequately infer the meaning intended by the text. Provided that constructors are modeled so as to allow for coercion and other types of meaning-oriented operations - and that's the hard part - there's light in the end of the tunnel. Going "universal grammar" does not seem to be right way of pursuing this. If the system embraces constructions and their semantic/pragmatic functions, the odds of being able to generate some text that actually resembles human language increase significantly. --TiagoTorrent (talk) 19:32, 12 May 2020 (UTC)
@TiagoTorrent: Yes, thank you, I still need to answer your email! Sorry for taking so long!
I think, the proposal does this: the knowledge notation is quite similar to constructions (I mean, it is even called Constructors). I am not really that interested in discovering a universal grammar (for this proposal, I mean, in general, sure :) ), but I am thinking about a very pragmatic approach towards having cross-lingual Constructors and Renderers. The references to UG and Semantic Primes is more a kind of wishful thinking ten years down the line, and to show that this is not just my silly idea, but smarter people than me had similar silly ideas. --denny (talk) 02:50, 18 May 2020 (UTC)

VisualEditor support

As far as I can see, Visual Editor support mentioned in the development plan, but way too briefly. It must be explicit and much more prominent.

The pages Wikilambda/Tasks and Wikilambda/Components mention new magic words. Every added complication in wikitext is a problem, but fine, I guess that they will be necessary.

However, the support for them must be available in both wiki syntax editing and visual editing from the start. As far as I know, this wasn't done for Wikidata: the {{#statements}} parser function is not available in the VE toolbar. It can be inserted only by editing in wikitext or by inserting a template that has it, but this, too, is only possible if the editor knows the template's name. It must be fixed for this and other Wikidata-specific parser functions, and it must be done for all Wikilambda functions from the start.

And VE support shouldn't be just about being able to add and edit the new wikitext. It should also be available, for example for the following scenario:

  1. A reader sees text in Wikipedia, without knowing that it was auto-generated by Wikilambda.
  2. The reader thinks that the text should be improved and boldly tries clicking the "edit" button for the first time. Visual Editor opens.
  3. The reader, who is now trying to become an editor, sees that what will actually be edited is not the raw text, but a piece of code.

If at this point the reader doesn't have a really, really easy way to improve the text without learning to code Wikilambda renderers, the reader is not likely to become an editor. With clever user experience design and Visual Editor integration, this can be prevented. The simplest thing I can think of is that VE can allow editing the auto-generated content as raw text, and an experienced renderer developer will be notified about the need to edit this page in a structured way, but perhaps there can also be something smarter.

Thanks :) --Amir E. Aharoni (talk) 17:23, 8 May 2020 (UTC)

@Amire80: Yes, there is a lot of UX work to be figured out. There is a task already planned to figure out the UX flow for editing Abstract Content: P2.3 mentions design research that will have generous amount of time, P2.4 deals with the mobile flow, and P2.15 is about exactly that kind of flow you describe where we don't assume that everyone needs to learn Abstractese in order to contribute. So, in short, I agree with the problem, and I think that your suggested solution is compatible to P2.15. --denny (talk) 02:51, 9 May 2020 (UTC)

Kaldari's concerns

The fundamental problems I imagine such a project would have are:

  1. Language is fuzzy, not mathematical. Wikidata can't decide (even after years of debate), whether the state of being a male or female should be called "sex" or "gender" in English, so they have decided to use one property to handle both concepts. The "inception" property is so vague it has 57 English-language aliases listed on Wikidata, including both "start date" and "completed". These are just two examples limited to English. When you start comparing across languages, the fuzziness multiplies exponentially. In some cases, like biographies or chemical elements, you have clean 1-to-1 relationships, but for anything conceptual it's a role of the dice. For example, "love" in English covers romantic love, platonic love, familial love, and erotic love, while in other languages (e.g. Greek, Spanish, Japanese) these are usually split into different words that cover various connotations, and there may not be a unifying word/concept. Even a relatively simple concept like "pie" is an ontological mess when you start looking across languages. Wikidata typically glosses over these ontological misalignments and just makes due as best it can. The worst that can happen (currently) is splintered or confusing interwiki links, which almost no one seems to care about. Once you start trying to create sentences out of this mess, the mess will become glaringly obvious.
  2. Different language Wikipedias have different world-views. Your essay acknowledges this, but downplays it. First, you only cite examples of how this is a bad thing: neo-nazis running Czech Wikipedia and Bavarians objectifying women. Second, you say that it doesn't matter anyway since each Wikipedia will be able to choose which articles to use from Abstract Wikipedia. The problem is biases aren't just restricted to a small set of articles that can be ignored, but are expressed across vast swaths of content in subtle ways. For example, you would assume that articles about plant species are relatively free of bias and a good example of the kind of content that small Wikipedias would prefer to import rather than write themselves. Yet even here, cultural biases make a big difference. On English Wikipedia, due to WP:MEDRS, we have erased most mentions of the medicinal qualities of plants (even if there are sources for it). Chinese Wikipedia, on the other hand, often includes such information. Resolving conflicting world-views isn't simply a matter of reaching compromises on specific statements. It involves issues of emphasis, tone, coverage, and sourcing standards, none of which is easily dealt with by software, which brings me to the next issue…
  3. We've hardly figured out how to effectively resolve content disputes on English Wikipedia. Resolving them on a multilingual Wikipedia will be an order of magnitude more difficult. This isn't as much an issue on Wikidata, as Wikidata basically just accepts everything, and if there's a dispute, you just add all the conflicting claims. There are no questions of balance, emphasis, or tone, and for the most part, no one cares about sourcing, much less sourcing standards. A multilingual Wikipedia, however, will need to deal with all of these problems. And these problems will be compounded by issues #1 and #2. The inevitable result is that English-speakers will dominate the discourse and the articles will reflect the biases and world-view of the English speaking world. This project will thus perpetuate the colonization of non-English cultures into the Western world-view whether we intend for it to or not.

I'm interested to hear your thoughts on these issues and whether you have ideas for addressing them. My apologies if some of these have already been discussed elsewhere. Kaldari (talk) 21:49, 8 May 2020 (UTC)

Thank you, these are really good and hard questions. It will be honestly a pleasure to discuss them. I don't expect to have answers for all of your concerns that will entirely convince you, but I hope that we can have a conversation that leads to more insight for everyone involved. I apologize for my long answers.

Concept misalignment between languages

Re Question #1:
Yes, language is fuzzy, not mathematical. Wikidata goes far further in enforcing an ontology in its structure than Abstract Wikipedia would. Wikidata aims to be precise enough to support answering structured queries and reasoning scenarios. Abstract Wikipedia merely needs to be precise enough to allow to generate natural language text in different languages. Not more than that. This allows for much more fuzziness.
Abstract Wikipedia developed out of the realization that Wikidata (and any other knowledge representation based on formal semantics) is inherently different than natural language, which is why this is a different approach: instead of starting with formal logic or mathematics, let's start with natural language, but abstract from the surface peculiarities of the individual languages.
So let's take the English word "love" and your example. So in Spanish we have different words for parts of the concept that the English word "love" covers, say "amor" and "flechazo", and let's assume for simplicity that in English the word "love" covers both entirely (often it's not that easy, and if wanted I can go through such a more complicated example as well). Let us assume a more or less typical sentence that could show up in Wikipedia: "Her love for the Senator inspired her to write the poem." Now, in the abstract sentence we would need to decide whether we want to go for "amor" or "flechazo" when translating to Spanish, and therefore there would be two distinct entities available at this point, one which translates to "amor" and the the other for "flechazo", but in this case both of them would translate to "love" in English. Note that, since we never need to parse the English sentence, it is usually OK to loose specificity when generating the natural language text, in fact, the abstract representation will almost always contain more information and be more specific than any of the individual natural language translations (but we need to mark which part of the content must be present in all translation and which ones are only there in order to enable certain translations).
But how do we get to this point? Does this mean that every time an English-speaking contributor wants to use the English word "love" they first have to learn all the differences of that word in all other languages and make the precise choice from these? Well, some might, and it might be really interesting and eye-opening, but requiring that would probably lead to reduced participation. So instead we must ensure that there is a way for the English contributor to use the underspecified "love" that results in a good text for English, but at the same time the text will be flagged for a Spanish-speaking contributor as "does not render", and they need to go in and edit it, adding the specification, so that the sentence renders.
This assumes that the difference is indeed in the mapping of the conceptual space to language, and that there is one common conceptual space, although sliced up differently, between different languages, and that the points in that space are stable across languages. This is my assumption for your point #1, and I would like to stay with that assumption for point #1, and devote the point #2 to the situation where that is not the case. But it makes it easier to discuss matters of world-view and surface-language independently. (Unless we subscribe to the strong Sapir-Whorf hypothesis, which would state that this is impossible as both are the same).
Background reading: Anna Wierzbicka and Cliff Goddard on semantic primes, John McWorther against Sapir-Whorf, and, well, Sapir and Whorf for Sapir-Whorf. --denny
@Denny: Re Question #1: You make some good points here, but I would like to know more specifics (which I'm sorry if you've already outlined elsewhere). Let's say we're generating an article on Abstract Wikipedia for Eden Atwood (Q5336817). Eden Atwood's sex or gender is intersex (Q1097630) female (Q6581072) (both of which are equally accurate and have normal rank). Would it be possible via WikiLambda to automatically choose the correct English pronoun for Atwood (in this case "she")? In other words, would WikiLambda's syntax be flexible enough to make complicated decisions or would it require the creation of new properties to resolve complexities (for example, a "preferred pronoun" property in this case)? --Kaldari
@Kaldari: Thank you for asking the questions! I still hope I can convince you to drop your opposition. So, next round :)
Re Question #1: Wikilambda will contain almost arbitrary code, so from that point of view, the answer could be "trivially yes", since Wikilambda will be flexible enough to accommodate that if someone makes it accommodate it.
Now, this is not really a satisfying answer to your question, I am afraid. So let's dig a bit deeper.
When creating Content in Abstract Wikipedia, we must be able to talk about a person that has no Wikidata item, because Wikipedia articles constantly talk about people who don't have Wikidata items. So we must be able to introduce a person inside the abstract Content notation and refer to it. If it has a Wikidata item, OK, we can go to Wikidata and maybe set a few things on the person per default, but it must always be able to refer to a new person without a Wikidata item and then we need to enter all necessary information for that person in the Abstract Content.
This will also include which pronoun to use in what language. Again, if the person has a Wikidata item, awesome, we can use the preferred pronoun property P6553 if available, or other properties we can reasonably make a default guess from, but we must also be able to override that and set it specifically. That should allow the fine-grained decision on which pronouns to use.
To make it clear, the list of pronouns does not have to be restricted to 'he' or 'she'. If a person has preferred pronoun such as 'they' or 'zie', technically that will be easily accommodated. Now, again, as with English Wikipedia, it is up to the communities to make the editorial decision whether they will use the pronoun a person has explicitly stated for themselves or whether they want to deny the person that courtesy. I know how I would argue on that topic, but it will not be up to me to make that decision.
This decision would be made by each of the language communities individually. Hopefully we can be smart about inferring some pronouns across languages, but that will be an interesting challenge.
The good thing though is that this won't be subject to drive-by edits. I just checked a few of the articles on English Wikipedia where this is relevant - and the state is rather sorry. --denny
@Kaldari and Denny: I hope you don't mind me refactoring this discussion into subsections, since there are clearly two separate topics (or perhaps three, I think there is some overlap between 2 & 3 but little overlap with 1). I hope the subheadings help readers understand what we're discussing. If you disagree feel free to revert :) Deryck C. 13:17, 28 May 2020 (UTC)

Resolving competing worldviews

Re Question #2:
Croatian, not Czech.
I don't buy the claim that this affects large swatches of content, and I stay with my argument that it is about a small set of articles (more on the size of these sets also in my response to your question #3). I also don't buy the argument that different languages (and in extension their Wikipedias) have different world-views. Yes, different cultures and subcultures might have different world-views, and these sometimes align with language barriers, but not always, and the cases where this does not happen allows us to investigate whether your claim or mine is more helpful. For example, English - English is a wonderful example. It is spoken as an official language in such diverse countries such as England, India, the US, Australia, New Zealand, Uganda, Botswana, Zimbabwe, and Singapore. Or Portuguese which is spoken in Brazil and in Portugal. And so on. These place have very different cultures and very different world-views, but we don't have a Wikipedia for en-IN, one for en-US, and one for en-UK.
On the other side, we do have different Wikipedias for Bosnian, Serbian, Croatian, and even Serbocroatian. And these four languages, I claim, are sufficiently similar to not induce a particular difference in world-view. It's not the language that creates that difference - it's the culture and the society.
We chose not to have a Wikipedia for Red and one for Blue America. We chose not to have one Wikipedia for Portugal and one for Brazil. Why is there reason to assume that if the English Wikipedia can reconcile the differences between the world-view of someone from Bristol with someone from Mumbai, that this should be a unreconcilable difference for someone from Taipei and someone from Vancouver?
If there are specific types of content - such as the medical properties of Ginseng, which seem to have a Section in that article on English Wikipedia, by the way - if there are specific types of well-sourced content that a specific Wikipedia language community wants to ban - say "no images of the Prophet on Arab Wikipedia" - they can, as said, override that content locally, or even mark the content up in the Abstract representation and suppress its rendering in a given language.
But to be honest, I would hope that in most cases we see much more diverse knowledge coming together in the Abstract Wikipedia than in any of the individual languages currently. I would love to read many more sections saying "Carrots in Nigeria" if there is some specific aspect. In the Edinburgh talk I take an example of the article on Adenoidectomy, a medical condition, and go in detail through the differences in coverage in different language Wikipedias. I doubt that any of those differences was due to world-view or language, and had we one article that brought that content together, I think everyone would have benefitted.
Re Question #3:
(Quick aside: I object to the claim that on Wikidata no one claims about sourcing. The vast majority of claims on Wikidata are sourced. In fact, the source density on Wikidata is far better than English Wikipedia. As this doesn't really have an impact on the current discussion, I just want to state my objection to your claim.)
Edit wars are quite rare (I know, it doesn't feel like it). Protection is rare (0.67% is the number I found in a paper by Hill and Shaw, 2015). Most edits are uncontested and most pages are open to editing. If this offers a solution for only those pages that don't have edit wars and are in need of protection - well, I think that's still more than worth it.
If an edit war breaks out over cultural differences, well, I assume that there is sufficient interest to override the article with a local version. There is a way around the lack of being able to have discussions around problematic topics in Abstract Wikipedia. Because, no, you are right, there is no good solution for these discussions. The only thing I have in mind is to use some templates for having those discussions. This has the advantage of potentially being massively de-escalating, but honestly, I have doubts as to how effective it will be in resolving issues.
I don't think that this project will necessarily perpetuate the "colonization of non-English cultures into the Western world-view" more than the current situation, in fact, I have several reasons to believe that it might improve on the current situation (but note, I am not comparing to a hypothetical fair world, I am comparing to the current world):
First, it will allow people from all kind of world-views to work together on a common resource, instead of only allowing people who speak English to work on the English Wikipedia, which is the de-facto global Wikipedia. It diversifies the editor pool compared to today.
Second, it will allow people in smaller Wikipedia communities more freedom to devote more time to topics of particular local interest, as they can simply adopt articles form Abstract Wikipedia on topics that are of less local interest. This is true both for the local Wikipedia, in case they chose to overwrite the topic (which is likely), as well as for the Abstract Wikipedia, in order to share their perspective with the world.
Third, it allows us to make local differences more visible, which helps with studying and understanding them. Currently, we don't know if the differences in the Adenoidectomy article are intentional or accidental between the different language version. With an Abstract Wikipedia we will become able to get a better understanding of that question. And I think that will be extremely valuable in order to make the diversity of knowledge in the world more visible than it currently is, more visible to everyone.
Thank you for your questions! I really think that these are very important questions, and I would like to hear your thoughts in response. --denny (talk) 00:35, 9 May 2020 (UTC)
These concerns are valid, but, like with https://phabricator.wikimedia.org/T69659#6099209 , where Kaldari expressed similar doubts, they are not a reason not to do it. There are a lot of areas in which these problems don't get in the way. --Amir E. Aharoni (talk) 06:48, 9 May 2020 (UTC)
I agree that discussing these questions before writing up any sort of grant proposal is important. For the moment I gather this is just an "opening salvo" proposal which is to generate goodwill, not to obtain funding. As such I'm not sure what the point is of symbolically supporting or opposing.
Again, it is (I think) key to come back to the notion of the reference. If I've understood correctly, the idea behind an abstract Wikipedia is to avoid translating from an already existing *.wp page into another *.wp, but rather to write a page at wikilambda using Zxxx-speak.
One of the goals mentioned in the proposal is: "[to] allow more people to read more content in their language". However, the source of "verifiability" in a given wiki-entry is the references, not the wikitext. (Cf. the five disclaimers) Even after you've managed to change the labels for the different fields describing the references into abstract numbers like Z4820, that reference remains in the language that you are translating from, so it does not actually " allow more people to read more content in their language".
Don't get me wrong, IMO pulling references from various language entries and standardizing their representation using language-neutral field labels at this proposed abstractor-of-quintessences-wiki would have use-value. Is that value sufficient to justify the carbon footprint it will generate once the bots get rolling? SashiRolls (talk) 12:58, 9 May 2020 (UTC)
@SashiRolls: I believe that most readers unfortunately stop at Wikipedia, meaning, that they read the Wikipedia article (or rather, the parts of the article they are interested in), and rarely check the references. Also, for many languages there simply won't be references for many claims in their language. It probably is still better to have a reference, even if it goes to a language the reader doesn't read, than have no reference at all, or than not having the information at all. I don't think any Wikipedia requires that all their references be in the language of the given Wikipedia (although I am sure all Wikipedias prefer their references be in the language of the given Wikipedia). Or did I misunderstand your point? --denny (talk) 23:38, 10 May 2020 (UTC)
Re Question #2: I'm not really convinced by your English Wikipedia example. I think English Wikipedia has a very long history of excluding its colonial cultures such as India, Pakistan, and South Africa, going all the way back to the 2007 deletion battle over Mzoli's. Even today, articles on Indian and Pakistani politicians are routinely deleted from English Wikipedia due to not having any sources in English or not having enough digitized sources (which, IMO, is much more due to American and European bias than legitimate notability enforcement). I'm less concerned about smaller wikis being able to block content from Abstract Wikipedia (your "images of the Prophet" example) than I am about English-speakers dictating the balance of content within articles on Abstract Wikipedia, especially in cases of widely diverging world-views. I hope I'm wrong, but this is still a concern for me.
Re Question #3: (Quick aside: I don't have actual numbers, but my impression is that the sourcing on Wikidata is sparse and poor quality. A huge percentage of the sources (maybe even the majority) are from Wikipedias, and it is routine for editors to change values without updating sources. Where does your claim about higher source density come from? Does that exclude Wikipedia sources?) I'm glad to hear that you are thinking about the challenges of cross-language dispute resolution even if there isn't an obvious solution yet. My experience on Wikidata is that whoever is the most persistent typically wins disputes (probably because dispute resolution is not well developed or supported there). I would love to hear more ideas for something better than that.
Thanks for taking the time to address my concerns and discuss this in more detail. Kaldari (talk) 18:41, 15 May 2020 (UTC)
Re Question #2: I am surprised you are using Mzoli's as an example, as the most contentious thing about it was that it was created because, well, Jimmy sat down in a restaurant and ate there, as the legend goes. The way I remember it, it was more of a kind of power play between different parts of the community regarding this seemingly frivolous origin, but that didn't involve a particular colonial background. Looking at the talk page and its archive, I couldn't find much indication of the claim that the discussion was due to a colonial bias.
But the problem is more fundamental. If you think that English Wikipedia is a failure regarding its coverage of Pakistan, then I am afraid that Abstract Wikipedia will have a hard time becoming a success by your criteria. That is, as far as I can see it, such a high bar to meet, and one which I don't want to promise to hit. But also, to be honest, when Abstract Wikipedia hits the depth and coverage of the English Wikipedia on Pakistan or South Africa, I will be celebrating success - that would be a huge amount of knowledge unlocked for many other language editions!
Nevertheless, I see some advantages for achieving good coverage compared to English Wikipedia:
  1. Abstract Wikipedia will naturally not require English-speaking sources for its content, so that problem goes away.
  2. Abstract Wikipedia will have a more diverse community than English Wikipedia, which will naturally work on a more diverse set of topics.
  3. The coverage goals of Abstract Wikipedia won't be guided by English Wikipedia but by Wikidata.
But on the other side, Abstract Wikipedia will have significant disadvantages compared to English Wikipedia, which you have already listed.
I mean, I don't think that Abstract Wikipedia and Wikilambda will solve the problem of bias that the Wikimedia projects have. But I honestly believe that it will help with improving the situation compared to the current one. And that's all I aim for.
Re Question #3: about 75% of statements on Wikidata have references. Only 6% of the statements in Wikidata use Wikipedia as a reference. Compared to Wikipedia, even if you take the featured articles, you won't find such a high reference density, never mind average articles. (source)
Regarding dispute resolution: isn't persistence a big part of how disputes are resolved on Wikipedia as well? Or are you saying that you want to see something better than in Wikipedia? (I am trying to understand the goal posts)
A big part of dispute resolutions on Wikidata seem to involve the formalization and stronger enforcement of constraints and clean-up processes. And for these the conflict resolution seems to be similarly stratified as in other wikis with a comparable activity level. If it is about individual disputes for data values I would expect them to be much rarer than such disputes on any Wikipedia. Do you have a few examples of the disputes you have in mind?
Please don't let me wiggle out of an answer. I hope that this helps with answering your questions! Thank you! --denny (talk) 02:10, 18 May 2020 (UTC)
@Denny: I appreciate your frank responses, and as I don't want to be the main voice blocking consensus, I've struck my oppose vote. I do, however, still want to continue our conversation (and I apologize that I'm slow at responding here). You ask above what sort of disputes I have in mind that would require a better conflict resolution system than what is present on Wikidata. Let's take Vladimir Lenin as an example. On English Wikipedia, Lenin's article presents him as a brutal and controversial figure. The lead mentions "violent campaigns" of suppression, "tens of thousands killed or interned in concentration camps", "political repression and mass killings", etc. It even uses an ominous-looking photo in the infobox. On Russian Wikipedia, Lenin's article is more apologetic. The lead doesn't mention anything about violence, suppression, concentration camps, or mass killings. Instead it touts his legacy as "the most significant revolutionary statesman in world history" and only has a single sentence mentioning that some people have a "negative assessment of Lenin's activity". It also uses a more dignified, statesmen-like photo for the infobox. I imagine that any Abstract Wikipedia article about Lenin will quickly be subject to differing views about how much emphasis should be placed on Lenin's violent political suppression versus his legacy of revolutionary political achievement. The two extremes of this debate will speak different languages and be immersed in completely different historical realities about who Lenin was, complete with competing sets of references. How would you imagine such a dispute playing out on Abstract Wikipedia? Kaldari (talk) 21:06, 27 May 2020 (UTC)

@Kaldari, Denny, and Amire80: I have a pretty confident guess of how inter-cultural disputes will play out, because we already have at least two projects where this is happening: Chinese Wikipedia and Commons. And I call the two mechanisms the voting mechanism and the judge mechanism.

  • On Commons, many voting templates have evolved since discussants often don't share a common language. There are two main types of intractable discussions on Commons: One type involves aesthetics (is this pretty enough to be featured?), the other type involves copyright. The aesthetics type is essentially decided by majority vote ("voting" mechanism). For anything with a copyright nuance in it, the elected admins essentially serve as judges who rule on the facts and policies; non-admin opinions are largely immaterial and Commons admins regularly overrule a majority consensus where they believe there is a copyright rule that should sway the result otherwise ("judge" mechanism); essentially creating an over-class of admin-judges and an under-class of everyone else.
  • The Chinese Wikipedia comprises four cultures divided by a common language: Mainland Chinese, Taiwanese, Hongkongese, and Overseas Chinese. Again, most "worldview" disputes are eventually resolved by majority vote ("voting" mechanism) and voting templates are used in every single discussion involving competing opinions. In a small number of cases where there are written policies, admins may wade in and make decisions based on policy rather than voting ("judge" mechanism).

This is not a criticism of the "voting" mechanism, the "judge" mechanism, or indeed the "consensus" mechanism used in projects where most discussants share a common language and compatible worldviews. I'm simply predicting, based on the fact that the "voting" and "judge" mechanisms are already in use on some Wikimedia projects to handle worldview disputes, that this will be what happens on Abstract Wikipedia. We may also see a lot of arguments being written out in Z-statement forms, much like how Wikidata discussions are peppered with {{statement|Q123|P456|Q789|P1011|Q1213}} as a way of stating one's position in a discussion. Deryck C. 22:47, 27 May 2020 (UTC)

@Kaldari, Deryck Chan, and Amire80: Great example regarding Lenin (although my question was about disputes on Wikidata, but let's pivot the discussion, I think this is the more interesting and harder one anyway). The funny thing is - for me the different representation of Lenin is a motivating example (don't forget, my background is from the Croatian Wikipedia). I would think it an advantage if the contributors of the Russian and English Wikipedia would develop a common understanding of Lenin. I mean, we don't have a blue and a red article about Donald Trump, and I think that's a strength, not a bug.
There are several possible answers, and Deryck already listed the experience we already have with similar situations on multilingual wikis, and I expect to see similar processes develop in Abstract Wikipedia too.
Furthermore, as I said before, I think cases such as Lenin are rare. And I would be honestly shocked if either the Russian or the English Wikipedia would use the content from Abstract Wikipedia to replace their local Lenin article. So the easiest way to resolve this conflict is for both Wikipedias to continue to have their own articles, and that would work as intended.
Here's my actual hope: that we can, in Abstract Wikipedia, introduce some of the more stringent policies from Wikidata regarding sources. Let's look at the Lenin article: the English Wikipedia article doesn't cite a single reference in the lead (this is so unusual that I guess it is an editorial decision that I didn't easily find documented - it doesn't seem to vibe with en:WP:LEADCITE in my opinion, but hey, that's pretty much with what I meant with the difference in reference density between Wikidata and Wikipedia. And this is a featured article.) So I would hope that Abstract Wikipedia would have many more sources for such statements. And then the issue shifts from one of "should we have this statement here" to one of "how much weight do we give to this statement".
I don't think that either the Lenin article in enwp nor in ruwp are complete: enwp could use more text on the current perception of Lenin in Russia, and ruwp could use more text on the perception of Lenin in most Western countries. I have seen similar situations on hrwp and srwp, where two articles wildly diverge, and then sometimes seeing the very same editors coming to an agreement on enwp. Again, since we won't force the result onto ruwp, my assumption is that this will, overall, lead to a major improvement of the overall situation and allow more access to more people to more knowledge.
Again, this is not a perfect answer, but rather pointing to bits and pieces. I agree that the situation will be harder than on Commons, because I guess discussing images is easier than discussing text, and harder than on Wikidata for similar reasons, but a mix of voting and judging, paired with the option to opt out any time, and mixed with a lot of "this applies only to a small subset of articles" will get us quite far I think - and at the same time it will make the intentional differences between the Wikipedia editions more visible.
Thank you also for striking your oppose vote, it is much appreciated. And yes, the discussion doesn't stop because of that. --denny (talk) 01:25, 28 May 2020 (UTC)
(Putting my English Wikipedia admin hat on) Yes it is an editorial guideline that the lead section of an article should not contain inline citations, because it is a summary of the rest of the article. The reader should be looking for relevant citations in the rest of the article. With the latest coronavirus coverage, this style guideline has made it difficult to quickly translate the gist of an article into another language. One radical solution to this (for Abstract Wikipedia) is to include a logical relation which says a certain statement in the lead section is a summary of another statement further down the article. Deryck C. 21:15, 28 May 2020 (UTC)

Comparison to Reasonator, part 1: Text generation

There's something that I wanted to be sure about. Reasonator is a tool that appears to quite similar to what Wikilambda is proposing. Am I understanding correctly?

It can take info from Wikidata, and present it in ways that are different from how it is shown on wikidata.org. One of its notable functions is auto-generating a few sentences of prose at the top of the page. Here is an example:

The text generation for the different languages is implemented in the file auto_long_desc.js in the Reasonator source repository. As far as I can see, only English, Dutch, and French are implemented, although it's quite possible I missed something.

So, it Wikilambda supposed to be similar to this? The most basic differences I can spot are:

feature Reasonator Wikilambda
Storage Git repository. To modify something, you need to run Git commands, submit a pull request, and each pull request must be reviewed and deployed by somebody with permissions. Wiki pages. By default, everyone can edit everything, unless the page is protected. Every change is immediately deployed, and all the pages that use this function are immediately updated ("use a function" means: transclude, invoke, depend on, etc.).
Programming language Nearly generic JavaScript. JavaScript, Lua, wikitext, or something new. And if it's JavaScript or Lua, it will not be totally generic, but will include a library of existing functions.

There are some other differences, but they are much more complicated, so I'll put them in separate sections. But do I understand it correctly till here? --Amir E. Aharoni (talk) 07:44, 9 May 2020 (UTC)

I checked out a wikidata stub I created recently to see how this worked. The "related media" that resonator generates is quite curious! SashiRolls (talk) 13:29, 9 May 2020 (UTC)
The main difference is that Reasonator is based on the structured data in Wikidata, and thus necessarily limited, because Wikidata is massively limited. I was trying to illustrate that with the narration of how the Mayor of San Francisco changed in the Signpost. It almost doesn't matter what data you would have in Wikidata - there is basically no way Reasonator could reconstruct that narrative from Wikidata's data (and that's no fault of Reasonator, that's due to the limited expressivity of Wikidata compared to natural language).
Similar is true for articles such as music, physics, or pizza. Here's Reasonator on music.
So yes, we should start with the capabilities of Reasonator, but if we stopped there I would think this project a failure. We should go much further and extend the expressivity of what we can say in Abstract Wikipedia well beyond of that what can be said in Wikidata. (This comes with other disadvantages, such as the inability to query the content of Abstract Wikipedia in the way we can query the content of Wikidata, but that's a different story. Both projects will have their advantages and disadvantages.) --denny (talk) 04:13, 10 May 2020 (UTC)

@Amire80 and SashiRolls: Thanks for bringing up this comparison with Reasonator. That's my first reaction to denny's proposal too - what he's proposing is Reasonator+ and will be best implemented as an expansion of Wikidata. I would turn denny's statement, that Abstract Wikipedia would be a failure if it stopped where Reasonator stopped, around on its head. We should be saying that Wikidata's coverage hasn't gone far enough and we need to expand on it.

I am of the opinion that Wikidata is an auxlang. Items are nouns, Properties are verbs, Qualifiers are participles. The best way Abstract Wikipedia can be implemented is to piggyback on Wikidata. Rather than creating new Z nodes to represent concepts and relations, we should be using existing Wikidata Ps and Qs. Where existing Properties (verbs) aren't expressive enough, we create new Wikidata Properties. Wikilambda's role is then to provide the functionality to build the natural language generators that render statements on an item, and for editors to curate (for each language) which statements should be rendered into natural language, in what order, and what typesetting. This might be best done as new namespaces similar to EntitySchema on Wikidata: each Property will have its accompanying set of generators per language, and each Item will have its accompany set of rendered text (or this can also be done on the corresponding Wikipedia pages). Deryck C. 23:09, 27 May 2020 (UTC)

I hope that is the case (and we'll have enough time to figure that out), but I don't believe it is possible. Wikidata simply lacks the expressibility to capture a story. I like to use the story of the transition of the Mayor of San Francisco from 2017 to 2018, where you one can capture a lot in Wikidata statements in Wikidata, but it would be really hard to do it in a way that allows to tell that in natural language.
But again, we'll figure it out. We will likely start with the functionality that Autodesc currently provides, and then build from there, and if we can get away without the Abstract Wikipedia, that would be so much easier. My prototypes and work in that area indicate that this is not the case though. We'll see! --denny (talk) 01:02, 28 May 2020 (UTC)
Actually, I can see what the missing features are.
  1. Reasonator only generates sentences where the subject of the item is the subject of the sentence. But there are often important sentences in the narrative of a topic where the topic itself isn't even part of the sentence (fictional example: John Doe's father was born during the Second World War and moved to the UK in 1957).
  2. To write a nicely readable story, the order of statements in an article most likely need to be customised.
We can address both of these by making Abstract Wikipedia a canvas where editors can drag and drop any statements, including statements where that belong to a different Wikidata item. But I'm skeptical about your criticism of Wikidata's expressibility to tell a story: my hunch is that prose features that can neither be expressed as i) Wikidata statements nor ii) juxtaposition of different statements on a page are probably non-universal. We've already had a few of these debates with Wikidata properties e.g. "uncle", "as" (formerly P794). Deryck C. 21:11, 28 May 2020 (UTC)

Comparison to Reasonator, part 2: Findability in external search engines

This is related to the previous section about Reasonator and also to the section about search engines and Implicit article creation.

One thing that Reasonator doesn't seem to be able to do now is to make the prose that it auto-generates findable in external search engines, like Google, Bing, and Yandex. Using the Nikolaï Kurilov in French link as an example, here's the French prose that it auto-generates:

Nikolaï Kurilov est un peintre, écrivain et poète. Il est né le 11 juin 1949 à Andryushkino.

If I google for "Nikolaï Kurilov est un peintre, écrivain et poète", I don't find this generated text. This probably happens because the text is generated completely dynamically on the client side and not stored anywhere.

Reasonator also doesn't appear in search engine results if I just search for "Nikolaï Kurilov".

Because of this, Reasonator is familiar to some Wikidata enthusiasts, but not to the general public.

To sum things up, Wikilambda should be different: it must make its output available to search engines. --Amir E. Aharoni (talk) 08:21, 9 May 2020 (UTC)

Yes, I fully agree. That's explicated in Components F2 and F3 - make the rendered content available to be findable in external Search Engines.
In fact, we could even be cheeky and make the content findable even without the local Wikipedias incorporating it, which would be required by F2 and F3. Hmm. Not sure what I think about that :) --denny (talk) 04:19, 10 May 2020 (UTC)
Thanks for the links!
I don't think there is anything wrong about making the generated text available even if it's not incorporated into a Wikipedia, as long as it is very clearly labeled, in a human-readable and machine-readable way. It must say that it was machine generated, and that it was not written, reviewed, or endorsed by the human editors of the Wikipedia in that language. It should probably also somehow indicate the maturity of the code that generated it—for example, if the renderer function for this language is outdated, the text should probably be marked as such, or maybe not shown at all.
But other than that, I cannot see anything unethical about making information available.
Maybe you could also ask some lawyers or brand people about it. --Amir E. Aharoni (talk) 13:35, 10 May 2020 (UTC)

Comparison to Reasonator, part 3: Feature synchronization

All of this is wild guesses, but I'll try to explain myself as clearly as I can.

In Reasonator, everything is just generic, raw JavaScript code, including the code that auto-generates prose. I assume that there is no feature in Reasonator that checks whether different features are implemented in different languages.

In Wikilambda, each text generation function should be a first-class feature. For example, if I care about maintaining good support for Hebrew in Wikilambda, I want to see a structured list of available features, and to see what is the status of each of them in Hebrew.

It would look more or less like this:

Function implementation statistics
Function name Status in Hebrew
Generating lead paragraphs for articles about cities in France Implemented
Generating lead paragraphs for articles about cities in Italy Not implemented
Generating lead paragraphs for articles about Russian scientists Needs update
Generating lead paragraphs for articles about asteroids Implemented

This is comparable to seeing the status of translation of different projects on translatewiki. For each project, I see what % of the strings in it were translated and what % need update, and as a volunteer translator I can choose: get something from 98% to 100%, get something from 40% to 60%, get something from 0% to 10%, etc.

In addition, there will probably be hundreds of functions, so it must be possible to filter and categorize this list :) --Amir E. Aharoni (talk) 08:48, 9 May 2020 (UTC)

I really like this idea. I mention in the technical paper (Section 9.3) that it would be an interesting future work to display something like that, a list of unimplemented pieces in the Renderers and how much impact they would have if they were implemented. I am not sure if we can have an online solution for this, or if this would require a big mapreduce run regularly, but I think that's would indeed be really helpful. I also really like your design sketch and the ideas here. I added a task to not loose this idea. Please take a look and feel free to improve it! Thank you. --denny (talk) 23:07, 10 May 2020 (UTC)
Similar to do lists are in use elsewhere, e.g. at Scholia via its "missing" pages aimed at facilitating curation, which complement the profiles and exist for things like topics, organizations or authors. -- Daniel Mietchen (talk) 02:15, 12 May 2020 (UTC)
@Daniel Mietchen: very cool, thanks for the pointer, Daniel! I added that too. --denny (talk) 02:21, 18 May 2020 (UTC)

lexical data

Wikilamba shines or fails not for lack of developers giving it a go but for lack of lexical data. The Wikimedia Foundation's ambition is to reach out for cooperation and collaboration. In the last Wikidata meet in Berlin, a linguist asked for attention. He represented the languages where we are hardly functional as a Wikipedia for readers.. So when linguists get a front row seat in helping define processes and when we work on processes to include their lexical data in Wikidata it will prove enabling for Wikilambda.

When you consider Wikidata, for me as a source of information it is dysfunctional. I invariably use Reasonator to get a grip on the data existing in and linking to a Wikidata item. I use English for a language, when I replace for for instance Zulu, a really important feature of Reasonator becomes apparant. Items where no Zulu label is available, show in a different language. They are underlined in red and when you double click on them, you can add a Zulu label. When you want more people get involved from other languages, a feature like this ie Reasonator and underline in red needs to be integrated with Wikidata.

The key point is that this is not to facilitate queries, our current overriding concern, but usability for people who feel not comfortable in our main language and as an enabler for Wikilambda. Thanks, GerardM (talk) 07:55, 9 May 2020 (UTC)

This sounds like a nice idea for user experience and inviting contributors! --Amir E. Aharoni (talk) 08:03, 9 May 2020 (UTC)
@GerardM: That's a great idea! I wonder if there could be a view on a rendered article in Wikilambda that is kind of an invitation to contribute, very much to what you describe for Reasonator in Zulu (I immediately went to try it out, that's cool!). Because you're right, most often than not this will probably just a lexeme missing in Wikidata, and not a grammatical renderer in Wikilambda, or some specific piece of information that wasn't necessary for another language, and it would be great to be able to easily get these in. In fact, one could imagine a nice interface that allows for going for articles where specific information is needed, somehow combined with the interface Amir mentioned in the previous question. I am adding your idea as a task. Thank you for the great idea! --denny (talk) 23:32, 10 May 2020 (UTC)

Missing the most important question: why a new Wikipedia?

I combed through the proposal page and the technical plan and did not find the answer to this question. I fully understand the need for a centralized code repository and for the changes in Wikidata and Wikipedia, but I fail to see the need or opportunity for a multilingual Wikipedia.

Let me try to explain why I think a new wikipedia is a bad idea and then put out some alternative proposals:

  1. It's confusing for readers. Let's say that I'm a reader searching for a subject that is not covered on my language wiki. I stumble in google results over the multilingual Wikipedia, click on it and read the article. From there, I want to find out more about something that my language wiki does have an article about. Will I be redirected there? If yes, why am I being taken to another website? If not, maybe I'm missing some information? Do the answers to the questions above depend on my Wiki's options?
  2. It's confusing for editors.
    1. I want to write about something, but abstract Wikipedia already has a cool article about it. How can I improve that article? Will all the links to my article on abstract wiki magically point to the new article?
    2. I have a new class of data regarding a subject. In order for it to appear on my wiki I have to edit on Wikidata, then change some code on another wiki then wait a few minutes/hours, maybe even days/months if review is required (it probably should)? That's not how Wikipedia used to work... What if I don't know how to program? Who do I talk to? What's with all these new "templates" (actually magic words) that people are adding in articles?
  3. It's confusing for programmers and backwards-unfriendly. What happens to existing modules? What happens to templates like {{ill}}? Will layout-conversion gadgets still work with pseudo-pages?
  4. It will create an artificial and unfair competition between the "old" wiki and the equivalent language version of abstract wiki, encouraging local wikis to be even more bot-dependent in order to have a significant number of articles and remain competitive on Google.
  5. Lack of support for other projects. I don't understand why you want to limit yourselves just to Wikipedia. Wikivoyages could benefit just as much from this feature.


My alternative proposals are as follows:

  1. Drop abstract Wikipedia as a standalone project and create it instead as some kind of test wiki for the code developed in Wikilambda. No articles should be indexed by Google, all local edits should be purged on new deployments.
  2. Make sure Wikilambda is fully backwards-compatible:
    1. No other language except Lua and JS please
    2. All code on Wikilambda should be usable and extendable on local projects (this implies that local projects are in control over the final look of the pseudo-pages)
  3. Prepare Wikilambda to become the central repository of any kind of code (e.g. central gadgets) even if it doesn't happen in phase 1
  4. Seamless local experience on Wikipedia:
    1. Pseudo-pages should be rendered as if local articles (just like File pages are done now)
    2. Pseudo-pages should be editable as wikitext or VE (i.e. when I click "edit source" I should see the wikitext corresponding to the page; when I save the page, a real local page is created)
    3. When a local page is created, all links to wikidata/abstract/whatever now "magically" point to the real page (just like for {{ill}}).
    4. [Optional, maybe as gadget?] Links to abstract are converted on page edit if a local page exists.
    5. [Optional] I can edit data directly from the pseudo-articles just like I can from wikidata-enabled infoboxes on some Wikis (this is risky, maybe enable just for some user groups?)
  5. Support for all other projects that request it.

I hope you will consider at least some of my proposals for future iterations so I can support this promising project. :)--Strainu (talk) 11:00, 11 May 2020 (UTC)

@Strainu: Thank you for your question. I am sure I can show you that a number of your suggestion have already been taken into account. I guess I need to apologize for not writing more clearly - and there sure is a lot of content to go through.
I fully agree with your point that there should *not* be a new, multilingual Wikipedia, but that the Content from what is called "Abstract Wikipedia" in the proposal should be integrated seamlessly into the existing Wikipedias as they choose. So, an existing Wikipedia, say Romanian, should be able to say "on this topic, we don't have a good article, but in Abstract Wikipedia there's some decent content, and there are some good renderers, let's take that and show it in the Romanian Wikipedia." This is described in Components F2 and F3. I do *not* want to set up a Wikipedia that competes with the existing language Wikipedias, I want to give the existing language Wikipedias to take content from the Abstract Wikipedia and display it seamlessly in their own Wikipedia.
"Abstract Wikipedia" is a bit of a misnomer. So that's not a new Wikimedia project I suggest, or an actual Wikipedia, it's just a name for the development tasks. There won't be a thing called "Abstract Wikipedia" at the end, a new site, or such. That's the name for the development to extend Wikidata so it can hold the abstract Content for the articles which then gets rendered by the functions in Wikilambda to be displayed seamlessly in the Wikipedias. The architecture discusses this. It also discusses different options, and one of the options is as you suggest to have all the content in Wikilambda - but I think that this would have a number of disadvantages. It seems better suited to have the content in Wikidata, and then the Wikipedias use that. This is discussed in the architecture page. But what I definitely agree is, that there should not be a new Wikipedia, competing with the existing ones.
As it is integrated seamlessly, there should be no confusion for the readers, because there is no Abstract Wikipedia page. There are only the language Wikipedias. There also should be no confusions for editors. And links should point to the pages of the local Wikipedia, whether they are stored locally, or they are generated from the Abstract Content. But there's never the question about one or the other. The local content always wins.
No waiting for hours after changing Wikilambda code and data in Wikidata is expected. We still have to figure out the caching mechanism, but in the worst case there's something akin to a purge button that creates the page in real-time. This may take a few seconds, but that's it.
All Wikimedia projects will be able to call functions in Wikilambda and Content from Wikidata or Wikilambda. This support is not limited to the Wikipedias. The Wikipedias get a little bit of special treatment, because they can store content next to the respective Wikidata item, but Wikivoyage and all other projects will be able to call Wikilambda functions, and they can store content in Wikilambda. If there is sufficient demand from the community, we could even add a form Abstract Wikivoyage to Wikidata, where we can store this content as well, but that's not really needed to get started, it just might be more convenient in the long run. That would be rather easy to add. Of the other projects I am not sure how useful that feature would be as they are in general much less aligned across languages in the first place.
The workflow you describe - click on edit, materialize the text, and edit it manually - is already in the project plan, described at Component F2.
So I hope you'll be happy to hear that many of your points of contention are already taken care of, and I really hope that this will even lead you to change your vote. But also if you have further question, or if I didn't answer something to your please, please feel free to ask.
Again, thank you for your questions! --denny (talk) 20:22, 11 May 2020 (UTC)

Thanks for your response Denny. It does indeed cover most of my concerns. I think most of the confusion comes from the project tagline in the proposal table: A multilingual Wikipedia and a wiki for functions. Rephrasing that (and any other appearances) would go a long way towards calming the fears of people like me, who appreciate all that wikidata has to offer, but prefer to focus on developing the local projects.

That being said, I have some follow up issues, specifically regarding content creation and mulți-project support:

  1. Component F2 describes a complex process which includes creating a virtual sitelink on wikidata. How can we make sure at that stage that a valid renderer is available? Also, how is a valid renderer defined? If we have a well-stocked wikidata item but content for just a few properties, you would end up with a stub that might not fulfill local inclusion criteria.
  2. The same component says: If a user clicks on editing the article, they can choose to either go to Wikidata and edit the abstract Content (preferred), or start a new article in the local language from scratch, or materialize the current translation as text and start editing that locally.. That sounds very much like competition to me, so it would be preferable to leave the choice of the default to the local Wikipedias, including the option to have only local edits.
  3. Still on F2, creating a virtual article when a local article is deleted seems like an escape from local notability rules to Wikidata notability, which is way more relaxed - you make sure the item has all the required information, then create a local article which gets deleted and tadaa! One solution could probably be for local policies to apply to wikidata sitelinks, which sounds like a bureaucracy nughtmare.
  4. Expanding on the ownership (policy-wise) of the site links, there is also the question of manual and automated oversight. Having a local process for title creation would ease the pain. For instance, use another special page, Special:CreateAbstract, which could be integrated with all (well, ok, probably only the major) anti-vandalism tools that currently exist on Wikipedia, e.g. Flagged Revisions, Abuse Filter, ORES etc. Going through the local loops and hoops would improve the quality problem (are the existing renderers enough to create a decent article?) This would also solve the problem of reflecting sitelink changes in the Wikipedia RC for patrollers to investigate: while Wikidata changes can be included locally, the sheer size of wikidata quickly floods even medium wikis such as Romanian,so a local log would helo. Throttling would also be a requirement I believe.
  5. Moving on to component F3, I kind of fail to see how that is the least amount of work for a community. I might not be fully aware of how Google search works these days, but how would the virtual pages be discovered?
  6. Regarding multiproject support, if you keep all the code in one place, I think renderers should also be project-aware (or project-specific, but that would loose some of the reusage benefits), as the same data can be phrased differently in a tourism guide VS an encyclopedia. A more elegant option (although probably more technically challenging) would be to allow local overrides of the renderers (a la MediaWiki namespace). That's what I meant by reusing and extending the wikillambda code. Strainu (talk) 22:33, 11 May 2020 (UTC)
@Strainu: Thanks for the follow up!
I changed the tagline to "Multilingual content for Wikipedia and a wiki for functions", because, it is, indeed, not a new Wikipedia. That's a good point, thanks.
Regarding F2, the concrete user story of how to create the pages explicitly needs to be obviously refined, but the idea is indeed that the local Wikipedia contributors look at the rendered article first, and only then integrate the article into their Wikipedia. That way they decide, based on the resulting output, whether the potential article fulfils local inclusion criteria, or not. Also, deleting the sitelink again would remove the inclusion. But again, these workflows will be refined and discussed with the communities. There's plenty of time for that: the work on including articles into the Wikipedias won't start until the second year, and the start on working on the design of these is scheduled for the middle of the first year, so there is plenty of time planned for making sure these workflows make sense and discuss them with the communities.
Regarding F3, the virtual pages would be discoverable by search engines just like any other article in that Wikipedia. This would only work if we can actually automatically create the name of the article, which might or might not be possible for a given article. It would be the least work because they don't have to go through potentially millions of articles and include each of them explicitly. Once Wikilambda can render an article, including its title, it would be included.
The implicit integration along F3 requires a lot of trust into the Abstract Wikipedia project by the local Wikipedia, and so I expect the local Wikipedia to explicitly ask for this tight integration. This shouldn't be a default.
The UX flow for how to deal with edits directly in the Wikipedias needs to be refined. Again, there will be enough time for that, and I am convinced the local Wikipedias will have a lot of input in this process. I imagine a similar approach as we did with Wikidata, where a few Wikipedia communities volunteered to be early adopters, and we figure out the exact flows with them.
What I would like to see is indeed if you and others write design proposals on how certain workflows should look like (just as I did). I am happy to put them into the proposal, or link to them, and then, if we do this thing, when we get around to them, we already have suggestions, and can integrate different proposals, and discuss them. So, please, if you want to write down a proposal for how the explicit integration should work, or similar, do so, and I will be happy to either link it or integrate it from the proposal at the appropriate places. This can only help with making the proposal stronger and the project have a more helpful result for everyone.
What do you mean with "throttling"? --denny (talk) 15:57, 14 May 2020 (UTC)
@Strainu: I also linked to this discussion in the proposal now, so we don't miss it. Thanks! --denny (talk) 16:10, 14 May 2020 (UTC)
By "throttling" I mean limit the number of virtual pages a rogue user can create within a certain ammount of time on a certain wiki and/or cross-wiki.--Strainu (talk) 20:27, 14 May 2020 (UTC)
Does such a thing exist for creating new pages on a wiki? If yes, then there's precedence, and I think that might be a sensible feature, if a wiki wants that. --denny (talk) 21:33, 14 May 2020 (UTC)

@Denny: not per se AFAIK, but the same effect can be obtained through abuseFilter or even with $wgRateLimit Strainu (talk) 08:56, 19 May 2020 (UTC)

@Strainu: Yes, that might be a good way to implement it. I was wondering if any projects actually use this as a policy? --denny (talk) 21:24, 19 May 2020 (UTC)
Great! Can do the same, then. --denny (talk) 22:57, 24 May 2020 (UTC)

Actually, per [1] there is a proper create ratelimit (doesn't appear on MediaWiki.org, weirdly).

Wikidata seems to be using them, per phab:T184948. Strainu (talk) 06:15, 21 May 2020 (UTC)

Way to early for votes

It is way to many open questions, and it doesn't make sense to run any voting now. Iron out the open questions, and make a better project proposal. — Jeblad 17:12, 11 May 2020 (UTC)

@Jeblad: Hi Jeblad, and thanks for spreading the word in the Norwegian community! The project proposal is already more detailed than the Wikidata proposal was at the point we started working on that together. Also there have been discussions about this idea going at least back to 2013.
So I am unsure what the open questions are you refer to - I certainly tried to answer all questions that I am aware of - and how to improve the project proposal, besides by actually starting to work on it, which would lead to all the actual refinement, just as it did with Wikidata.
The project plan leaves the hardest questions - how to represent the Abstract Content, how to integrate into the Wikipedias, how to ensure that editing doesn't meet too high a barrier - all to Year 2 of the project. That gives us, once the project has started, a full year to work with the communities and refine that part. I really wouldn't know what value to expect from delaying the process to propose this project further.
For opening the vote, I followed the instructions as described at the new project proposal page. Which additional conditions do you envision we should meet before running for a vote? --denny (talk) 20:39, 11 May 2020 (UTC)

Move Abstract Wikipedia to it's own domain name

Given that users feared that Abstract Wikipedia gets confused by users who find a website via Google with the language edition and possible reduce knowledge diversity that way, I think it would make sense to host to not call the project currently called Abstract Wikipedia but something like UniversalPedia. ChristianKl11:40, 12 May 2020 (UTC)

There will be a naming discussion and decision once the project starts. It's part of Task P1.1. I am adding your suggestion to the naming page. --denny (talk) 15:28, 14 May 2020 (UTC)
I personally think this is best executed as a new subproject within Wikidata and does not require its own domain name. Although it would be wise to pre-emptively buy abstractwikipedia [dot] org, as Denny (or sb else?) has already done for wikilambda.org (which redirects to Denny's technical paper). Deryck C. 22:06, 27 May 2020 (UTC)

Version system

This is only a very immature idea.

Currently the way to request edit an high-risk template or module is to create a draft version in a sandbox page and request it to be merged with the main version. I propose a version system for Wikilambda, which may be a complement of "Freeze and thaw entities".

  • solid function: functions whose dependencies may be determined, i.e. we can explicitly define a list of other functions (which must also be solid functions), and the function will not call any functions out of the list regardless of parameters. For example, the MediaWiki code {{ {{{1|}}} }} is not a solid function, as it may invoke arbitrary other functions specified by parameter. Some part of this proposal (the build system) only applies to solid function.
  • version: a combination of two parts, code of a specific function and build specification (described below). Anyone can create new versions of function, just like submitting a patch for review; though a new version does not immediate change behavior of any existing pages. Version is immutable.
  • branch: an identifier, includes public branch (including master branch), and temporary branch. A branch may be protected as a whole (though not recommended) or by a specific function (a cascade protection may be introduced). In each functions, a branch is expressed as two pointers: version pointer and build pointer.
  • version pointer: a specific version per branch and function. Changing the version pointer will not affect existing pages per see, but will affect the build process.
  • build specification: a list of version pointers or versions per function which a specific function directly depends on. Some specific values are allowed, such as "default" which will refers to a master version, but in a temporary branch it will refers to the version in temporary branch if exists. It may also be possible to override the version of a specific function that a function not directly depends on, though this should be considered a future feature. (Usually, user don't need to edit this.)
  • build configuration: a list of versions per function calculated from a build specification. This is derived data.
  • build: a function (or module, if we introduce the concept of module as set of functions) and dependencies thereof, in a specific version, according to a build configuration. Build is immutable. Usually they are generated in a back-end process, but builds may also be generated manually. It may also include precompiled codes of function. Not all builds are materialized, as builds may be recreated from build configuration.
  • build pointer: a specific build (if generated) per branch and function, which will be used if it is invoked. The master build will be used by default. Changing a version pointer will make all affected build pointers outdated, and those in use will be regenerated.

For example, This will be the way to update the equivalent of Module:String in Wikilambda. Given it is heavily in use, the master branch can only be modified by users in a specific "core-function-editor" user group. If others want to contribute patch:

  1. Create a temporary branch.
  2. Users are free to modify functions in this temporary branch.
  3. Once edits are completed, create builds of functions involved. Alternatively, builds are automatically generated in every edit, for every function affected.
  4. Users may use {{LAMBDA:Temp123@capitalize(“san francisco”)}} to call the function in a specific branch (the @ should be a character disallowed in function name or branch name). It is also possible to render a page in a specific branch of function (i.e. each function will be replaced by the equivalent in the specific branch). Unit test may be performed against a specific branch.
  5. Once it is done, user may ask this branch to be merged to the master branch. The code review process is out of scope of this proposal.
  6. Alternatively, a branch may be forked to another name, which will create a new function.
  7. A consumer may use a specific version of function {{LAMBDA:2.0@capitalize(“san francisco”)}} - this version system provides ways to create stable versions of function, so that clients can prevent local pages from being broken, where it is not possible to use an old or static version of template or Lua module.

--GZWDer (talk) 15:37, 12 May 2020 (UTC)

Waterfall isn't a good design philosophy for software projects. Deciding all details of Wikilambda before staffing the projects is likely a bad idea. I think denny provided enough data to have a decision to have a Wikilambda project. Afterwards, the developer can work through the various important details. Other Wikimedia communities will be able to decide how the want to relate to Wikilambda like they can now decide how to relate to Wikidata. ChristianKl20:19, 13 May 2020 (UTC)
@GZWDer: Thank you for your proposal.
I really hope we can get away without reimplementing the good parts of git. The idea is that functions are small and rather stable in their specification, and instead of changing a function it would be recommended to create a new one. I am not sure how far this idea will get us, but to take one example - why would the specification of capitalize ever change?
I do like the idea of calling branches explicitly in case this would be needed, and thank you for your detailed design doc on how to do a version system. I am afraid that at some point this might be needed, and I am linking to your proposal from the task description so that we have that as a possible starting point. Thank you! --denny (talk) 16:07, 14 May 2020 (UTC)
If the unicode consortion adds a bold version of ä you would want to have your capitalize function so that it can also capitalize the new bold character. My intuition would be that you want functions to be changable but need approval in some cases like in Github. ChristianKl22:03, 17 May 2020 (UTC)
If function is totally immutable for each edit a large number of new functions have to be created.--GZWDer (talk) 02:30, 18 May 2020 (UTC)
@ChristianKl and GZWDer: Good example, ChristianKl. I would have thought that this would lead to a new function fulfilling the new specification. But I wonder if that is actually in any way different than just being able to call a branch. With the advantage that, using the branch-based solution, you could auto-update. Hmm. Good point.
Maybe something like "auto-update if you don't break these tests over here". Ah, this needs design docs. For now, I added some text to not forget this.
Regarding the large number of functions, GZWDer, yes, that is correct. But is that a problem? --denny (talk) 02:33, 18 May 2020 (UTC)

How will qikistraction work? Don't you need data relation statements in Wikidata first to connect ideas?

 
Triad Triopoly / Property relations

I notice that there are not very many qualifiers included in Wikidata, in the traditional linguistic sense of adjectives / participles. To get an idea of what I'm driving at cf. d:Wikidata:Property_proposal/banned_in where a proposal is written up that might be of interest. So far, not too much traction for introducing data statement relators as qixidata statements. This is also not really being discussed that much at wikipediocracy in a thread about the landlord game. Oh well, such is life. I fear I may never understand the mysteries surrounding this/these λx.y black box(es). ^^ SashiRolls (talk) 01:00, 15 May 2020 (UTC)

@SashiRolls: Yes, you are right, Wikidata is indeed missing out on a lot of qualifiers. Wikidata was, from the beginning, meant to be very bare bones. The expressivity is severely limited (even though it is already far more expressive than most other comparable knowledge graphs!)
But Abstract Wikipedia must go well beyond what can be expressed in Wikidata in order to become interesting. It is not the goal to limit Abstract Wikipedia to just what is already in Wikidata - although that's probably a start. But we will need to be able to express more knowledge than that. This is why we introduce a new notation. The Signpost article discusses this in more detail. --denny (talk) 02:44, 18 May 2020 (UTC)

Feasibility of the language generation

The following is a discussion from Twitter with Mark Dingemanse that I copy here with his permission.

Hmm. Do you think universal translation is possible between programming languages? And even if it was (in the sense of yielding same outcome), would it be readable? Would you prefer reading, say, autotranslated Python over 'pythonic' python?

Most disassembled and ported code is unreadable. If it's like that for (well-defined) programming languages, I don't know why one would be optimistic that autotranslations would be preferable over idiomatic writing (what you call 'from scratch') in natural languages.

For a limited class of 'brute facts' this may help; but the structural and semantic differences between, say, Chukchi and Farsi surely exceed those between C# and F# languages... and that's even before considering cultural and communicative context. -- Mark Dingemanse

Thanks! I understand the arguments, but note that the goal is not to take natural language input, disassemble it, and reassemble natural language output (although that would be a cool project, too). The goal is to create abstract content which gets instantiated using concrete templates. So we can make the abstract content be expressed on a sufficiently high level.
Can you find a concrete example in Wikipedia, where a sentence or short paragraph wouldn't be expressible in two languages of your choice? I would be curious to work through that. Doesn't have to be on Twitter necessarily, but we'll find a good place to work through the examples. I really would like to have a selection of problematic sentences. -- Denny

Okay this is way too interesting. Let's take an example of yours, "mayor" — seemingly a clear enough, easily definable term. As in: "The current mayor is former District 5 Supervisor and President of the Board of Supervisors London Breed" (from en:Mayor of San Francisco).

  • Staying within Germanic languages, let's start easy, with German. Bürgermeister or Oberbürgermeister? Depends, among other things, on the kind of analogical mapping you want to do (and there are multiple possibilities, none neutral) (en:Mayor#Germany).
  • Or take Swedish, where the cognate term borgmästare was actually abandoned in the 1970s. Perhaps this one's easy: Swedes may well just use "mayor" for the mayor of San Francisco — but that's boring and issues would still arise with historical articles (sv:Borgmästare).
  • Moving to Slavic, how about Polish? We'll probably use a calque from German ('burmistrz') but the semantic space is again being warped by alternatives, and partial incommensurability is demonstrated by key terms remaining untranslated in this paper.
  • Colonialism has an ugly habit of erasing cultural institutions and indigenous voices and vocabulary — and even then the result is rarely simple translational equivalence, as seen in the use of Spanish alcalde (judge/mayor/{...}) in the Americas (paper).
  • Moving further afield, let's take Samoan, where the office of pulenu'u is a weird mesh of locally organized administration and colonial era divide and conquer policies. "Mayor" might be translated as pulenu'u but it would definitely have a semantic accent paper).

On the very useful notion of "semantic accent", see Werner 1993. It is this kind of careful anthropological linguistic work that most strongly brings home, to me, the (partial) incommensurability of the worlds we build with words.

I have here focused on languages for which there at least appears to be an available (if not fully equivalent) translation, but of course there will also be those that simply haven't lexicalised the concept — think of languages spoken by egalitarian hunter-gatherers.

One might say that they could surely adopt the concept & term from another language and get it. Sure. And there's the rub: ontologies are rarely neutral. The English term "mayor" is supported by and realized in its own linguistically & culturally relative ontology.

While I've mostly taken the English > other language direction here (which may seem easier because of globalization, cultural diffusion, calqueing, etc.), clearly the problems are at least as bad if you try going the other direction, starting from other culturally relative notions.

If even a seemingly innocuous term like "mayor" is subject to this kind of warping of semantic spaces (if it's available at all), that doesn't bode well for many other concepts. Which is why, even if I like the idea, I'm skeptical about a concrete Abstract Wikipedia. -- Mark

That is a surprisingly simple example to solve (because I got lucky and I can cheat): there's a Wikidata item for the mayor of San Francisco, and we can collect the realisation for this particular office in all languages that the project supports.
Now, we usually won't be that lucky. But it points in the direction of the solution: we can have exceptional items like these, with their realisation in different languages, or we have types of items that are realised in the same way across several offices or cities.
These then can share a construction for realising the office, which can then take the city as the argument. Items are used for exceptional singletons. And we have some fallback for the rest. This would allow us the flexibility of using the right word in each language. -- Denny

Yes, I went for your home turf so we can better interrogate assumptions.

TL;DR Wikidata may help us autofill some slots, but social ontologies are never language-agnostic so the project risks perpetuating rather than transcending (Anglo) worldviews.

Some phrases do a lot of work in your reply. E.g.,

  • "The realisation for this particular office" — words in natural lgs never just realise some pristine underlying predicate-argument relation: they (also) construe, colour, frame, warp, even undermine it (see: semantic accent)
  • "...in all languages that the project supports" — ground truth data would be great but let's critically consider the source: usually, translations of a culturally dominant (=Anglo) Wikipedia produced by a strongly skewed demographic. Is this not a recipe for ontological bias?
  • "items are used for exceptional singletons" — identified how? And what if it's not a 99:1 distribution but more like 20:30:35:15 between 4 partly overlapping concepts each shading into different semantic areas, as in the German : Slavic : Samoan : Mexican Spanish examples?
  • "... using the right word in each language." — do you actually think this exists for each language, including, say, !Kung or Hadza? Can we suffice with autogenerated pulenu'u in Samoan or would it benefit from a touch of human contextualization & interpretation after all?

I think Wikidata is promising for brute physical facts like the periodic table and biochemistry. But the social facts we live with —from politics to personhood and kinship to currency— are never fully language-independent, so any single ontology will be biased & incomplete -- Mark

Thank you for staying on my turf, I appreciate that.
This is proposed as a Wikimedia project. This has several advantages for the project.
  1. We already have articles for San Francisco in 168 languages, and ~80 mention the mayor. That's the baseline. If we can do as good as the current system re translating "mayor", I'm happy. As far as I can tell it, most of your arguments apply to translations in general and are not specific to the Abstract Wikipedia proposal.
  2. Once we reach that baseline it's a perpetual work in progress. Our goal is not perfection, but continuous iterative improvement. We start from where we are now, and improve.
  3. What is an exception and what is not is decided by the community and their work.
  4. The word "mayor" or the translation can *link* to the respective article, which makes it less necessary to explain the whole context in-place. It's hypertext, we can provide more information by linking, and readers can find what exactly "mayor" means with a click.

We can also automatically use a realisation such as "alcalde (ingl. "mayor", el cargo del departamento ejecutivo)" the first time. The realisation doesn't have to be a single word, it can be a phrase and a link the first time. (Forgive Spanish errors, I learned on Duolingo). -- Denny

BTW this solution evoked a Gödel-like jolt for me, as it seems to admit that the ontology (of 'mayor') may be irrevocably Anglo & could fold back onto itself in countless localised cross-references — which, if allowed omnidirectionally, would create a dense thicket of xrefs. -- Mark
Wouldn't that be awesome?
It makes me wonder if it would unfold somehow in the way of Wierzbicka's program towards translation and semantic primes. --denny (talk) 04:19, 16 May 2020 (UTC)

I enjoyed the conversation, thank you. My skepticism comes from a place of fondness for Wikipedia and free knowledge, having been a regular contributor & admin in the early years (as well as a member of WP:CSB). Great to hear that community involvement remains key! -- Mark

Thank you too, I enjoyed it as well! I think these are very important questions, and it is crucial to discuss them.
This proposal won't solve many of these existing issues, but it may solve some others. --denny (talk) 04:16, 16 May 2020 (UTC)

Governance

  • I found there is no section regarding governance of community and contributors for this project in proposal. Can the proposer share your views and plan for how the governance will work in the new project, given it's multilingual. What will look the same or different between Wikidata and WikiLambda. Xinbenlv (talk) 19:33, 21 May 2020 (UTC)
@Xinbenlv: Great question, thanks. The proposal intentionally doesn't cover this. This is a technical proposal regarding how to implement the project, but the governance will be left to the new communities that will form. This has served Wikimedia projects well so far, and I believe that such an autonomy of the projects is important. --denny (talk) 02:40, 22 May 2020 (UTC)

@Denny and Xinbenlv: Are we right to assume that, though the details will be left to the new community when it forms, this will be a Wikimedia project, and so WMF (or an affiliate) will take ultimate legal responsibility and provide the computing infrastructure? Deryck C. 22:19, 27 May 2020 (UTC)

Yes, absolutely. --denny (talk) 00:55, 28 May 2020 (UTC)

call, write, maintain, and use

Just nitpicking: The current Wikilambda page says that Wikilambda is "a catalog of all kind of functions that anyone can call, write, maintain, and use".

Is there a difference between "call" and "use"? --Amir E. Aharoni (talk) 08:28, 24 May 2020 (UTC)

@Amire80: Yes. "use" is wider - functions, particularly the composition of functions from other functions, can be used in far more creative ways than simply calling them. They can also be used to analyse other code bases for patterns, or to train automatic coding systems, or to visualise how certain algorithms work, and towards other educational and technical goals. So, whereas "call" is technically subsumed by "use", it is a very important usage, so I called it out explicitly, but I don't limit it to it. --denny (talk) 23:00, 24 May 2020 (UTC)

Performance

Python is a slow programming language (compared with C or C++), so I do not expect majority of functions are written in Python (we already encounted plenty Lua memory error and time out). Purely functional implementations are slow too (this is why we introduced Lua). Lua is a bit better but only if we do the whole thing in Lua (not moving data to and fro between Lua and PHP).

There's some idea:

  1. Try to transpile the main function call to Lua. This may be a big challenge when we develop more feature. Codes implemented via Python and JavaScript may still require be invoked via proxy functions.
  2. Use WebAssembly (see the next section) and compile Python and JavaScript code to WebAssembly and put them together. This requires more works and the combined code may still be not optimized. I doubt whether this is possible.
    By the way. adding WebAssembly as one of function implementations will make it easy to reuse existing C/C++/Rust code. Note as WebAssembly instance is stateful, so all data should be passed via parameters (unless, we call functions with a environment, which may be a new datatype, though I don't recommend such solution.)

--GZWDer (talk) 14:18, 30 May 2020 (UTC)

@GZWDer: Thanks, yes, those are great considerations, and yet, I have to admit, I think they should be secondary (and that's a risk for the project). My first goal is to make it as simple as possible to allow as many people as possible to participate in contributing to Wikilambda and creating and maintaining renderers for specific languages. Figuring out how to make it fast enough is a secondary goal - worst case, we'll have some offline computation pipeline. But I aim to balance broad participation over computational efficiency. Now, I understand that this is a risk. Compared to the functional implementation in the MediaWiki template language we should have some advantages that will hopefully be able to avoid those pitfalls.
Another reason why I think we might avoid these issues is that we will slowly grow into the problems. The beginning won't be problematic anyway, and only if the project becomes a resounding success, will we slowly crunch into efficiency issues. Just as with Wikidata, I would suggest we start simple and learn on the way how the system behaves, instead of trying to predict it and guess where the bottlenecks will be. We can and should still monitor these proactively and resolve them ideally before they become growth-stoppers, but I don't want to try to solve them prematurely.
Yes to the transpilation to a single target, be it Lua or WebAssembly (and I really think WebAsm could be a super interesting target!). One hope is that the way functions can be composed from lower level functions in Wikilambda will allow to create such a transpiler. My hope is that a year or two after launch the system will be active and interesting enough for researchers in PL and Compilers to take note and try to come up with evaluation strategies and evaluators that are far beyond what we will initially implement. That worked well for Semantic MediaWiki and Wikidata before, where most of my early code and algorithms has been replaced by significantly better optimised solutions. I hope to achieve a similar dynamic here.
And yes, I think your ideas are spot on! My sense is that your ideas would lead to a 100x-1000x speed up compared to what I have sketched out in the current plan.
I hope that makes sense as a strategy? What do you think? --denny (talk) 14:53, 4 June 2020 (UTC)
I checked the project plan, and whereas it mentions Web Assembler as a supported language, it does not yet talk about transpiling an implementation from a function composition. I added that as a task, because I really think that's a brilliant idea and I wouldn't want to lose it. Thank you for the suggestion! --denny (talk) 15:00, 4 June 2020 (UTC)

Client-side code and interactive design

(This is a bit out of project proposal; If you don't like you can just ignore the whole section)

Wikilambda may generate any text, for example it may generate an SVG from a Commons dataset (this would be more useful with Wikilambda/Plan#Task_O21:_Stream_type). But as w:Wikipedia:Lua#Lua_input_limitations says, it cannot create a box that calculates the square root of a number you type in, or recalculate a piece of the Mandelbrot set depending on which part of the parent set you click on. This may be overcomed via following ways (these solutions can also be used to build gadgets):

  1. Parse and process all code via something like ResourceLoader and generate (compact) client-side JavaScript code. This only works if no Lua or Python code is embedded.
  2. Use a server-side rendering process. This may call backend Lua or Python (via web requests), but DOM structure may only be manipulated by JavaScript.
  3. Build user interface using arbitrary language and display them via HTML canvas. A bit less efficient; I don't want to imagine a Twinkle completely based on canvas)

--GZWDer (talk) 14:18, 30 May 2020 (UTC)

I think I understand the scenario, but it sounds like a more appropriate implementation for such a thing would be in JavaScript packaged as an extension or a gadget, or a backend tool, probably written in PHP, and packaged as an extension with a form in the frontend. --Amir E. Aharoni (talk) 19:50, 31 May 2020 (UTC)
@GZWDer: In my understanding, I don't see Wikilambda be a great place to actually design the UX or the whole app, but only for the functions. Yes, you could put it in in the way you describe, but that would be more a proof of concept. I was imagining Task O26 to be about how to make the functionality inside Wikilambda easily accessible to an HTML-based UX or an app-based UX.
Put differently, the box and the client-side JavaScript would be written in traditional HTML/JS/CSs, and then it would call the respective function in Wikilambda to calculate the square. Or you have a canvas that displays a Mandelbrot visualisation created from Wikilambda, and the controls to select how to zoom or navigate, and then Wikilambda can calculate a new picture by easily allowing the JavaScript to invoke the function.
Wikilambda is not an app-hosting platform, but just functions. A decent app will need to add more product around it. And these, as Amir points out, should probably live in the classical way.
But that's just my current understanding - I would be surprised if things don't go unexpected ways :) And I really enjoy your boundary-pushing thoughts on that. --denny (talk) 03:49, 5 June 2020 (UTC)

Deferred update

We should introduce a new datatype for computed value, which are computed in a separate process independent of page rendering. Even if the performance issue is solved, there are several things that can not be run on-the-fly, like SPARQL queries (this is more the task of Wikidata development and out of scope of Wikilambda, but I hope Wikidata should support parametered queries with instances), SQL queries (see next section), machine learning, etc.--GZWDer (talk) 14:18, 30 May 2020 (UTC)

@GZWDer: Yes! Most of the compute will be done offline, I assume, and just cached and displayed on read. But yes, some smarter approaches are meant to be covered by Task O4. Feel free to extend the task description! --denny (talk) 04:15, 5 June 2020 (UTC)

Database

We should create two or three new datatypes for

  • Relational database table
  • K-V store, or more generally, "document" store - See [2] for example
  • triple store (optional)

Note this is different from Commons dataset as the whole table is usually not loaded as a whole, which may have several MB or even GB. Large databases should be stored externally (in a service such as tools-db) and "derived" to Wikilambda.

A table may be:

  • Primary - stored in a Wikipage, which may not be longer than 2MB (the limit may be increased to 10MB, but there is still a limit)
  • Derived but cached - a table is generated via a SQL query or other external source, like SPARQL or stream. The result is semi-permanent and stored somewhere but not in database dumps. Such a table may not be larger than 1GB.
  • Derived but not cached - instead of a table it should more accurately be called a view. We can create views for Wikimedia database, which may be several TB as a whole, and also external databases in a service such as tools-db.

--GZWDer (talk) 14:18, 30 May 2020 (UTC)

Relational database: Definitely, we need it. Wikidata is great as a data store for a lot of things, but it's too clever for a simple tabular data, which is sometimes needed. JSON can be used for this, too, but it's way too loose and unstructured. Already now it would be useful for Wikipedia—for tables of climate, for tables of parliament members, pandemic statistics, and lots of other things. And when Wikilambda appears, it would be useful for Wikilambda, too.
K-V store: How would it be different from a relational database table with two columns?
Triple store: Isn't that what Wikidata does?
Sorry, perhaps I'm showing ignorance in the last two points :) --Amir E. Aharoni (talk) 19:47, 31 May 2020 (UTC)
  • I now find a use of a local triple store: I scrapped ~30000 pages from an external website and want to compare its data with Wikidata. If we use SQL we have to either make a query for each statement (for comparision) or download all triples of a specific property, both are awkward to use. Alternatively the whole external database can be converted to an set of triples and federated queries may be used.--GZWDer (talk) 15:20, 4 June 2020 (UTC)
@GZWDer: Just to make sure I understand: since Wikilambda is stateless, these stores (be their relational, KVs, or graphs) are basically input parameters, right? So just as we would call the Wikidata SPARQL endpoint or a Commons datatable? Or are you thinking of them more like temporary tables for the way a specific function call is implemented? --denny (talk) 03:57, 5 June 2020 (UTC)
@Denny: This means the stores are treated as special "values". They can be generated from SPARQL result or Wikimedia database etc. It may be "primary" i.e. directly given, or "derived" i.e. as a return value of other functions, whether it is materialized (cached) or not (this include "deriving" data from a future dedicated service for community-maintained database similar to tools-db). It's able to call a Commons datatable, but this means the whole table had to be loaded in the memory - even if only a small part is needed.--GZWDer (talk) 04:22, 5 June 2020 (UTC)
@GZWDer: Thanks for the clarification! Yes, agreed, that should be possible. External stores should be accessible via REST, and internally they should be representable as user-generated types. And then we'll probably need to make it more efficient, I would guess, but do you think that should cover it? --denny (talk) 14:33, 5 June 2020 (UTC)

Two small questions

Hi there, just a couple of small questions about the project.

  1. In Wikilambda/Components#F1: new Special Page: Abstract, you say: The special page displays the Content from the selected Q-ID or the Q-ID sitelinked to the respective article rendered in the selected language. What if there is already a label for the selected Q-ID? Can it be used as title?
  2. In Wikilambda/Components#F5: New Magic Word: ABSTRACT WIKIPEDIA, you say: The magic word is replaced with the wikitext resulting from Rendering the Content on the Wikidata item that is connected to this page through sitelinks. What if I want to use that content inside a "bigger article" such as a list-like one (i.e. "List of discontinued ministries")?

Thank you in advance! --Sannita - not just another it.wiki sysop 12:21, 1 June 2020 (UTC)

For the second question, a possible solution is to allow one content contains multiple parts, which may be individually transcluded and one part can also call other part. There're two possible way to identify the parts: 1. use sequential IDs like Content:Q62#123, but the ID itself will be meaningless; 2. identify them via roles, which are references to Qids.--GZWDer (talk) 14:34, 1 June 2020 (UTC)
@Sannita: Thanks for the questions!
Re Question 1: Yes, absolutely, and it should! Wikilambda will be able to query Wikidata and use the results for rendering.
Re Question 2: the ABSTRACT WIKIPEDIA magic word (which is really a placeholder) also can be called with parameters, one of them being a Wikidata QID. So if you want to use the content of an item in a larger article, you can just call it with the magic word and the Item QID.
Regarding the idea of parts, yes, that's right, that is a better approach. I changed the glossary already to understand that Content is an instantiated Constructor, and thus we can render at any level - from a whole article down to a single phrase. This leads to a follow-up question of how to address a specific content, but that needs to be answered anyway in order to allow for graceful degradation of an article (i.e. be able to say "only show this paragraph if that sentence over there was rendered"). So, I think the flow described here should work. --denny (talk) 04:26, 5 June 2020 (UTC)

A question

Will the current language subdomains of Wikipedia disappear and be merged into Abstract Wikipedia? --Agusbou2015 (talk) 14:48, 7 June 2020 (UTC)

@Agusbou2015: No. The language subdomains would remain the main channel to distribute content. But they could use some of the content from Abstract Wikipedia to become more complete and more current. But Abstract Wikipedia is a background project, just like Wikidata, and does not aim to replace the individual Wikipedias. --denny (talk) 02:35, 8 June 2020 (UTC)

Terminology

Abstract Wikipedia/June 2020 announcement says "new wiki project", which might mean anything. The resolution says it's actually approved as a new Wikimedia project, a term which has a specific meaning. (It can have a new domain, separate MediaWiki installations etc.) Also, no idea what "wiki-project" means, is it even standard English? Nemo 19:11, 2 July 2020 (UTC)

@Nemo bis: thanks for pointing this out! It means to be a proper new sister project / Wikimedia project. The architecture is still fluid - the current proposal can be found on the content page - but yes, it is meant to include a new Wikimedia project. And I am looking forward to explore what standard English is :) --denny (talk) 19:56, 2 July 2020 (UTC)

Abstract Wikipedia was approved

The Wikimedia Foundation Board of Trustees has approved Abstract Wikipedia as a new sister project. You can read the full announcement and also join the new dedicated mailing list. I will be updating the proposal and also answer the open questions above (and the new ones that will likely come below) in the next few days. Thank you! With gratitude and excitement, --denny (talk) 19:51, 2 July 2020 (UTC)

λ.wikipedia.org? Carn (talk) 21:41, 2 July 2020 (UTC)
"Abstract Wikipedia" is not the final name. The final name is to be determined.--GZWDer (talk) 22:44, 2 July 2020 (UTC)

"programming language" for WikiLambda content

In this Github repo, you have outlined some code examples for the proposed WikiLambda project. Therein, it is also claimed that “the current implementations are strongly inspired by Grammatical Framework.” (GF, a programming language for writing grammars of natural languages, see en:Grammatical Framework). Looking at the projects listed on their website, it appears that use cases for this language are pretty academic until now, despite the language being more than 20 years old.

As a successful WikiLambda needs a community with a lot of technical and linguistic expertise, I am wondering how much community there is already out there (or isn’t, in other words), because a rich set of readily available ressources would clearly be helpful to get a WikiLambda running. I am also interested to learn whether there are other relevant programming languages besides GF available for the proposed task, and why GF (or some derivative thereof?!) has been chosen over them for WikiLambda. —MisterSynergy (talk) 18:16, 7 May 2020 (UTC)

@MisterSynergy: oh, the implementation is inspired by GF, but it is *not* the GF language. I basically took some of the ideas from GF and reimplement them. The actual languages in the current implementation of AbstractText are JavaScript and Python, and the proposal suggests to go for JavaScript and Lua, and more to add based on demand. Then the functions can be combined, but no, I didn't explore to actually use GF as an implementation language (the AceWiki project did so, but they restricted it to a sublanguage of OWL). I agree that the pool of people who already know GF would be unfortunately limited.
If the text is misleading and making it look like the language is GF instead of being inspired by it, let me know, and I'll change it. I mention GF more in order to acknowledge the intellectual heritage. --denny (talk) 03:40, 8 May 2020 (UTC)
denny, the PDF at wikilambda.org gives the impression that it will be a new programming language.
Lua and JavaScript —and wikitext!— have the advantage of being familiar to a lot of current technically-oriented wiki editors, but if a different programming language is more appropriate for building such tools than Lua, JavaScript, and wikitext, then it may be fine. I leave it to the engineers to decide whether going for a different language is worth the cost of getting community people to learn a new programming language. Before 2012, not a lot of people in the Wikimedia community knew Lua, and some learned it to write Scribunto modules, so it's conceivable that it will happen again.
As some people probably know, Scribunto can work with other languages in theory. In practice, it exclusively works with Lua, but adding JavaScript support has been requested for a long time. If the Wikilambda project will create such support, it will be another fulfillment of an old community wish. (This may be even more relevant given that JavaScript usage in the MediaWiki universe is probably going for a thorough overhaul.)
If it will be done using an existing programming language like Lua or JS, it will require a thorough, featureful, and well-documented library, but I guess you know that :) --Amir E. Aharoni (talk) 08:57, 8 May 2020 (UTC)
Oh, I am surprised to hear that the paper gives the impression of creating a new programming language. I really need to improve my clarity! Section 5.3: "Implementations of functions in Wikilambda can either be native in a programming language such as JavaScript, WebAssembly, or C++, or can be composed from other functions." If you have suggestions what could reduce that impression, that would be helpful for version 2 of the paper. --denny (talk) 17:50, 8 May 2020 (UTC)
Figures 1, 2, 3, and 6 in the PDF look a lot like code in a programming language. But it's quite possible that I misread something—I had no choice but to read it while locked at home with small children running around :) --Amir E. Aharoni (talk) 19:50, 8 May 2020 (UTC)
Ah, I see. Yeah, no, not really, at least not in the traditional sense. I should make it clear in these Figures that these are just sketched. Thanks! --denny (talk) 00:56, 9 May 2020 (UTC)
Thanks denny for your answer. I think I was not clear enough in my initial comment apparently; I am particularly referring to the “Natural language generation” section on your github manuscript and the functions presented therein. Initially I had the impression that you had come up with something completely new, but the reference to GF indicates that this is not (entirely) the case, although in fact not wrong as well. When looking at the provided examples, I find it not very intuitive to work with the proposed language that actually generates the natural language output (I have to mention that do not have reasonable experience in this field). I also realize that the shown examples yield a pretty basic SPO-like output; grammatically correct, but far away from natural language that would be appealing to a human audience. So let me drop some more specific questions here:
  • How should this scale up to a point where an Abstract Wikipedia would be eventually useful? How many community members would be needed for the function programming (order of magnitude), how should they learn the technical basics if this is completely new?
  • How much will these functions depend on each other? Based on the subclass-mess we often see in Wikidata and also the way how Lua modules in Wikipedias are modified, there better not be too much interdependency between different functions :-)
  • When we consider this part of WikiLambda to be “programming”, is a wiki-based approach desirable at all? This sort of version control is IMO inferior compared to using a Git repository, and it also makes it extremely difficult to perform proper testing.
MisterSynergy (talk) 12:25, 9 May 2020 (UTC)
@MisterSynergy: Thanks for clarifying. Those are hard questions and I don't have hard answers for them, only intuitions. Those are also important questions, so keep them coming!
The nice thing about Wikilambda is that the implementation that I am suggesting for the natural language generation might be completely wrong, and yet, Wikilambda and most of the other work proposed in the project would be a useful foundation to try a completely different approach. For example, based on what we see from Lsjbot and Reasonator, we see that there are useful parts that we can get with a quite simpler NLG approach that it closer to mail merge (basically templates with fillers and options). And since Wikilambda is actually agnostic towards the NLG technology on top of it, we could just do that (in fact, the proposal suggests that in P2.14). So, even if I got that part of the proposal entirely wrong, there will still be a lot of useful results from that.
The proposal is, in general, a high-risk proposal. It is much riskier than Wikidata was (which a lot of people called infeasible before we did it). That is why it is structured in a way where we have intermediate milestones such as Wikilambda which have value in themselves. It was similar for Wikidata (even if Wikidata would have had failed with the knowledge base, hey, at least we cleaned up the interwiki links mess). Wikilambda will be useful as a catalog of functions, even if the Abstract Wikipedia should run into insurmountable challenges (which I don't expect - I expect we will get somewhere with the approach, it might just fall short from getting proper nice article with substance and may remain just with something like small stubs). And if Wikilambda should get into insurmountable challenges regarding creating a catalog of functions, hey, at least it will help with centralizing some of the functionality of templates and modules and make that accessible to more Wikimedia projects.
So there is some risk mitigation going on in the proposal. I have looked into a number of systems in NLG in the last few years, and there are a number of reasons why GF is currently my favorite one. But again, it doesn't have to be the one that succeeds. Time will tell. So, why GF?
One reason is that for Wikilambda I am quite intentionally choosing functions as the main building block. And one could ask, hey, why? Objects are much more prevalent among software engineers. Why choose functions? And the reason is that functions are extremely local. There is no state, there are no globals, everything is in the input and all you get is the output. And in between the implementation can do whatever it wants, but we don't have much 'spooky action at a distance' going on. And that's really important if we want people from all over the world all contribute small pieces to a large puzzle, one piece at a time, without constantly getting into each others way. We can create the function interface, specifying the out- and inputs, then we can independently write and agree on tests for the interface, independently create several implementations for each interface which can test each other, which can use the tests for checking they are correct, etc. And each of the functions are meant to be quite smallish, building the puzzle further and further. Which is why I hope we can avoid too much versioning mess. (The proposal suggests to introduce freezing to further avoid the versioning mess)
Also, functions seem, based on my experience teaching computer science, to be quite easily understood as a concept, and I have the hope that we can lure more people to Wikilambda than using any other paradigm. Furthermore, they have the advantage to be able to be safely run in many different contexts, and with comparably little overhead.
And now the last piece: GF is also based on a functional approach! So we have an NLG approach that has shown to be promising, built on a functional approach, and we have the listed advantages for using functions in Wikilambda, and there we go, that whole thing seems to somehow fit together.
Having said all this as a preamble, let's get to your three questions:
  • How many contributors will we need? - It depends on the number of Constructors and that's a big unknown. I hope they are in the low thousands. My hope is that we can get very far with <10 contributors per language, and these don't have to be continuously active! Grammar doesn't change that much. And then some more core members who work on shared functionality, maybe another ~100? These numbers assume volunteers in their spare time.
  • Dependency between functions? - As said, the assumption is that function specifications freeze, and therefore the dependency remains controlled.
  • Worse than git? - yes, in many ways. But: git also leads to a much higher barrier to entry than I think we can afford. Due to very local aspects of functions, we can also test locally, and provide the results through the browser. Put differently: we don't need to run integration tests constantly, we'll get very far with unit tests which can be highly localized. I think it is more important to reach out to a bigger pool of developers, who won't always be able to set up a development environment and git. I think there's of democratizing coding quite a bit with this project. (Section 10.6 in the technical paper is on that.)
Thanks again for asking! As said, these are important questions and I hope I didn't try to weasel out of answering them. --denny (talk) 00:28, 11 May 2020 (UTC)
I would suggest that functions can be referenced by version, that could help solve the problem with dependencies. Whenever some function uses another function you could choose to reference either a specific version of that function or say that the reference should always point to the newest version of the function. Each logical change could be a new version of the function. So if the function lambda_to_positive_integer(lambda_add(positive_integer_to_lambda(left), positive_integer_to_lambda(right))) was change to lambda_to_positive_integer(lambda_add(positive_integer_to_lambda(right), positive_integer_to_lambda(left))), that would automatically increase the version number by one. In wiki software today, each version could potentially be a large change, if each version has an implementation in javascript and then the next version has large changes it javascript, that could potentially be problematic. If the changes were made directly to the lambda and each logical change was stored, then if you could generate the javascript from the lambda, you could perhaps also generate javascript of each of the intermediate steps of the change if needed and for simplifying code merging. Fuelbottle (talk) 21:32, 3 July 2020 (UTC)
@Fuelbottle: See Talk:Abstract_Wikipedia#Version_system, though this is still not an idea ready for implementation.--GZWDer (talk) 02:57, 4 July 2020 (UTC)
Return to "Abstract Wikipedia/Archive 1" page.