Talk:Abstract Wikipedia
This page is for discussions related to the Abstract Wikipedia page. Please remember to:
|
Abstract Wikipedia |
---|
(Discussion) |
General |
Development plan |
|
Notes, drafts, discussions |
|
Examples & mockups |
Data tools |
Historical |
Sub-pagesEdit
- Project pages:
- Talk pages:
What about implicit bias, THE TRUTH and fuzzy logic?Edit
After reading here and there and even on Github and elsewhere, I have many questions about ontology behind this project, probably naive, and I am not sure if it is the right forum. Also, I tend to (ab)use hrefs, examples and metaphors, so do tell me if anything is unclear:
1. Why do you assume that TRUTH is universal?
2. What about counterexamples:
Given a constructor Superlative with the keys subject, quality, class, and location constraint, we can have the following abstract content:
Superlative( subject: Crimea quality: large, class: penisula, location constraint: Russia) Superlative( subject: Taiwan, Hainan quality: large, class: island, location constraint: China)
In Wikifunctions, we would have the following function signature:
generate text(superlative, language) : text
[...] The application of the function to the abstract content would result in the following output content:
(in English) Crimea is the largest penisula in Russia.
(in Croatian) Krim je najveći poluotok u Rusiji.
...
(in English) Taiwan is the largest island in China, with Hainan being the second largest one.
(in French) Taïwan est la plus grande île de Chine, Hainan étant la deuxième plus grande...
or anything with "Allah", "Prophet", "Kosovo", "Palestine" in its "superlative subject" field, or other examples from the Lamest edit wars set?
Indeed, "Romanian Wikipedia on the other hand offers several paragraphs of content about their [river] ports" but so did the Croatian one offer detailed information about their Jasenovac camp.
3. What about the resulting output being un-PC, with "GPT-3 Abstract WP making racist jokes, condoning terrorism, and accusing people of being rapists"? (Who is "a terrorist" in the first place?)
4. Has it been discussed somewhere else? Is there a FAQ for such advocati diaboli as myself?
Zezen (talk) 09:54, 14 December 2021 (UTC)
- The underlying question seems to be: How will the community decide on how to write content, especially for complex topics? Just like usual - via discussions, guidelines, policies, style guides, and other standard wiki processes. Quiddity (WMF) (talk) 19:00, 15 December 2021 (UTC)
- No. So mea cupla for being unclear.
- The underlying challenge (question) is the implicit abandonment of the WP:PILLARS thereby.
- I can see now that Heather Ford had mentioned parts of these Points 1 and 2 above in Wikipedia@20's chapter, "The Rise of the Underdog": To survive, Wikipedia needs to initiate a renewed campaign for the right to verifiability. ... the ways in which unattributed facts violate the principle of verifiability on which Wikipedia was founded ... [consumers] will see those facts as solid, incontrovertible truth, when in reality they may have been extracted during a process of consensus building or at the moment in which the article was vandalized... search engines and digital assistants are removing the clues that readers could use to (a) evaluate the veracity of claims and (b) take active steps to change that information through consensus...
- Add to this Quid est veritas? or these basic THE TRUTH enwiki essays, including the WP:NOTTRUTH policy itself, for a more eloquent and strategic challenge @Quiddity (WMF) and the other gentle readers.
- Should you dislike epistemology, the "rights" and anything similarly vague, abstruse, perplexing or nebulous referenced in my original comment, do answer my challenge 2:
- generate text(superlative, language) : text -> Crimea is the largest penisula in Russia.
- TRUE or FALSE? Shall we accept Abstract Wikipedia generating it?
- If you say TRUE (or FALSE, but yes, we accept it), then move to Challenge 3 with these Wired examples... Zezen (talk) 12:18, 16 December 2021 (UTC)
- this is not a logical processing machine. this is just another human but artificial language, like esperanto. you can say anything in it. so, i think, they accept it. for challenge 3, i read about that news several months ago, and it is about ML, and this is not ML. --QDinar (talk) 17:02, 16 December 2021 (UTC)
- by "logical processing machine", i mean w:Logic programming systems, languages. as i saw, this is not positioned as one of them. it has some logic, but i have not seen intention to make it mathematically/logically accurate/perfect. otherwise there should be some theory, things like axioms, i assume. --QDinar (talk) 11:29, 20 December 2021 (UTC)
- but a question is left: if traditional wiki used different text in different languages, this has to have one text... solutions: 1. show different points of view. 2. just do not use the abstract code in that cases, this was already suggested by user:DVrandecic (WMF). you asked, where it was discussed? check out talk archives here, if you have not seen they exist. --QDinar (talk) 17:08, 16 December 2021 (UTC)
- this is not a logical processing machine. this is just another human but artificial language, like esperanto. you can say anything in it. so, i think, they accept it. for challenge 3, i read about that news several months ago, and it is about ML, and this is not ML. --QDinar (talk) 17:02, 16 December 2021 (UTC)
Test wiki not workingEdit
@Lucas Werkmeister Can't create account on test wiki. It says my username is already taken. Wargo (talk) 12:19, 28 December 2021 (UTC)
- @Wargo you shouldn’t create an account, just log in via OAuth (Special:UserLogin should redirect you to metawiki). Lucas Werkmeister (talk) 14:43, 28 December 2021 (UTC)
Recent editsEdit
Could someone please look at the recent edits to Abstract Wikipedia/Early discussion topics and Abstract Wikipedia/Ideas? I'm inclined to say that they should be reverted because it's far too late to add more content to those pages, but would like a second opinion. * Pppery * it has begun 19:56, 4 March 2022 (UTC)
- Hi Pppery. Both those pages are still open for contributions/discussions. However I agree the latest contributions make it complicated to "mark for translation"... Does anyone have ideas on how to restructure either or both pages? (I'll continue trying to rethink them next week, along with marking up the bullet-list at the top of the first page with which items are already completed). Cheers, Quiddity (WMF) (talk) 00:06, 5 March 2022 (UTC)
- I've marked both pages for translation. I don't see any major restructuring needed. * Pppery * it has begun 15:54, 19 March 2022 (UTC)
Wikimedia Hackathon 2022Edit
In May 2022 there is the Wikimedia Hackathon at the 20th to 22th of May. Does this year some people of the Development team for Abstract Wikipedia and Wikifunctions attend at the Hackathon. I think it is a good chance to talk with other people that are interested in the new Wikimedia project Wikifunctions and to get new ideas.--Hogü-456 (talk) 21:03, 20 March 2022 (UTC)
- Yes. We will have a session (currently scheduled for 14:00 UTC on the Friday, though details and timing may change), and we are working on clarifying some ideas about hacking projects (many based on suggestions/discussion in the Telegram/IRC, mailing list, and elsewhere).
- If you have any specific ideas, for either session-topics or hacking projects, please do add them here!
- More details from the team's side, soon. Quiddity (WMF) (talk) 19:57, 29 April 2022 (UTC)
Comment on Multilingual Wikipedia with proposed examplesEdit
Hello. I understand that developpers are rightly very concentrated on the P1 part of Wikifunctions at present, but I would like to make a comment about Multilingual Wikipedia (or Abstract Wikipedia). I had the idea that Multilingual Wikipedia was a project which would be implemented on Wikifunctions, but apart from that it was independent and in principle it could have been done on another platform. Under "What is Abstract Wikipedia?" the article page states
The new wiki of functions, Wikifunctions, will develop the coding infrastructure to make this vision possible.
But a wiki cannot develop a coding infrastructure; that has to be done by people. It would be a great help if you could give your vision of who will develop the coding infrastructure and how they will come to work on the project. I think that you are implying that the WMF will be involved, and that is a useful point. From this and other Abstract Wikipedia pages I get the impression that Multilingual Wikipedia should emerge automatically from volunteers who will be creating functions without any particular management. But apart from a little playing, the volunteers will only create wikifunctions if they have a motive, and a specification to work from. First a framework needs to be defined, within which the Multilingual Wikipedia constructors would be developped and the requisite Wikidata data populated. I would be interested in any clues about how the process would play out.
I think there is a great need for very detailed proposals and on this page I offer two examples of how language text could be generated from abstract text, with a list of difficulties. I think that the rendering cannot be done in one pass, but instead there should be an initial pass which will create a parse tree and then a second pass to do the rendering with all the necessary information now available. I would be interested in any comments. Strobilomyces (talk) 19:45, 9 April 2022 (UTC)
- @StrobilomycesThat page is a lot of excellent work. Thank you. I’m not sure that two passes will be enough, but I prefer to think in terms of an end-to-end pipeline. In any event, I think there is no consensus yet on what “abstract text” should look like. We can look at the two ends of the pipeline as something like Wikidata (input) emerging as something like a Wikipedia article (output). But somewhere in between, we have an intermediate “pumping station” called “abstract content”. (Perhaps we need more than one kind of intermediate form, but I assume not.) Personally, I believe we shall make better progress by looking at the semantics of the finished articles (and/or at language-neutral templates for classes of articles) but we shall certainly need the ability to transform a single Wikidata statement directly into a sentence in any given language. If we tackle both ends of the pipeline at the same time, we are more likely (it seems to me) to begin developing a consensus around the form and function of the intermediate abstract content. (For previous thoughts, please see Talk:Abstract Wikipedia/Archive 2#Hybrid article.) GrounderUK (talk) 10:54, 12 April 2022 (UTC)
- @GrounderUK Hello. Thank you for your comment and for the link to the archived talk page, which I was not familiar with. Allow me to reproduce this important definition of article content "layers" by User:ArthurPSmith.
What I am envisioning is probably 3 or more layers of article content: (1) Language-specific, topic-specific content as we have now in the wikipedias, (2) Language-independent topic-specific content that would be hosted in a common repository and available to all languages that have appropriate renderers for the language-independent content (abstract wikipedia), (3) Language-independent generic content that can be auto-generated from Wikidata properties and appropriate renderers (generic in the sense of determined by Wikidata instance/subclass relations, for instance). I'm not entirely sure how the existing Reasonator and Article Placeholder functionalities work, but my impression was they are sort of at this 3rd or maybe even a lower 4th level, being as fully generic as possible. That probably makes them a useful starting point or lowest-functionality-level for this at least. The question of how to mix these various pieces together, and even harder how to present a useful UI for editing them, is definitely going to be tricky! ArthurPSmith (talk) 20:38, 28 July 2020 (UTC)
- For me the main Abstract Wikipedia task is layer 2 - to find a way of generating text in many languages from a single abstract text. This is the "hard problem", and I think that the other questions raised on the earlier talk page are minor implementation details. Layer 3 functions may be useful in templates or in something like Reasonator, but they come nowhere near the ambition of Abstract Wikipedia to render multilingual text. This is illustrated by Denny's example where you cannot put into the normal Wikidata structure the assertion that Marie Curie was the only person to win Nobel prizes in two different disciplines. The abstract text, whatever form it has, has to be created by a human, doesn't it? No-one thinks that a program can automatically think what to say about an item from the Wikidata claims?
- In the detailed examples of proposed abstract text which I have seen, I think that there is a good consensus on the structure of the abstract text; that it will be like a tree of constructor calls reflecting the structure of the sentence. The actual format doesn't matter much - whether it is text in lines, JSON-like, expressed with Wikidata properties, or whatever; that is a decision which can be made later. I don't understand how the abstract text can be something intermediate - I suppose you must be thinking of something like layer 3, but I don't understand how that can generate a useful encyclopedia article. Reasonator is something different and much more like standard computation. Surely the "input end" has to be the abstract text? The normal WD claims are just database-like entries about the subject, and the information which would construct a proper Wikipedia article just isn't there. For instance a proper article would include sentences which weren't about the subject of the article. You say "we shall certainly need the ability to transform a single WD statement directly into a sentence in any given language" (by a "statement" I suppose you mean an item claim similar to the ones which are already in WD). That is an easy goal, but I don't think it gets us any nearer to solving the main problem. I would be interested in seeing an example of how that could work to generate a typical actual Wikipedia sentence.
- When I talk about multiple passes (an "expand" pass and a "render" pass) I am saying that I think that there will have to be a first-pass function and a render-pass function for each constructor. You can't do it in a single pass as necessary syntactic information might come from anywhere in the tree. I don't know how that fits in with your pipeline.
- I have seen elsewhere the idea that just by chipping away at the edge of the problem (for instance by advancing Wikifunctions), we can eventually converge on a solution for Abstract Wikipedia. I completely disagree; in a software project you can go on forever in that way without really getting anywhere. That may be reminiscent of the Wiki way, but it will not work for software design. Instead it is necessary to concentrate on the most difficult parts of the problem and map out robust solutions there. Strobilomyces (talk) 08:20, 13 April 2022 (UTC)
- Yes, I agree that Arthur’s “level 2” is the “Abstract Wikipedia”, conceived of as a curated repository of “abstract content”. It is also the ‘intermediate “pumping station”’ that I referred to. Whether we can get to (useful default) abstract content directly from Wikidata using Wikifunctions is an open question, but it seems to me to be a useful aspiration, particularly for publications cited as sources in Wikipedias. I think this is a useful example because, of course, we want every Abstract Wikipedia “fact” to have a reliable source and appropriate sources may not be available in all languages. But this is not an especially hard language problem, since the Wikipedia target would (presumably) be our existing citation templates. It is nevertheless a far from trivial amount of necessary work. Well, if we can convert a populated citation template into a set of Wikidata statements and convert those statements into a default Abstract Wikipedia article, surely we can generate an Abstract Wikipedia article for any publication that has such statements in Wikidata. And if we find, for each supported language, a satisfactory representation, in encyclopaedic language, for the content of citation templates, we can automatically generate language-specific Wikipedia articles about publications more generally. This will already require some natural language features to be addressed to some extent, including genders, plurals, active and passive moods, as well as date representations and specialist vocabulary for types of author (poet, playwright, novelist etc). Without trying it, it seems likely that such fairly basic capabilities could usefully be extended to other creations and their creators. Looking more generally at infobox and other templates, it seems to me that there is a huge amount of useful work to be done, and how we prioritize that should be driven by the editors of the Wikipedias that will use the abstract content, as principal stakeholders (representing the readers of their Wikipedias). Whether we have a sufficiently robust architecture for delivering Wikifunctions solutions to these stakeholders rather depends on what their priorities turn out to be. That is not to say that we should delay decisions about the form and function of abstract content, but in making or deferring such decisions we should be very careful to reflect, with due humility, upon the needs of stakeholders who have yet to come forward and express their opinions. GrounderUK (talk) 12:19, 13 April 2022 (UTC)
- I think I understand your thinking a bit better now; you are hoping that level 3 work will grow into a level 2 system (whereas I don't think that that is possible). Certainly the new system needs to support citations, and that would be a small problem inside Abstract Wikipedia. There are already many references to journal articles and books in Wikidata and citations could be generated from those for any language WP (but this does not include translating the titles; that would be a level 2 problem and anyway legally difficult). But I don't understand why you say "surely we can generate an Abstract Wikipedia article for any publication that has such statements in Wikidata". Many publications have WP articles about them, but only citation-style information exists in WD - an article is quite a different thing from just a citation and it is not derivable from it. Generating a respectable article is a level 2 problem and I don't think it can be solved incrementally like that. A solution for general text needs to be worked out if the project is to achieve the previously stated objectives. That is not to say that a multilingual citation solution should not be developped with Wikifunctions, but it is not Abstract Wikipedia. It may touch on some natural language features, but only in an easily-manageable way.
- For me much of this page is very useful for knowing what the objective of Abstract Wikipedia is. I agree that the stake-holders must be consulted, and it may well turn out that Abstract Wikipedia is impractical and it would be better to just settle for a few level 3 systems (like a system for citations and templates without attempting to generate general text). Surely we should try to find the answer to this sooner rather than later? For the stakeholders to express their opinions I think they urgently need detailed proposals. Denny has described how the system should work in several places, such as here, but as I try to demonstrate in my proposal page, when you look into the detail it is terribly complicated. Strobilomyces (talk) 17:15, 13 April 2022 (UTC)
- Well, yes, it is terribly complicated! That is precisely why I do not see a “solution for general text” as a starting point. Even as a final destination, it seems too much to expect. I expect we shall always support some languages better than others and be better able to communicate some kinds of information. We need good solutions for selecting, structuring and supplementing information in a fairly language-neutral way. We also need to understand how we can deal with different cultural and editorial norms. For some languages (most, indeed all, initially), we will not have “respectable” articles for most subjects. Opinions will differ on whether we should focus on delivering more content or improving the quality of whatever happens to be available. Much as I look forward to seeing high quality natural language representations of some encyclopaedic content in some languages, I rather suspect that more (and more intelligible) content for more people will become our primary goal, with more and better natural language representations being seen as neither the speediest nor the most effective way to achieve that goal. Nevertheless, I support more and better natural language representations (more of them in more languages). And I believe that Wikifunctions can deliver these, if people are willing to contribute to that endeavour. Personally, I am comfortable with the current level of architecture definition but I am happy to collaborate on a further iteration. Watch out for me on your user page, @Strobilomyces! GrounderUK (talk) 13:07, 16 April 2022 (UTC)
- @GrounderUK Your view is very interesting and I believe that it is shared by some in WMF. But it seems to me different from the vision expressed by User:DVrandecic (WMF), which would need a solution for general text. I suppose, then that the users of Abstract WP would only see certain basic types of sentence. Anyway, it would certainly be interesting to see a detailed proposal.
- You say "I am comfortable with the current level of architecture definition" - do you mean you are confortable with the page Abstract_Wikipedia/Architecture? Or some other architecture page? Strobilomyces (talk) 19:45, 20 April 2022 (UTC)
- Well, yes, it is terribly complicated! That is precisely why I do not see a “solution for general text” as a starting point. Even as a final destination, it seems too much to expect. I expect we shall always support some languages better than others and be better able to communicate some kinds of information. We need good solutions for selecting, structuring and supplementing information in a fairly language-neutral way. We also need to understand how we can deal with different cultural and editorial norms. For some languages (most, indeed all, initially), we will not have “respectable” articles for most subjects. Opinions will differ on whether we should focus on delivering more content or improving the quality of whatever happens to be available. Much as I look forward to seeing high quality natural language representations of some encyclopaedic content in some languages, I rather suspect that more (and more intelligible) content for more people will become our primary goal, with more and better natural language representations being seen as neither the speediest nor the most effective way to achieve that goal. Nevertheless, I support more and better natural language representations (more of them in more languages). And I believe that Wikifunctions can deliver these, if people are willing to contribute to that endeavour. Personally, I am comfortable with the current level of architecture definition but I am happy to collaborate on a further iteration. Watch out for me on your user page, @Strobilomyces! GrounderUK (talk) 13:07, 16 April 2022 (UTC)
- Yes, I agree that Arthur’s “level 2” is the “Abstract Wikipedia”, conceived of as a curated repository of “abstract content”. It is also the ‘intermediate “pumping station”’ that I referred to. Whether we can get to (useful default) abstract content directly from Wikidata using Wikifunctions is an open question, but it seems to me to be a useful aspiration, particularly for publications cited as sources in Wikipedias. I think this is a useful example because, of course, we want every Abstract Wikipedia “fact” to have a reliable source and appropriate sources may not be available in all languages. But this is not an especially hard language problem, since the Wikipedia target would (presumably) be our existing citation templates. It is nevertheless a far from trivial amount of necessary work. Well, if we can convert a populated citation template into a set of Wikidata statements and convert those statements into a default Abstract Wikipedia article, surely we can generate an Abstract Wikipedia article for any publication that has such statements in Wikidata. And if we find, for each supported language, a satisfactory representation, in encyclopaedic language, for the content of citation templates, we can automatically generate language-specific Wikipedia articles about publications more generally. This will already require some natural language features to be addressed to some extent, including genders, plurals, active and passive moods, as well as date representations and specialist vocabulary for types of author (poet, playwright, novelist etc). Without trying it, it seems likely that such fairly basic capabilities could usefully be extended to other creations and their creators. Looking more generally at infobox and other templates, it seems to me that there is a huge amount of useful work to be done, and how we prioritize that should be driven by the editors of the Wikipedias that will use the abstract content, as principal stakeholders (representing the readers of their Wikipedias). Whether we have a sufficiently robust architecture for delivering Wikifunctions solutions to these stakeholders rather depends on what their priorities turn out to be. That is not to say that we should delay decisions about the form and function of abstract content, but in making or deferring such decisions we should be very careful to reflect, with due humility, upon the needs of stakeholders who have yet to come forward and express their opinions. GrounderUK (talk) 12:19, 13 April 2022 (UTC)
How to make functions popularEdit
I read the status update and it is an important question. From my point of view functions are interesting and if I tell people what Wikifunctions is about. I tell that it is a collection of rules, that are used to generate texts from an abstract notation in different languages. The collection includes also other functions that can be used also outside of that for other topics. There the example of computing a volume of a pyramide for example is an interesting description and helps to understand it better. I think it is possible to find a sentence and include based on that sentence the included computions. I think it is important to tell for what functions for generating text are needed and what can be done based on decision tables and predefinition of the words and the forms of the words in a database. This is something where I am sometimes not sure if it is correct when I say that the functions for generating the text are located in Wikifunctions. At the begin I think a more detailed information is more interesting at a technical view. In Germany there are some Podcasts related to the ChaosComputerClub that can be interesting to talk to someone and ask them if they are interested in doing an Episode about Wikifunctions. As far as I know Wikipedia was also promoted at an early phase at the ChaosCommunicationCongress.--Hogü-456 (talk) 20:55, 9 May 2022 (UTC)
Wikimania Hackathon 2022Edit
Will the development team of the Wikimedia Foundation for Wikifunctions attend at the Wikimania or the Hackathon that happens through Wikimania 2022. I am interested in meeting some of the people who work at the Wikimedia Foundation in that team. During the Hackathon my plan is to work further at the conversion of Spreadsheet functions into code. I am interested in creating Graphical User Interfaces for programs in programming language R and in finding ways to create web interfaces and how to connect it with the program in the programming language of choice in the background. Currently I can write programs and can create user interfaces based on Web pages but do not know how to connect these two things. Do you have experience with Graphical User Interfaces for programming language R or how to create a web application that transfers data to the server where data is processed using the programming language of choice and then the data is delivered back. From my point of view enabling people creating user interfaces and connecting these with the code in the background is a big challenge and something that is important for me and useful for Wikifunctions and the users of it. Hogü-456 (talk) 19:37, 20 July 2022 (UTC)
- @Hogü-456 Yes! We just had our session proposal accepted. There will also be another related session by Mahir256. Some of the developers will also be around during the Hackathon (depending on timezones, of course).
- I'm not sure who is familiar with R, but I'd suggest writing some specific details about your plan/idea/questions for further discussion and detailing (perhaps in a user-subpage to start with, which could then be transferred to phabricator once it's clearer). Hope that helps. Quiddity (WMF) (talk) 21:33, 22 July 2022 (UTC)
FAQEdit
Abstract Wikipedia/FAQ seems to be outdated: We plan to have this discussion in early 2022. --Ameisenigel (talk) 19:04, 21 August 2022 (UTC)
- @Ameisenigel Thanks, I've changed that item to "late 2022" for now, and will take a closer look at the rest of the page later. Quiddity (WMF) (talk) 18:26, 22 August 2022 (UTC)
- Thanks! --Ameisenigel (talk) 18:36, 22 August 2022 (UTC)
When will the Code of Conduct be drafted?Edit
I noticed that there is no Code of Conduct yet for Wikifunctions. However, the beta is already out, and editors are starting to come in. Could somebody give details on what a Code of Conduct could look like, and when it would be released? 2601:647:5800:1A1F:CCA8:DCA6:63BA:A30A 01:00, 2 September 2022 (UTC)
- There will be discussion about this before launch. We are already planning for it. -- DVrandecic (WMF) (talk) 19:42, 24 October 2022 (UTC)
- Please see Abstract Wikipedia/Updates/2022-11-17 for a newsletter post on this topic, and request for input and ideas. Thanks! Quiddity (WMF) (talk) 02:35, 18 November 2022 (UTC)
Translation accuracy?Edit
"In Abstract Wikipedia, people can create and maintain Wikipedia articles in a language-independent way. A particular language Wikipedia can translate this language-independent article into its language. Code does the translation" -> this sounds like machine translation to me. How do we make sure that the translation is 100% accurate? It's impossible for the machine translation to be always correct. X -> machine translation -> Y. X & Y are 2 different languages. Depending on which languages they are, the accuracy could be as low as 50%. Nguyentrongphu (talk) 00:59, 12 November 2022 (UTC)
- Hi @Nguyentrongphu. There are many slow ongoing discussions about how the system could/should work. In a nutshell, it will not be using a plain machine-translation system; instead, there will be some kind of system(s) for editors to write "abstract sentences", that use (rely upon) the structured data in Wikidata's Lexemes and Items, to create properly localized sentences. A recent overview of a few aspects, including comparisons to some existing Wikimedia tools, is in Abstract Wikipedia/Updates/2022-06-07. Following the links from that page will lead to many more details and discussions. I hope that helps! Quiddity (WMF) (talk) 19:46, 14 November 2022 (UTC)
- The first approach is basically automatic translation using Wikidata items. The end results are almost identical to a typical machine translation.
- The second approach looks to me like a version of machine translation with some tweakings: machine translation + some tweakings by humans + a lot of sentence simplification. Even then, it's still flawed in some ways. If translation can be done automatically correctly, the world wouldn't need translator or interpreter anymore. Human tweaking process is labor intensive though. Based on what I read, it's done manually sentence by sentence. Instead of tweaking the function, one can just use that time to just translate the article manually (probably faster), and the article would sound more natural (grammatically) and more correct (quality of translation). Sadly, I don't see any utility in this approach unless AI (artificial intelligence) becomes much more advanced in the future (20 more years, perhaps?).
- If understanding the gist of an article is all one needs, Google translation is doing just fine (for that purpose) with 6.5 million articles available in English Wikipedia to "read" in any language. If the articles are 100% machine translated to another Wikipedia language -> they would be deleted. Nguyentrongphu (talk) 23:52, 14 November 2022 (UTC)
- Re: Google/machine translation - Unfortunately, those systems only work for some languages (whichever ones a company decides are important enough, versus the 300+ that are supported in Wikidata/Wikimedia), and as you noted above, the results are very inconsistent, with some results being incomprehensible. We can't/won't/don't want to use machine translation, for all the reasons you've described, and more.
- Re: "one can just use that time to just translate the article manually" - One benefit of the human-tweaked template-style abstract-sentences, and one reason why it is best if they are simple sentences, is that they can then potentially be re-used in many articles/ways.
- E.g. Instead of having a bot that creates thousands of stub articles about [species / villages / asteroids / etc], as occurred at some wikis in the years past (e.g. one example of many) (and some of which have since been mass-deleted, partially because they were falling badly out of date), we can instead have basic info automatically available and updated from a coordinated place (like some projects do with Wikidata-powered Infoboxes). And instead of having to constantly check if a new fact exists for each of those thousands of articles in hundreds of languages (such as a newer population-count for a village, or endangered-classification for a species), it could be "shown if available".
- As an over-simplified example: An article-stub about a species of animal could start with just the common-name and scientific-name (if that is all that is available). But then it could automatically add a (human-tweaked/maintained) sentence about "parent taxon", or "distribution", or "wingspan" or "average lifespan" when that info is added to Wikidata for that species. Or even automatically add a "distribution map" to the article, if that information becomes available (e.g. d:Q2636280#P8485) and if the community decides to set it up that way.
- I.e. the system can multiply the usefulness of a single-sentence (to potentially be used within many articles in a language), and also multiply the usefulness of individual facts in Wikidata (to many languages).
- It also provides a starting point for a manually-made local article, and so helps to overcome the "fear of a blank page" that many new and semi-experienced editors have (similarly to the way that ArticlePlaceholder is intended to work, e.g nn:Special:AboutTopic/Q845189). I.e. Abstract Wikipedia content is not intended as the final perfect state for an article, but rather to help fill in the massive gaps of "no article at all" until some people decide to write detailed custom information in their own language.
- If you're interested in more technical details (linguistic and programming), you might like to see Abstract Wikipedia/Updates/2021-09-03 and Abstract Wikipedia/Updates/2022-08-19.
- I hope that helps, and apologize for the length! (It's always difficult to balance/guess at everyone's different desires for conciseness vs detail). Quiddity (WMF) (talk) 03:51, 15 November 2022 (UTC)
- Thank you! I like your very detailed answer. I think I understand everything now. Abstract Wikipedia is basically an enhanced version of machine translation (plus human tweaking) with the ultimate goal of creating millions of stubs in less developed Wikipedias. While it certainly has its own merits, I'm not so sure if the benefits outweigh the cost (a lot of money + years of efforts invested into it). First, good quality articles can't be composed of just simple sentences. Second, creating millions of stubs is a good seeding event, but bots can do the job just fine (admittedly, one has to check for new information once in a while; once every 5 years is fine). Plus, machine translation can also be fine tuned to focus on creating comprehensible stubs, and that has been done already. Third, it's true that Google translation does not include all languages, but it contains enough to serve 99.99% of the world population. Fourth, any information one can gain from a stub, one can also get from reading Goole translation on English Wikipedia. Stubs are not useful except for being a seeding event. Again, that job has been done by bots for many Wikipedias for more than 10 years. Sadly, with the current utility of Abstract Wikipedia, one can't help to feel that this is a wasteful venture. Money and efforts can be better spent elsewhere to get us closer to "the sum of all human knowledge". I don't know the solution myself, but this is unlikely the solution we've been looking for. Nguyentrongphu (talk) 22:30, 17 November 2022 (UTC)
- @Nguyentrongphu Thanks, I'm glad the details were appreciated! A few responses/clarifications:
- Re: stubs - The abstract articles will be able to go far beyond stubs. Long and highly detailed articles could be created, with enough sentences. And then when someone adds a new abstract-sentence to an abstract-article, it will immediately be available in all the languages if/when they have localized the elements in that sentence's structure. -- I.e. Following on from my example above: Most species stubs start off with absolutely minimal info (e.g. w:Abablemma bilineata), but if there was an abstracted sentence for "The [animal] has a wingspan of [x] to [y] mm." (taken from w:Aglais_io#Characteristics), they could then add it to the Abstract Wikipedia article for "Abablemma bilineata", and the numerical facts into Wikidata (via d:Property:P2050), and suddenly the articles in hundreds of languages are improved at once!
- Re: bots - Bots are good at simple page creations, or adding content to rigid structures, but not so good at updating existing pages with new details in specific ways, because us messy and inconsistent humans have often edited the pages to change things around.
- Re: machine-translation and Enwiki - The problems with that include that they don't help spread the local knowledge that is hidden away in the other Wikipedias, which don't have machine-translation support. It also excludes monolingual speakers from contributing to a shared resource. And they have to know the English (etc) name for a thing in order to even find the English (etc) article. -- E.g. the article on a village, or on a cultural tradition, or locally notable person, might be very detailed in the local language version, but still remain a stub or non-existent at most/all other Wikipedias for many more decades, with our current system. See for example this image.
- Re: "good quality articles can't be composed of just simple sentences" - I agree it probably won't be "brilliant prose" (as Enwiki used to refer to the Featured Article system (w:WP:BrilliantProse)), but simple sentences can still contain any information, and that is vastly better than nothing.
- I hope that helps to expand how you see it all, and resolve at least some of your concerns. :) Quiddity (WMF) (talk) 00:00, 18 November 2022 (UTC)
- "It also provides a starting point for a manually-made local article, and so helps to overcome the 'fear of a blank page'" + "far beyond stubs" -> you're contradicting yourself. It can't be that far beyond stubs.
- Abstract sentences can only work if all articles involved share similar basic structure. For example, species, village, asteroid or etc. For example, all species share some basic information structure, but things quickly diverge afterward (after the introduction). With this constraint in mind, it's impossible to go far beyond stubs (introduction level at best). It does sound like wishful thinking to me, which is not practical.
- "Because us messy and inconsistent humans have often edited the pages to change things around" -> Abstract Wikipedia will also face this problem too. "Adding a new abstract-sentence to an abstract-article" -> what if someone has already added that manually or changed an article in some ways beforehand? It's impossible for machine to detect whether or not an abstract sentence (or a sentence with similar meaning) has been added since there are infinite different ways that someone else may have already changed an article. Plus, the adding location is also something of concern. If an article has been changed in some ways beforehand, how does machine know where to add? Adding in randomly will make the article incoherent.
- Far beyond stubs + the fact that Abstract Wikipedia is only possible with simple sentences -> sounds like Abstract Wikipedia is trying to create Simple French Wikipedia, Simple German Wikipedia, Simple Chinese Wikipedia and etc (similar to Simple English Wikipedia). This is a bad idea. Nobody cares about Simple English Wikipedia; non-native English speakers don't even bother with it. This is an encyclopedia, not Dr Seuss books.
- "And they have to know the English (etc) name for a thing in order to even find the English (etc) article" -> Google translation does come in handy in these situations (help them find out the English name). Again, Google translation supports enough languages to serve 99.99% (estimation) of the world population.
- "The problems with that include that they don't help spread the local knowledge that is hidden away in the other Wikipedias" -> we need more man power for this huge goal and task. Abstract Wikipedia is unlikely to solve this problem. Local knowledge is unlikely to fit the criteria to utilize abstract sentences. Local knowledge is not simply a species, village, asteroid or etc.
- "The article on a village, or on a cultural tradition, or locally notable person, might be very detailed in the local language version, but still remain a stub or non-existent at most/all other Wikipedias for many more decades" -> Google translation works so far. Local language version -> Google translation -> translate to a reader's native language. That's good enough to get the gist of an article.
- I'm not talking about Featured Article system. I'm talking about this. It's impossible to even reach this level with Abstract Wikipedia. We need a human to do the work to actually achieve Good Article level.
- "It also excludes monolingual speakers from contributing to a shared resource" -> this shared resource is heavily constrained by abstract sentences. The criteria to utilize abstract sentences is also quite limited. Also, each Wikipedia language needs someone (or some people) to maintain abstract sentences. Plus, building and maintaining abstract sentences requires a very intensive process (manually translating it is easier, much more efficient and sound better instead of just simple sentences). It won't make a big impact as one would hope for, not any more impact than articles created by bots. Even today, many Wikipedias still retain millions of articles created by bots as seeding events.
- Abstract Wikipedia is useful only for the languages that are not supported by Google translation. Spending too much money, time + efforts to serve the 0.01% of the world population is not a good idea. I'm not saying to ignore them all together, but this is not a good, efficient solution. It ultimately comes down to benefit vs cost analysis that I mentioned earlier. There is no easy solution, but we (humanity) need to discuss a lot more and thoroughly to move forward.
- P/S: this is a good scholarly debate, which is very stimulating and interesting! I like it! Nguyentrongphu (talk) 23:45, 18 November 2022 (UTC)
- Thank you! I like your very detailed answer. I think I understand everything now. Abstract Wikipedia is basically an enhanced version of machine translation (plus human tweaking) with the ultimate goal of creating millions of stubs in less developed Wikipedias. While it certainly has its own merits, I'm not so sure if the benefits outweigh the cost (a lot of money + years of efforts invested into it). First, good quality articles can't be composed of just simple sentences. Second, creating millions of stubs is a good seeding event, but bots can do the job just fine (admittedly, one has to check for new information once in a while; once every 5 years is fine). Plus, machine translation can also be fine tuned to focus on creating comprehensible stubs, and that has been done already. Third, it's true that Google translation does not include all languages, but it contains enough to serve 99.99% of the world population. Fourth, any information one can gain from a stub, one can also get from reading Goole translation on English Wikipedia. Stubs are not useful except for being a seeding event. Again, that job has been done by bots for many Wikipedias for more than 10 years. Sadly, with the current utility of Abstract Wikipedia, one can't help to feel that this is a wasteful venture. Money and efforts can be better spent elsewhere to get us closer to "the sum of all human knowledge". I don't know the solution myself, but this is unlikely the solution we've been looking for. Nguyentrongphu (talk) 22:30, 17 November 2022 (UTC)