Community Wishlist Survey 2021/Editing/Spellchecker

Spellchecker

  • Problem: One of the most important aspects copy-editing workflow for users is finding and fixing spelling mistakes and typos.
  • Who would benefit: Editors who would have less frustration in their work and readers who would read a higher quality articles.
  • Proposed solution: There is something in Persian Wikipedia which I would expect can be used as inspiration and turn into an extension. That tool is called Check Dictation. When an editor who enabled the gadget sees an articles, on top of the page, they see list of mistakes and inside the article they get color coded. It actually has different colors for different issues: Typos, bad wikitext, informal words, links to disambig pages, and many more types. Here's an example File:Rechtschreibung-fawiki.png. You can also define per-article list of okay words an example. The code for the gadget can be found in here but it's highly hard-coded to fawiki and it can be improved drastically.
  • More comments:
  • Phabricator tickets:
  • Proposer: Amir (talk) 18:31, 16 November 2020 (UTC)Reply[reply]

Discussion

I think most operation systems and browsers support spellchecking on their site so this is not needed in MediaWiki. --GPSLeo (talk) 18:40, 16 November 2020 (UTC)Reply[reply]

@GPSLeo You wouldn't see the typos unless you go to edit mode. How to find them in articles is not doable with browsers and operating systems. Amir (talk) 19:33, 16 November 2020 (UTC)Reply[reply]
Ah, you want a tool to find mistakes in articles just while reading not for editing. I did not got this. Now I understand and think this could be useful. --GPSLeo (talk) 21:23, 16 November 2020 (UTC)Reply[reply]
The Chrome spellcheck does not work for me when editing. Keepcalmandchill (talk) 03:45, 17 November 2020 (UTC)Reply[reply]

There is a similar user script for the MOS called en:User:Ebrahames/Advisor.js on EN.WP. I don't think I've seen a spelling gadget. I also tend to disagree that a spelling gadget is necessary. (Mis)Spellings can be context dependent. --Izno (talk) 21:55, 16 November 2020 (UTC)Reply[reply]

@Izno The spelling gadget would just highlight potential spelling mistakes. Even in the tool in fawiki, you can set highlights as false positive on per-article basis. Amir (talk) 03:33, 22 November 2020 (UTC)Reply[reply]

I usually just use Grammarly to check grammar (not sponsored). Félix An (talk) 02:27, 17 November 2020 (UTC)Reply[reply]

Would this also take regional variants of English into comparison? English Wikipedia articles can vary depending on regional relevance or by a "first-come first-serve" edit. Tenryuu (talk) 02:29, 17 November 2020 (UTC)Reply[reply]

English is not the only language with spelling variances, so good question. --Izno (talk) 18:08, 17 November 2020 (UTC)Reply[reply]

Note, that also in Wikisource are various variants of language, language of 100 years old work is different from todaylanguage, but it is also correct. THere should be some project-specific spellchecker, which allows local variants. JAn Dudík (talk) 14:09, 18 November 2020 (UTC)Reply[reply]

I think the points made by other users about language variation are good, but as long as the changes are not automated and a human is always involved that person should be able to recognize when a word was incorrectly marked as a misspelling and not act to fix it. For languages that have detailed Wiktionaries, they might be a good source to use for checking what is and isn't a recognized spelling. This orange links gadget has functionalities that also might relevant to this proposal. —The Editor's Apprentice (talk) 19:21, 20 November 2020 (UTC)Reply[reply]

@Ladsgroup: thanks for posting this. How does the Check Dictation tool work? Does it use some open-source Persian spellchecker? Or is it handmade with a list of common mispellings? I ask because the Growth team is building "structured tasks", which use machine learning to help newcomers find specific edits to make, e.g. adding wikilinks. Here are notes from a conversation about how to make spellchecking possible across languages, and we're thinking about whether it would have to be done language by language. -- MMiller (WMF) (talk) 17:19, 23 November 2020 (UTC)Reply[reply]

@MMiller (WMF) The code for it is w:fa:مدیاویکی:Gadget-CheckDictation.js and it seems it calls a service in the cloud VPS (I didn't write this gadget so I'm not 100% sure of its internals) but I assume it uses a unix library for spellchecking. As I said, it has an exception list for each page as well [1]
The fun thing is that this was originally was developed to find spelling mistakes but it grew to basically any sort of copy-editing issues from links to disambig pages, to unclosed links/templates, to much more. Amir (talk) 00:28, 24 November 2020 (UTC)Reply[reply]

I would support the idea, but in the context of a typographic checker, not just a spellchecker. It would check grammar, adjectives, orthography, etc. MarioSuperstar77 (talk) 21:06, 24 November 2020 (UTC)Reply[reply]

  • I'm merging a similar wish:
    • Problem: عربى: وجود مدقق لغوي داخلي للنصوص شبيه بما يقوم به برنامج word
    • Proposer: عمر الشامي (talk) 21:09, 22 November 2020 (UTC)

SGrabarczuk (WMF) (talk) 20:25, 3 December 2020 (UTC)Reply[reply]

A spellchecker and grammer-checker would be both be useful tools. They should be separate tools. The spellchecker should have the ability to set the English variety. I have encountered many articles that use several varieties and it would be useful tool to edit to the desired variety. User-duck (talk) 18:32, 8 December 2020 (UTC)Reply[reply]

English Wikipedia already has an active spellchecking project that finds spelling errors and a small number of manual of style violations in the latest database dump - see en:Wikipedia:Typo_Team/moss. We're currently doing this by making wiki pages full of lists and relying on editors to go through the lists. It's taking years to get to all the likely typos, and though we're catching up, of course more are added all the time. Any UI that increases automation of this task, either by interested volunteers working from lists or by capturing work done by folks who just happened to be reading the article, would be very helpful. We are slowly starting to advertise problematic cases to readers using tags in the articles themselves (see en:Template:Typo help inline). This sort of tag could be a hook for a little interactive UI that resolves the spelling issue into a small number of bins (add to dictionary, proper noun, change to correct spelling, unsure). Or a reader-centric spell checker could find typos on its own without help from tags. Though there's something to be said for storing "not a typo" sorts of information in the article itself, so that if a different spelling or grammar checker comes by later, we won't duplicate work. As for dialect detection...many English Wikipedia articles also have templates declaring the preferred dialect, and in some cases the category membership associates an article with a specific country, too. But even without these things in most cases I think it's pretty easy to tell which dialect a page is mostly or completely written in. Wiktionary already knows which words go with which dialect, and we can simply count up the number that are unique to one or the other. Any reader's web browser's built-in spell checker is probably going to properly handle only their own dialect, and that's too cumbersome for most readers to change. (So it's helpful to build a new system that's smart enough to deal with multiple dialects.) -- Beland (talk) 08:37, 12 December 2020 (UTC)Reply[reply]

Voting

  •   Support Eridian314 (talk) 18:29, 8 December 2020 (UTC)Reply[reply]
  •   Support Per my comment, I support a typographic checker. A spellchecker is not enough when someone messes up their grammar and punctuation. MarioSuperstar77 (talk) 18:32, 8 December 2020 (UTC)Reply[reply]
  •   Support User-duck (talk) 18:32, 8 December 2020 (UTC)Reply[reply]
  •   Support Jax MN (talk) 18:40, 8 December 2020 (UTC)Reply[reply]
  •   Support Armaanikaks (talk) 18:48, 8 December 2020 (UTC)Reply[reply]
  •   Support Shoeper (talk) 18:50, 8 December 2020 (UTC)Reply[reply]
  •   Support Movses (talk) 19:09, 8 December 2020 (UTC)Reply[reply]
  •   Support --NGC 54 (talk / contribs) 19:19, 8 December 2020 (UTC)Reply[reply]
  •   Support DerFussi 19:53, 8 December 2020 (UTC)Reply[reply]
  •   Support CrystallineLeMonde (talk) 20:09, 8 December 2020 (UTC)Reply[reply]
  •   Support It's Been Emotional (talk) 20:28, 8 December 2020 (UTC)Reply[reply]
  •   Support Kisnaak (talk) 21:23, 8 December 2020 (UTC)Reply[reply]
  •   Support Pmau (talk) 21:24, 8 December 2020 (UTC)Reply[reply]
  •   Oppose Duplicates functionality that modern browsers have built-in anyways. If some tweaks are needed to get the browser spellcheckers to work on Wikipedia, though, I'd support that. {{u|Sdkb}}talk 05:56, 9 December 2020 (UTC)Reply[reply]
    @Sdkb As I said in the proposal, this is bigger than just helping while editing, it's to find such errors before even get to editing the articles. And it's not just spellchecking, also to find and highlight all sorts of issues including but not limited to links to disambig pages, etc. before editing the article. Amir (talk) 18:37, 9 December 2020 (UTC)Reply[reply]
  •   Support Munfarid1 (talk) 09:54, 9 December 2020 (UTC)Reply[reply]
  •   Support OrCer (talk) 10:40, 9 December 2020 (UTC)Reply[reply]
  •   Support Magol (talk) 11:33, 9 December 2020 (UTC)Reply[reply]
  •   Support 1Mmarek (talk) 11:52, 9 December 2020 (UTC)Reply[reply]
  •   Support Very practicable Jjkorff (talk) 13:42, 9 December 2020 (UTC)Reply[reply]
  •   Support Monozigote (talk) 17:26, 9 December 2020 (UTC)Reply[reply]
  •   Support Amir (talk) 18:31, 9 December 2020 (UTC)Reply[reply]
  •   Support Mardetanha talk 18:34, 9 December 2020 (UTC)Reply[reply]
  •   Support // Lollipoplollipoplollipop :: talk 05:44, 10 December 2020 (UTC)Reply[reply]
  •   Support Libcub (talk) 19:06, 10 December 2020 (UTC)Reply[reply]
  •   Oppose I see this as having potential to cause significant issues specifically where languages have regional usages that differ in both spelling and grammar. The simplest explanation is to see en.wp where we have us/uk/au/ca and host of other variations along with extensive discussions over "ize vs ise", or "Color vs Colour". As highlighted by Sdkb above the browsers already do a good job, creating a tool is just a waste of the limited available resources. Gnangarra (talk) 03:21, 11 December 2020 (UTC)Reply[reply]
  •   Support In Persian Wikipedia we use it and it is very good and convenient.--Cgl02 (talk) 15:51, 11 December 2020 (UTC)Reply[reply]
  •   Support StringRay (talk) 16:22, 11 December 2020 (UTC)Reply[reply]
  •   Oppose This proposal is currently open-ended and isn't defined/discrete enough. Its scope is sprawling, which would edge out lots of other improvements. Is this for grammar edits within the editor? Or for logged-in editors while reading? How would it need to be built to accommodate our multilingual community? I think the proof of concept would need to be built out further to understand exactly what is being implemented. I also see marginal benefit when browser-based tools already exist. czar 16:57, 11 December 2020 (UTC)Reply[reply]
  •   Support It is appropriately opne-ended. We need the basic functionalit, and there is no shortage of ways to use it DGG (talk) 01:17, 12 December 2020 (UTC)Reply[reply]
  •   Support taking an iterative approach to reduce the enormous backlog by increasing editor productivity and crowdsourcing more work to readers. "Frequently misspelled words" lists are easier than a full-blown spell checker but would provide user feedback to refine a UI. Letting projects configure regexps to flag common style issues would be easy and generate hundreds of thousands of fixes. Flagging too much all at once might be bad, so we might have time to build the next iteration while the current round of typos are still being fixed. Per-language and per-project tuning will be needed. Building a spell checker is not hard; building an in-house grammar checker is a multi-year project that could get added in a future wishlist. It would be interesting to try leapfrogging by hooking in an off-the-shelf spelling or grammar checker, but I would strongly prefer it be open source, maybe starting with easy-to-find typos (perhaps even just "frequently misspelled words" lists?) in easy-to-check languages, refining the UI, scaling up to broader coverage, and only later tackling harder problems like grammar. -- Beland (talk) 08:54, 12 December 2020 (UTC)Reply[reply]
  •   Support Rdyornot (talk) 22:27, 12 December 2020 (UTC)Reply[reply]
  •   Support Edgars2007 (talk) 10:27, 13 December 2020 (UTC)Reply[reply]
  •   Oppose This is a browser and OS feature, and is out-of-scope for MW.  — SMcCandlish ¢ >ʌⱷ҅ʌ<  06:16, 15 December 2020 (UTC)Reply[reply]
  •   Support Vacant0 (talk) 18:39, 15 December 2020 (UTC)Reply[reply]
  •   Support GiFontenelle (talk) 21:10, 15 December 2020 (UTC)Reply[reply]
  •   Support Bgrus22 (talk) 22:30, 15 December 2020 (UTC)Reply[reply]
  •   Oppose per SMcCandlish ◅ SebastianHelm (talk) 12:59, 16 December 2020 (UTC)Reply[reply]
  •   Support Katzmann83 (talk) 14:24, 16 December 2020 (UTC)Reply[reply]
  •   Support I am not sure about en.wiki, but in lt.wiki there is wikify tool, not only checking spelling, but also common formatting errors, such as no space after dash, double return lines, un-closed parentheses, wiki formatting errors, etc. This should be right next to preview button, or even work together and highlight potential errors straight away. Wolfmartyn (talk) 14:29, 16 December 2020 (UTC)Reply[reply]
  •   Support as long as it not just using American English spellings. Charlesjsharp (talk) 16:58, 16 December 2020 (UTC)Reply[reply]
  •   Strong oppose way out of scope, and seems at cross-purposes to its own motivation. You want people to be able to "fix problems" who can't themselves recognize those problems? You want to make things simple in an area as complex and context driven as texts throughout history and world? Don't try to flag things as broken to people who weren't motivated enough to use the spellcheckers they already have in browser or platform. You're encouraging bad behaviour. Shenme (talk) 07:24, 17 December 2020 (UTC)Reply[reply]
  •   Strong support really pertinent to Wikipedia! I've been using Grammarly for quite a while now, and of course, it came with loads of foibles and mistakes, both major and minor, that already exacerbates the current status quo of my erroneous editing. Hence, I strongly support your concept, and I hope it succeeds! JN Dela Cruz (talk) 16:36, 18 December 2020 (UTC)Reply[reply]
  •   Oppose Grüße vom Sänger ♫(Reden) 22:18, 18 December 2020 (UTC) Every decent browser does that, no need for a duplicate.Reply[reply]
  •   Support Neon Richards (talk) 23:11, 18 December 2020 (UTC)Reply[reply]
  •   Neutral, but — Why is this a proposal on meta-wiki? This is a mega-task with 300+ different gadgets. Or is this only supposed to be for English? Will it then be dropped onto all wikis where every word will henceforth be underlined in red? If not, do you expect Wikimedia to develop a Lingala or Cherokee spellchecker when Google doesnʼt even have it as a sorting algorithm, let alone a tranlate-option? SMcCandlishʼs "out-of-scope" is actually an understatement: youʼre biting off more than you can chew, or even choke on... Seb az86556 (talk) 22:26, 19 December 2020 (UTC)Reply[reply]
  •   Support must include Shagil Kannur (talk) 10:34, 20 December 2020 (UTC)Reply[reply]