Community Wishlist Survey 2021/Admins and patrollers/Improve anti spam mechanisms

Random proposal ►◄ Admins and patrollers The survey has concluded. Here are the results!

Improve anti spam mechanisms

Problem: "Wikimedia's captchas are fundamentally broken: they keep users away but allow robots in" (T158909). This was sadly true in 2017 and so it is in 2020 (T241921). While a proposal to enable better ones exists (T141490) its implementation is being delayed due to lack of testing/metrics. Year after year stewards and other volunteers spent most of their time blocking spambots and clearing after them. Thousands of spambots need to be manually blocked and cleaned up after by stewards and administrators every month that should've not been allowed to register from start. Moreover, this abusive spambot registration occurrs mostly on small and scarcely watched wikis. While the global SpamBlacklist and AbuseFilter are enormously helpful when it comes to prevent spam edits, we could do better than that and prevent that they register from the start. We need a long-term strategy that spares volunteers from this continuous hindrance. Existing proposals (in addition to those mentioned above): Revamp anti-spamming strategies and improve UX (2015), Automatically detect spambot registration using machine learning (2017) (#aicaptcha), enable MediaWiki extension StopForumSpam (Phabricator workboard · Beta Cluster deployment request (2017)). CheckUser shows that most spambots we detect register and edit using IPs or ranges blacklisted in one or more anti spam sites such as StopForumSpam and analogous DNSBL sites. Filtering out traffic originating from those would also help addressing this.
Who would benefit: All users.
Proposed solution: I guess it depends on how Community Tech would like to address this issue. My informal proposal (which may not be the path that the developers have in mind) would be as follows: (a) short term: Deploy improved FancyCaptcha, (b) medium term: enable MediaWiki extension StopForumSpam (passive mode: do not send data about our users, just receive the data they have about toxic IPs/networks), and (c) long term: AICaptcha.
More comments: —
Phabricator tickets: See above, but T125132 contains an accurate summary of the issue. Of interest: T125132 (restricted), T212667, T230304 (restricted).
Proposer: —MarcoAurelio (talk) 19:05, 16 November 2020 (UTC)[reply]

Discussion

I believe the Captcha work is already underway at Phab:T250227. Samwalton9 (talk) 00:01, 17 November 2020 (UTC)[reply]
hCaptcha might be another posibility, but I am not sure how many people would agree to use a third-party system. In any case, if it is decided that hCaptcha is the way to go, Community Tech could still get involved. —MarcoAurelio (talk) 18:49, 17 November 2020 (UTC)[reply]
I would probably not use Google's reCAPTCHA and instead use a simple in-house developed "type the letters you see" captcha. If the network the user is on blocks Google but not Wikipedia, they may not be able to edit. Félix An (talk) 02:29, 17 November 2020 (UTC)[reply]
Indeed. I am not proposing to use reCAPTCHA or other third-party system. User privacy is important to me. —MarcoAurelio (talk) 18:49, 17 November 2020 (UTC)[reply]
Query: how will this be usable by those who do not have "latinized" keyboards? (Examples: users from many Asian and Middle Eastern countries) Not inherently opposed, just not sure how this will work when many of the projects that would benefit the most are languages for which "latin" letters are not standard. I can see how producing a captcha of some sort that uses the same alphabet as the project may reduce spam, which is often in a different language than the project. Risker (talk) 18:46, 20 November 2020 (UTC)[reply]
I guess this needs to be analyzed and come to a solution that can be as inclusive as possible regardless of the cultural background. Instead of writting random words as it happens currently, users could be offered a mosaic of pictures from Commons and ask them to click on the ones that are cats/dogs/cars/rivers/etc. for example? Or maybe solve an easy math question (e.g. How much is 3 + 7?). I feel that, ideally, the solution would be the IACaptcha work started some time ago where without the need of captchas the system is able to identify and exclude non-humans from registering. That, I guess, can take some time; but maybe we can profit from this oportunity to restart that work and end this situation of cross-wiki volunteers having to deal with hundreds of spambots every day. I think that not doing anything in this subject is no longer an option. —MarcoAurelio (talk) 19:00, 28 November 2020 (UTC)[reply]
Please, be very careful. For me as a user Captachas are increasingly annoying. As a normal user which includes not using Windows one should NEVER see a captcha. --Shoeper (talk) 15:02, 23 November 2020 (UTC)[reply]
- I may be missing something here, Shoeper, but I don't see any way that you could never see a captcha unless you were being tracked across the internet. If you come to a new site which does not have any information about you, and they need to be sure you aren't a bot, a captcha is how they do it. Natural-language questions are also useful, but have to be changed regularly and are hard to make non-culturally-biassed. I'd suggest offering the user some simple editing tasks, suggested-edits style, might be an effective way to test. Pre-Google, re-CAPTCHA asked users to digitize a couple of words from scanned public-domain books, using overlap between users to validate. I understand this is too easy for modern bots anyway, and Google now asks everyone to train their proprietary driverless car algorithms (usign the same consensus-of-humans method to verify). But we have no shortage of bot-undoable tasks on the wikis. HLHJ (talk) 01:22, 24 November 2020 (UTC)[reply]
  - (People could be used to improve Wikidata, but it would probably require the user to research something leading to bad user experience.) The increasing use of captchas on the internet is worrying. Non technical users and women are underrepresented on Wikipedia. I fear adding a captcha is going to worsen that situation further. But tbh although I'd like to improve Wikipedia and try it from to time it never was a real pleasure. Looks like it is getting even worse.--Shoeper (talk) 18:13, 28 November 2020 (UTC)[reply]
    - Wikimedia already uses Captchas, but they're broken so bots easily get in, and some people struggle with them. Real people being blocked by broken captchas is certainly a concern for me and I'd like to find a solution that is both effective and inclusive. I think AICaptcha is the solution to this as it'd use no captcha at all. In the meanwhile if you are having problems with the Wikimedia captchas, you can ask for a global captcha-exemption permission. See details at this page. —MarcoAurelio (talk) 19:00, 28 November 2020 (UTC)[reply]
    - Basically the choice is between an unlimited/unrestricted flow of rubbish coming in (which is haunting away regulars due to the amount of work), restrict everything that looks like rubbish (which is stopping spambots but disallowing new genuine editors), or a 'click here' OK-box which is a nuisance (extra click, though most people won't bother too much) but basically an unlimited/unrestricted flow of rubbish coming in. A captcha is a path between: it is (when there is a good captcha) rather restricting on spam-bots (except for the really intelligent ones, which cost money to the spammer), and a nuisance for genuine editors (I myself do not care about the occasional captcha, it should however be reasonable; I agree that some (new) editors will be genuinely annoyed, but that will be way less than when you fully restrict or have to click away the OK-box every single time). --Dirk Beetstra ^{T C} (en: U, T) 10:47, 30 November 2020 (UTC)[reply]
It is important to note that this is, probably, not a matter of shifting a tradeoff between being better at keeping bots out and being better at allowing humans in - it is very likely that both can be improved at the same time. The core capability that we are missing here is some kind of analytics to evaluate captcha changes - there are easy options to tweak the captcha algorithm in a way that probably improves all the parameters, but we cannot actually measure those parameters currently, so we'd have to fly blind. That has kept those changes from being made for a long time. --Tgr (talk) 03:28, 13 December 2020 (UTC)[reply]
The only concern I have here is accessibility for blind users. As if Google's reCAPTCHA was blind-friendly in the first place anyway (their audio feature is broken)... But I wonder how the three suggested implementations would handle that. Pandakekok9 (talk) 03:04, 15 December 2020 (UTC)[reply]

Voting

Support Would be useful to have DannyS712 (talk) 18:01, 8 December 2020 (UTC)[reply]
Support Camouflaged Mirage (talk) 18:13, 8 December 2020 (UTC)[reply]
Support 𝐖𝐢𝐤𝐢𝐁𝐚𝐲𝐞𝐫 👤💬 18:14, 8 December 2020 (UTC)[reply]
Support Stryn (talk) 18:21, 8 December 2020 (UTC)[reply]
Support Waddie96 (talk) 18:27, 8 December 2020 (UTC)[reply]
Support Sgd. —Hasley 18:33, 8 December 2020 (UTC)[reply]
Support Rs chen 7754 19:05, 8 December 2020 (UTC)[reply]
Support Count Count (talk) 19:08, 8 December 2020 (UTC)[reply]
Support I support Community Tech focusing on anti-spam measures マイキ (talk) 19:14, 8 December 2020 (UTC)[reply]
Support ExtremPilotHD (talk) 19:34, 8 December 2020 (UTC)[reply]
Support ToBeFree (talk) 20:21, 8 December 2020 (UTC)[reply]
Support CrystallineLeMonde (talk) 20:27, 8 December 2020 (UTC)[reply]
Support Silver hr (talk) 20:34, 8 December 2020 (UTC)[reply]
Support MarioSuperstar77 (talk) 21:13, 8 December 2020 (UTC)[reply]
Support Martin Urbanec (talk) 22:14, 8 December 2020 (UTC)[reply]
Support Improving this is mandatory. Not sure about the best technical implementation. Dagelf (talk) 08:08, 9 December 2020 (UTC)[reply]
Support Ellif (talk) 08:51, 9 December 2020 (UTC)[reply]
Support Thomas Kinz (talk) 09:03, 9 December 2020 (UTC)[reply]
Support Wiki13 (talk) 10:08, 9 December 2020 (UTC)[reply]
Support Matěj Suchánek (talk) 11:15, 9 December 2020 (UTC)[reply]
Support Sakretsu (炸裂) 11:31, 9 December 2020 (UTC)[reply]
Support This seems needed on smaller wikis and wikis with fewer active editors, in order to preserve existing content on these projects and avoiding damaging their reputation. — Bilorv (talk) 11:34, 9 December 2020 (UTC)[reply]
Support ‐‐1997kB (talk) 12:40, 9 December 2020 (UTC)[reply]
Support Em-mustapha ^{User | talk} 15:08, 9 December 2020 (UTC)[reply]
Support Rafael ^{(stanglavine) msg} 18:34, 9 December 2020 (UTC)[reply]
Support — WinnerWolf99 ^talk_{What did I break now?} 20:12, 9 December 2020 (UTC)[reply]
Support Shev123 (talk) 22:22, 9 December 2020 (UTC)[reply]
Support GeneralNotability (talk) 23:34, 9 December 2020 (UTC)[reply]
Support Rainald62 (talk) 00:19, 10 December 2020 (UTC)[reply]
Support - Darwin ^Ahoy! 01:48, 10 December 2020 (UTC)[reply]
Support JPxG (talk) 06:06, 10 December 2020 (UTC)[reply]
Support - yona B. (D) 07:01, 10 December 2020 (UTC)[reply]
Support. Meiræ 15:14, 10 December 2020 (UTC)[reply]
Support Libcub (talk) 17:53, 10 December 2020 (UTC)[reply]
Support MER-C (talk) 18:36, 10 December 2020 (UTC)[reply]
Support Hiàn (talk) 18:58, 10 December 2020 (UTC)[reply]
Support Wutsje (talk) 04:26, 11 December 2020 (UTC)[reply]
Support BoldLuis (talk) 10:34, 11 December 2020 (UTC)[reply]
Support Dreamy Jazz ^{talk to me | enwiki} 16:31, 11 December 2020 (UTC)[reply]
Support --Teukros (talk) 17:55, 11 December 2020 (UTC)[reply]
Support TohaomgTohaomg (talk) 19:32, 11 December 2020 (UTC)[reply]
Support --Alaa :)..! 01:09, 12 December 2020 (UTC)[reply]
Support — AfroThundr ^{(u · t · c)} 05:05, 12 December 2020 (UTC)[reply]
Support Tom Ja (talk) 11:33, 12 December 2020 (UTC)[reply]
Support Gdarin | talk 17:28, 12 December 2020 (UTC)[reply]
Support Trizek ^{from FR} 18:42, 12 December 2020 (UTC)[reply]
Support Kew Gardens 613 (talk) 02:47, 13 December 2020 (UTC)[reply]
Support Tgr (talk) 03:28, 13 December 2020 (UTC)[reply]
Support Dirk Beetstra ^{T C} (en: U, T) 06:16, 13 December 2020 (UTC)[reply]
Support Novak Watchmen (talk) 19:23, 13 December 2020 (UTC)[reply]
Support ~ Amory (u • t • c) 19:43, 13 December 2020 (UTC)[reply]
Support --Achim (talk) 08:52, 14 December 2020 (UTC)[reply]
Support ·addshore· ^{talk to me!} 12:54, 14 December 2020 (UTC)[reply]
Support Michel Bakni (talk) 13:59, 14 December 2020 (UTC)[reply]
Support.—Teles «Talk ˱_{C L @ S}˲» 16:58, 14 December 2020 (UTC)[reply]
Support WhatamIdoing (talk) 18:49, 14 December 2020 (UTC)[reply]
Support Tutwakhamoe (talk) 21:23, 14 December 2020 (UTC)[reply]
Support —Thanks for the fish! ^{talk•contribs} 22:27, 14 December 2020 (UTC)[reply]
Support WTM (talk) 00:18, 15 December 2020 (UTC)[reply]
Support As top priority. The sooner we can get rid of Google's broken reCAPTCHA, the better. --Pandakekok9 (talk) 02:58, 15 December 2020 (UTC)[reply]
Support — Draceane ^talk_contrib. 12:47, 15 December 2020 (UTC)[reply]
Support Imnotminkus (talk) 15:42, 15 December 2020 (UTC)[reply]
Support Atsme^📞📧 11:44, 17 December 2020 (UTC)[reply]
Support Base (talk) 18:45, 17 December 2020 (UTC)[reply]
Support Owleksandra (talk) 18:46, 17 December 2020 (UTC)[reply]
Support RLuts (talk) 18:46, 17 December 2020 (UTC)[reply]
Support Kocgs (talk) 20:30, 17 December 2020 (UTC)[reply]
Strong support yes, that is fundamental in Wikipedia :))) JN Dela Cruz (talk) 05:43, 19 December 2020 (UTC)[reply]
Support Joejose1 (talk) 16:27, 19 December 2020 (UTC)[reply]
Support Yes. At least new, better captchas are needed. Vachovec1 (talk) 21:42, 19 December 2020 (UTC)[reply]
Support I think we could develop a reCaptcha-like mechanism that uses more complex pictures and audio that is not easily recognizable by bots. WikiAviator (talk) 04:47, 20 December 2020 (UTC)[reply]
Support 郑洲扬 (talk) 11:55, 20 December 2020 (UTC)[reply]
Support Geonuch (talk) 12:56, 20 December 2020 (UTC)[reply]
Support though ensuring access to visually impaired users is important. Barkeep49 (talk) 15:44, 20 December 2020 (UTC)[reply]
Support Sudonet (talk) 19:53, 20 December 2020 (UTC)[reply]
Support Ahmad^talk 02:52, 21 December 2020 (UTC)[reply]
Support —2d37 (talk) 09:48, 21 December 2020 (UTC)[reply]
Support David1010 (talk) 11:52, 21 December 2020 (UTC)[reply]
Support use of Support template under protest (I generally refuse to use template), as usual. — regards, Revi 16:49, 21 December 2020 (UTC)[reply]
Support Schniggendiller (talk) 16:59, 21 December 2020 (UTC)[reply]