Community Wishlist Survey 2021/Admins and patrollers/Overhaul spam-blacklist

Random proposal ►◄ Admins and patrollers The survey has concluded. Here are the results!

Overhaul spam-blacklist

Problem: The current blacklist system is archaic; it does not allow for levels of blacklisting, and is confusing to editors. Main problems include that the spam blacklist is indiscriminate of namespace, userstatus, material linked to, etc. The blacklist is a crude, black-and-white choice, allowing additions by only non-autoconfirmed editors, or only by admins is not possible, nor is it possible to allow links in certain namespaces, certain wikis (or certain wiki-flavours, e.g. disallow everywhere except for all of wikitravel). Also giving warnings is not possible (on en.wikipedia, we implemented XLinkBot, who reverts and warns - giving a warning to IPs and 'new' editors that a certain link is in violation of policies/guidelines would be a less bitey solution).
Who would benefit: Editors on all Wikipedia's
Proposed solution: Basically, replace the current mw:Extension:SpamBlacklist with a new extension with an interface similar to mw:Extension:AbuseFilter, where instead of conditions, the field contains a set of regexes that are interpreted like the current spam-blacklists, providing options (similar to the current AbuseFilter) to decide what happens when an added external link matches the regexes in the field (see more elaborate explanation in collapsed box).
Note: technically, the current AbuseFilter is capable of doing what would be needed, except that in this form it is extremely heavyweight to use for the number of regexes that is on the blacklists, or one would need to write a large number of rather complex AbuseFilters. The suggested filter is basically a specialised form of the AbuseFilter that only matches regexes against added external links. Alternatively, it could be incorporated into the current AbuseFilter as a specialized and optimized 'module'.

description of suggested implementation

description of suggested implementation

Take the current AbuseFilter, create a copy of the whole extension, name it ExternalLinkFilter, take out all the code that interprets the rules ('conditions').
Make 2 fields in replacement for the 'conditions' field:
- one text field for regexes that block added external links (the blacklist). Can contain many rules (one on each line, like current spam-blacklist).
- one text field for regexes that override the block (whitelist overriding this blacklist field; that is generally more simple, and cleaner than writing a complex regex, not everybody is a specialist on regexes).
Leave all the other options:
- Discussion field for evidence (or better, a talk-page like function)
- Enabled/disabled/deleted (not turn it off when not needed anymore, delete when obsolete)
- 'Flag the edit in the edit filter log' - maybe nice to be able to turn it off, to get rid of the real rubbish that doesn't need to be logged
- Rate limiting - catch editors that start spamming an otherwise reasonably good link
- Warn - could be a replacement for en:User:XLinkBot
- Prevent the action - as is the current blacklist/whitelist function
- Revoke autoconfirmed - make sure that spammers are caught and checked
- Tagging - for certain rules to be checked by RC patrollers.
- I would consider to add a button to auto-block editors on certain typical spambot-domains (a function currently taken by one of Anomie's bots on en.wikipedia).

This should overall be much more lightweight than the current AbuseFilter (all it does is regex-testing as the spam-blacklist does, only it has to cycle through maybe thousands of AbuseFilters). One could consider to expand it to have rules blocked or enabled on only certain pages (for heavily abused links that actually should only be used on it's own subject page). Another consideration would be to have a 'custom reply' field, pointing the editor that gets blocked by the filter as to why it was blocked.

Possible expanded features (though highly desired)

create a separate userright akin AbuseFilterEditor for being able to edit spam filters (on en.wikipedia, most admins do not touch (or do not dare to touch) the blacklist, while there are non-admin editors who often help on the blacklist).
Add namespace choice (checkboxes like in search; so one can choose not to blacklist something in one particular namespace, with addition of an 'all', a 'content-namespace only' and 'talk-namespace only'.
- some links are fine in discussions but should not be used in mainspace, others are a total nono
- some image links are fine in the file-namespace to tell where it came from, but not needed in mainspace (e.g. flickr is currently on revertlist on en.wikipedia's XLinkBot)
Add user status choice (checkboxes for the different roles, or like the page-protection levels)
- disallow IPs and new users to use a certain link (e.g. to stop spammers from creating socks, while leaving it free to most users).
- warn IPs and new users when they use a certain link that the link often does not meet inclusion standards (e.g. twitter feeds are often discouraged as external links when other official sites of the subject exists; like the functionality of en:User:XLinkBot).
block or whitelist links matching regexes on specific pages (disallow linking throughout except for on the subject page) - coding akin the title blacklist
block links matching regexes when added by specific user/IP/IP-range (disallow specific users to use a domain) - coding akin the title blacklist

Downsides

We would lose a single full list of material that is blacklisted (the current blacklist is known to work as a deterrent against spamming). That list could however be independently created based on the current rules (e.g. by bot).

Modular approach: make the current AbuseFilter 'modular', where upon creation of a new filter, you can define a 'type' of filter. That module can be a module like the current existing AbuseFilter, or specialised modules like this spam-blacklist filter described above.

More comments: Previous discussions: Community_Wishlist_Survey_2017/Miscellaneous/Overhaul_spam-blacklist and Community Wishlist Survey 2019/Admins and patrollers/Overhaul spam-blacklist
Phabricator tickets: task T6459 (where I proposed this earlier); specific comment on task T16719: Brion, April 5, 2010
- Related tickets: task T243484, task T18326, task T224921
Proposer: Dirk Beetstra ^{T C} (en: U, T) 06:13, 17 November 2020 (UTC)[reply]

Discussion

This would be very useful and seems to be kind of urgent! Zblace (talk) 08:16, 18 November 2020 (UTC)[reply]
Having poked my head in, there would indeed be firm value both on reducing the complexity side of things and also for the more nuanced components - at the moment there's an ongoing call having to be made between major collateral damage and allowing spam to avoid severing a sometimes useful domain. Nosebagbear (talk) 10:10, 18 November 2020 (UTC)[reply]
The lack of an SBL-override right for sysops and bots is criticized since some years and nothing happened though it is highly required. --Achim (talk) 17:47, 19 November 2020 (UTC)[reply]
Absolutely support this. ALso we need to rename spam blacklists to link-blocklist for all projects. JzG (talk) 18:52, 20 November 2020 (UTC)[reply]
This would be good too, but renaming extension that is in use is very complicated, see phab:T254649. – Ammarpad (talk) 21:05, 22 November 2020 (UTC)[reply]
I am not proposing to rename any extension (the phab ticket does, and I support that, maybe we should do a wishlist for that as well to show how much the community wants that). In its simplest form I suggest to make a copy of the AbuseFilter extension and name it (e.g.) 'ExternalLinkFilter'. Then from the copy you rip out the part that is executing the 'code' that is the core of the AbuseFilter, and replace that with a textbox containing regexes and code which matches these regexes against the added external links (like what the current spam-blacklist does). The rest are additions to that part code. --Dirk Beetstra ^{T C} (en: U, T) 05:52, 23 November 2020 (UTC)[reply]
Well, in other words, what you mean is "rewrite the extension" or "make new extension" completely, neither is simple, obviously. Note that I am equally frustrated with the weird limitations of this extension, I am just acknowledging the reality. – Ammarpad (talk) 06:32, 23 November 2020 (UTC)[reply]
@Ammarpad: yes, in a way it is going to be a near rewrite from scratch. I guess my suggestion to copy and adapt the AbuseFilter is more based on the idea that many of the controls there are very suitable for the needed control (basically all outside of the actual AbuseFilter-code). --Dirk Beetstra ^{T C} (en: U, T) 05:34, 24 November 2020 (UTC)[reply]
Can someone explain in a simpler way what this would do? Thanks! — WinnerWolf99 ^talk_{What did I break now?} 21:28, 11 December 2020 (UTC)[reply]
@WinnerWolf99: The proposal suggests that instead of editing on the MediaWiki: namespace to adjust the spam blacklists and whitelists, there should be a dedicated special page for such blacklist just like the abuse filter. The current implementation of the blacklist is also very limited in that it only disallows the edit from being performed, and that admins and bots are not exempted from using text that matches the regex of the blacklists. The proposal borrows some ideas from the abuse filter, which allows different actions (like silent logging, warning, and disallowing). If you want an even simpler explanation, this mock-up of the new blacklist interface might help you. Pandakekok9 (talk) 01:09, 13 December 2020 (UTC)[reply]
OK, voting support below — WinnerWolf99 ^talk_{What did I break now?} 16:16, 14 December 2020 (UTC)[reply]
Late to the comment period, but one potentially useful feature would be the ability to blacklist all domains on a given IP - mentioned by Beetstra at w:en:Special:Diff/995659682. GeneralNotability (talk) 13:56, 22 December 2020 (UTC)[reply]

Voting

Support Sagivrash (talk) 18:44, 8 December 2020 (UTC)[reply]
Support This seems like something that would be a clear benefit to the projects Thryduulf (talk: meta · en.wp · wikidata) 19:56, 8 December 2020 (UTC)[reply]
Support —MarcoAurelio (talk) 19:58, 8 December 2020 (UTC)[reply]
Support Filipović Zoran (talk) 20:17, 8 December 2020 (UTC)[reply]
Support CrystallineLeMonde (talk) 20:28, 8 December 2020 (UTC)[reply]
Support Pi.1415926535 (talk) 20:43, 8 December 2020 (UTC)[reply]
Support MarioSuperstar77 (talk) 20:58, 8 December 2020 (UTC)[reply]
Support Jan Myšák (talk) 22:37, 8 December 2020 (UTC)[reply]
Support * Pppery * _{it has begun} 01:51, 9 December 2020 (UTC)[reply]
Support – Ammarpad (talk) 04:20, 9 December 2020 (UTC)[reply]
Support Sidishandsome (talk) 04:46, 9 December 2020 (UTC)[reply]
Support Samwalton9 (talk) 09:26, 9 December 2020 (UTC)[reply]
Support ‐‐1997kB (talk) 12:41, 9 December 2020 (UTC)[reply]
Support Sgd. —Hasley 13:10, 9 December 2020 (UTC)[reply]
Support — Rhododendrites ^talk \\ 22:07, 9 December 2020 (UTC)[reply]
Support Minorax (talk) 22:10, 9 December 2020 (UTC)[reply]
Support GeneralNotability (talk) 23:29, 9 December 2020 (UTC)[reply]
Support Meiræ 14:10, 10 December 2020 (UTC)[reply]
Support Libcub (talk) 18:01, 10 December 2020 (UTC)[reply]
Support KevinL (aka L235 · t) 01:02, 11 December 2020 (UTC)[reply]
Support Dreamy Jazz ^{talk to me | enwiki} 16:29, 11 December 2020 (UTC)[reply]
Support Theklan (talk) 17:18, 11 December 2020 (UTC)[reply]
Support Rename it too. czar 18:10, 11 December 2020 (UTC)[reply]
Support, one of most needful ideas. Фред-Продавец звёзд (talk) 18:52, 11 December 2020 (UTC)[reply]
Support Wutsje (talk) 19:39, 11 December 2020 (UTC)[reply]
Support Robins7 (talk) 22:08, 11 December 2020 (UTC)[reply]
Support Strainu (talk) 09:37, 12 December 2020 (UTC)[reply]
Support Klaas `Z4␟` V: 11:54, 12 December 2020 (UTC)[reply]
Support Obviously. More flexibility would be very useful to our admins. --Pandakekok9 (talk) 01:10, 13 December 2020 (UTC)[reply]
Support Tgr (talk) 03:50, 13 December 2020 (UTC)[reply]
Support as proposer. Long overdue. --Dirk Beetstra ^{T C} (en: U, T) 06:09, 13 December 2020 (UTC)[reply]
Support Wikibenchris (talk) 08:38, 13 December 2020 (UTC)[reply]
Support ~ Amory (u • t • c) 19:43, 13 December 2020 (UTC)[reply]
Support.--HakanIST (talk) 08:04, 14 December 2020 (UTC)[reply]
Support --Achim (talk) 09:20, 14 December 2020 (UTC)[reply]
Support Sadads (talk) 11:42, 14 December 2020 (UTC)[reply]
Support ·addshore· ^{talk to me!} 12:55, 14 December 2020 (UTC)[reply]
Support — WinnerWolf99 ^talk_{What did I break now?} 16:16, 14 December 2020 (UTC)[reply]
Support WhatamIdoing (talk) 18:50, 14 December 2020 (UTC)[reply]
Support --ThunderingTyphoons! (talk) 21:37, 14 December 2020 (UTC)[reply]
Support —Thanks for the fish! ^{talk•contribs} 22:21, 14 December 2020 (UTC)[reply]
Support Spencer (talk) 15:43, 15 December 2020 (UTC)[reply]
Support TrangaBellam (talk) 10:39, 17 December 2020 (UTC)[reply]
Support RLuts (talk) 18:53, 17 December 2020 (UTC)[reply]
Support Base (talk) 18:54, 17 December 2020 (UTC)[reply]
Support Owleksandra (talk) 18:54, 17 December 2020 (UTC)[reply]
Support Joejose1 (talk) 16:24, 19 December 2020 (UTC)[reply]
Support This would be a great improvement. Vachovec1 (talk) 21:47, 19 December 2020 (UTC)[reply]
Support Geonuch (talk) 13:03, 20 December 2020 (UTC)[reply]
Support Barkeep49 (talk) 15:46, 20 December 2020 (UTC)[reply]
Support Ahmad^talk 03:02, 21 December 2020 (UTC)[reply]
Support Schniggendiller (talk) 16:36, 21 December 2020 (UTC)[reply]
Support use of Support template under protest (I generally refuse to use template), as usual. — regards, Revi 16:46, 21 December 2020 (UTC)[reply]
Support Thibaut (talk) 16:59, 21 December 2020 (UTC)[reply]