Community health initiative/AbuseFilter

In 2017 the Wikimedia Foundation's Anti-Harassment Tools team explored ways to improve the AbuseFilter extension so communities can use the powerful tool to prevent and monitor potential harassment, similar to how vandalism and spam are prevented and monitored.

This page documents a feature the Wikimedia Foundation's Anti-Harassment Tools team has built. Development of this feature is complete.

🗣 We invite you to join the discussion!

No future development work is planned on AbuseFilter by the Anti-Harassment Tools team.

Goals

Improve AbuseFilter so admins have stronger tools at their disposal to prevent, identify, and monitor harassing behavior.
Alleviate performance concerns so communities don't need to pick-and-chose which filters are active.
Add functionality to allow more sophisticated detection. (ORES, anti-spoof, regex captures)

Background

On ENWP

The AbuseFilter extension is enabled on most wikis. On the English Wikipedia it is called 'Edit filter' because it is used to log and monitor other events other than just abuse, though filters can be used for other logged actions, and not just edits. Wikipedia:Edit filter explains the features and the tool can be found at Special:AbuseFilter.

There are ~220 users (most of which are administrators) on ENWP who have permissions to create or modify these filters, known as "edit filter managers", or EFMs. Edit filters are written in a language similar to other high-level programming languages. Full documentation can be found at mw:Extension:AbuseFilter/Rules format, with some additional documentation at w:en:Wikipedia:Edit filter/Documentation. In short, filters can search for actions (e.g. a user making many rapid edits, user removes a large section of text, account creation) or strings (e.g. curse words, spam URLs) and compare them against information about the user (e.g. username, registration date, lifetime edit count) and the page they are editing (e.g. namespace, title, recent contributors). In addition to basic logging, if the filter is triggered it can do one of four actions:

Block the user — this is not used on ENWP.
Prohibit, or "disallow" the edit from being saved.
Warn the user with a custom message and require a confirmation before publishing.
Tag the edit (en:Wikipedia:Tags).

All hits to enabled filters are publicly logged. However edit filter managers can set the visibility of a particular filter to private, so that the filter details are only visible to EFMs. This privacy also applies when searching for hits to a specific filter in the logs (example). When reviewing the user's filter log, hits to private filters are still listed, but the ID of the filter is not disclosed.

Performance management

When a user publishes an edit their revision is processed through as many edit filters as possible, in increasing numerical order, until 1,000 conditions are hit. In essence, each boolean operation is considered one "condition". If the edit runs through 1,000 conditions but not all filters, the remaining filters are skipped. This is to keep the filters lean and reduce the time between submitting a publish and the publish succeeding.

This results in a system where EFMs must manage the inventory of conditions — if all conditions are used and a new filter is desired, conditions must be re-allocated from other filters. This is time consuming and requires experience and patience. To ensure all filters are running, after modifying any filters the EFMs must manually monitor the top line of text on Special:AbuseFilter:

Of the last 7,694 actions, 0 (0.00%) have reached the condition limit of 1,000, and 74 (0.96%) have matched one of the filters currently enabled.

Problems to solve

This is an unprioritized laundry list, not a backlog.

Performance management

Tracked in Phabricator:
Task T166802 resolved

We've added performance monitoring to AbuseFilter on a handful of wikis, which can be viewed here: https://grafana.wikimedia.org/d/000000393/mediawiki-abusefilter-profiling?orgId=1

We've also added logging for filters that take over 800MS. These are currently privately logged, but we'd like to build them into the AbuseFilter UI directly. We've also re-enabled the once-disabled per-filter profiling on Portuguese and English Wikipedias to monitor if they themselves cause a performance degradation.

From what we've found, there is still condition inventory on ENWP to run more filters, so we do not need to 'fix' performance, but rather help Edit Filter Managers more easily find the maximum number of filters they can enabled. To accomplish this, we'd like to bring the tracking we've implemented out of these dashboards and into the Special:AbuseFilter UI directly. Once we know if the per-filter profiling causes a performance degradation, we will begin an on-wiki discussion about how to surface the data on the Sp:AF UI directly.

Potential things for the future

Filters are combined for performance reasons, so it's hard to know if certain parts of the regular expressions are still needed (e.g. some vandalism trend that has died off).
- These combo-regex filters can have severe affects on large edits to large article pages.
Give people better feedback on the performance of a filter. ("Hey, this filter needs to be optimized. Here’s some best practices on how to improve it.")
Database updates to the abuse filter log table could happen via post-processing (not that big of a deal for Anti-Harassment).
Can we optimize execution time on the backend?
The larger the diff, the slower AF runs on publish. This can be troublesome on page-blanking or mass content removal reverts and undos.
Is runtime a better way to show the performance of each filter?

Warning effectiveness

Tracked in Phabricator:
Task T166804

AbuseFilter currently has a 'warn' feature that displays a message when a user trips a filter. It would be low effort to test and implement improvements to these messages' effectiveness.

Allow a different type of display per filter (pop-up, above the edit window, etc.)

Functionality

Tracked in Phabricator:
Task T166805

There are many limitations to the edit filter's functionality.

Binary decisions, no heuristics. Should we explore Detox, ORES, or other machine learning integration?
Subroutines — Check if another filter was tripped. Allow one rule to trigger / call other rules, i.e. to standardize common elements and remove redundancies. Tracked in T186960.
Notifications? — Echo? Watchlist? IRC feed? A new Special:AbuseWatchlist?
- When a filter is tripped (tracked in T179495)
- When other managers edit/create filters
Detect edit wars, or likelihood that an edit is part of an edit war?
Ability to set an expiry for a filter, so that it can automatically be disabled after some period of time. Tracked in T20246.
Add additional variables:
- Age since page creation / total number of edits, or some other way to tell if a page is newish or oldish
- Number of recent edits / time since previous edits, or some other way of detecting floods in the AF rules (separate from the throttle mechanism)
- Indicate edits performed via revert, undo, etc.
- new_categories / old_categories
- new_media / old_media
Add additional functions:
- A function like contains_any but for integers, such as contains_any_int(article_namespace, 1, 5, 10). Currently this can only be done with less-user-friendly regex, such as article_namespace rlike "^(1|5|10)$", or with multiple comparisons that require one condition for each item in the array. Added with this commit.
- ~~Add ability to store the regex "captures" into a variable, which can be used in other parts of the filter.~~ (added with T179957)
Add additional outcomes of the filters:
- Require editor to complete a captcha before saving edit.
- Possibly, apply temporary semiprotection to the targeted page.
- Deferred changes — see task T118696 & Wikipedia:Deferred_changes
~~Clarify the UI that setting a filter to both 'warn' and 'disallow' may be redundant (but still leave it as an option.)~~ It's not redundant.
Throttling — Allow users to make a certain action for N infractions, then warn for N infractions, then disallow on any further infractions.

UI improvement

Better per-filter discussions
Categorization and/or search of filters, so that filter managers can easily find out if they filter they want already exists, or if an existing one can be modified to meet their needs. The search part was implemented with T87455.
Don't accept bad filters. Set up automatic testing for a new filter (run it on past 1000 edits or so) and see if it doesn't disallow majority of the edits. If it does, it's a bad filter and we shouldn't accept it. Then we can get rid of the automatic-disable for filters which is clearly not the right solution and is not just to people who had to face the bad filter and had their legit edits declined.

Anti-Spoof

Tracked in Phabricator:
Task T166816

It can be easy to get around filter that are using AntiSpoof. To address this, the Anti-Harassment Tools team added in more equivset coverage and implemented a new function to AbuseFilter to make it easier to compare potentially spoofed words to others. We will not be investing in this area in the future.

Community health initiative/AbuseFilter

Contents

Goals

Background

On ENWP

Performance management

Problems to solve

Performance management

Warning effectiveness

Functionality

UI improvement

Anti-Spoof

See also