Talk:Community health initiative/AbuseFilter

Exploring how the AbuseFilter can be used to combat harassment

The AbuseFilter is a feature that evaluates every submitted edit, along with other log actions, and checks them against community-defined rules. If a filter is triggered the edit may be rejected, tagged, logged, trigger a warning message, and/or revoke the user’s autoconfirmed status.

Currently there are 166 active filters on English Wikipedia, 152 active filters on German Wikipedia, and 73 active filters here on Meta. All small and medium size wikis have 29 global filters enabled as part of the Global AbuseFilter feature.

One example Global filter would be filter #96, “online shopping spam” which identifies brand new users who add known spam phrases to new articles. When triggered, it displays a warning to the user but allows them to save their changes. It also tags the edit with ‘meta spam id’ for future review. It’s triggered a dozen times every week.

AbuseFilter is a powerful tool and we believe it can be extended to handle more user conduct issues. The Anti-Harassment Tools software development team is looking into three major areas:

1. Improving its performance so more filters can run per edit

We want to make the AbuseFilter extension faster so more filters can be enabled without having to disable any other useful filters. We’re currently investigating the current performance in task T161059. Once we better understand how it is currently performing we’ll create a plan to make it faster.

2. Evaluating the design and effectiveness of the warning messages

There is a filter on English Wikipedia — #50, “Shouting” — which warns when an unconfirmed user makes an edit to mainspace articles consisting solely of capital letters. When the edit is tripped, it displays a warning message to the user above the edit window:

From en:MediaWiki:Abusefilter-warning-shouting. Each filter can specify a custom message to display.

These messages help dissuade users from making harmful edits. Sometimes requiring a user to take a brief pause is all it takes to avoid an uncivil incident.

We think the warning function is incredibly important but are curious if the presentation could be more effective. We’d like to work with any interested users to design a few variations so we can determine which placement (above the edit area, below, as a pop-up, etc.) visuals (icons, colors, font weights, etc.) and text most effectively conveys the intended message for each warning. Let us know if you have any ideas or if you’d like to participate!

3. Adding new functionality so more intricate filters can be crafted.

We’ve already received dozens of suggestions for functionality to add to AbuseFilter, but we need your help to winnow this list so we can effectively build filters that help combat harassment.

In order to do this, we need to hear your input. What limitations in the AbuseFilter prevent us from creating effective user conduct filters? What design changes to the warning messages should we explore? How might the AbuseFilter feature be expanded to better monitor user conduct problems across all small and medium wikis, using global filters? Join our discussion below.

Thank you!

— The Anti-Harassment Tools team (posted by Trevor Bolliger, WMF Product Manager 🗨 )

Efficiency

Latest comment: 7 years ago2 comments2 people in discussion

I discussed this briefly with MusikAnimal on IRC and wanted to summarize some of the ideas that emerged from that discussion here for consideration.

Currently, our abuse filters all check at least one condition before moving on to the next filter, often more. Many of these conditions overlap, so this is insanely inefficient. A more ideal abuse filter system would have the following features. I will explain how they interconnect below:

Filters can be triggered without producing any logs. The default would be to log, but this could be unchecked.
Filters can activate other filters as an action upon completion. At the completion of the activated filter, the program would return to the original filter to continue any additional prescribed actions (including possibly activating other filters).
Filters can be set to a "dormant" or "ready" state, in which they are considered active filters but will not run unless called by another filter.
Filters can perform a different set of actions if conditions are not met. This set of actions could possibly only include activating other filters, if the alternative is considered too messy.

Once you have these four things, you have enough to set up a system of filter subroutines. Imagine this simple situation:

I want three filters. Filters 1 and 2 each start off with a check that the user is autoconfirmed. Filter 3 starts off with a check that the user is not autoconfirmed. They then proceed to do radically different things. In the current setup, there will be three separate autoconfirmed-related conditions checked. Alternatively, with the changes I highlighted above, we could set up a Filter 0 to check if an editor is autoconfirmed. If yes, Filters 1 and 2 activate (now launching right into their specialized tasks). If no, Filter 3 activates. We've cut down the number of autoconfirmed-related conditions down to only one. Bullet points 2, 3, and 4 are all obviously necessary to implement this. Bullet point 1 prevents massive log-spam due to a simple autoconfirmed check being logged on every other edit.

When you consider how many filters we have, how many start with simple checks of autoconfirmed status, and how many edits have to run through the filters, the potential resource savings are substantial. Basically, this set of changes would turn our system of abuse filters into a web of more-efficient subroutines.

We could later focus on aesthetic changes to increase usability; mainly, an ideal interface would show the conditions of activation that each filter has. But that's more complicated and we can handle that in the Notes section for now. ~ Rob₁₃^Talk 23:23, 12 March 2017 (UTC)Reply

Hi Rob, thank you for taking time to share your thoughts about how to tackle some of the problems/opportunities with AbuseFilter. Your concept of subroutines makes perfect sense — good explanation. A lot of the filters I see have similar condition checks, with minor nuances given the exact type of abuse they're detecting.

Our first step is to measure the actual performance for AbuseFilter on ENWP to understand the room for improvement. I'd like to avoid an entire overhaul if possible — we may be able to find other ways to increase the efficiency of the feature. I do agree that performance improvements should be prioritized higher over UI improvements. — Trevor Bolliger, WMF Product Manager 🗨 18:08, 14 March 2017 (UTC)Reply

Condition Limiter / Performance

Latest comment: 7 years ago2 comments2 people in discussion

Currently, the edit filter condition limit system is awkward, poorly tuned to the problem it seeks to address, and inadequate.

First a bit of brief history.

The current limit is 1000 conditions. For many years on enwiki, this was a frequently encountered barrier that called for old filters to be rewritten or disabled to allow new filters to be created. A whole dark art was created around writing filters to minimize their condition count. In April 2016, the engineering underpinning the condition limiter was changed. While not directly affecting execution time, this revision changed when and how conditions were totaled. For the typical abusefilter rule, the number of conditions executed post-April 2016 was lower than before (sometimes dramatically lower). This created extra space for the addition of new rules. Since April 2016, I am not aware of the condition limiter having ever been reached on enwiki. So, it went from being a frequent, endlessly annoying constraint to being a non-issue.

At the same time that changes were made to how conditions were scored, devs also removed all of the per filter tools that EFMs could use to see how many conditions were being triggered per filter. At present, EFMs have essentially no tools at all by which to judge the performance of individual filters.

We have encountered, in the wild, cases of individual filters causing severe performance issues. To give an example, there was a filter on enwiki that offers a complicated menu of abusive terms expressed as a regex. There are roughly ~120 options in the regex variously connected by "|". Though that is obviously a large number, the runtime impact is still negligible on the typical small edit. However, if you feed it a huge edit, evaluating the regex could hang the server taking 10s of seconds to evaluate. In addition to the potential for vandalism, very large edits can also happen naturally in response to vandalism (e.g. restoring a page that was blanked), or when large pages are refactored. After this problem was discovered, a precondition was added to prevent the regex from being evaluated on edits longer than 10000 characters. That was sufficient to ameliorate the immediate issue, but doesn't solve the underlying problem. The huge complicated regex filter rule only required about 5 conditions to evaluate. Targeting evaluated conditions doesn't catch cases like this, and there was no diagnostics or limits to identify such problems.

Which brings me to my first suggestion. The condition limit should be replaced with a runtime limit. This would more directly target performance as it affects real users. (Of course, runtime statistics have their own issues, such as varying by server hardware or load, but that still seems more relevant than the artificial condition approach.)

Secondly, EFMs need tools to identify performance impacting filters. Prior to 2016 there were some limited tools, focused on counting average and max conditions triggered by each filter. These were disabled because the tools themselves became burdensome on performance as the number of AF rules grew large. It's fine to worry about the performance of the diagnostic tools, but not giving EFMs any tools for detecting performance burdens is also likely to allow performance problems to persist. I would suggest logging an average runtime per filter (perhaps using sampling) as well as logging any exceptionally long runtimes (or the max runtime during some period of time) to catch filters that behave badly only for specific cases (such as huge edits).

Third, right now all filters must be run on all edits. This is inefficient, even if the first condition checked quickly shows the filter doesn't apply. As a result of this, there is strong resistance to using AF rules to target narrow issues, such as persistent abuse that only targets a single page. There are cases where AF rules could be effective against vandals with a narrow fetish, or against narrow controversial issues (e.g. vandalism of images used in the Muhammed article), but where there is a reluctance to use filters that are perceived to have a very narrow scope. Historically, because of worries about breaching the condition limit, filters that didn't produce a certain rate of hits would be disabled regardless of whether they were effective. In terms of counter-vandalism strategy, I think this is a mistake. Right now, persistent vandalism often results in pages with long-term protection. If a targeted edit filter could ameliorate the vandalism without the need for protection, then that should be preferable to protection. (Not always possible, of course, but it should be a tool in our toolkit.) To make this possible we need ways to efficiently organize and deploy rules that are narrow in their application (e.g. affecting only one or a small number of pages), without creating a significant burden against unrelated edits.

Fourth, as a purely internal issue, I think it is weird and inefficient that the AF parser is required to tokenize and parse every rule during every execution. It would make much more sense to me to store AF rules in the database using some pre-tokenized form so that they merely have to be evaluated and not also interpreted during each execution.

Dragons flight (talk) 12:45, 25 March 2017 (UTC)Reply

Thank you for joining this conversation, Dragons. We have an investigation ticket — T161059 — to understand the current performance impact of the 1,000 condition count in AbuseFilter. There will be two goals for our work on 'performance': 1) ensure that we don't lose good publishes due to AF checks 2) allow EFMs to write and manage all desired filters without having to hassle with condition inventory management. We're fortunate to have a few ideas of how to solve this, and I'm hoping the investigation ticket will give us good news! — Trevor Bolliger, WMF Product Manager 🗨 19:15, 28 March 2017 (UTC)Reply

Functionality

Latest comment: 7 years ago2 comments2 people in discussion

The following are a list of suggestions for improving AF functionality.

Additional variables:
- Age since page creation / total number of edits, or some other way to tell if a page is newish or oldish
- Number of recent edits / time since previous edits, or some other way of detecting floods in the AF rules (separate from the throttle mechanism)
- Indicate edits performed via revert, undo, etc.
- new_categories / old_categories
- new_media / old_media
As outcome:
- Require editor to complete a captcha before saving edit.
- Possibly, apply temporary semiprotection to the targeted page.
Workflow:
- Allow one rule to trigger / call other rules, i.e. to standardize common elements and remove redundancies.

More ambitiously, one could consider overhauling the UI, which provides no search functions and only rudimentary discussion functions.

Also, right now the only way to present a customized explanation of the edit filter problem is with the "warn" function. After issuing a warning, the editor always has the option to continue anyway. If the filter is set to "warn" and "disallow" then the disallow condition only stops them after they confirm that they want to continue. This is a weird user experience. Probably not a huge problem, since most people that experience this issue are vandals, but we really shouldn't be presenting a confirm option immediately before disallowing the action.

Dragons flight (talk) 13:57, 25 March 2017 (UTC)Reply

Oh, these are all great suggestions. I'll work them into the main article. If you think of any more, please feel free to directly edit the content page.

We've considered updating the UI so it is easier for new communities to start using AbuseFilter. The ability to search is something we hadn't thought of yet, and would certainly be useful for EFMs! — Trevor Bolliger, WMF Product Manager 🗨 19:15, 28 March 2017 (UTC)Reply

Comments from BethNaught

Latest comment: 7 years ago2 comments2 people in discussion

These are all good suggestions so far. I'd like to add a couple of my own (sorry if they've already been covered and I've missed them):

It should be possible to set a filter to "throttle" but still log an instance where the filter was matched, but actions were not triggered. So for example a user makes three matching edits: the first two are logged, but not blocked, while the third one is logged and blocked. In my own experience using filters this would reduce the need for duplication. If there is an action which in an isolated case may be suspicious or innocent, but when done many times is definitely suspicious, this change would facilitate both monitoring and prevention.
The filter should be able to choose an outcome based on conditions. For example,

if ( abusive_behaviour() == true ) {
if ( "confirmed" in user_groups ) { action:throttle,disallow } else { action:disallow } }

This would help, say, if a sockpuppet can be positively identified by certain actions when using IP addresses, but only weakly when using accounts. You could consolidate separate filters for the two cases.

Thanks, BethNaught (talk) 07:47, 13 May 2017 (UTC)Reply

Beth, thank you for your suggestion. I appreciate the clarity and example you provided. I think a throttle functionality would be a natural fit for the AbuseFilter (the 'warn' is a form of throttling, in a way) and could potentially be an important tool in curbing harassment. I've added it to Community_health_initiative/AbuseFilter#Problems_to_solve. Feel free to edit the page if I miscaptured something. — Trevor Bolliger, WMF Product Manager 🗨 21:30, 15 May 2017 (UTC)Reply

Suggestions from MER-C

Latest comment: 7 years ago2 comments2 people in discussion

Excellent suggestions so far. I'll add a few more:

Please add the functions specified here: phab:T147765. I've already had a spammer (unintentionally?) exploit this loophole and can easily imagine some LTA harasser also doing the same.
There are some filters that do not have to be run immediately before saving a page (e.g. filter set to log only). These filters should be run after the edit to free up more CPU cycles for filters that need to run as the edit is made. Furthermore, as they are deferred, these filters should be allowed to consume more resources.
Allow checkusers to set a filter that examines all edits from an IP address range, whether logged in or not. We have LTAs on ranges that are too large or busy to block, and the ability to (say) lower the account creation throttle per IP to 2 per day for select ranges would be useful.

Hope these are helpful. MER-C (talk) 04:36, 22 June 2017 (UTC)Reply

@MER-C: These are helpful, thank you for sharing! Your second bullet is currently my favorite tactic to clear up more performance, but I don't want to jump to that solution without proper investigation from our developers.

As for your third bullet — to make sure I understand this correctly, it sounds like you are suggesting a new condition for filters to only run if a user is within a specified IP range — correct? (We'd have to solve for privacy, and your suggestion of permissioning it to checkuser is likely the best tactic.) Do you think the ranges would need to differ per filter? Or could there be a shared list of "suspicious IP ranges"? e.g. Filter 1 checks Range A, Filter 2 checks Range B — or — Filters 1 and 2 check list-of-suspicious-IP-ranges (which contains A and B.) — Trevor Bolliger, WMF Product Manager 🗨 17:14, 22 June 2017 (UTC)Reply

Add topic