Community health initiative/Policy and enforcement research

This page is currently a draft. More information pertaining to this may be available on the talk page.

Translation admins: Normally, drafts should not be marked for translation.

Several teams at the Wikimedia Foundation have been looking at aspects of what makes a Wikimedia community healthy, and which factors work against a healthy contributing environment. As part of the “policy and enforcement growth” component of the Community Health initiative, some analysis of the under-researched area of dispute resolution and policy enforcement is badly needed before improvements can be discussed. The first stage in this effort is both a quantiative and a qualitative look at the English Wikipedia's Adminstrator's Noticeboard/Incidents - how it used, what its strengths are, and where its weaknesses lie. The survey and data analysis detailed below are designed to gain a better understanding of how a specific process works - where it performs well, and where it could improved.

Why AN/I?

The Administrator’s Noticeboard/Incidents page is one of the longest running and most-frequented forums for user disputes and and problems on the Wikimedia projects. It has hundreds of archives and tens of thousands of threads, making it a rich source of data on how we manage user problems on the projects.

Why look at English Wikipedia only?

The Community Health initiative will be working with members of our many projects, not just those on English Wikipedia. The AN/I research is a good starting point, though, due to the age and high usage of the board. Many projects have not yet developed similar processes, and lessons learned from large projects can be very valuable to communities looking to expand and refine how they deal with problems.

What does the Foundation plan to do with this research?

This research will help inform the Wikimedia Foundation’s efforts to support community development, whether through software and technical improvements, or through discussions and proposals to improve processes and policies. The results of this data will be shared with contributors, and will help communities make data-driven decisions about how they develop their community norms and workflows.

Qualitiative survey - draft questions

Below are some of the questions being considered for a survey to run in the last quarter of 2017. These questions are not the final draft; we would very much appreciate feedback from contributors who see important aspects not being convered, or questions being poorly framed or confusing. We will be looking for feedback throughout the early fall of 2017.

Use of ANI

AIM: We want to know how frequently people use ANI as a community-lead process for reporting, reviewing and deliberating on incidents. We want to be able to differentiate between the experiences of different groups of users - reviewers, reporters, and people who do both.

How often have you reported incidents to ANI in the last 12 months? (never, once or twice/ 3-10 times / about once a month / more often)
How often have you been a participant (i.e. not making a new report, being reported, or closing an existing report) in discussions on ANI in the last 12 months? (never, once or twice/ 3-10 times / about once a month / more often)
How often have you been involved in an incident reported on ANI in the last 12 months ? (never, once or twice/ 3-10 times / about once a month / more often)
In the last 12 months, how often have you been admonished or sanctioned as a result of being involved in an incident that was reported to ANI? (never, once or twice/ 3-10 times / about once a month / more often)
How often did you visit ANI to follow or read about reports on incidents you are not involved in in the last 12 months? (never, once or twice/ 3-10 times / about once a month / more often)
Have you seen someone file a false or spurious AN/I report to antagonize, annoy, or harass someone? (never, once or twice/ 3-10 times / about once a month / more often)

Satisfaction with decision making on ANI

AIM: We want to know, if people are satisfied with the discussion process on ANI, the outcomes, the tools used to enforce outcomes

Are you satisfied with the way reports are handled on ANI?
Are you satisfied with the number of people taking an active part in discussions on ANI?
Are you satisfied with the variety (e.g. admins, heavy users, newer users, members of different Wikiprojects) of people taking an active part in discussions on ANI overall?
Are you satisfied with the way non-admin users discuss on ANI?
Are you satisfied with the role administrators take on ANI?
Have you ever disagreed with an AN/I outcome, such as the wrong person being sanctioned?
Do you agree with the general process of how AN/I reports work? (e.g posting to the board and opening things up for conversation)
Are you satisfied with the outcomes from ANI reports?
Do you feel the closing result to a report on ANI is too harsh? (never/rarely/sometimes/often/all the time)
Do you feel the closing result to a report on ANI is not harsh enough? (never/rarely/sometimes/often/all the time)
Do you feel that threads typically stay on ANI for the appropriate length of time?(never/rarely/sometimes/often/all the time)
How do you feel about the following tools and processes that are used in reported incidents?

When page protection is used, do you think it is used

Generally appropriately
Generally inappropriately
Mixture of the above
unsure

When temporary blocks are used, do you think they are used

Generally appropriately
Generally inappropriately
Mixture of the above
unsure

When indefinite user blocks are used, do you think they are used

Generally appropriately
Generally inappropriately
Mixture of the above
unsure

When interaction bans are used, do you think they are used

Generally appropriately
Generally inappropriately
Mixture of the above
unsure

When topic bans are used, do you think they are used

Generally appropriately
Generally inappropriately
Mixture of the above
unsure

When warnings are issued, do you think they are used

Generally appropriately
Generally inappropriately
Mixture of the above
unsure

When closures with no action are used, do you think they are used

Generally appropriately
Generally inappropriately
Mixture of the above
unsure

When problems get redirected to other noticeboards, do you think that happens

Generally appropriately
Generally inappropriately
Mixture of the above
Unsure

Experiences with ANI

AIM: We want to know, if people accept and trust ANI as community-lead process for reporting, reviewing and deliberating on incidents. We especially want to find out if there is a lack of trust in the process for any reason.

What does ANI do well? [open answer]
Which specific types of problems are dealt with well at ANI? [Scaled answer]
- Sock puppetry
- Personal attacks
- “Bot” or automated script problems
- Impersonation accounts
- Long-term user disputes
- Short-term user disputes
- Topic-related problems
What types of problems does AN/I not work well for?
Do you feel confident to take part in most discussions on reported incidents?
How do you feel about other participants in ANI discussions?
- sufficiently skilled and experienced
- somewhat skilled and experienced
- not skilled and experienced enough
Have you avoided reporting one or more incidents to ANI in the last 12 months, because you did not think it would be handled appropriately there?
- If so, how often?
Why did you think those incidents would not be handled appropriately?
Have you avoided reporting an incident or taking part in a discussion on ANI in the last 12 months, because you were afraid of retributions of any kind?
- If so, how often?
Would you recommend making a report to ANI to another editor who is involved in a dispute?

Long answer question

AIM: We want to find possibilities for improvement

If you could change one thing about ANI, what would it be?

Quantitative analysis - draft methodology

A second, complementary approach is to look at the extensive archives of AN/I to gain a data-based picture of how it is being used. This quantitative approach is challenging, as the noticeboard is an open, minimally-structured wiki page. This makes automated data collection difficult, but we are examining several approaches that will help in that area.

Questions to answer

Here are some of the questions we hope to answer with this work:

What is the relative frequency of reports?
Who is participating?

In particular - reported editors, reporting editors, admins, uninvolved editors, involved users

How are cases closed/resolved?
How is this work spread among the admin group? Are a small number of admins closing most cases, or is it more varied?
How many cases were archived but not formally closed?
What is the breakdown of reasons for filing AN/I?
What is the average length of threads, and how does that relate to closure types?

Techniques

The Anti-Harassment Tools team has already undergone a “trial run” of this project. We are currently looking at the results of this trial to help inform the larger project, and what we can learn from difficulties there. Some types of data are easy to collect, such as the length threads are open, and their word count. Others, such as looking at types or categories of reports, and how they were actioned, are more challenging. We are both looking at better automated approaches as well as ways to efficiently manually review the archives. Some techniques we are using to glean information from AN/I threads include:

Was a policy linked in the thread?
Were certain keywords used in the heading? In the initial report? In the following discussion?
Of reports that are formally closed, what keywords or policy links are used in the closing statement/template?

We welcome any ideas on how to dive deeper into the data, and any suggestions on questions beyond those listed above. This work is scheduled for the last months of 2017, with analysis being presented in early 2018. More information can be found in the Phabricator task for this project.