Talk:Universal Code of Conduct/2021 consultations/Research

Talk to us about this research!

Latest comment: 3 years ago2 comments2 people in discussion

To better understand the perspectives of individuals in the Wikimedia community who have experienced harassment, Wikimedia Foundation researched our community members’ knowledge of, and comfort with, existing enforcement and reporting processes.

Currently, an Executive Summary is available in English, Spanish, and German. Please share the results of the research with your community by posting a link to the report, or translating the report into your language.

You can discuss the results or raise questions on this talkpage or by contacting communityhealthwg@wikimedia.org. We will be collecting questions for review by staff and will have answers available starting June 28, 2021.

SPoore (WMF) Senior Strategist, Trust & Safety (talk) 13:58, 23 June 2021 (UTC)Reply

Ich denke ein wichtiger Schritt bezüglich des Verständnisses wäre eine geeignete Sprache. Bisher gibt es Umseitiges in zwei Sprachen. Falls Euch das ausreichend vorkommt: Überdenkt Euer Verständnis von "universal" ...Sicherlich ^Post 11:04, 24 June 2021 (UTC)Reply

Various thoughts: process, recommendations, further research, and what seems to be getting very short mention

Latest comment: 3 years ago4 comments2 people in discussion

Various thoughts on process and recommendations:

Research methods: given it being surfaced to all ArbComs (a positive), is there a reason for one notable difference in the languages of the noticeboards "Notices for this survey were also posted on the village pumps (or equivalent) on Italian, Spanish, French, German, Polish and Arabic Wikipedias." - why was en:WP:VPWMF not utilised?

The study correctly points out it was only a pilot study, but the nature of that makes me dispute how reasonable it is to be making recommendations with applicability to the entire meta-project. The current timing would indicate that UCOC recommendations are going to be drafted before a full-scale version of this report (something I would read with interest) can be created. That is to me very poor form to me, especially as it's not merely a risk of small sample size, but a confirmed non-representative form - both of these should be resolved.

Do we have a full list of survey and interview questions (both base and follow-up, for the latter)?

In terms of the distinctly draft recommendations, I have mixed views and would ask if issues and potential issues corresponding to each will also be part of any fuller survey?

Overwhelmingly, participants report that our existing enforcement systems are overly complicated and difficult to understand. - this certainly seems fair. One thing that I'd love to see for en-wiki, for example (given we have at least 7 conduct boards, plus private arbcom email) is something like a "conduct wizard" that will give simple questions and feed the person to the right place, and assist in dropping in evidence and so on.

Clarify and streamline the reporting process, for both reporters and administrators. - sort of akin to the above. Combining conduct boards does come with its own sets of issues, including a clogging effect, or complexity of having different rulesets applying to different issues on the same page (an example being discretionary sanctions on en:WP:AE vs en:WP:ANI or simple en:WP:AIV cases vs complex en:WP:ANI aspects).

Make it easier to surface incidents of harassment to administrators this is certainly true, but there are legitimate disagreements on its ultimate reach. We already get a large number of content disputes incorrectly bought to a conduct board, and were you, say, to put a "report this" button next to every edit, this rate would skyrocket.

Provide more flexible and varied outcomes for reporting - en:WP:MEDCAB hasn't existed for a long time, but something akin would be interesting. I'm not sure how readily a conduct mediation method may scale, but would be interesting to look into. I'm somewhat concerned by "This could include allowing reporters to have input on the outcome of reports," - the current phrasing isn't especially clear as to what exactly may be meant by it. I would have thought most communities were already at least fairly open to reducing the sanction that most thought suitable if all/the sole victim themselves felt that was warranted, but the default assumption by the nature of the reporter reporting something is they view it severely unless otherwise stated, so giving them remit to further escalate seems...odd.

Make the reporting process transparent and not just visible., this is also the logical location to talk about several other aspects. One is that the first finding notes "Some of them used a specific jargon term, “boomerang”". The finding in general seems to aggregate all consequences to the reporter as boomeranging. Except, that's not appropriate. A check back through the last 7 or so ANI pages found a significant fraction, between 15-25% had at least significant consideration of a boomerang sanction - that is, a sanction placed because the conduct of the reporter was (or was also) at fault. That filters out cases where just the reported individual requested it, but the variability on "significant" leads to a margin. Of those that progressed to actual boomerang blocks, the large majority to me were not controversial. Many times disputes have both parties at fault, and to use the author's phrasing, a "moral imperative" necessitates evenhandedness - prohibiting formal boomerang action would be a major unfairness and promote significant first-mover advantage.

Distinct from that is reprisal action, whether that be (further) harassment [a word that traditionally does not define well] from the reported, hounding from new unhappy editors, or similar. It could be considered "unwarranted boomeranging" as a far clearer phrase than the current merging.

Make the reporting process transparent and not just visible. - while some degree of making these more searchable would certainly be a positive, especially cross-projects, I am confused. The first two lines seem to contradict each other - the first says it's not transparent, but the second then does state they "can not only see [complexity and past evidence]" - is there a negative missing? While jargon has the negatives of making it harder for newcomers to a conduct process, if you endeavoured to remove it, it would have the countervailing effect on those whose presence makes up the large majority of the time on all editor activity on these boards (primarily admins etc). It's not jargon used for the sake of jargon.

Provide better guidelines or specific training for administrators to resolve disagreement while avoiding escalation into full-blown harassment. - I mean, I don't think anyone disagrees that content disputes can flare into conduct disputes, though there aren't any numbers attached to this (a recurring issue, so it makes it hard to identify rate issues, which means ultimately any changes would also have to rely on anecdotal indications on if they were helping). Admins specifically don't have special authority in content disagreements - nor should they. Encouraging dispute resolution methods to be further rolled out, and with better awareness of them, are of course aspects that again, don't seem likely to warrant disagreement except potential viability often contingent on number of experienced editors willing to support them.

Do private reporting systems impact rates of reporting? Does this privacy impact rates of enforcement?, onto the "further research" areas, I find it of immense concern that it's not even in the bolded header, any consideration of negatives that may come to reported parties from an anonymous or highly private system on request (for non-private evidence). Any study should specifically seek out those who have been accused of misconduct, both through purely public onwiki routes, private from community (but shared with accused) evidence routes such as ARBCOMs handling offwiki evidence and T&S purely private cases. For a full picture, this would likely need to try emailing individuals who have been blocked as well - not great, but it can't claim to be full field research without it, I suspect.

"A second key question is to figure out who this current state of visibility best serves. As this Targets of Harassment project indicates, would-be reporters are badly served by this publicly visible system. Therefore, we ought to figure out who benefits, if anyone, from this system, and how they make use of this publicly visible information" is not under a header that logically applies, and should be a dedicated field. A few potential areas come to mind:

The accused - it's not clear what form of private reporting the author would prefer, but conducting a defence necessitates knowing the evidence, and conducting an assessment of evidence must know that all of the evidence is there. Context, and parties' perceptions, plus offwiki evidence that the reported party may not know to provide, are all factors that should be included. That excludes the significant usecase of unblocks - an individual blocked through this route, how are they supposed to request an appeal when reviewing admins always want a clear demonstration the user understands the issues and how they will avoid them in the future?
A second facet of this is who makes the decisions. On en-wiki, any non-clear-cut conduct case will have numerous pairs of eyes on it, with multiple participants - and usually large numbers for edge cases. Even private ARBCOM cases should have between 12-15 pairs of eyes viewing them. A private setup could never guarantee this level of viewpoints for every significant conduct complaint raised. These individuals, often disagreeing, disagree not just on whether a reported party is at fault, but (most commonly) as to scale of that and what appropriate actions should be taken. Shrinking the number of individuals assessing would inherently increase issues raised by viewpoint perspectives of single editors.
The full community - cases being public helps users be aware of "where the axe will fall" - what does a Community consider bad, tolerable, and good, and if bad, how bad. This information doesn't cope well with being summarised, as applying it to instances seen elsewhere becomes nonviable.
The watchers - that is, anyone judging whether a user makes good judgments, or, critically, going back and reviewing past judgements. Admin candidates get past actions reviewed, as do arbitrator candidates. Admins who routinely have poor judgements are called up for it, both formally and informally, as do non-admins on conduct boards (which is a huge number).
The reporters - if cases are held privately, it also reduces the information available to potential reporters, both knowing what is good, what is bad, and what is likely. Nosebagbear (talk) 23:13, 23 June 2021 (UTC)Reply

Hello @Nosebagbear:, thank you for your close review of this project. I’m posting to let you know that I’m working on a response for the points that you bring up, but in the interests of thoroughness it will take me a while to finish it - I appreciate your patience. —User:CLo (WMF) (talk) 17:17, 29 June 2021 (UTC)Reply

Thanks for the points you’ve raised. I’ve taken a long time to reply because I wanted to honestly assess my ability to take action on them. As you know, as a researcher I am in a position where I make recommendations, but the ultimate decision of whether to take them up (and if so, to what extent) rests with far more people than just myself. While I am personally and professionally interested in this subject there are a lot of priorities to balance, so I like to think of projects like these (as well as future ones) as individual contributions to a body of accumulated knowledge. This is all to say - yes, I have read your comments multiple times over, and I think you make some valid points (and I’m glad to see we do agree on some of them). —User:CLo (WMF) (talk) 23:37, 1 July 2021 (UTC)Reply

Hi Clo - I do look forward to seeing your full reply to each point, with pure regard just to your specific reply here:

" I am in a position where I make recommendations, but the ultimate decision of whether to take them up (and if so, to what extent) rests with far more people than just myself" - this is true. However, critically true is that you, presumably, feel that your recommendations will be given some weight - otherwise why else make them. As such, as with any of us who make recommendations, we assume a responsibility to have done all possible work to ensure that these recommendations are accurate and have a strong basis. While any advice always remains just that, that does not eliminate, indeed, does not reduce the responsibility on the advice-giver to be able to prove they "did their homework" and didn't give it too early.

In terms of "individual contributions to a body of accumulated knowledge", that would be fine if you'd picked a small area and covered it in depth, but here you picked a big area and covered in in only very preliminary detail. At that point, you have some research that would be excellent for indicating which would be the logical next steps of research, but not actually giving recommendations on that basis when you know it's not complete - and that not all (indeed, likely not many) of the absent areas will be gathered in suitable depth prior to the UCOC drafting committee creating any drafts.

It continues to escape me why a skilled researcher like yourself felt confident giving such bold recommendations, as well as asserting moral obligations, when there was so much yet to be considered - even in the fields you looked at. Nosebagbear (talk) 23:45, 1 July 2021 (UTC)Reply

General response to themes raised in feedback

Latest comment: 3 years ago3 comments2 people in discussion

Hello everyone,

I want to thank the people who have commented on this talk page as well as other places for their points raised regarding this research. I wanted to address a few common themes that I see recurring in many of these replies in a single place rather than fragmenting them across different replies and platforms, so here goes.

Firstly, when we were planning this pilot research project, we made a conscious decision to prioritize the viewpoint of people making reports. This is not because we don’t want to hear from functionaries - far from it. It is mostly a reflection of the fact that most of the feedback that we have received so far has taken place on Meta-Wiki or on-wiki consultation pages, and this means that we tend to hear from long-time users that also have a deep interest and expertise in our current reporting and enforcement systems, who are often current or former functionaries themselves. On top of that, prior research that I have conducted on this topic has looked very specifically at challenges facing functionaries and other governance work (e.g. patrollers).

I wanted to purposefully plan research questions for this particular project to look at the topic of enforcement and reporting from a non-functionary-centric point of view. I believe it is valuable to take a step back and try to re-evaluate our existing processes from the point of view of someone who doesn’t understand them, who nevertheless has to use them for lack of alternatives. This absolutely does not preclude future research from focusing on challenges specific to functionaries, especially given what we have now heard about persistent capacity issues, the difficulties of working with existing tools, and lack of support for dealing specifically with complex harassment cases, to name just a few obstacles that this round of research has identified.

To elaborate on how this research will be used to guide any UCoC decisions or even future product decisions, this is not and has never been intended to be the only research guiding any of those decisions. I want to reiterate that the primary value of this kind of research is to start asking questions from different points of view than what we’ve already heard, to highlight future avenues for further study, and to continue collecting information on the topic to add to our body of knowledge. This study is very much a continuation of prior work on the topic as mentioned in the further links at the end of the main article, and is not a ‘stopping point’ for this kind of work. For myself I am keen to do more research on the subject, hopefully covering a wide range of guiding questions.

I am grateful for the kind offers from participants in this round of research and others on this consultation page, and schedule permitting, I personally would love to have those conversations.

There are also general concerns over how these recommendations might increase the burden on administrators who are already under-capacity and struggling with existing workloads. Our goal is not to heap more work onto administrators while ignoring these very valid concerns. My hope is that doing the work to make some important systematic changes will lead to an overall improvement in the working environment for administrators. If we can figure out why some tasks (particularly those relating to dispute resolution) are disproportionately technically complex, or are procedurally complex (especially when that complexity is lopsided), that would shed some light on how to improve these systems for everyone who has to interact with them.

All this is to say that the feedback we have received in the past, the discussions we’ve both hosted and observed, and in this particular round of research point to widespread dissatisfaction with the status quo of enforcement. I understand the worry that accepting these recommendations will mean creating even more work for already over-burdened administrators. However I hope that this does not mean that we reject attempts to improve structural or underlying issues that could ultimately lead to a better working environment for everyone involved in enforcement today.

With regards to translation, since the survey was offered to participants in Spanish we prioritized translating the Executive Summary into Spanish and notifying the Spanish language communities of the report. Since the report is now posted on Meta and marked for translation, we invite volunteers to translate the report and share it with your community. Foundation staff have already translated into German. On a procedural note this was my first time using extensive translation markup in Meta-Wiki. I personally hope to improve my familiarity with it, both to ease future volunteer translations but also so that we can more easily upload professionally-translated content in a way that fits in with how translations are usually handled here.

I hope this response goes some way as to answering some of the common concerns raised so far in this discussion. Thank you all again for your time and effort giving detailed feedback on this project. —User:CLo (WMF) (talk) 20:33, 29 June 2021 (UTC)Reply

Hi @CLo (WMF): - I don't know whether your reply to mine indicates you're creating a distinct set of replies to my commentary - if so, then these should be taken as specific points raised as general concerns rather than just specifically how they (don't) relate to the aspects I raise.

As it is, this doesn't really answer any of my concerns. I didn't say it wasn't a good group to start with, I questioned that you should publish any formal recommendations to the UCOC drafting committee at this stage. I'd also note that "functionaries" is a tiny group: Stewards, oversighters, checkusers, bureaucrats and, functionally, arbitrators. The vast majority of the Community lies between your two groups - the vast majority of all conduct issues the Communities face are not handled by functionaries. The large majority don't have their outcome decided even by admins.

You state you don't want to heap more work onto admins, but you don't suggest how that might be possible. I'm also interested how you handle the distinct issue of heaping more power onto admins - we selected our admins to execute Community decisions, not replace them, and all the areas where that has changed are extremely well defined and extremely fraught. en:WP:Discretionary sanctions, for example, have the most rules, case law, and Community unhappiness of any area - and your proposals would go well beyond them.

You share an interest in further research, but I would note out that you have two branches - further questions and "asking the same questions but of an actual representative (not just reporters or functionaries) cohort. That will require reaching out to a range of local communities at their village pump equivalents, not just meta. Nosebagbear (talk) 20:55, 29 June 2021 (UTC)Reply

Hi @CLo (WMF): - I was a bit disappointed not to receive either a response to my specific reply here, or the more specific points I noted in June, of which around half are, to some degree, covered in your general response at the start of July, and half are not. Given that there is UCOC discussion with some reference to this research, the criticality for these aspects to be resolved, or, if not resolved, noted down in the direct research (rather than just by an editor) has now reached criticality, and needs to be done before half of the UCOC consultation time has resolved. Nosebagbear (talk) 22:59, 13 September 2021 (UTC)Reply

Add topic