Research:Opportunities for Supporting Community-Scale Communication
This page documents a planned research project.
Information may be incomplete and change before the project starts.
We envision a socio-technical framework for supporting the communication process involved in community-scale collaborations. To develop this framework, in this work we propose a mixed-methods exploration of communication challenges surrounding collaboration on Wikipedia and of potential technological solutions that can help collaborators with these challenges.
Timeline
editIn the first year of the project we will focus on understanding the opportunities for automated communication support. We will start with an analysis of existing communication data on community-wide deliberation spaces—such as those concerning Articles for Deletion (Mayfield and Black 2019)—to identify common communication challenges. Building on these observational insights, we will design structured interviews for active Wikipedia editors, aimed at understanding what they perceive to be the most challenging aspects of communication, how they currently approach these challenges, and what they perceive as opportunities for automated communication support.
Having identified key opportunities together with the community members, in the second year of the project, we will engage in a participatory design of the prototype communication support tools. We will iterate on the design of these prototype tools together with the community members and rely on user-studies to examine how the assistance provided by these tools might fit into the editor’s workflow and the extent to which they are effective in improving the collaboration process.
Methods
editData analysis. For the data analysis section we will focus on the large-scale deliberation spaces included in WikiConv (Hua et al 2018), spanning 5 languages (English, German, Greek, Russian, Mandarin). We will use the same methodology we used to develop this dataset to update it with recent conversations, with a focus on those involving community-wide discussions and deliberations. We will use existing methods for analyzing conversational dynamics that we have developed in previous work (such as politeness, coordination, etc.) and introduce new tools that are specifically geared towards capturing collaboration issues (hesitance, groupthink, deliberation deadlock).
Interviews and surveys. Similar to our prior work on exploring algorithmic support for proactive moderation (Schluger et al 2022), we will use structured interviews with active Wikipedia editors. Initial interview questions will be developed around three main themes:
1) Types of communication breakdowns challenges;
2) Strategies currently used by editors to overcome these challenges;
3) Attitudes toward automation and tool support.
Prototypes. The exact design of the prototype communication support tools will depend on the specific support opportunities revealed by our data analysis and from the interviews. We will rely on our expertise in co-designing user-facing communication support tools together with community stakeholders (Schluger et al 2022; Chang et al. 2022).
Policy, Ethics and Human Subjects Research
editCommunity impact plan. We will engage with the community transparently from the outset, prioritizing reciprocity and mutual benefit. This includes early outreach on relevant Wikipedia forums, sharing preliminary findings, and involving editors in shaping research questions. We will adhere to established community norms, obtain feedback through iterative engagement, and ensure that any tools or findings are communicated clearly and accessibly. We will open up our code for the prototypes and engage the community in their development.
We will share our evaluation of the potential long-term impacts of the communication support tools with the community. The decision of whether to broadly deploy communication support tools that result from this work will be left to the community.
Human Subjects Research. We will seek Cornell IRB approval for the part of the studies involving human subjects.
Data Management Plan. All data collected during the study will be stored securely and confidentially on Cornell servers. We do not collect any personally identifying information, with the sole exception of if a participant voluntarily chooses to disclose their email address (for participants who do not wish to disclose this information, we will fully support the Wikipedia email system or User talk page posts as methods of communication).
Open weight AI models. Throughout we will use open-weights AI models (apart from using closed models for comparisons) in order to facilitate future adoption in the WikiMedia ecosystem.
Results
editEvaluation. The evaluation of this project will occur in two main phases: the diagnostic phase, focused on validating our understanding of communication challenges, and the prototype phase, focused on assessing the usefulness, usability, and impact of the proposed communication support tools.
Evaluation during the diagnostic phase ensures that the insights we generate are accurate, meaningful to community members, and useful for informing tool design. We will cross-reference findings from discussion data analysis with insights from structured interviews to verify consistency. We will monitor for thematic saturation in our structured interviews with Wikipedia editors—that is, the point at which additional interviews stop yielding substantially new insights. To ensure that our interpretations reflect the lived experiences of editors, we will present preliminary themes and taxonomies of communication challenges back to community members.
In the prototype phase we will test the usability of the support tools using think-aloud protocols and task-based evaluations with active Wikipedia editors, we will assess how intuitive and non-disruptive the tools are within real workflows. We will conduct field deployments of early prototypes (e.g., as browser extensions or third-party tools) and study how they integrate into day-to-day editing practices. We will hold iterative design and review sessions with Wikipedia editors and moderators to gather structured feedback, prioritize design revisions, and co-refine the tools. After tool usage, we will survey and interview participants to understand perceived changes in communication clarity, efficiency, inclusiveness, and civility. To understand the broader potential for adoption and long-term impact of the communication support tools we will conduct quantitative analysis of discussion thread dynamics (e.g., thread length, conflict markers, resolution indicators) before and after tool use. We will also solicit feedback from community leaders and Wikimedia affiliates to assess the relevance and sustainability of the interventions.
Results. Results will be added here as the project progresses.
Resources
editConvoKit: An open-source toolkit for analyzing conversational data.
References
editChang, Jonathan P., Charlotte Schluger, and Cristian Danescu-Niculescu-Mizil. “Thread With Caution: Proactively Helping Users Assess and Deescalate Tension in Their Online Discussions.” In Proceedings of CSCW, 545:1-545:37, 2022. https://doi.org/10.1145/3555603.
Hua, Yiqing, Cristian Danescu-Niculescu-Mizil, Dario Taraborelli, Nithum Thain, Jeffery Sorensen, and Lucas Dixon. “WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community.” In Proceedings of EMNLP. EMNLP, 2018. https://www.aclweb.org/anthology/D18-1305.
Mayfield, Elijah, and Alan W Black. “Analyzing Wikipedia Deletion Debates with a Group Decision-Making Forecast Model” 3 (2019): 26.
Schluger, Charlotte, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil, and Karen Levy. “Proactive Moderation of Online Discussions: Existing Practices and the Potential for Algorithmic Support.” Proceedings of the ACM on Human-Computer Interaction 6, no. CSCW2 (November 11, 2022): 370:1-370:27. https://doi.org/10.1145/3555095.