Artificial intelligence/Bellagio 2024

On February 19–23, 2024, a group of 21 Wikimedians, academics, and practitioners met at the Rockefeller Foundation's Bellagio Center to draft an initial research agenda on the implications of artificial intelligence (AI) for the knowledge commons. We aimed to focus attention (and therefore resources) on the vital questions volunteer contributors have raised, including the promise, as well as risks and negative impacts, of AI systems on the open Internet.

We are optimistic that the use of machine learning and other AI approaches can improve the efficiency of operations on these platforms and the work of volunteers, and can support efforts to reach a new generation of readers and contributors. At the same time, we are concerned about the potential negative impact of the use of AI on the motivations of contributors as well as the misuse of these technologies to discourage volunteers and disrupt their work in the peer-produced knowledge commons ecosystem.

Below, we published the initial thinking on potential research directions that may eventually become a shared research agenda. Our hope is that many researchers across industry, government, and nonprofit organizations will adopt the final research agenda to help support and guide their own research. By focusing research efforts on topics that benefit the knowledge commons and help volunteers, our goal is to help inform and guide product development, public policy, and public opinion about the future direction of AI-related technology.

A note on AI ethics

The development, evaluation, deployment, and measurement of AI tools raise many ethical concerns—both in general and in the context of the knowledge commons. The project of articulating these risks and developing principles and guidelines to shape research in this area reflects both a significant effort and a critically important aspect of every part of the research agenda outlined here. Efforts to develop these principles and guidelines should be made in parallel with the research outlined here. Researchers engaged in any aspect of the work described here have a responsibility to consider the harms and impacts of their research. As ethical principles and guidelines are developed, they should be used to critically assess and shape all the work outlined below. As the work below is conducted, we hope that the results will also shape our knowledge of ethical research.

Research areas

This section is currently a draft.

This is a summary of potential research areas that the research agenda may eventually pursue. It represents some initial brainstorming and work that we are sharing here to gather early feedback and direction from Wikimedians, other knowledge commons communities, and researchers, with the aim of publishing a more stable agenda in March or April 2024.

The four potential research areas are:

Characterize and monitor the use of AI in the knowledge commons over time
Develop AI tools for the Knowledge Commons
Evaluate the effect and impact of deploying AI tools
Empower knowledge commons communities in the AI era

Characterize and monitor the use of AI in the knowledge commons over time

Knowledge commons platforms are one of the greatest success stories of the Internet. As the latest wave of automation is sweeping through people's digital work and life, there are concerns about the amount of disruption this may cause for knowledge equity around the world, for the communities of volunteers engaged in these initiatives, and the integrity of the knowledge they help create.

Robust current research on the extent of these changes is lacking. This lack of data makes it difficult for these communities (and their broader ecosystem of partners, supporters, and collaborators) to address current and potential harms or make the most of the new capabilities of foundational and frontier models. Used wisely, these hold the promise to address ongoing knowledge commons challenges such as community growth, contributors' experience and content quality.

Proposed research

Current and future uses of AI. AI did not start with the launch of ChatGPT in November 2022. Many AI tools are already deployed in knowledge commons communities, and popular knowledge commons platforms like Wikipedia have employed the use of machine learning tools for more than a decade. However, our understanding of how actively these tools are used and how they can be improved is limited, especially when it comes to newer generative capabilities. We also lack understanding around whether contributors find such capabilities helpful, and around what measures are needed to empower all contributors to use them. We need to explore how AI could potentially lead to new ways for people to contribute, including those who are, for a variety of reasons, not currently part of these communities. Our "State of AI in the knowledge commons" research agenda could include:
- A review of currently deployed systems, including (where available) quantitative and qualitative evidence of use and impact.
- A survey of contributors' experience and opinions of AI assistants, as well as broader issues such as their perceptions of how knowledge commons are used towards the development of AI models and applications.
- A hub for AI assistants in use, how they work, and what they are for, including datasets, related resources, and ways for the community to provide feedback and contribute to their further development.
Contributors' motivations. To attract new contributors and help make knowledge commons communities sustainable in the face of ongoing challenges, it is essential to deepen our understanding of the reasons people do or do not contribute. To have real impact, this research will need to be mindful of the diversity of existing and prospective contributors across the world, including countries and demographics that are currently underrepresented. Our assumption, supported by some evidence from platforms such as GitHub and StackOverflow, is that the mainstream availability of tools such as ChatGPT could fundamentally change both levels of participation and contribution practices with mixed effects. This means we will first need to revisit and refresh existing frameworks that have been used to study community motivations to consider the impact of AI assistants, informed by ongoing research in responsible AI, as well as an up-to-date account of contribution profiles. This would inform discussions in the form of workshops and other established community engagement means.

Community values and preferences around AI. AI has been, especially since the launch of ChatGPT, the subject of public debates and controversies in terms of capabilities, harms, and benefits. Experts have argued the importance of ensuring that AI models and tools are fair and equitable, and represent the values of those affected or using them to forge trust and adoption. This includes being mindful of underrepresented voices. While our theoretical and practical understanding of AI ethics is evolving, there are already examples of knowledge commons communities that have reacted to contact with AI in positive and less positive ways. (e.g., Classifying edits on Wikipedia, Reddit mods rebellion, DeviantArt AI policies, ArtStation "No AI Art" protest.) This research theme would study, learn from, and then build on these examples to design a survey to collect (recent) preferences from a range of knowledge commons communities, summarize common values, and identify areas where opinions go in opposite directions (over time). This could help communities manage their own expectations, make better choices on how to engage with the technology, inform policies on terms of use, and decide if and how they would use their position to influence positive change. In a field as rapidly changing as AI, these activities would be carried out regularly, and the results analyzed over time would allow us to understand changes and future trends, for instance in the form of an observatory. This could be designed in an open, collaborative way to allow for new knowledge commons communities and the wider ecosystem to suggest new areas of inquiry or contribute new data points and research methods.

Develop AI tools for the Knowledge Commons

There are many potential places where AI can be used to improve knowledge commons processes or outputs. Research can aid in the development of new techniques and tools to do so. Tools can broadly be classified in two groups:

Tools focused on content contribution that make contributing easier or more effective. This is important because maintaining the commons is simply too much work for too few people. AI can help boost productivity and content quality, and identify threats to the integrity of content.
Tools focused on content consumption that can improve user experience by making content discoverable etc.

The proposed research areas below are focused on the Wikimedia ecosystem, but can hopefully serve as an inspiration for other knowledge commons projects too.

Proposed research

We envision a set of initiatives that can advance the development of AI tools for content contribution and consumption.

Identify activities that could benefit from AI. An important first step for this research is to identify and categorize what consumer and contributor tasks could benefit from AI assistance, together with downstream users of knowledge commons platform. This effort should define which tools can be developed in support of the identified tasks, and what principles researchers should follow as they work on the development of those tools. For example, multilinguality is an important consideration for natural language processing tools designed to serve the Wikimedia communities.
- Example of a research project: the "Wiki Guide." A powerful example of a novel tool supporting Wikipedia could be a "Wiki Guide," a general-purpose LLM-based assistant and point of entry for Wikipedia, both for consumption and contribution. Examples for readership include search, question and answering, translation, text simplification, while contributors could benefit from newcomer onboarding, task suggestions like adding links, text improvement, image captioning, and getting early warnings about rule-breaking and downstream conflicts. The Wiki Guide could be used to orchestrate the existing AI-tools and guide users in leveraging them. Developing a Wiki Guide would require major community engagement, with questions around the feasibility of AI-guided tasks, the training-testing data to curate, how to evaluate the models, and how to align the model with community preferences. For AI researchers, the Wikiverse is a "sweet spot" to focus on: a multi-faceted environment with an unbounded number of tasks, and at the same time a well-delineated domain. We envision the Wiki Guide to be multimodal, and go beyond plain text: images, knowledge graph, functions and code for Wikifunctions.
Engage AI community via benchmarks and datasets: Knowledge commons communities may have specific needs that AI can help with, but the technology is not ready to be used at scale. The AI communities of researchers and practitioners are eager to find interesting problems, and interested in contributing back to the knowledge commons. AI benchmarks and datasets that focus on knowledge commons problems can help create communities around difficult and interesting AI challenges while guiding the research community in choosing problems for which AI holds promise to actually make a difference. These benchmarks can act as a force-multiplier: together we can build more and better solutions based on agreed-upon evaluation metrics and test sets. With these benchmarks we can run bake-offs, e.g., on Kaggle or shared tasks at conferences. Examples include Wikipedia talk-page summarization across languages or image alt-text generation.
Track provenance: In a world in which more of the content we interact with may be generated or assisted by machines, we will need to rethink provenance. Much of the credibility that knowledge projects have now comes not just from the content itself but knowledge of the processes and inputs used to create it. Current AI tools often disassociate the content from its context and background in ways that threaten that credibility. Additionally, many reusers depend on knowing this information to make effective use of the content, and many contributors' motivation comes from the ability to attach credit to their work. The objective of this research direction is to encourage the development of tools to track provenance of content, so that when needed, readers and contributors can easily identify machine-generated content.

Evaluate the effect and impact of deploying AI tools

Empirical research is needed on the effect of AI tools on knowledge commons. This includes research on the impact of generative AI tools deployed by platforms and individual contributors, individually and in aggregate. It should also seek to measure the impact of deployments of generative AI that draw data from or shift audiences away from knowledge commons in ways that affect the commons themselves. It also includes research to characterize the risks and harms caused—both realized and potential—by bad faith actors using these tools on knowledge commons.

For example, if a contributor to a knowledge commons adopts an AI tool, does it increase their productivity? Does it change or shift their behavior or the kinds of tasks they take on? If so, how? Does it increase/decrease their motivation or satisfaction? If a community deploys an AI tool, does it increase group-level productivity? Does it change or shift aggregate behavior? How does it affect knowledge artifacts and reader experiences (e.g., article quality, etc.)? Will bad faith bad actors use AI tools to attack a knowledge commons project? Will communities defend against them?

This work will seek to provide empirical evidence to answer these questions. The goal is to inform future development and deployment of tools, and build evidence to help evaluate and justify existing deployments. This work will draw from research that seeks to characterize and measure AI tools over time (see subsection "Current and future uses of AI") and will seek to inform work to develop new tools and to empower AI knowledge commons (see "Develop AI tools for the knowledge commons").

Proposed research

We imagine that a series of steps will be necessary to carry out this work, including:

Identifying outcomes of interest. Important early steps along this research dimension involve identifying factors likely to be affected by AI. This can involve drawing from existing research on individual and community-level measures of behavior and health. The result should be a series of outcomes, features, indicators, and measures that are likely to be affected by AI. We see opportunities for theory-driven, literature-based, critical, qualitative and quantitative work. This work should include features that are likely to be affected positively (e.g., measures of editor productivity) or neutrally (e.g., shifting behavior from human editors in response to new tasks that can be automated) by AI, as well as features that are likely to hurt directly (e.g., increases in AI-assisted attacks) or indirectly (e.g., individual demotivation caused by the introduction of a tool).

Identifying AI-based interventions. Similarly, the research in this area must draw from a set of interventions that capture the introduction of AI tools that are hypothesized to affect the outcomes of interest. This might involve tools being deployed by platforms (perhaps collected as part of the work described in the Section "Characterizing and monitoring use of AI"). It also might involve tools that predate the current generation of LLM-based generative AIs but are similar along dimensions that will allow research to draw analogies and generalize (e.g., bots engaged in other forms of algorithmic content generation). Because many AI tools used in content generation may not be labeled or may be used by humans in tool assisted editing, one topic to be explored involves measuring the amount of AI-generated content that is used in Wikimedia projects.
Data collection. Once features are identified, data should be collected that will allow researchers to answer questions about the relationships between AI deployments and changes in key indicators or features. Much of this data is likely already being measured and collected, and we imagine that this work will leverage a large number of datasets as well as existing research on individual behavior in knowledge commons and group-level measures of community health.

The core of this work will involve a series of attempts to generate and test theories about whether and how AI tools cause changes in features of interest. We see this happening through a range of methodologies that include, but are not limited to:

Lab-based experiments. Experiments in a controlled environment will help understand how the introduction or involvement of AI tools affects individual attitudes and behaviors or the behavior of small groups. They are also likely to give us the ability to disentangle the effects of many variables that change over time.
Field experiments. Field experiments will evaluate the effect of tools through A/B testing among contributors in real knowledge commons. We hope that those who are deploying tools consider experimental designs and work with researchers to help structure their deployments experimentally. We also hope that they measure a range of outcome variables to inform their own decision-making and evaluation and to contribute to knowledge about what we believe are likely very complex and multidimensional effects of AI tools.
Observational research. As AI tools are deployed, they create a wide range of opportunities for observational studies. This might include studies that seek to characterize the changes over time in individual-level or group-level measures and to describe how these changes correspond to the introduction of AI tools. Although much of this work will be correlational or rely on statistical controls in nature, a range of more quasi-experimental or natural experiments are likely possible. Similarly, it may be able to take advantage of staged deployments to identify potential case controls (e.g., in the form of differences-in-difference designs).
Inductive research to identify new effects or explain results. Inductive, interpretive, and qualitative research can play an important role in identifying new potential outcomes to measure, identifying limitations in existing measures, and generating proposals for new relationships. We also hope that critical and interpretive work can help explain potential mechanisms between relationships measured in quantitative results and explain puzzles that emerge from this work.

Empower knowledge commons communities in the AI era

Recent advances in AI would not be possible without the communities that build knowledge commons. Knowledge commons are widely known as being some of the most important datasets on which AI systems are developed, trained, and beyond. However, knowledge commons communities have little to no influence over how these AI systems are built and used. This is a particular challenge when these systems begin to affect knowledge commons communities (e.g., by threatening to reduce participation in them) or when they violate core values in those communities (e.g., citation and provenance). As knowledge commons-based AI systems grow in prominence across society, there is growing demand for new mechanisms to ensure that knowledge commons communities are able to have an influence in the AI ecosystem that is commensurate with the value they create for the ecosystem. These mechanisms must be sufficiently powerful to make change and at the same time must also be compatible with the openness principles that are core to many knowledge commons communities.

Proposed research

Some of the research includes but is certainly not limited to:

Revisiting licensing: New AI systems use data in ways that were difficult to predict when the current family of open content licenses were developed, or when communities and people decided to attach these licenses to their content. This has led to a growing interest in new types of licenses (or new mechanisms to express and enforce community preferences) that, for instance, empower communities to express preferences about the use of content in AI training/RAG (e.g. explicit opt-in or consent). Should new types of licenses need to be developed to support community values and norms around the use of their content in AI systems? There are also questions as to whether such licenses would be enforceable–either legally or practically–and whether licensing is the right mechanism for influencing the behavior of downstream technologies. What other legal, normative, and policy strategies might complement or replace licensing in this regard?

Collective action across knowledge communities: Existing research suggests that collective action is essential for content producers to successfully influence the AI ecosystem. What types of community structures and tools are needed to facilitate collective action across knowledge commons communities with respect to the behavior of AI models and systems? Would a knowledge commons coalition be a sufficient institution? What would such a council look like? What types of shared infrastructure would be needed to facilitate finding consensus across the leaders and members of multiple commons communities? Can we adopt ideas from Works Councils that can help influence the developers of AI systems on questions on which knowledge commons communities are key stakeholders?

Supporting open AI systems: While knowledge commons communities often would not and cannot restrict the use of their knowledge to specific AI systems regardless of their behavior, they can pay special attention to AI systems whose behavior better matches their values, especially transparency and openness more generally. What can knowledge commons communities do to best support these types of AI systems? For instance, are there knowledge creation drives that can differentially benefit these systems? How can knowledge commons communities best contribute to larger public data repositories that are being developed? Should these public repositories have usage rules that reflect the values of the communities that contributed to them, and how would they be enforced?

Additional influence mechanisms: What other mechanisms for influence do knowledge communities have? For instance, are there normative standards that can be included in professional conduct guides, and how might those normative standards be enforced? For communities that want to restrict usage of the knowledge they developed by a certain set of actors, what are all the ways they can do that?

Knowledge commons communities as a representative of the broader truth infrastructure for the web: How can knowledge commons communities use their leverage to advocate for the needs of other parts of the truth infrastructure of the web (e.g. journalism), on which knowledge commons communities often rely. How can knowledge commons communities partner with institutions that create knowledge in different ways for shared goals, and what are those goals?

Participants in the 2024 Bellagio symposium

(Listed in alphabetical order)

Chris Albon 一 Wikimedia Foundation
Ricardo Baeza-Yates 一 Institute for Experiential AI, Northeastern University
Giovanni Colavizza 一 University of Bologna and Odoma LLC
Claudia Deane 一 Pew Research Center
Selena Deckelmann 一 Wikimedia Foundation
Jan Gerlach 一 Wikimedia Foundation
Brent Hecht 一 Northwestern University; Microsoft
Benjamin Mako Hill 一 University of Washington; Princeton University (2023–2024)
Fred von Lohmann 一 OpenAI
Lorenzo Losa 一 Wikimedia Foundation
Angela Oduor Lungati 一 Ushahidi
Guillaume Paumier 一 Wikimedia Foundation
Miriam Redi 一 Wikimedia Foundation
Tom Scott 一 PLOS
Elena Simperl 一 King's College London, Open Data Institute
Matt Thompson 一 Facilitator
Stefaan Verhulst 一 The GovLab and The Data Tank
Denny Vrandečić 一 Wikimedia Foundation
Kat Walsh 一 Creative Commons
Bob West 一 École Polytechnique Fédérale de Lausanne
Leila Zia 一 Wikimedia Foundation

Get involved

Questions and comments on the proposed research agenda are encouraged on the talk page.