Future Audiences/Experiments: conversational/generative AI

Future Audiences Objective 2 Key Result 2: Test a hypothesis around conversational AI knowledge seeking, to explore how people can discover and engage with content from Wikimedia projects.

Rationale

Large language models (LLMs) and tools/applications built on them – e.g., ChatGPT, Google Bard, Microsoft CoPilot – present both opportunities and risks to our movement. AI/ML technologies have been a part of our movement for over a decade and have assisted human contributors to our projects with, e.g., content translation, vandalism patrol, and structured newcomer tasks. With the latest generation of AI/ML tools, there may be more opportunities to make searching, consuming, and/or contributing to Wikimedia projects easier, more intuitive, and more accessible for more people.

On the other hand, AI assistants may pose a serious risk to the sustainability of our movement if they become primary entry-points for knowledge-seeking if their output does not provide attribution for or pathways to contributing to Wikimedia projects or communities. The may also generate large quantities of low-quality content that could overburden our moderation systems.

There is a high degree of legal, technical, and social complexity to the use of AI within the Wikimedia context, and for this year we are committed to gathering more data and insights to inform how to think about future strategic investments in this space.

Experiments

Report of the ChatGPT experiment

Wikipedia ChatGPT plugin

Status: Concluded

Hypothesis: If we create a Wikipedia ChatGPT plugin that summarizes content from Wikipedia and attributes/links to our projects, we can better understand how users want to interact with knowledge via AI assistants, and how/whether our content might improve their experience.

[Status: as of 2 February 2023, experiment concluded. Explanation and details]

Success criteria and results: details

Screenshot of MVP Wikipedia ChatGPT plugin response

Priority	Assumption	Data/metrics	Success condition
MVP
P0	People using AI assistants will want to receive knowledge from Wikipedia for some of their queries	# of plugin queries/ day # of query sessions	~1000s of queries/day Queries, users, and or queries per user per day increase over time (indicating sustained usage/usefulness)
P0	Knowledge arrives with fidelity to end-users	Relevance (does ChatGPT find and summarize back relevant Wikipedia content?) Accuracy (does ChatGPT correctly summarize knowledge from Wikipedia?) Attribution (does ChatGPT follow our instructions for attributing and linking to Wikipedia?	Qualitative assessment of user queries & results
P0	People want to get knowledge in non-English languages from Wikipedia via an AI assistant	All above broken out by languages tested

Preliminary results

Priority	Assumption	Results	Conclusion
MVP
P0	People using AI assistants will want to receive knowledge from Wikipedia for some of their queries	~500-1000 queries per day in first month since launch Queries and queries per user per day trending up week over week	Plugin has modest adoption but seems to be providing value to users who have enabled it
P0	Knowledge arrives with fidelity to end-users	Relevance: 84% Accuracy: 85-89% (based on different quality coders' results) Attribution: 68%	Overall, relevance and accuracy is fairly high (on par with Wikipedia according to some reliability studies), but attribution is inconsistent and needs to be looked at further.
P0	People want to get knowledge in non-English languages from Wikipedia via an AI assistant	English generally higher on accuracy and attribution than other languages analyzed (German, English, French, Japanese, Russian) More significant issues in accuracy and attribution noted in some non-English languages analyzed (e.g., no attribution in Russian and much lower accuracy rates – 30-70% – in Russian and German).	Understanding why quality differs in different languages is an important next step.

"Citation Needed" browser extension

Status: In progress

Hypothesis: If we leverage Wikipedia’s reputation for independence and reliability, making it available across the web, people will use it to verify claims on the internet.

Minimum Viable Product

As a user of Google Chrome, I can install a browser extension that allows me to:

Select passages of text I come across on the web
Receive back information about whether the claim(s) in this text match any relevant content on Wikipedia

Key research questions

Do people on the internet want Wikipedia content when not on our website?
Do people trust Wikipedia content and brand as a reliable source of information?
Can we reach new audiences? Or create new opportunities for current audiences to use Wikipedia?

"Add A Fact"

Hypothesis: If we make it easy for off-platform readers to add claims/facts from third-party websites, their contributions can help sustain and grow content in a potential future where most people consume Wikimedia content off-platform.

Key research questions

Do people on the internet want to contribute good-faith information to Wikipedia?
Who are the people who would be interested in doing this? i.e.:
- The general public
- People who are Wikipedian-like in some way – e.g., Reddit moderators, subgroups on the Internet (i.e., fandoms, communities, fact-checkers, etc.)?
- Existing Wikipedians
How might we deliver these contributions into existing or new pipelines for human review/oversight/addition to Wikipedia?

Other ideas

If you have more ideas, please leave them on the talk page!

Future Audiences/Experiments: conversational/generative AI

Contents

Rationale

Experiments

Wikipedia ChatGPT plugin

Preliminary results

"Citation Needed" browser extension

Minimum Viable Product

Key research questions

"Add A Fact"

Key research questions

Other ideas

References