Research:Wikipedia type Articles Generated by LLM (Not for Publication on Wikipedia)

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

We are students working on a research project in conjunction with the Stanford Open Virtual Assistant Lab (OVAL)

Our research team is building a LLM-based system which can generate a full-length (capped at 2.5k tokens around 1.25k words) Wikipedia page for a given topic without the need for supplemental information (e.g., human written outlines, curated references, etc.). Besides automatic evaluation, we would like to have frequent wikipedia editors collaborate with scoring the articles and providing feedback.

Our goal is only for educational research, and we are not intending to try to publish these LLM generated articles on Wikipedia. Our LLM will ideally generate Wikipedia style articles with citations, and different sub-points. We will be scoring the essay based on 1. Well Written, 2. Verifiable with no original research, 3. Broad in its coverage, and 4. Qualitative comments (The first three metrics for a Good Article + Qualitative comments). We would take a subset of our articles produced and score them by actual Wikipedia editors as a way to verify our scoring is within reason.

The Stanford Open Virtual Assistant Lab is a supporter of Wikipedia and its editors. We want to do all we can to guarantee our work is not detrimental to Wikipedia.



Please fill out this form if interested which includes a consent form in compliance with IRB standards. Link[1]

We are hoping to gather a group of 10-15 frequent Wikipedia editors to score a sample size of around 30 articles. We know that this can be time-consuming especially concerning verifiability and hallucination of an LLM. To help ease this we will be using a UI system that will show the given articles while also having the general sections of citations immediately available to streamline the process.



We are hoping to have human evaluators by the end of November or beginning of December.

Policy, Ethics and Human Subjects Research


IRB approval August 23, 2021. In this work, we study the automatic Wikipedia generation problem as a way to push the frontier of automatic expository writing and automatic knowledge curation. All the studies and the evaluation in this work are designed to prevent the dissemination of misinformation by not publishing generated content online and implementing strict accuracy checks. We avoid any disruption to Wikipedia or related communities, as our system does not interact with live pages. Also, although we try to generate grounded articles, we believe there is no privacy issue related to this work as we only use information publicly available on the Internet.

The primary risk of our work is that the Wikipedia articles written by our system are grounded on information on the Internet which may contain some biased or discriminative contents. Currently, our system relies on the search engine to retrieve high-quality information but does not include any post-processing module. We believe improving the retrieval module to have good coverage of different viewpoints and adding a content sifting module to the current system will be a critical next step to achieve better neutrality and balance in the generated articles. In our experiment, we manually go through all the topics in the test set to ensure the topics themselves are not biased or discriminative.

Another limitation we see from an ethical point of view is that we only consider writing English Wikipedia articles in this work. Extending the current system to a multilingual setup is a meaningful direction for future work as there are more interesting topics that do not have their Wikipedia pages in non-English languages.



Results pending