Research:Prioritization of Wikipedia Articles/Importance

Tracked in Phabricator:
task T257869
18:51, 20 October 2020 (UTC)
Duration:  2020-May – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

The goal of this research is to better understand what factors affect the importance of a given Wikipedia article such that tools can be built to better support the ranking of Wikipedia articles by importance.


There have been a number of studies regarding article importance over the years. A brief listing of the major works follows but a better source of past work and synthesis of the approaches taken by researchers can be found in this literature review. Additional research:

  • Automated classification of article importance
  • Measuring article importance
  • Wikipedia Diversity Observatory provides some robust approaches to identifying what content is relevant to a given language edition's "culture" (as a measure of language-specific importance of articles) and is expanding its range to also look at aspects related to equity such as gender, sexuality, and race.
  • Lewoniewski et al. examined article importance across English, French, Russian, and Polish[1] (and associated write-up).
  • Wulczyn et al. studied how to recommend articles to be translated (of which importance relates to ranking).[2]
  • Gorbatai looked at the misalignment between pageviews (as a measure of importance) and quality of Wikipedia articles.[3] Warncke-Wang et al. extended and formalized this misalignment framework and analyzed English, French, Russian, and Portuguese Wikipedias.[4]
  • Stvilia et al. mostly focused on article quality scales across wikis but did a small analysis of how quality related to articles of local importance.[5]

On wiki, there are also a number of ways in which article importance appears:


In particular, the largest source of data on article importance comes from ratings by WikiProjects -- e.g., how important is the article for Jimmy Carter to WikiProject Human rights? As a result, many projects on article importance have focused on this dataset. In order to make it easier to gather data on WikiProject assessments of importance and better understand how much an article's importance depends on the context of which project is labeling it, a small analysis was conducted of English Wikiprojects.

Importance According to WikipediansEdit

In seeking to build tools that support the ranking of articles by importance, it is essential to understand the values of the stakeholder group at whom these tools will primarily be targeted. We therefore began our inquiry with the following question: how do Wikipedians determine which articles are more important than others? More specifically, what are the criteria they use in making these determinations? We wanted to understand this at a broad and domain-neutral level, so we decided to use Vital articles as our primary dataset, rather than focusing on WikiProjects as much of the previous research in this space has.

To determine what criteria Wikipedians use in evaluating the importance of an article, we examined the talk page discussions associated with the Vital articles lists. It is standard for a user to post a proposal on one of the Vital articles talk pages (there is one for each of the 5 levels) and seek consensus before making a major change, such as removing or replacing an existing Vital article. Other users then provide justifications for and against these proposals based on their own competing conceptions of article importance. These talk pages therefore provide us with rich discussion content that is particularly well-suited to answering the question of how Wikipedians view article importance.

We adopted a Grounded Theory-based approach in this analysis. We first stratified and sorted our Vital articles discussion data so that we would cycle through all 5 levels equally as we went down the list of discussion content. For each sentence in each user comment, we first asked whether the user contributes to discussion about vital articles beyond just indicating support for or opposition to a previously stated argument or proposal. This was intended to remove from consideration discussion content that could not provide actual insight into users’ reasoning. If the sentence contained potentially useful content, two researchers summarized each distinct statement made by the user in the sentence and created a code for it. We ended this phase when we had approximately 300 open codes, each corresponding to a distinct paraphrased user statement, often a justification for or against a proposal.

Then, through iterative thematic clustering, we separated the paraphrased statements into categories based on the justification criteria expressed or implied in them. For example, the sentence “If sport receives enough support then I think we should add an almost equivalent female dominated activity to balance things out (maybe dance)” was assigned the code “If Sport added, counterbalance with female dominated activity,” and was situated alongside several others in a cluster titled “Equity” by the end of the process. In total, 10 positive criteria (used to argue that something is more important) and 3 negative criteria (used to argue that something is less important) emerged from our data.


  1. Lewoniewski, Włodzimierz; Węcel, Krzysztof; Abramowicz, Witold (2016). "Quality and Importance of Wikipedia Articles in Different Languages" (PDF). Information and Software Technologies (Springer International Publishing): 613–624. doi:10.1007/978-3-319-46254-7_50. 
  2. Wulczyn, Ellery; West, Robert; Zia, Leila; Leskovec, Jure (11 April 2016). "Growing Wikipedia Across Languages via Recommendation". arXiv:1604.03235 [cs]. 
  3. Gorbatai, Andreea D. (3 October 2011). "Exploring underproduction in Wikipedia". Proceedings of the 7th International Symposium on Wikis and Open Collaboration (Association for Computing Machinery): 205–206. doi:10.1145/2038558.2038595. 
  4. Warncke-Wang, Morten; Ranjan, Vivek; Terveen, Loren; Hecht, Brent (2015). "Misalignment Between Supply and Demand of Quality Content in Peer Production Communities" (PDF). Proceedings of the 9th International Conference on Web and Social Media, ICWSM 2015. Retrieved 18 August 2020. 
  5. Stvilia, Besiki; Al-Faraj, Abdullah; Yi, Yong Jeong (1 December 2009). "Issues of cross-contextual information quality evaluation—The case of Arabic, English, and Korean Wikipedias" (PDF). Library & Information Science Research 31 (4): 232–239. ISSN 0740-8188. doi:10.1016/j.lisr.2009.07.005.