Research:Prioritization of Wikipedia Articles/Importance/Vital Articles

Tracked in Phabricator:
Task T257869
Created
18:51, 20 October 2020 (UTC)
Duration:  2020-May – 2022-March
This page documents a completed research project.


In seeking to build tools that support the ranking of articles by importance, it is essential to understand the values of the stakeholder group at whom these tools will primarily be targeted. We therefore began our inquiry with the following question: how do Wikipedians determine which articles are more important than others? More specifically, what are the criteria they use in making these determinations? We wanted to understand this at a broad and domain-neutral level, so we decided to use Vital articles as our primary dataset, rather than focusing on WikiProjects as much of the previous research in this space has.

Methods

edit

To determine what criteria Wikipedians use in evaluating the importance of an article, we examined the talk page discussions associated with the Vital articles lists. It is standard for a user to post a proposal on one of the Vital articles talk pages (there is one for each of the 5 levels) and seek consensus before making a major change, such as removing or replacing an existing Vital article. Other users then provide justifications for and against these proposals based on their own competing conceptions of article importance. These talk pages therefore provide us with rich discussion content that is particularly well-suited to answering the question of how Wikipedians view article importance.

We adopted a Grounded Theory-based approach in this analysis. We first stratified and sorted our Vital articles discussion data so that we would cycle through all 5 levels equally as we went down the list of discussion content. For each sentence in each user comment, we first asked whether the user contributes to discussion about vital articles beyond just indicating support for or opposition to a previously stated argument or proposal. This was intended to remove from consideration discussion content that could not provide actual insight into users’ reasoning. If the sentence contained potentially useful content, two researchers summarized each distinct statement made by the user in the sentence and created a code for it. We ended this phase when we had approximately 300 open codes, each corresponding to a distinct paraphrased user statement, often a justification for or against a proposal.

Then, through iterative thematic clustering, we separated the paraphrased statements into categories based on the justification criteria expressed or implied in them. For example, the sentence “If sport receives enough support then I think we should add an almost equivalent female dominated activity to balance things out (maybe dance)” was assigned the code “If Sport added, counterbalance with female dominated activity,” and was situated alongside several others in a cluster titled “Equity” by the end of the process. In total, 8 criteria emerged from our data. Interestingly, only 5 of these criteria relate to an article's importance based on its inherent characteristics. The 3 other criteria -- termed global criteria -- relate to an article’s ability to promote or impede the encyclopedia's values with regards to the global composition of high-quality content.

Results

edit
Criteria used by Vital articles contributors to justify an article's priority.
Importance Criterion Example Quote
Everyday Significance "An activity [sleep] that takes up 1/3 of your lifetime seems to be pretty vital to me."
Cultural Significance "Sports have in some form been a part of the vast majority of cultures for much of there history."
Historical Significance "The concept [bourgeoisie] has had a massive role in human history."
Enduring Significance "The repercussions [of the 2019-20 coronavirus pandemic] will be felt for many decades, at the very least."
Breadth "Folklore is the broader and more fundamental article [compared to Myth]."
Global Criterion Example Quote
Balance "If sport receives enough support then I think we should add an almost equivalent female dominated activity to balance things out (maybe dance)."
Non-redundancy "Everything on Earth is covered by Earth, and everything beyond Earth is of interest pretty much only for astronomy, which is covered by Science."
Completeness "The only type of activism we lack is women’s rights - of which i would support Emmeline Pankhurst."