Research:Prioritization of Wikipedia Articles/Importance

Tracked in Phabricator:
Task T155541

Created

18:51, 20 October 2020 (UTC)

Contact

Mo Houtti

University of Minnesota

Collaborators

Loren Terveen

University of Minnesota

Isaac Johnson

Wikimedia Foundation

Duration: 2020-May – ??

Research:Projects

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

The goal of this research is to better understand what factors affect the importance of a given Wikipedia article such that approaches can be developed to better align Wikipedia edit recommender systems with community values such as content equity.

Background

There have been a number of studies regarding article importance over the years. A brief listing of the major works follows but a better source of past work and synthesis of the approaches taken by researchers can be found in this literature review. Additional research:

Automated classification of article importance
Measuring article importance
Wikipedia Diversity Observatory provides some robust approaches to identifying what content is relevant to a given language edition's "culture" (as a measure of language-specific importance of articles) and is expanding its range to also look at aspects related to equity such as gender, sexuality, and race.
Lewoniewski et al. examined article importance across English, French, Russian, and Polish^[1] (and associated write-up).
Wulczyn et al. studied how to recommend articles to be translated (of which importance relates to ranking).^[2]
Gorbatai looked at the misalignment between pageviews (as a measure of importance) and quality of Wikipedia articles.^[3] Warncke-Wang et al. extended and formalized this misalignment framework and analyzed English, French, Russian, and Portuguese Wikipedias.^[4]
Stvilia et al. mostly focused on article quality scales across wikis but did a small analysis of how quality related to articles of local importance.^[5]

On wiki, there are also a number of ways in which article importance appears:

Vital articles lists -- e.g., enwiki, meta
WikiProject importance assessments -- e.g., enwiki
Offline wiki inclusion criteria

WikiProjects

Main article: Research:Prioritization_of_Wikipedia_Articles/Importance/WikiProjects

The largest source of data on article importance comes from ratings by WikiProjects -- e.g., how important is the article for Jimmy Carter to WikiProject Human rights? As a result, many projects on article importance have focused on this dataset. A small analysis of English Wikipedia emphasized that article importance is highly contextual -- i.e. WikiProjects rarely agree on the importance of an article.

Importance According to Wikipedians

Main article: Research:Prioritization_of_Wikipedia_Articles/Importance/Vital_Articles

A first step towards building tools that take into account article importance is better understanding how Wikipedians determine article importance. Our first research study focused on English Wikipedia's Vital articles, which contains a rich history of discussions about what makes content important or not. We analyzed discussions and derived eight themes regarding the criteria are used to justify the importance of a given article. Notably, only five of these criteria relate to the article in isolation while the other three relate to its context within the larger corpus of Wikipedia content.

Personalization and Equity

Main article: Research:Prioritization_of_Wikipedia_Articles/Importance/SuggestBot

Our second study takes a deeper look at a long-running edit recommender system on Wikipedia (SuggestBot^[6]) with the goal of understanding the trade-offs between personalization and content equity on Wikipedia.

References

↑ Lewoniewski, Włodzimierz; Węcel, Krzysztof; Abramowicz, Witold (2016). "Quality and Importance of Wikipedia Articles in Different Languages" (PDF). Information and Software Technologies (Springer International Publishing): 613–624. doi:10.1007/978-3-319-46254-7_50.
↑ Wulczyn, Ellery; West, Robert; Zia, Leila; Leskovec, Jure (11 April 2016). "Growing Wikipedia Across Languages via Recommendation". arXiv:1604.03235 [cs].
↑ Gorbatai, Andreea D. (3 October 2011). "Exploring underproduction in Wikipedia". Proceedings of the 7th International Symposium on Wikis and Open Collaboration (Association for Computing Machinery): 205–206. doi:10.1145/2038558.2038595.
↑ Warncke-Wang, Morten; Ranjan, Vivek; Terveen, Loren; Hecht, Brent (2015). "Misalignment Between Supply and Demand of Quality Content in Peer Production Communities" (PDF). Proceedings of the 9th International Conference on Web and Social Media, ICWSM 2015. Retrieved 18 August 2020.
↑ Stvilia, Besiki; Al-Faraj, Abdullah; Yi, Yong Jeong (1 December 2009). "Issues of cross-contextual information quality evaluation—The case of Arabic, English, and Korean Wikipedias" (PDF). Library & Information Science Research 31 (4): 232–239. ISSN 0740-8188. doi:10.1016/j.lisr.2009.07.005.
↑ Cosley, Dan; Frankowski, Dan; Terveen, Loren; Riedl, John (2007-01-28). "SuggestBot: using intelligent task routing to help people find work in wikipedia". Proceedings of the 12th international conference on Intelligent user interfaces. IUI '07 (New York, NY, USA: Association for Computing Machinery): 32–41. ISBN 978-1-59593-481-9. doi:10.1145/1216295.1216309.

[1] Lewoniewski, Włodzimierz; Węcel, Krzysztof; Abramowicz, Witold (2016). "Quality and Importance of Wikipedia Articles in Different Languages" (PDF). Information and Software Technologies (Springer International Publishing): 613–624. doi:10.1007/978-3-319-46254-7_50.

[2] Wulczyn, Ellery; West, Robert; Zia, Leila; Leskovec, Jure (11 April 2016). "Growing Wikipedia Across Languages via Recommendation". arXiv:1604.03235 [cs].

[3] Gorbatai, Andreea D. (3 October 2011). "Exploring underproduction in Wikipedia". Proceedings of the 7th International Symposium on Wikis and Open Collaboration (Association for Computing Machinery): 205–206. doi:10.1145/2038558.2038595.

[misalignment-4] Warncke-Wang, Morten; Ranjan, Vivek; Terveen, Loren; Hecht, Brent (2015). "Misalignment Between Supply and Demand of Quality Content in Peer Production Communities" (PDF). Proceedings of the 9th International Conference on Web and Social Media, ICWSM 2015. Retrieved 18 August 2020.

[5] Stvilia, Besiki; Al-Faraj, Abdullah; Yi, Yong Jeong (1 December 2009). "Issues of cross-contextual information quality evaluation—The case of Arabic, English, and Korean Wikipedias" (PDF). Library & Information Science Research 31 (4): 232–239. ISSN 0740-8188. doi:10.1016/j.lisr.2009.07.005.

[6] Cosley, Dan; Frankowski, Dan; Terveen, Loren; Riedl, John (2007-01-28). "SuggestBot: using intelligent task routing to help people find work in wikipedia". Proceedings of the 12th international conference on Intelligent user interfaces. IUI '07 (New York, NY, USA: Association for Computing Machinery): 32–41. ISBN 978-1-59593-481-9. doi:10.1145/1216295.1216309.

[1]

[2]

[3]

[4]

[5]

[6]