Wikipedia Administrative Pages Analytics/Research on Wikipedia Administrative Pages

In this page, we make a brief summary of the current state of the art on the Wikipedia administrative pages. We start explaining its different types of page and main namespaces, then we motivate the need for a systematic approach to analyze them, and finally, we enumerate potential areas that could be explored in future research.

Administrative Pages

edit

Wikipedia is one of the most studied objects on the Internet. In the past twenty years, scholars studied the content and the processes that lead to its creation, the readership, general consumption, and societal impact of Wikipedia, among many other topics.

Yet, over the years, administrative Pages have never been approached as a whole.

One specific type of admin page is “Policies & Guideliness.”. As they appear in academic literature, the main findings are:

  1. Numerous and abundant rules and policies have been created that serve a wide variety of functions to maintain Wikipedia (Butler and Pike, 2008).
  2. Norms are aimed at managing content but also addressing essential challenges of online communication in cooperative work (Reagle Jr. 2010)
  3. Newcomers find it difficult to have their contributions to policies accepted with an impact on their retention (Halfaker et al. 2013).
  4. Participants (newcomers) to Wikipedia edit-a-thons experience moments of frustration when navigating a complex bureaucracy of policies and procedures (Gluza et al. 2021).
  5. Similar rule-making activity across the five communities replicates and extends prior work on English language Wikipedia alone (Hwang and Shaw, 2022).

However, there are other types of administrative pages. Admin pages are defined as those pages that support administration and governance in order to further Wikipedia's goals. Leaving user pages and specific content discussion pages aside, all those pages dedicated to creating documentation, encouraging debate, and coordination are essential to Wikipedia administration.

Although they are key to newcomers and existing editors, Help pages are scarcely addressed as a main topic of research. Help pages do not have the prescriptive role of policies, but clarify some aspects of how to behave or use certain resources. In a similar manner, Wikipedia essays provide another layer of reflection upon different work practices that policies and help pages do not target (Morgan and Zachry, 2010).

Wikiprojects are spaces in which to coordinate in order to achieve a specific goal within Wikipedia (e.g, create articles about a topic, organize an event, etc.). The importance of Wikiprojects is key to opening unstructured spaces for editors to engage in collaborative editing (Morgan, McDonald and Zachry, 2013) and structured ones for students and self-regulated learning (Ng. E. M, 2016).

The Wikipedia space for discussion named “Village Pump” is used as a forum for discussions about the operations, new ideas for tools, technical issues, policies, or specific articles. New ideas often emerge from these discussions. Similar forums like “Teahouse” provide a space for discussions, but are more aimed at newcomers. Morgan and Halfaker (2018) found that new editors invited to the Teahouse are retained at a higher rate than editors who do not receive an invite.

Administrative and Content Namespaces

edit

Whether it is arguable that Wikiprojects and Village pump or other general discussion pages are part of the administrative pages, we see that they are all included in the data structure namespace 4 (Wikipedia). This is used for pages with information or discussion about Wikipedia itself, i.e., the administrative pages we previously described. In addition to that, namespace 12 (Help) contain information intended to help use Wikipedia or its software, although there is a large thematic overlap between help pages in Wikipedia namespace and help pages in the help namespace.

Finally, other namespaces employed to categorize and manage content are categories (namespace 14) and portals (namespace 100). Portals serve as "main pages" for content (articles) on specific topics or areas in a similar way to the main Page. Nonetheless, they are hard to maintain, as the articles about the topic might change at a greater speed and editors’ who created the Portal might not notice. Categories help content pages (i.e. the Wikipedia articles, which are namespace 0 or Main namespace) and administrative pages by grouping together pages on similar subjects. Categories have been extensively studied in Wikipedia academic literature as a source of information for topical analysis.

Areas of Wikipedia Research

edit

As said, many aspects related to Wikipedia community and content have been extensively researched in order to explain its success in many dimensions. Administrative pages are not included in broad Wikipedia research literature reviews on Wikipedia content, editor activity, or readership (Okoli et al. 2012, Okoli et al. 2014, Mesgari et al. 2015).

These are some of the most outstanding areas in Wikipedia research. Also, these are some areas that could benefit from dedicating more attention to administrative pages:

Topical analysis (Halavais and Lackaff, 2008; Kittur et al. 2009).

edit

Content gaps (Miquel-Ribé and Laniado, 2021; Redi et al., 2020).

edit

Content biases and diversity (Callahan and Herring, 2011; Beytía and Wagner, 2022)

edit

Contextualization and cross-language analysis (Hecht and Gergle, 2010; Hecht, 2013, Samoilenko et al. 2016).

edit

Content quality models (Lewoniewski et al. 2019; Halfaker and Geiger, 2020).

edit

Content readability and style (Setia et al. 2021, Liu et al. 2021).

edit

Participation, and inequality (Ortega and Robles, 2008; Shaw and Hargittai, 2018).

edit

Cooperation and peer-to-peer communication (Wilkinson and Huberman, 2007; Kittur and Krau, 2008).

edit

Readership (Warncke-Wang et al. 2015; Lehmann et al. 2014).

edit

Vandalism (Shachaf and Hara, 2010; Adler et al. 2011).

edit

Most of these areas focused mainly on content (articles) and used only occasionally administrative pages for a broader view of a subject. However, we believe that a more in-depth study of admin. pages is required. Administrative pages are not only the gateway to newcomers, but also have the potential to explain community health and engagement.

Case for a Systematic Analysis of Administrative Pages

edit

While the research community pays a lot of attention to content gaps (on gender, culture, and geography, among others.), we do not know which policies or help pages have been created across the more than 300 Wikipedia language editions and which are missing (i.e., it is also a content gap).

We argue that a systematic understanding of their different types of admin pages across languages would enable recommending key areas to work on, as well as changes to include more and more diverse editors in the editing process.

In this research project, we want to identify the different types of admin pages and create a dataset that qualifies every page along with valuable metrics to understand its completeness, popularity, engagement, and inclusion. By providing analytics on the study of administrative pages, we may be able to open new avenues of research and at the same time raise awareness among Wikipedians on how to improve them.


References

edit
  • Butler, B., Joyce, E., & Pike, J. (2008, April). Don't look now, but we've created a bureaucracy: the nature and roles of policies and rules in wikipedia. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1101-1110).
  • Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2013). The rise and decline of an open collaboration system: How Wikipedia’s reaction to popularity is causing its decline. American Behavioral Scientist, 57(5), 664-688.
  • Hwang, S., & Shaw, A. (2022, May). Rules and Rule-Making in the Five Largest Wikipedias. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 16, pp. 347-357).
  • Gluza, W., Turaj, I., & Meier, F. (2021, September). Wikipedia Edit-a-thons and Editor Experience: Lessons from a Participatory Observation. In 17th International Symposium on Open Collaboration (pp. 1-9).
  • Morgan, J. T., & Zachry, M. (2010, November). Negotiating with angry mastodons: the wikipedia policy environment as genre ecology. In Proceedings of the 16th ACM international conference on Supporting group work (pp. 165-168).
  • Reagle Jr, J. M. (2010). “Be Nice”: Wikipedia norms for supportive communication. New Review of Hypermedia and Multimedia, 16(1-2), 161-180.
  • Morgan, J. T., Gilbert, M., McDonald, D. W., & Zachry, M. (2013, August). Project talk: Coordination work and group membership in WikiProjects. In Proceedings of the 9th International Symposium on Open Collaboration (pp. 1-10).
  • Ng, E. M. (2016). Fostering pre-service teachers' self-regulated learning through self-and peer assessment of wiki projects. Computers & Education, 98, 180-191.
  • Morgan, J. T., & Halfaker, A. (2018, August). Evaluating the impact of the Wikipedia Teahouse on newcomer socialization and retention. In Proceedings of the 14th International Symposium on Open Collaboration (pp. 1-7).
  • Mesgari, M., Okoli, C., Mehdi, M., Nielsen, F. Å., & Lanamäki, A. (2015). “The sum of all human knowledge”: A systematic review of scholarly research on the content of W ikipedia. Journal of the Association for Information Science and Technology, 66(2), 219-245.
  • Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F. Å., & Lanamäki, A. (2014). Wikipedia in the eyes of its beholders: A systematic review of scholarly research on Wikipedia readers and readership. Journal of the Association for Information Science and Technology, 65(12), 2381-2403.
  • Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F. Å., & Lanamäki, A. (2012). The people’s encyclopedia under the gaze of the sages: A systematic review of scholarly research on Wikipedia. Available at SSRN 2021326.
  • Miquel-Ribé, M., & Laniado, D. (2021). The Wikipedia Diversity Observatory: helping communities to bridge content gaps through interactive interfaces. Journal of Internet Services and Applications, 12(1), 1-25.
  • Redi, M., Gerlach, M., Johnson, I., Morgan, J., & Zia, L. (2020). A taxonomy of knowledge gaps for wikimedia projects (second draft). arXiv preprint arXiv:2008.12314.
  • Halavais, A., & Lackaff, D. (2008). An analysis of topical coverage of Wikipedia. Journal of computer-mediated communication, 13(2), 429-440.
  • Kittur, A., Chi, E. H., & Suh, B. (2009, April). What's in Wikipedia? Mapping topics and conflict using socially annotated category structure. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1509-1512).
  • Hecht, B., & Gergle, D. (2010, April). The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 291-300).
  • Hecht, B. J. (2013). The mining and application of diverse cultural perspectives in user-generated content (Doctoral dissertation, Northwestern University).
  • Samoilenko, A., Karimi, F., Edler, D., Kunegis, J., & Strohmaier, M. (2016). Linguistic neighbourhoods: explaining cultural borders on Wikipedia through multilingual co-editing activity. EPJ data science, 5, 1-20.
  • Warncke-Wang, M., Ranjan, V., Terveen, L., & Hecht, B. (2015). Misalignment between supply and demand of quality content in peer production communities. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 9, No. 1, pp. 493-502).
  • Lewoniewski, W., Węcel, K., & Abramowicz, W. (2019). Multilingual ranking of Wikipedia articles with quality and popularity assessment in different topics. Computers, 8(3), 60.
  • Halfaker, A., & Geiger, R. S. (2020). Ores: Lowering barriers with participatory machine learning in wikipedia. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2), 1-37.
  • Setia, S., Iyengar, S. R. S., Verma, A. A., & Dubey, N. (2021, September). Is Wikipedia easy to understand?: a study beyond conventional readability metrics. In International Conference on Computational Collective Intelligence (pp. 175-187). Springer, Cham.
  • Liu, Y., Medlar, A., & Glowacka, D. (2021, July). Can Language Models Identify Wikipedia Articles with Readability and Style Issues?. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval (pp. 113-117).
  • Beytía Reyes, P., & Wagner, C. (2022). Visibility layers: A framework for systematising the gender gap in Wikipedia content. Internet Policy Review, 11(1), 1-22.
  • Callahan, E. S., & Herring, S. C. (2011). Cultural bias in Wikipedia content on famous persons. Journal of the American society for information science and technology, 62(10), 1899-1915.
  • Shaw, A., & Hargittai, E. (2018). The pipeline of online participation inequalities: The case of Wikipedia editing. Journal of communication, 68(1), 143-168.
  • Ortega, F., Gonzalez-Barahona, J. M., & Robles, G. (2008, January). On the inequality of contributions to Wikipedia. In Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008) (pp. 304-304). IEEE.
  • Wilkinson, D. M., & Huberman, B. A. (2007, October). Cooperation and quality in wikipedia. In Proceedings of the 2007 international symposium on Wikis (pp. 157-164).
  • Kittur, A., & Kraut, R. E. (2008, November). Harnessing the wisdom of crowds in wikipedia: quality through coordination. In Proceedings of the 2008 ACM conference on Computer supported cooperative work (pp. 37-46).
  • Halfaker, A., Keyes, O., & Taraborelli, D. (2013, February). Making peripheral participation legitimate: reader engagement experiments in wikipedia. In Proceedings of the 2013 conference on Computer supported cooperative work (pp. 849-860).
  • Lehmann, J., Müller-Birn, C., Laniado, D., Lalmas, M., & Kaltenbrunner, A. (2014, September). Reader preferences and behavior on Wikipedia. In Proceedings of the 25th ACM conference on Hypertext and social media (pp. 88-97).
  • Shachaf, P., & Hara, N. (2010). Beyond vandalism: Wikipedia trolls. Journal of Information Science, 36(3), 357-370.
  • Adler, B. T., Alfaro, L. D., Mola-Velasco, S. M., Rosso, P., & West, A. G. (2011, February). Wikipedia vandalism detection: Combining natural language, metadata, and reputation features. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 277-288).