Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 3 • Issue: 7 • July 2013 [contribute] [archives]

Napoleon, Michael Jackson and Srebrenica across cultures, 90% of Wikipedia better than Britannica, WikiSym preview

With contributions by: Taha Yasseri, Han-Teng Liao, Piotr Konieczny, Jonathan Morgan and Tilman Bayer

Multilingual ranking analysis: Napoleon and Michael Jackson as Wikipedia's "global heroes"Edit

An ArXiv preprint titled "Highlighting entanglement of cultures via ranking of multilingual Wikipedia articles"[1], authored by a group of physicists from France, examines the Wikipedia articles on individuals and their position in the hyperlink network of the articles in each Wikipedia language edition. There are 9 language editions studied. The authors try to locate the most "important" individuals ("heroes") in each language edition by calculating two different page rank scores: PageRank and CheiRank. After making the lists of individuals with highest ranks in each language edition (with 30 individuals in each list), overlaps between lists are investigated and local and global "heroes" are introduced. The lists of "global heroes" are topped by Napoleon for PageRank, and Michael Jackson for 2DRank. It is shown that both local and global heroes exist and while global heroes gain their central position in the network due to links from multiple other central nodes, local heroes are mostly notable because of the large number of links directly pointing to them. Finally, based on the nationality (language of origin) of the highly ranked individual, a network of languages is constructed and the position of each language in this network is analysed by calculating rank scores. The authors also analyzed the activities of those important individuals, and have found politicians and scientists to be quite often among the most important ones.

Wikipedia as Cultural Reference: Srebrenica Massacre, Art and MenstruationEdit

Art: Concept-sharing relationship between eight selected language versions of Wikipedia (from the DMI Summer School 2013)
Editor's note: the contributing editor of this section, Han-Teng Liao, participated at the DMI Summer School 2013, though not affiliated with the DMI or University of Amsterdam.

The book chapter of "Wikipedia as Cultural Reference" in Richard A. Rogers' book "Digital Methods"[2] can be read as an example of the "digital methods" applied to Wikipedia, or a contribution to the emerging literature on cross-language-version or cross-cultural comparison of the same or similar encyclopedia articles in global Wikipedia projects. Not to be confused with "big methods", "virtual methods", etc.[3], the Digital Methods Initiative (DMI) is a school of Internet researchers at University of Amsterdam led by Rogers to 'create a platform to display the tools and methods to perform research that ... take advantage of "web epistemology"'. Currently the DMI has built some basic Wikipedia research tools that help social scientists to analyze cross-lingual images, anonymous edits, tables of contents, etc. Thus, as part of Rogers' research agenda in advocating the "digital methods", the Wikipedia projects become both a data set and analytical devices that can be repurposed for social research: "as a cultural reference, a vigilant community, a scandal machine and a controversy diagnostic machine"[4].

Self-defined as "cultural research with Wikipedia", this chapter compared the Srebrenica Articles (The Fall of Srebrenica, the Srebrenica Massacre, and the Srebrenica Genocide) across six language versions: Dutch, English, Bosnian, Croatian, Serbian, and Serbo-Croatian. Using various kinds of datasets, ranging from creation dates, edits by interlanguage article editors and top ten editors, the numbers of victims, tables of contents, referenced websites and images used, the findings show that the principle of neutral point of view does not automatically make Wikipedia articles universal (or at least similar) across language versions. The differences, especially those specific to the Wiki medium, can be used for cultural analysis on the selected topics. The content outcome is found to reflect the dynamics between the power editors in defending their sources and content using Wikipedia policies. Among these "umbrella articles", the English version is a highly contested article among many interlanguage editors, and the Serbo-Croatian version is much softened and unifying with very few editors.

A visualisation of the Wikipedia-related images on menstruation articles across different language editions (from the DMI Summer School 2013)

Adopting and extending the digital methods, two groups of participants at the DMI summer school 2013 examined the cross-language-version differences on two topics: art and menstruation. The "Cross Lingual Art Spaces on Wikipedia" project (by Sangeet Kumar, Garance Coggins, Sarah Mc Monagle, Stephan Schlögl, Han-Teng Liao, Michael Stevenson, Federica Bardelli, and Anat Ben-David) sought to find the universal and specific articulations of the concept of art through (1) images and (2) concepts (i.e. strongly related articles), producing an image network visualization for 154 language versions and a concept network visualization for eight selected language versions. A Wikidata scraping tool was developed to identify different names for the same content for the process called "concept reference disambiguation".

The second project, "Menstruation Across Cultures Online" (by Astrid Bigoni, Loes Bogers, Zuzana Karascakova, Emily Stacey and Sarah Mc Monagle) looked at the cultural differences of Wikipedia images and Google autocomplete suggestions to find associated images and search queries. In addition, the English version of the article on menstruation was compared with other English-language sources such as Urban Dictionary and Twitter, producing an interesting cross-platform comparative tag cloud. While not full research articles, the research outcomes of the two projects nonetheless demonstrated the potential directions for cross-cultural and cross-platform comparison, when Wikipedia projects are compared among themselves or with other online platforms that contain user-generated content and/or activities.

Decline of adminship candidatures on Polish WikipediaEdit

A conference paper titled "Does the Acquaintance Relation Close up the Administrator Community of Polish Wikipedia?"[5] investigates why the Polish Wikipedia community of Administrators is growing slower than expected, as defined by a decrease in successful RfAs. The paper presents a useful literature review of related academic work on RfA, and is a welcome study of the under-researched population of editors at non-English Wikipedias. It seems to focus on the computer science dimension, with a developed statistics section, but little theory discussion. In this reviewer's opinion it would've been stronger if the authors engaged with more social science theory, such as the iron law of oligarchy.

The authors suggest at first such a decline may occur because administrators are chosen on the basis of acquaintance, thus creating a closed group which people lacking the right connections cannot join. Later, they conclude that this is unlikely, instead pointing to growing expectations about new candidates. Both of those would be valid hypotheses, but neither is clearly tied to any theory or previous study. The authors' analysis of the data is problematic; at one point they contradict themselves, noting that "[One of the observed phenomena] could indicate, however, that the community is closing up after all" although later their conclusion states "Our conclusion is that it cannot be claimed with certainty that the Polish Wikipedia community is closing up.".

The authors also misunderstand how the WP:RFA process works on English Wikipedia, noting that one of the key differences between Polish and English Wikipedia is voting, as in "in the case of English version of Wikipedia, new administrators are elected not by voting, but by discussion". That the authors are ready to take such policy claims at face value does cast a little doubt on the applicability of their findings.

Overall, the paper presents some interesting statistical data on trends in an understudied community, and contributes to our understanding of the governance of Wikipedia. The analysis of the received data is however rather lacking, particularly through weak ties to literature on leadership, volunteer motivation and related social science areas.

90% of Wikipedia articles have "equivalent or better quality than their Britannica counterparts" in blind expert reviewEdit

A Portuguese-language dissertation at the Universidade de Évora, titled "Colaboração em Massa ou Amadorismo em Massa?" ("Mass collaboration or mass amateurism?")[6] compared the quality of English Wikipedia with that of Encyclopaedia Britannica. As summarized in English on the author's blog, a representative random sample of 245 article pairs from both encyclopedias was generated, and "reformatted to hide [their] source and then graded by an expert in its subject area using a five-point scale. We asked experts to concentrate only on some [...] intrinsic aspects of the articles' quality, namely accuracy and objectivity, and discard the contextual, representational and accessibility aspects. Whenever possible, the experts invited to participate in the study are University teachers, because they are used to grading students' work not using the reputation of the source." They rated "90% of the Wikipedia articles ... as having equivalent or better quality than their Britannica counterparts".

First WikiSym 2013 papers availableEdit

The annual WikiSym research conference is taking place in Hong Kong from August 5 to 7. Since June, the organizers have been featuring the abstracts of the conference's papers on the conference blog, with online publication of full texts planned for August 5. But several authors have already made their papers available elsewhere:

  • Barnstars: "A Preliminary Study on the Effects of Barnstars on Wikipedia Editing"[7] analyzed 21,299 barnstars awarded to 14,074 editors on the English Wikipedia, and found that users tended to be less active in article editing after receiving or presenting barnstars. Although there has been previous research questioning the effectiveness of barnstars, the authors here stop short of concluding that barnstars don't work, but instead hypothesize that the observed effect may be simply because an editor's high activity period "subsequently catches the attention of other editors, who are then more likely to reward them with barnstars."
  • News coverage on Wikipedia and Wiktionary: Researcher Brian Keegan, who has published various research papers on how Wikipedia editors cover breaking news events, uses[8] sociologist Thomas Gieryn's concept of boundary-work to explore "how Wikipedia's response to the 9/11 attacks expanded the role of the encyclopedia to include newswork" in the early years of the project, and describes the "failure of Wikinews" which according to the author "illustrates the pitfalls of misappropriating professional newswork norms as well as the challenges of sustaining online communities."
  • Software library for analyzing collaboration networks on Wikipedia: "Analyzing Multi-Dimensional Networks within MediaWikis"[9] presents a software library for analyzing "a variety of relationships about the content, history, and editors of its articles such as hyperlinks between articles, discussions among editors, and editing histories", using NodeXL.
  • "An Actionable Quality Model for Wikipedia: Co-authored by the late John Riedl (see "Briefly" section), this paper contains both an overview of existing efforts to assess article quality on Wikipedia and a proposal for a new "simple model of article quality with actionable features".[10]
  • "Temporal Analysis of OpenStreetMap users activity: Taha Yasseri from Oxford Internet Institute, who already has a paper on Circadian and Weekly Patterns of Wikipedia Editorial Activity, together with Giovanni Quattrone, and Afra Mashhadi from University College London, studied the temporal patterns of user activity on OpenStreetMap, the wiki-based collaborative mapping project. By applying Principal Component Analysis, they have shown how the pattern of editing has been changing over years, most likely due to increase in use of mobile devices by the mappers. The Study compares the two cases of mappers in London and Rome showing a faster change in London compared to Rome.[11]

Survey participation bias analysis: More Wikipedia editors are female, married or parents than previously assumedEdit

The fact that Wikipedia's editing community has a huge gender gap (with vastly more male than female editors contributing to the encyclopedia) was first brought to wider attention by a 2008 survey of Wikipedia readers and editors, whose results were published by UNU-MERIT and the Wikimedia Foundation in 2010. It found that only 17.8% of US-based editors were female, and 12.7% globally. As reported in the Signpost at the time, some concerns were voiced about the possible impact of participation bias on the results (an effect which is frequent in volunteer web surveys), for example because the survey had also found a gender gap in Wikipedia readers (39.9% female in the US), in contrast to other research which estimated the gender ratio among readers closer to 50%.

A new PloS ONE paper titled "The Wikipedia Gender Gap Revisited: Characterizing Survey Response Bias with Propensity Score Estimation"[12] has made it possible for the first time to quantify this participation bias, regarding the subset of US-based editors. Using a method for propensity adjustment for web surveys first published in a 2011 statistical paper, they compare the 2008 survey with Pew Research data from around the same time, which is assumed to be free of the same kind of bias because it was based on different methodology (a phone survey), and had found 49.0% of US Wikipedia readers to be female. The authors write: "We estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%)." Likewise, they find evidence that the proportion of editors who are "married, or parents, [had] been underestimated, while the proportions of immigrants and students [had] been overestimated."

The authors emphasize that their results do not negate the existence of the gender gap in general ("the basic takeaways in regards to the underrepresentation of women in the WMF/UNU-MERIT survey remain intact"), and actually call for "the Wikimedia Foundation's strategic goal to increase female editorship to 25% [...] to be raised in light of these adjusted estimates." They observe that their method is not applicable to the three subsequent editor surveys conducted by the Wikimedia Foundation in 2011/12 (the most recent one by this reviewer), because they focused solely on editors, and therefore the necessary reader comparison data (e.g. the data from Pew Research surveys) is not available. Still, the paper's results will definitely have a positive impact on the research efforts by the Foundation and others to better understand the demographics of the Wikipedia editing community.


Art: Image-sharing relationship between 154 language versions of Wikipedia (from the DMI Summer School 2013)
  • "Researching collaboration for a better world: John T. Riedl (1962–2013)": A blog post by Dario Taraborelli in memory of computer scientist John Riedl and his numerous contributions to understanding of Wikipedia, ranging from the development of SuggestBot, vandalism, deletion, quality control, and editor retention to the gender gap [13]
  • "Coordination and Learning in Wikipedia: Revisiting the dynamics of exploitation and exploration": An academic paper published for researchers of the sociology of organizations, under the volume topic of "Managing ‘Human Resources' by Exploiting and Exploring People's Potentials", applies the exploration vs. exploitation trade-off learning theory to understand the evolution of Wikipedia [14]. The authors thus identify three periods in the evolution of Wikipedia: (i) the establishment/take-off period from 2001 to 2002, (ii) the growth/consolidation period from 2003 to 2006, and (iii) maturation/sustainability period from 2007 onwards.
  • Overview of Wikipedia and other online encyclopedias in China: An academic blog post [15] shares research materials for journalists to cover the Wikimania 2013 and Wikisym+Opensym 2013 events to be held in Hong Kong. It provides up-to-date information on Chinese-language user-generated content and online encyclopedias.
  • "Peer production online community infrastructures": An academic conference paper [16] that examines the role of centralized and decentralized governance and platform architectures in determining a social software system's excludability: the degree to which users can control who contributes to or consumes the system's resources. Closed-source software platforms like Facebook and Twitter are the most excludable. Users have no control over the design of the platform, no ownership of the content, and the system owners have the right and the power to arbitrarily censor content or block contributors. Free software platforms like Kune allow both decentralized architecture and decentralized governance: they can be hosted anywhere and users themselves can decide how the platform and its content are used. Peer-to-peer network services, especially Darknets, are the least excludable. These services are decentralized and anonymous, so users potentially have more privacy and information security. But these features also facilitate their use in criminal activity. Wikipedia exists somewhere in the middle: the use of CC-by-SA license for content, and community-created policies for governance, reduce excludability. But the Wikimedia Foundation's ownership of the production servers (along with the technical power invested in administrators) make Wikipedia's architecture and governance more centralized, introducing a degree of excludability.
  • Education Program case study: A paper titled "Wikipedia as a Tool for Teaching Policy Analysis and Improving Public Policy Content Online"[17] shares project objectives and lessons learned from having a class at the Trachtenberg School of Public Policy and Public Administration at George Washington University participate in a Wikipedia writing assignment as part of the Wikimedia Foundation's 2010/11 Public Policy project.
  • "Digital citizens" in the classroom: Similarly, a conference paper titled "Becoming Digital Citizens: Using Wikipedia to Enhance the Classroom" [18] describes the outcome of one course participating in the Wikipedia education Program, including a small survey among participating students (10 respondents). Another paper about the Education Program appeared in First Monday recently[19].
  • Use P2P techniques to support Wikipedia hosting: According to a simulation by two German computer scientists[20], the Wikimedia Foundation "can reduce the traffic needed for article lookups in case of Wikipedia up to 72%" by having participants in a P2P network storing and serving some articles from their machines, while still also serving them from a central installation (cloud).
  • Dissertation about vandalism: A dissertation titled "Damage detection and mitigation in open collaboration applications"[21] examines the subject of vandalism on Wikipedia. The author is well-known to Wikipedians as the programmer of the widely used "STiki" vandalism-fighting tool, and for conducting a controversial vandalism experiment himself in 2010.
  • Maintenance tag analysis: A thesis titled "Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia" [22] examines the use of cleanup tags on the English Wikipedia. Some of the author's previous work was covered earlier in this newsletter (e.g. "{{Citation needed}} more effective than {{unreferenced}}").
  • Wikibooks case study: In "The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability"[23], a group of researchers (including Wikimedian Magnus Manske) describe their use of Wikibooks as a platform to write a handbook about Next Generation Sequencing (NGS). Another paper titled "Analysis of Existing Technological Platforms for the Collaborative Production of Open Textbooks"[24] contains a summary of the advantages and drawbacks of Wikibooks compared to similar platforms.
  • Scraping Wikipedia tables: A conference paper describes "Methods for Exploring and Mining Tables on Wikipedia"[25], with an online demo available.
  • Wiktionary and OmegaWiki compared: A paper analyzing the usefulness of Wiktionary and OmegaWiki for translation applications[26] summarizes the differences of the two platforms as follows: "While the openness and flexibility of Wiktionary has attracted many users, leading to a resource of considerable size and richness, the non-standardized structure of entries also leads to difficulties in the integration into translation applications. OmegaWiki, on the other hand, does not suffer from this problem, but the self-imposed limitations to maintain integrity also constrain its expressiveness and, along with that, the range of information which can be represented in the resource." The authors propose a method for using both at the same time, by automatically aligning the two resources at the level of word senses with good precision. This yields a substantial increase of coverage, especially concerning available translations."
  • Quoting Wikipedians in research papers: try to ask them: On the group blog "Ethnography Matters",[27] researcher Heather Ford explored the ethical dilemma of how to quote online statements by members of collaborative communities such as Wikipedia in research papers: anonymously or by name? Ford arrives at the conclusion that "For now ... I'll use my best efforts to contact those whose statements and conversations on Wikipedia I want to quote. More generally, I'm going to continue to talk to Wikipedians about what they think about these issues."
  • "Algorithmic governance" of Wikipedia: A conference paper titled "Work-to-Rule: The Emergence of Algorithmic Governance in Wikipedia" [28] "collected qualitative and quantitative data from Wikipedia in order to show how a community's consensus gradually converts social mechanisms into algorithmic mechanisms".


  1. Young-Ho Eom, Dima L. Shepelyansky: Highlighting entanglement of cultures via ranking of multilingual Wikipedia articles
  2. Rogers, Richard A. (2013). "Wikipedia as Cultural Reference". Digital methods. Cambridge, Massachusetts; London: The MIT Press. pp. 165–202. ISBN 9780262018838.   (Note. A previous version of this chapter can be found (and freely accessible) here: a conference paper for the Wikipedia Academy Deutschland 2012).
  3. For the five methodological views on the implications of digitization for social research, see Marres, Noortje (2012). "The redistribution of methods: on intervention in digital social research, broadly conceived". The Sociological Review 60: 139–165. ISSN 1467-954X. doi:10.1111/j.1467-954X.2012.02121.x. Retrieved 2013-07-28.  (Note. A pdf file can be accessed via the author's university website.)
  4. See a slideshow for the DMI 2013 summer school by Erik Borra on Repurposing Wikipedia
  5. Justyna Spychała, Piotr Turek, Mateusz Adamczyk: "Does the Acquaintance Relation Close up the Administrator Community of Polish Wikipedia? Analysing Polish Wikipedia Administrator Community with use of Multidimensional Behavioural Social Network [1]
  6. Fernando Silvério Nifrário Rodrigues: Colaboração em Massa ou Amadorismo em Massa? Um Estudo Comparativo da Qualidade da Informação Científica Produzida Utilizando os Conceitos e Ferramentas Wiki. Universidade de Évora, 2012 English synopsis
  7. Kwan Hui Lim, Amitava Datta and Michael Wise: A Preliminary Study on the Effects of Barnstars on Wikipedia Editing. PDF WikiSym '13, Aug 05-07 2013, Hong Kong
  8. Brian C. Keegan: A History of Newswork on Wikipedia: WikiSym '13, Aug 05-07 2013, Hong Kong
  9. Brian C. Keegan Arber Ceni Marc A. Smith: "Analyzing Multi-Dimensional Networks within MediaWikis PDF WikiSym '13, Aug 05-07 2013, Hong Kong
  10. Morten Warncke-Wang, Dan Cosley, John Riedl: Tell Me More: An Actionable Quality Model for Wikipedia. WikiSym '13, Aug 05-07 2013, Hong Kong PDF
  11. Taha Yasseri, Giovanni Quattrone, Afra Mashhadi: Temporal Analysis of Activity Patterns of Editors in Collaborative Mapping Project of OpenStreetMap. WikiSym '13, Aug 05-07 2013, Hong Kong PDF
  12. Benjamin Mako Hill, Aaron Shaw: "The Wikipedia Gender Gap Revisited: Characterizing Survey Response Bias with Propensity Score Estimation" PLoS ONE Volume: 8, Issue: 6, DOI:10.1371/journal.pone.0065782
  13. Taraborelli, Dario (July 30, 2013). "Researching collaboration for a better world: John T. Riedl (1962 – 2013)". Wikimedia Blog. Retrieved July 31, 2013. 
  14. Aaltonen, Aleksi; Kallinikos, Jannis (2012). "Coordination and Learning in Wikipedia: Revisiting the dynamics of exploitation and exploration" (PDF). Research in the Sociology of Organizations (Emerald Group Publishing Limited). Retrieved July 31, 2013. 
  15. Liao, Han-Teng (July 30, 2013). "Chinese conditions on user-generated content and online encyclopedias: press-friendly background materials". Oxford Internet Institute Blog. Retrieved July 31, 2013. 
  16. De Rosnay, Melanie (2013). Peer production online community infrastructures. First Conference on Internet Science. The FP7 European Network of Excellence in Internet Science ( Retrieved July 31, 2013. 
  17. Donna Lind Infeld and William C. Adams: Wikipedia as a Tool for Teaching Policy Analysis and Improving Public Policy Content Online. Journal of Public Affairs Education / JPAE 19 (3), 445–459 (Summer 2013) PDF
  18. Sarah Hernandez, Natalie Rector: Becoming Digital Citizens: Using Wikipedia to Enhance the Classroom PDF
  19. Amy Roth, Rochelle Davis, Brian Carver: Assigning Wikipedia editing: Triangulation toward understanding university student engagement
  20. Lars Bremer and Kalman Graffi: "Symbiotic Coupling of P2P and Cloud Systems: The Wikipedia Case". In: IEEE ICC'13: Proc. of the IEEE International Conference on Communications. PDF
  21. Andrew G. West: Damage detection and mitigation in open collaboration applications. Dissertation in Computer and Information Science, University of Pennsylvania, May 2013
  22. Maik Anderka: Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia
  23. Jing-Woei Li et al.: The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability. Briefings in Bioinformatics, doi:10.1093/bib/bbt045
  24. Xavier Ochoa, Gladys Carrillo, Ana Casali, Claudia Deco, Valeria Gerling: Analysis of Existing Technological Platforms for the Collaborative Production of Open Textbooks
  25. Chandra Sekhar Bhagavatula, Thanapon Noraset, Doug Downey: Methods for Exploring and Mining Tables on Wikipedia. IDEA’13, August 11th, 2013, Chicago, IL, USA. PDF
  26. M Matuschek, CM Meyer, I Gurevych: "Multilingual Knowledge in Aligned Wiktionary and OmegaWiki for Translation Applications"
  27. Heather Ford: Onymous, pseudonymous, neither or both? Ethnography Matters blog, June 27, 2013
  28. Claudia Müller-Birn, Leonhard Dobusch, James D. Herbsleb: "Work-to-Rule: The Emergence of Algorithmic Governance in Wikipedia" C&T '13 June 29 - July 02 2013, Munich, Germany PDF

Wikimedia Research Newsletter
Vol: 3 • Issue: 7 • July 2013
About • Subscribe: Email      [archives][Signpost edition][contribute][research index]