Research:User Engagemenet in Wikipedia: The Influence of Cultural Identity

Mari-Carmen Marcos
Duration:  2011-11 – 2013-12
Open data project  Open data
no url provided
VisualEditor - Icon - Check.svg
This page documents a completed research project.

Key PersonnelEdit

This research is carried out by a team of researchers based in Universitat Pompeu Fabra, Barcelona, with the collaboration of members from the Universitat Politècnica de Catalunya. Key personnel on the project include:

  • Marc Miquel i Ribé, Universitat Pompeu Fabra (UPF)
  • Mari-Carmen Marcos, phD, Universitat Pompeu Fabra (UPF)
  • Horacio Rodríguez, phD, Universitat Politècnica de Catalunya (UPC)

Project SummaryEdit

The purpose or main goal of this study is to understand the influence ‘Local Content’ identification factor exerts in the User Engagement process between readers and editors and Wikipedia.

Our hypothesis is that this new factor affects positively increasing the other User Engagement process factors. To verify it, we propose four specific goals, as necessary to cover the relationship between the user (reader and writer) and Wikipedia. They are the following:

  1. To obtain/quantify Cultural identity related content scope, verify and measure its existence and characteristics in several WP language editions.
  2. To evaluate over time the influence of Cultural Identity as a UE factor in both edition/discussion community dynamics and navigation/reading.
  3. To understand how Cultural Identity related content is consequence of both user involvement/activity in Wikipedia and affection for its cultural background.
  4. To assess how Cultural Identity can affect the reading experience regarding the elements in the Wikipedia article layout.

The hypothesis will be verified for sets of languages up to 20 when processing. When focusing in a specific community case we will take Catalan Wikipedia community.

Background InformationEdit

User Engagement (UE) is one of the trendiest concepts in the Internet, it appeared after the acknowledgement of the importance of user-centered design, and it qualifies the user experience between an object and a user. Thus, it is used both in academic and professional sphere – with slightly different meanings. It is appreciated for any technologic device and usually studied on user’s behavior.

One of its working frameworks integrated different psychology theories (Flow, Play and Aesthetics) and proved it was composed by factors like attention, usability, aesthetics, novelty, endurability and involvement (O’Brien, 2008). Although the framework has been used by different studies, it must be remarked that content meaning was not generally included as a factor.

It seems undeniable then that familiarity with content and a technological interface alters the use and therefore UE process. Besides, identification with a culture (so called cultural identity) has been referred as one of the main human drives or motivation to accomplish a goal. Identification with content can be seen as identification with a particular set of meanings from a culture.

Wikipedia (WP) is a free, collaboratively edited and multilingual Internet encyclopedia available in 285 languages. It is constructed by communities of volunteers – registered and anonymous - which decide its content by consensus. Then, each language edition can dedicate more interest to certain topics others don’t. This is the case of ‘local content’, a set of articles which develop the language or culturally related topics like territory, language, traditions and societal dynamics.

WP along its popularity is becoming a well-researched study object. Nonetheless, it has not been explored as a product of UE – which makes sense when a user can easily become a producer – and its multilingual characteristic is often neglected taking only the English edition. Our hypothesis is that UE process happening between Wikipedia and its readers/editors is also influenced by their cultural identification.

In this research we propose to explain how the mapping of this cultural identification exists with ‘local content’ as its representative set of articles, and we explore the relationship between user interactions and WP with several studies. Hence, we will divide the study object into different spaces to obtain a holistic approach and reliable results.

At last, the study will propose an improvement for the UE framework and disseminate its results among the academic. And no less important, the study conclusions will be useful for the Wikipedia community, since they will get more understanding of their members but also will benefit interface change proposals based on the insights.


Data Processing (Data Analysis and Natural Language Processing) In the first phase, Wikipedia was mainly the research object and thus it had to be approached by computational means and methodologies. In order to obtain general conclusions we chose 20 language editions from the most edited to very small ones. We proposed an analytical model using different Wikipedia informational structures - textual, relational and quantitative. This implied all kind of elements such as edits, links, text and categories among others. And we did use techniques like Tf-Idf, Page Rank or Semantic Relatedness.

Etnography and Qualitative Research In the second phase, we propose using qualitative means to approach different Wikipedia language communities. We already did two informal surveys to the Catalan community (through Amical Viquipèdia association) which gave good indicator of a cultural motivation to collaborate in local content. However, it is necessary to use qualitative techniques to obtain fundamented conclusions. This will imply a recruitment, interview and analys periods.

User Testing: Eye Tracking and Think Aloud Methods In the third and last phase, we propose 'Eye Tracking' and 'Think Aloud' as mature User Experience methodologies to understand how users relate in the present moment to 'local content'. It will be necessary to take into account all degrees of interest in 'local content'. This implies a recruitment, testing and analysis period. Later, it will be also possible to continue the research by proposing and using metrics in order to see if the interface changes users behaviors.


The research will be presented at relevant conferences, seminars and journals.

Wikimedia Policies, Ethics, and Human Subjects ProtectionEdit

Our foremost priority is to conduct our research in an ethical, respectful, and non-disruptive manner. We will ensure that we conform to strict standards of informed consent and transparency in data collection methods. All participants will be informed about our affiliation, purpose and research goals. We will make all efforts to address any risks associated with participation in this study.

Benefits for the Wikimedia community - Fit to StrategyEdit

This work will help to:

  1. Propose useful engaging guidelines in 'User Experience' for newer MediaWiki versions.
  2. Identify motivation of editors, both registered or anonymous.
  3. Give a better understanding on the Wikipedia content, their strenghts and lacks.
  4. Reinforce a new multicultural neutral point of view (MNPOV).
  5. Help in the overall goal of spreading all human knowledge to all languages.


January-March 2012

  • Develope the API for Wikipedia analysis
  • Process new data from Wikipedia languages
  • Draft report

April - June 2012

  • Data analysis

June-September 2012

  • Data Analysis
  • Prototyping
  • Eye Tracking Testing

January-March 2013

  • Survey Creation
  • Data Analysis

April - December 2013

  • User testing
  • Data Analysis
  • Drafting, writing and publishing


At the moment it has no funding.


Attfield S., Kazai G., Lalmas M. and Piwowarski B. 2011. Towards a science of user engagement, WSDM Workshop on User Modelling for Web Applications, Hong Kong, China, 9 February 2011

Peterson, E.T. How do you calculate engagement? Part I. Web Analytics Demystified (blog) 2006.

Peterson, E.T. & Carrabis. 2008. Measuring the immeasurable: visitor engagement, WebAnalyticsDemystified.

Yom-Tov, E.; Lalmas, M.; Dupret, G.; R. Baeza-Yates; P. Donmez and J. Lehmann. 2012. The Effect of Links on Networked User Engagement, World Wide Web Conference (2012), pp. 16-20 April 2012, Lyon, France (Poster).

O’Brian, H.; Toms, E.. 2008. What is user engagement? A conceptual framework for defining user engagement with technology. JASIST 59(6), pp. 938-955.

O’Brian, H.; Toms, E. T; Kelloway, K. & Kelley, E. 2010. The development and evaluation of a survey to measure user engagement. JASIST 61(1):50-69

O’Brien, H. 2011. Exploring user engagement in online news interactions. In Proc. JASIST. Jennings, M. 2000. Theory and models for creating engaging and immersive e-commerce websites. ACM SIGCPR.

Lehmann, J; Lalmas, M; Yom-Tov E. & Dupret, G. 2012. Models of User Engagement, 20th conference on User Modeling, Adaptation, and Personalization (UMAP 2012), Montreal, 16-20 July 2012. McCay-Peet. L.; Lalmas, M.; V. Navalpakkam. 2012. On Saliency, Affect and Focused Attention, ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), 5-10 May, Austin, Texas.

Alexander Halavais and Derek Kaclkaff. 2008. An analysis of topical coverage of Wikipedia. Journal of Computer-Mediated Communication. 13(2):429-440.

Brent Hecht and Darren Gergle. 2010. The Tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context, 291-300. ACM.

Brent Hecht and Darren Gergle. 2009. Measuring self-focus bias in community-maintained knowledge repositories. In C38;T’09: Proceedings of the fourth international conference on Communities and technologies, 11-20, New York, NY, USA, 2009. ACM.

Brian Butler, Elisabeth Joyce, and Jacqueline Pike. 2008. Don’t look now, but we’ve created a bureaucracy: the nature and rules of policies and rules in Wikipedia. CHI ’08: Proceedings of the twenty-sixth annual SIGCHI conference on Human Factors in computing systems. pages 1101-1110. ACM, New York, NY, USA.

Felipe Ortega and Jesus M. Gonzalez Barahona. 2007. Quantitative analysis of the Wikipedia community of users. WikiSym ’07: Proceedings of the 2007 International symposium on Wikis. Pages 75-86. ACM. Montreal, Québec, Canada.

Gabrilovich, E. andMarkovitch, S. (2007). Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. Twentieth Joint Conference fo Artificial Intelligence (IJCAI ’07), 1606-16

Kittur, Aniket and chi, Ed H. and Suh, Bongwon. 2009. What’s in Wikipedia?: mapping topics and conflict using socially annotated category structure. CHI’09: Proceedings of the 27th international conference on Human factors in computing systems. pages 1509-1512. ACM. Boston, MA, USA.

Miquel, Marc, Rodríguez, Horacio. Cultural Configuration of Wikipedia: Measuring Autoreferentiality in Different Languages. Recent Advances in Natural Language Processing, 12-14, September, Hissar, Bulgaria 2011.

Nastase, Vivi and Strube, Michael. 2008. Decoding Wikipedia categories for knowledge acquisition. AAAI’08: Proceedings of the 23rd national conference on Artificial intelligence. Pages 1219-1224. AAI Press. Chicago, Illinois.

Oded Nov. What motivates Wikipedians? 2007. Communic. ACM. 60-64. New York, NY, USA. Pfeil, Ulrike and Zaphiris, Panayiotis and Ang, Chee S. 2006. Cultural Differences in Collaborative Authoring of Wikipedia. Journal of Computer-Mediated Communication. 12(1).

Yang, Heng-Li and Lai, Cheng-Yu. 2010. Motivations of Wikipedia content contributors. Computer Human Behaviour. 26(6).


Marc Miquel – MSc in Telecommunication and degree in Humanities - marcmiquel @