Research:Wikimedia France Research Award/papers submission

Everyone can submit any number of research papers in accordance with the following criteria.

Papers have to:

  1. have made an important impact on the understanding of the working of Wikipedia or other free knowledge projects;
  2. have been published between jan. 2003 and dec. 2011 in peer reviewed publications (e.g. journals or proceedings), and be of reasonable length (no Phd thesis, books…); This research can be published/translated in any language, but there should be an english translation available
  3. be available in open access (can be a preprint);

Many examples here: Academic studies of Wikipedia and in the Wikimedia Research Newsletter Archives and in WikiLit

Submission deadline : August 1 -- August 7th, 2012


August 9th : Papers Submission is now closed

Papers submission edit

Paper_Title edit

  • Publication source paper has to be peer-reviewed (e.g. Journal, proceedings),
  • Author(s), Year
  • Summary
  • (optional) Quick assessment of paper importance. Please present in few words why this paper is important for the Wikimedia projects and/or free knowledge, you can use a citation index or similar indicator
  • Link(s) to the full text

Wikipedia: community or social movement? edit

  • Interface. Volume 1 (2): 212 - 232 (November 2009)
  • Konieczny, Piotr, 2009
  • In recent years a new realm for study of political and sociological phenomena has appeared, the Internet, contributing to major changes in our societies during its relatively brief existence. Within cyberspace, organizations whose existence is increasingly tied to this virtual world are of interest to social scientists. This study will analyze the community of one of the largest online organizations, Wikipedia, the free encyclopedia with millions of volunteer members. Wikipedia was never meant to be a community, yet it most certainly has become one. This study asks whether it is something even more –whether it is an expression of online activism, and whether it can be seen as a social movement organization, related to one or more of the Internet-centered social movements industries (in particular, the free and open-source software movement industry).
  • The first serious consideration of whether Wikipedia movement is a social movement
  • [1]

Wikipedia: A Key Tool for Global Public Health Promotion edit

  • Journal of Medical Internet Research: J Med Internet Res 2011;13(1):e14
  • Heilman JM, Kemmann E, Bonert M, Chatterjee A, Ragar B, Beards GM, Iberri DJ, Harvey M, Thomas B, Stomp W, Martone MF, Lodge DJ, Vondracek A, de Wolff JF, Liber C, Grover SC, Vickers TJ, Meskó B, Laurent MR 2011
  • The Internet has become an important health information resource for patients and the general public. Wikipedia, a collaboratively written Web-based encyclopedia, has become the dominant online reference work. It is usually among the top results of search engine queries, including when medical information is sought. Since April 2004, editors have formed a group called WikiProject Medicine to coordinate and discuss the English-language Wikipedia’s medical content. This paper, written by members of the WikiProject Medicine, discusses the intricacies, strengths, and weaknesses of Wikipedia as a source of health information and compares it with other medical wikis. Medical professionals, their societies, patient groups, and institutions can help improve Wikipedia’s health-related entries. Several examples of partnerships already show that there is enthusiasm to strengthen Wikipedia’s biomedical content. Given its unique global reach, we believe its possibilities for use as a tool for worldwide health promotion are underestimated. We invite the medical community to join in editing Wikipedia, with the goal of providing people with free access to reliable, understandable, and up-to-date health information.
  • Even though published less than a year ago, already cited 19 times (Google Scholar), tweeted over 100 times (http://www.jmir.org/stats/viewTweets/all/1589)
  • http://www.jmir.org/2011/1/e14/

Semantic Wikipedia edit

  • WWW '06 Proceedings of the 15th international conference on World Wide Web, Pages 585 - 594,
  • Max Völkel, Markus Krötzsch, Denny Vrandecic, Heiko Haller, Rudi Studer, 2006
  • Wikipedia is the world's largest collaboratively edited source of encyclopaedic knowledge. But in spite of its utility, its contents are barely machine-interpretable. Structural knowledge, e.,g. about how concepts are interrelated, can neither be formally stated nor automatically processed. Also the wealth of numerical data is only available as plain text and thus can not be processed by its actual meaning.We provide an extension to be integrated in Wikipedia, that allows the typing of links between articles and the specification of typed data inside the articles in an easy-to-use manner.Enabling even casual users to participate in the creation of an open semantic knowledge base, Wikipedia has the chance to become a resource of semantic statements, hitherto unknown regarding size, scope, openness, and internationalisation. These semantic enhancements bring to Wikipedia benefits of today's semantic technologies: more specific ways of searching and browsing. Also, the RDF export, that gives direct access to the formalised knowledge, opens Wikipedia up to a wide range of external applications, that will be able to use it as a background knowledge base.In this paper, we present the design, implementation, and possible uses of this extension.
  • Paper Importance : 422 citations on Google Scholar, 93 on ACM, lead to Semantic MediaWiki and Wikidata
  • http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.9834&rep=rep1&type=pdf

DBpedia: A Nucleus for a Web of Open Data edit

  • Lecture Notes in Computer Science, Volume 4825/2007
  • Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary Ives, 2007
  • DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human- and machine-consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current status of interlinking DBpedia with other open datasets on the Web and outline how DBpedia could serve as a nucleus for an emerging Web of open data.
  • Paper Importance : 824 citations on Google Scholar
  • http://www.cis.upenn.edu/~zives/research/dbpedia.pdf

Wikipedia as Participatory Journalism: Reliable Sources? edit

  • Paper presented at the 5th International Symposium on Online Journalism, April 16 - 17, 2004, Austin, Texas, United States.
  • Lih, Andrew, 2004
  • Wikipedia is an Internet-based, user contributed encyclopedia that is collaboratively edited, and utilizes the wiki concept -- the idea that any user on the Internet can change any page within the Web site, even anonymously. Paradoxically, this seemingly chaotic process has created a highly regarded reference on the Internet. Wikipedia has emerged as the largest example of participatory journalism to date -- facilitating many-to-many communication among users editing articles, all working towards maintaining a neutral point of view -- Wikipedia’s mantra. This study examines the growth of Wikipedia and analyzes the crucial technologies and community policies that have enabled the project to prosper. It also analyzes Wikipedia’s articles that have been cited in the news media, and establishes a set of metrics based on established encyclopedia taxonomies and analyzes the trends in Wikipedia being used as a source.
  • Paper importance: 235 citations according to Google scholar
  • open ; citeseer

Governance in Social Media: A case study of the Wikipedia promotion process edit

  • 4th Int'l AAAI Conference on Weblogs and Social Media
  • Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg, 2010
  • As a case study of social-media governance,we have investigated the Wikipedia promotion process from the perspective of the voters engaged in group decision-making. We have identified several forms of relative assessment that play an important role in how voters make decisions; these include how elative characteristics of voters and candidates affect the probability of positive votes, as well as how voters’ decisions depend on the state of the lection at the time they cast their votes. We have also investigated the temporal dynamics of the elections, identifying ordering effects that contrast with standard theories of herding and information cascades. This style of analysis suggests a range of further interesting questions related to governance and deliberation. It would be interesting to connect our findings on the relative merit of voters and candidates more closely to the recent work of Burt (2009) and others on the role that relative comparison plays in social networks. We would also like to try integrating our analyses of temporal dynamics in elections with Bayesian models of information cascades (Banerjee 1992). Finally, we believe that the style of analysis used here could be productively combined with textual analysis of the content of discussions that arise as part of deliberation on social-media sites; such a hybrid of textual and structural approaches could well yield further insights.
  • Paper importance:
  • full text: [2]

He says, she says: conflict and coordination in Wikipedia edit

  • In Proc. SIGCHI Conf. Human factors in computing systems,
  • Aniket Kittur , Bongwon Suh , Bryan A. Pendleton , H. Chi, 2007
  • Throughout this paper we have presented methods to characterise conflict in Wikipedia at the global, article, and user levels. First we presented details of the growth of conflict and coordination costs at the global level across Wikipedia’s history. We then showed that conflicts at the local article level can be modeled and predicted using a machine learner. Finally, we depicted the conflicts that occur at the user level, demonstrating the use of visualization in making sense of disputes between users. We believe that the characterization on the growth of conflict and coordination costs provides insights into how a Wiki solution to collaboration in hypertext systems can scale to very large sizes, with potential implications for the study of other large groupware and organizational memory systems. The methods developed for predicting conflict from simple metrics and visualizing user conflict also present novel ways to analyze large scale online collaborative systems in which users interact to produce knowledge. Further research is needed to explore how these findings generalize to other collaborative knowledge systems.
  • Paper importance: 237 citations according to Google Scholar
  • full text: [3]

Creating, destroying, and restoring value in Wikipedia edit

  • GROUP 2007
  • Priedhorsky, R., Chen, J., Lam, S. K., Panciera, K., Terveen, L., & Riedl, J., 2007
  • (Abstract) Wikipedia’s brilliance and curse is that any user can edit any of the encyclopedia entries. We introduce the notion of the impact of an edit, measured by the number of times the edited version is viewed. Using several datasets, including recent logs of all article views, we show that frequent editors dominate what people see when they visit Wikipedia, and that this domination is increasing.* Similarly, using the same impact measure, we show that the probability of a typical article view being damaged is small but increasing, and we present empirically grounded classes of damage. Finally, we make policy recommendations for Wikipedia and other wikis in light of these findings.
  • Paper importance: 161 citations according to Google Scholar
  • open ACM

Readers are not free-riders: Reading as a form of participation on Wikipedia edit

  • CSCW 2010
  • Antin, J., & Cheshire, C., 2010
  • Abstract: The success of Wikipedia as a large-scale collaborative effort has spurred researchers to examine the motivations and behaviors of Wikipedia’s participants. However, this research has tended to focus on active involvement rather than more common forms of participation such as reading. In this paper we argue that Wikipedia’s readers should not all be characterized as free-riders – individuals who knowingly choose to take advantage of others’ effort. Furthermore, we illustrate how readers provide a valuable service to Wikipedia. Finally, we use the notion of legitimate peripheral participation to argue that reading is a gateway activity through which newcomers learn about Wikipedia. We find support for our arguments in the results of a survey of Wikipedia usage and knowledge. Implications for future research and design are discussed.
  • Importance:
  • open ACM

Foucault@wiki: First steps towards a conceptual framework for the analysis of wiki discourses edit

  • WikiSym 2006
  • Pentzold, C., & Seidenglanz, S., 2006
  • Abstract: In this paper, we examine the discursive situation of Wikipedia. The primary goal is to explore principle ways of analyzing and characterizing the various forms of communicative user interaction using Foucault’s discourse theory. First, the communicative situation of Wikipedia is addressed and a list of possible forms of communication is compiled. Second, the current research on the linguistic features of Wikis, especially Wikipedia, is reviewed. Third, some key issues of Foucault’s theory are explored: the notion of ‘discourse’, the discursive formation, and the methods of archaeology and genealogy, respectively. Finally, first steps towards a qualitative discourse analysis of the English Wikipedia are elaborated. The paper argues, that Wikipedia can be understood as a discursive formation that regulates and structures the production of statements. Most of the discursive regularities named by Foucault are established in the collaborative writing processes of Wikipedia, too. Moreover, the editing processes can be described in Foucault’s terms as discursive knowledge production.
  • Importance: ....
  • open ACM

Community, consensus, coercion, control: CS*W or how policy mediates mass participation edit

  • GROUP 2007
  • Kriplean, T., Beschastnikh, I., McDonald, D. W., & Golder, S. A., 2007
  • Abstract: When large groups cooperate, issues of conflict and control surface because of differences in perspective. Managing such diverse views is a persistent problem in cooperative group work. The Wikipedian community has responded with an evolving body of policies that provide shared princi- ples, processes, and strategies for collaboration. We employ a grounded approach to study a sample of active talk pages and examine how policies are employed as contributors work towards consensus. Although policies help build a stronger community, we find that ambiguities in policies give rise to power plays. This lens demonstrates that support for mass collaboration must take into account policy and power.
  • Importance: ...
  • open ACM

Don't look now, but we've created a bureaucracy: The nature and roles of policies and rules in Wikipedia edit

  • CHI : SIGCHI conference on Human factors in computing systems
  • Butler, B., Joyce, E., & Pike, J., 2008
  • Abstract: Wikis are sites that support the development of emergent, collective infrastructures that are highly flexible and open, suggesting that the systems that use them will be egalitarian, free, and unstructured. Yet it is apparent that the flexible infrastructure of wikis allows the development and deployment of a wide range of structures. However, we find that the policies in Wikipedia and the systems and mechanisms that operate around them are multi-faceted. In this descriptive study, we draw on prior work on rules and policies in organizations to propose and apply a conceptual framework for understanding the natures and roles of policies in wikis. We conclude that wikis are capable of supporting a broader range of structures and activities than other collaborative platforms. Wikis allow for and, in fact, facilitate the creation of policies that serve a wide variety of functions.
  • Importance: ...
  • open

open author preprintACM

Talk Before You Type: Coordination in Wikipedia edit

  • HICSS 2007
  • Fernanda B. Viégas, Martin Wattenberg, Jesse Kriss, Frank van Ham, 2007
  • Abstract: Wikipedia, the online encyclopedia, has attracted attention both because of its popularity and its unconventional policy of letting anyone on the internet edit its articles. This paper describes the results of an empirical analysis of Wikipedia and discusses ways in which the Wikipedia community has evolved as it has grown. We contrast our findings with an earlier study [11] and present three main results. First, the community maintains a strong resilience to malicious editing, despite tremendous growth and high traffic. Second, the fastest growing areas of Wikipedia are devoted to coordination and organization. Finally, we focus on a particular set of pages used to coordinate work, the “Talk” pages. By manually coding the content of a subset of these pages, we find that these pages serve many purposes, notably supporting strategic planning of edits and enforcement of standard guidelines and conventions. Our results suggest that despite the potential for anarchy, the Wikipedia community places a strong emphasis on group coordination, policy, and process.
  • Paper importance: 197 citations according to Google Scholar
  • open IEEE

Us vs. Them: Understanding Social Dynamics in Wikipedia with Revert Graph Visualizations edit

  • VAST 2007
  • Bongwon Suh, Ed H. Chi, Bryan A. Pendleton, Aniket Kittur, 2007
  • Abstract: Wikipedia is a wiki-based encyclopedia that has become one of the most popular collaborative on-line knowledge systems. As in any large collaborative system, as Wikipedia has grown, conflicts and coordination costs have increased dramatically. Visual analytic tools provide a mechanism for addressing these issues by enabling users to more quickly and effectively make sense of the status of a collaborative environment. In this paper we describe a model for identifying patterns of conflicts in Wikipedia articles. The model relies on users’ editing history and the relationships between user edits, especially revisions that void previous edits, known as “reverts”. Based on this model, we constructed Revert Graph, a tool that visualizes the overall conflict patterns between groups of users. It enables visual analysis of opinion groups and rapid interactive exploration of those relationships via detail drill- downs. We present user patterns and case studies that show the effectiveness of these techniques, and discuss how they could generalize to other systems.
  • Importance: ...
  • open1 open2 IEEE

Computing semantic relatedness using Wikipedia-based explicit semantic analysis edit

  • Proceedings of the 20th International Joint Conference on Artificial Intelligence
  • Evgeniy Gabrilovich , Shaul Markovitch ,2007
  • Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA results in substantial improvements in correlation of computed relatedness scores with human judgments: from r =0.56 to 0.75 for individual words and from r =0.60 to 0.72 for texts. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.
  • Paper importance: 627 citations according to Google scholar
  • open1, open2 citeseer

Measuring wikipedia edit

  • International Conference of the International Society for Scientometrics and Informetrics,
  • Voss, Jakob, 2005
  • Wikipedia, an international project that uses Wiki software to collaboratively create an encyclopaedia, is becoming more and more popular. Everyone can directly edit articles and every edit is recorded. The version history of all articles is freely available and allows a multitude of examinations. This paper gives an overview on Wikipedia research. Wikipedia’s fundamental components, i.e. articles, authors, edits, and links, as well as content and quality are analysed. Possibilities of research are explored including examples and first results. Several characteristics that are found in Wikipedia, such as exponential growth and scale-free networks are already known in other context. However the Wiki architecture also possesses some intrinsic specialities. General trends are measured that are typical for all Wikipedias but vary between languages in detail.
  • Paper importance: 255 citations according to Google scholar
  • open, E-Lis

Large-scale named entity disambiguation based on Wikipedia data edit

  • Proceedings of EMNLP-CoNLL
  • Silviu Cucerzan, 2007
  • This paper presents a large-scale system for therecognition and semantic disambiguation ofnamed entities based on information extractedfrom a large encyclopedic collection and Websearch results. It describes in detail the disam-biguation paradigm employed and the informationextraction process from Wikipedia. Through aprocess of maximizing the agreement between thecontextual information extracted from Wikipediaand the context of a document, as well as theagreement among the category tags associatedwith the candidate entities, the implemented sys-tem shows high disambiguation accuracy on bothnews stories and Wikipedia articles
  • Paper importance: 289 citations according to Google Scholar
  • open

A Content-Driven Reputation System for the Wikipedia edit

  • Proceedings of the 16th International World Wide Web Conference
  • B. Thomas Adler, Luca de Alfaro, 2007
  • We present a content-driven reputation system for Wikipedia authors. In our system, authors gain reputation when the edits they perform to Wikipedia articles are preserved by subsequent authors, and they lose reputation when their edits are rolled back or undone in short order. Thus, author reputation is computed solely on the basis of content evolution; user-to-user comments or ratings are not used. The author reputation we compute could be used to flag new contributions from low-reputation authors, or it could be used to allow only authors with high reputation to contribute to controversial or critical pages. A reputation system for the Wikipedia could also provide an incentive for high-quality contributions.
  • Paper importance: 235 citations according to Google Scholar
  • open bepress

Harnessing the wisdom of crowds in wikipedia: quality through coordination edit

  • Proceedings of the 2008 ACM conference on Computer supported cooperative work,
  • Aniket Kittur, Robert E. Kraut, 2008
  • Wikipedia's success is often attributed to the large numbers of contributors who improve the accuracy, completeness and clarity of articles while reducing bias. However, because of the coordination needed to write an article collaboratively, adding contributors is costly. We examined how the number of editors in Wikipedia and the coordination methods they use affect article quality. We distinguish between explicit coordination, in which editors plan the article through communication, and implicit coordination, in which a subset of editors structure the work by doing the majority of it. Adding more editors to an article improved article quality only when they used appropriate coordination techniques and was harmful when they did not. Implicit coordination through concentrating the work was more helpful when many editors contributed, but explicit coordination through communication was not. Both types of coordination improved quality more when an article was in a formative stage. These results demonstrate the critical importance of coordination in effectively harnessing the "wisdom of the crowd" in online production environments.
  • Paper importance: 179 citations according to Google scholar
  • open

Viable wikis: struggle for life in the wikisphere edit

  • Proceedings of the international symposium on Wikis,
  • Camille Roth, 2007
  • Wikis are collaborative platforms enabling collective elaboration of knowledge, the most famous and possibly the most successful thereof being the Wikipedia. There are currently plenty of other active open-access wikis, with varying success: some recruit many users and achieve sustainability, while others strive to attract sufficient active contributors, irrespective of the topic of the wiki. We make an exploratory investigation of some factors likely to account for these various destinies (such as distinct policies, norms, user incentives, technical and structural features), examining the demographics of a portion of the wikisphere. We underline the intertwining of population and content dynamics and emphasize the existence of different periods of development of a wiki-based community, from bootstrapping by founders with a pre-established set of rules, to more stable regimes where constant enrollment and training of new users balances out the occasional departure of more advanced users.
  • Paper importance: 3 citations according to Google scholar
  • open

Can history be open source? Wikipedia and the future of the past edit

  • The Journal of American History,
  • R Rosenzweig, 2006
  • History is a deeply individualistic craft. The singly authored work is the standard for the profession; only about 6 percent of the more than 32,000 scholarly works indexed since 2000 in this journal's comprehensive bibliographic guide, “Recent Scholarship,” have more than one author. Works with several authors—common in the sciences—are even harder to find. Fewer than 500 (less than 2 percent) have three or more authors.1 Historical scholarship is also characterized by possessive individualism. Good professional practice (and avoiding charges of plagiarism) requires us to attribute ideas and words to specific historians—we are taught to speak of “Richard Hofstadter's status anxiety interpretation of Progressivism.”2 And if we use more than a limited number of words from Hofstadter, we need to send a check to his estate. To mingle Hofstadter's prose with your own and publish it would violate both copyright and professional norms. A historical work without owners and with multiple, anonymous authors is thus almost unimaginable in our professional culture. Yet, quite remarkably, that describes the online encyclopedia known as Wikipedia, which contains 3 million articles (1 million of them in English). History is probably the category encompassing the largest number of articles.
  • Paper importance: 157 citations according to Google scholar
  • pdf open, [ http://chnm.gmu.edu/essays-on-history-new-media/essays/?essayid=42 html open]
  • French translation of the article : http://clioweb.free.fr/debats/rosen-fr.htm

Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary edit

  • Proceedings of the Conference on Language Resources and Evaluation,
  • Torsten Zesch, Christof Müller and Iryna Gurevych, 2008
  • Recently, collaboratively constructed resources such as Wikipedia and Wiktionary have been discovered as valuable lexical semantic knowledge bases with a high potential in diverse Natural Language Processing (NLP) tasks. Collaborative knowledge bases however significantly differ from traditional linguistic knowledge bases in various respects, and this constitutes both an asset and an impediment for research in NLP. This paper addresses one such major impediment, namely the lack of suitable programmatic access mechanisms to the knowledge stored in these large semantic knowledge bases. We present two application programming interfaces for Wikipedia and Wiktionary which are especially designed for mining the rich lexical semantic information dispersed in the knowledge bases, and provide efficient and structured access to the available knowledge. As we believe them to be of general interest to the NLP community, we have made them freely available for research purpose
  • Paper importance: 134 citations according to Google scholar
  • open

Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. edit

  • GROUP International Conference on Supporting Group Work
  • Susan Bryant, Andrea Forte, Amy Bruckman
  • An empirically derived description of how an early sample of expert Wikipedians went from being readers, to novice Wikipedians, and, eventually experts. Uses the theoretical lenses of activity theory and legitimate peripheral participation to organize and present data about transformation of participation among editors.
  • Paper Importance : This was among the first scholarly papers to offer a theoretically informed account of how new editors first join the Wikipedia community and get better at making contributions. It's been cited by over 350 other works according to Google Scholar, but, perhaps more a important measure of impact, it has appeared on many graduate course syllabi and PhD qualifying exam reading lists in the past 6 years: Carnegie Mellon, Georgia Institute of Technology, University of Washington, University of California at Irvine, Univ. of Calif at Berkeley, University of Minnesota, among others. This paper is foundational in that many contemporary Wikipedia researchers read (and perhaps critiqued!) it before they went on to become published Wikipedia researchers themselves.
  • http://andreaforte.net/BryantForteBruckBecomingWikipedian.pdf

Wikipedia Revision Toolkit: Efficiently Accessing Wikipedia's Edit History edit

  • Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.
  • Oliver Ferschke, Torsten Zesch and Iryna Gurevych, 2011
  • We present an open-source toolkit which allows (i) to reconstruct past states of Wikipedia, and (ii) to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that, the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP, but access is severely impeded by the lack of efficient tools for managing the huge amount of provided data. By using a dedicated storage format, our toolkit massively decreases the data volume to less than 2% of the original size, and at the same time provides an easy-to-use interface to access the revision data. The language-independent design allows to process any language represented in Wikipedia. We expect this work to consolidate NLP research using Wikipedia in general, and to foster research making use of the knowl- edge encoded in Wikipedia’s edit history.
  • Paper importance: 3 citations according to Google scholar
  • open

Understanding collaboration in Wikipedia edit

  • First Monday, 2011, vol. 16, núm. 12.
  • Kimmons, Royce.
  • Previous attempts at studying collaboration within Wikipedia have focused on simple metrics like rigor (i.e., the number of revisions in an article’s revision history) and diversity (i.e., the number of authors that have contributed to a given article) or have made generalizations about collaboration within Wikipedia based upon the content validity of a few select articles. By looking more closely at metrics associated with each extant Wikipedia article (N=3,427,236) along with all revisions (N=225,226,370), this study attempts to understand what collaboration within Wikipedia actually looks like under the surface. Findings suggest that typical Wikipedia articles are not rigorous, in a collaborative sense, and do not reflect much diversity in the construction of content and macro–structural writing, leading to the conclusion that most articles in Wikipedia are not reflective of the collaborative efforts of the community but, rather, represent the work of relatively few contributors.
  • open at Frist Monday

Adhocratic governance in the internet age: a case of Wikipedia edit

  • Journal of information technology & politics, , 2010, vol. 7, núm. 4, pp. 263-283.
  • Konieczny, Piotr
  • In recent years, a new realm has appeared for the study of political and sociological phenomena: the Internet. This article will analyze the decision-making processes of one of the largest online communities, Wikipedia. Founded in 2001, Wikipedia—now among the top-10 most popular sites on the Internet—has succeeded in attracting and organizing millions of volunteers and creating the world's largest encyclopedia. To date, however, little study has been done of Wikipedia's governance. There is substantial confusion about its decision-making structure. The organization's governance has been compared to many decision-making and political systems—from democracy to dictatorship, from bureaucracy to anarchy. It is the purpose of this article to go beyond the earlier simplistic descriptions of Wikipedia's governance in order to advance the study of online governance, and of organizations more generally. As the evidence will show, while Wikipedia's governance shows elements common to many traditional governance models, it appears to be closest to the organizational structure known as adhocracy.
  • open at Taylor % Francis Online

WP:Clubhouse? An Exploration of Wikipedia’s Gender Imbalance edit

  • WikiSym 2011 (7th International Symposium on Wikis and Open Collaboration)
  • Shyong (Tony) K. Lam, Anuradha Uduwage, Zhenhua Dong, Shilad Sen, David R. Musicant, Loren Terveen, and John Riedl
  • Wikipedia has rapidly become an invaluable destination for millions of information-seeking users. However, media reports suggest an important challenge: only a small fraction of Wikipedia’s legion of volunteer editors are female. In the current work, we present a scientific exploration of the gender imbalance in the English Wikipedia’s population of editors. We look at the nature of the imbalance itself, its effects on the quality of the encyclopedia, and several conflict-related factors that may be contributing to the gender gap. Our findings confirm the presence of a large gender gap among editors and a corresponding gender-oriented disparity in the content of Wikipedia’s articles. Further, we find evidence hinting at a culture that may be resistant to female participation.
  • Winner of Best Long Paper Award at WikiSym 2011
  • PDF

Ranking of Wikipedia articles in search engines revisited: Fair ranking for reasonable quality? edit

  • Journal of the American Society for Information Science & Technology 62(2011)1, 117-132
  • Dirk Lewandowski, Ulrike Spree
  • This paper aims to review the fiercely discussed question of whether the ranking of Wikipedia articles in search engines is justified by the quality of the articles. After an overview of current research on information quality in Wikipedia, a summary of the extended discussion on the quality of encyclopedic entries in general is given. On this basis, a heuristic method for evaluating Wikipedia entries is developed and applied to Wikipedia articles that scored highly in a search engine retrieval effectiveness test and compared with the relevance judgment of jurors. In all search engines tested, Wikipedia results are unanimously judged better by the jurors than other results on the corresponding results position. Relevance judgments often roughly correspond with the results from the heuristic evaluation. Cases in which high relevance judgments are not in accordance with the comparatively low score from the heuristic evaluation are interpreted as an indicator of a high degree of trust in Wikipedia. One of the systemic shortcomings of Wikipedia lies in its necessarily incoherent user model. A further tuning of the suggested criteria catalog, for instance, the different weighing of the supplied criteria, could serve as a starting point for a user model differentiated evaluation of Wikipedia articles. Approved methods of quality evaluation of reference works are applied to Wikipedia articles and integrated with the question of search engine evaluation.
  • Most-downloaded current paper in JASIST
  • open at ArXiv pdf at publisher

Studying cooperation and conflict between authors with history flow visualizations edit

  • Proceedings of the SIGCHI conference on Human factors in computing systems (CHI '04). ACM, New York, NY, USA, 575-582.
  • Fernanda B. Viégas, Martin Wattenberg, and Kushal Dave, 2004
  • The Internet has fostered an unconventional and powerful style of collaboration: "wiki" web sites, where every visitor has the power to become an editor. In this paper we investigate the dynamics of Wikipedia, a prominent, thriving wiki. We make three contributions. First, we introduce a new exploratory data analysis tool, the history flow visualization, which is effective in revealing patterns within the wiki context and which we believe will be useful in other collaborative situations as well. Second, we discuss several collaboration patterns highlighted by this visualization tool and corroborate them with statistical analysis. Third, we discuss the implications of these patterns for the design and governance of online collaborative social spaces. We focus on the relevance of authorship, the value of community surveillance in ameliorating antisocial behavior, and how authors with competing perspectives negotiate their differences.
  • I think this paper has contributed like no other to boost the popularity of Wikipedia research in many disciplines. Its work is appealing, valuable and easily understandable for a wide audience. Pioneering contribution to understand the grounds of collaborative editing in Wikipedia, at a very early stage (published in 2004). 604 citations so far, according to Google Scholar.
  • http://domino.watson.ibm.com/cambridge/research.nsf/0/53240210b04ea0eb85256f7300567f7e/$FILE/TR2004-19.pdf
  • http://web.media.mit.edu/~fviegas/papers/history_flow.pdf
  • http://opensource.mit.edu/papers/viegaswattenbergdave.pdf

What Motivates Wikipedians? edit

  • Communications of the ACM, 2007, Vol. 50 (11) pp.60-64.
  • Nov, Oded. 2004
  • In order to increase and enhance user-generated content contributions, it is important to understand the factors that lead people to freely share their time and knowledge with others.
  • The paper addresses one of the fundamental issues to the success and survival of Wikipedia: understanding why people contribute and how to support ongoing contribution.
  • http://faculty.poly.edu/~onov/Nov_What%20Motivates%20Wikipedians_CACM_print_version.pdf

Feedback Mechanisms and their Impact on Motivation to Contribute to Wikis in Higher Education edit

  • Proceedings of WikiSym '11, 215–216.
  • Athanasios Mazarakis and Clemens van Dinther, 2011
  • The success of Wikis depends very strongly on the user participation and the willingness to edit. In this paper we examine within an experiment which influence different kinds of feedback have on the motivation to edit a Wiki page. The results indicate a positive impact of feedback on the willingness to participate in the Wiki for any of the used feedback mechanisms.
  • It is important because it can effectively stop the decrease of authors in Wikipedia. Feedback mechanisms are an easy way to give system neutral and automatic feedback. It is not meant to increase the quality of articles. Instead the purpose is to raise participation of already registered authors with at least one edit.
  • http://www.im.uni-karlsruhe.de/Upload/Publications/e73361db-1a19-449b-a17e-312be8fc4a11.pdf

Cultural Configuration of Wikipedia: Measuring Autoreferentiality in Different Languages edit

  • Proceedings of RANLP '11, 316-322,
  • Marc Miquel-Ribé and Horacio Rodríguez, 2011
  • Among the motivations to write in Wikipedia given by the current literature there is often coincidence, but none of the studies presents the hypothesis of contributing for the visibility of the own national or language related content. Similar to topical coverage studies, we outline a method which allows collecting the articles of this content, to later analyse them in several dimensions.
  • To prove its universality, the tests are repeated for up to twenty language editions of Wikipedia. Finally, through the best indicators from each dimension we obtain an index which represents the degree of autoreferentiality of the encyclopedia.
  • http://aclweb.org/anthology-new/R/R11/R11-1044.pdf