Wikimedia Research Newsletter
Wikimedia Research Newsletter Logo.png

Vol: 9 • Issue: 05 • May 2019 [contribute] [archives]

Wikipedia more useful than academic journals, but is it stealing the news?

With contributions by: Tilman Bayer and Smallbones

"Is Wikipedia stealing the news?"Edit

A paper in the current issue of First Monday[1] "analyzes Wikipedia’s breaking news practices and the ways the Internet is changing perceptions of news", based on a case study of the article 2014 Sydney hostage crisis.

The author is a lecturer in journalism at the University of Sydney, and co-organiser of an upcoming academic conference co-sponsored by Wikimedia Australia ("The Worlds of Wikimedia™: communicating and collaborating across languages and cultures"). In a press release by the university, somewhat provocatively titled "Is Wikipedia stealing the news?" (see also podcast, starting at 21:55), she describes Wikipedia as "a competitor to media organisations" and states:

Wikipedia contributors don't undertake the core role of journalists, which is to produce new work. Contributors' news gathering practices are solely "aggregation and assemblage", and it is important to recognise that the journalistic labour that underpins a Wikipedia page has been funded by media organisations and appropriated without economic consideration.

The case study in the paper itself includes:

  • a detailed timeline of reactions to the event (e.g. police arriving seven minutes after the hostage-taking, the first journalist tweeting about it after eight minutes, and the Wikipedia article being created within less than two hours)
  • an explanation of relevant Wikipedia policies and guidelines (e.g. no original research and WP:NOTNEWS)
  • some statistics about the article's revision history and the traffic it received
  • a classification of the references used, using the three categories "Local news media", "International news media", and "Non-mainstream media"
  • an examination of discussions on the article's talk page, showing "just how closely the behaviour of non-journalists resembles that of a professional newsroom."

The author also interviewed a senior Wikipedian involved in the article.

The paper criticizes the "reasoning [behind some of Wikipedias policies and protocols around news as] contradictory. The claim [in WP:NOTNEWS ] that breaking news should not be emphasized or treated differently doesn’t fit with the specific parameters set by their ‘current event’ template. The entry also claims that Wikipedia is not written in ‘news style’ which also doesn’t hold up to scrutiny ... The 2014 Sydney hostage crisis page clearly conforms: the lead sentence contains who, what, when, and where [Five Ws], is written in past tense and the information is presented according to an inverted pyramid structure."

Alongside the presence of other Wikipedia features such as the "In the News" section on the main page and the use of infoboxes to summarize essential information, the author interprets this as a vindication of traditional news-writing practices: "Over the decades since [Wikipedia's founding], through trial and error and negotiation, the community has adopted a form for presenting information that is readily recognisable as employing news conventions ... . This demonstrates the ongoing versatility of news writing style as an efficient form of communication that extends beyond legacy newspapers, where it originated, and into new forms as they emerge on the Internet." She acknowledges the quality work done by the Wikipedia volunteers, with talk pages "show[ing] just how closely the behaviour of non-journalists resembles that of a professional newsroom."

While these conclusions are backed by detailed observations about Wikipedia, the paper offers few arguments to substantiate the appropriation and competition claims highlighted in the press release. In a Facebook discussions with Wikipedians, the author distanced herself from "stealing" headline, but otherwise seemed to stand by these concerns. Her use of terms like "appropriated", "in the economic sense", "payment" etc. suggests an underlying assumption of property rights about facts that is at odds with the existing legal and economic system that has been underlying the news business in Western countries for a long time. In copyright law, this relies on the idea–expression divide, or specifically in Australia on the seminal court decision Victoria Park Racing & Recreation Grounds Co Ltd v Taylor, which asserted: "The law of copyright does not operate to give any person an exclusive right to state or to describe particular facts. A person cannot by first announcing that a man fell off a bus or that a particular horse won a race prevent other people from stating those facts". It seems that Avieson disagrees with this, at least when the first person is a journalist and those "other people" are Wikipedia editors. Given that journalists themselves routinely rely on the "labour" of other journalists without compensating them (most newspaper articles don't exclusively consist of original reporting) and on that of their sources (paying them is a highly controversial practice even when those sources undergo substantial efforts or risks to provide information to the journalist), it's hard to escape the impression that this paper falls into a common trap of Wikipedia criticism: Berating the open, volunteer community project for practices that are in fact common in traditional, commercial media as well.

Conferences and eventsEdit

See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines, and the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.

Other recent publicationsEdit

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer

"Wikipedia can be more useful than academic journal articles" for learning about certain technologiesEdit

From the abstract:[2]

This article analyses five technology-enhanced learning-related terms on Wikipedia, assessing their usefulness in relation to academic journal articles concerning the same terms. Data were obtained about the word lengths of the Wikipedia articles, the numbers of Wikipedia edits and numbers of academic journal publications over the first 5 years after the creation of the first Wikipedia entry. ... The article argues that Wikipedia can be more useful than academic journal articles in the new and emerging phases of a technology, because of the volume of information made available, together with the speed of its publication and the updating of its contents.

"The Network Structure of Successful Collaboration in Wikipedia"Edit

From the abstract:[3]

... we compare the network mechanisms underlying the production of the complete set of featured articles, with the network mechanisms of a contrasting sample of comparable non-featured articles in the English-language edition of Wikipedia. Estimates of relational event models suggest that contributors to featured articles display greater deference toward the reputation of their team members. Contributors to featured articles also display a weaker tendency to follow the behavioral norms predicted by the theory of structural balance, and hence a weaker tendency toward polarization.

(See also our earlier review of a paper by the same authors: "Articles receiving the most attention (by editors) overall lack the depth of quality found in featured articles")

"Negotiation processes on Wikipeda talk pages in case of the White Rose"Edit

Paper/book chapter in German[4], title translates as "How does communicative memory become cultural memory? Negotiation processes on Wikipeda talk pages in case of the White Rose"

"Application of SEO Metrics to Determine the Quality of Wikipedia Articles and Their Sources"Edit

From the abstract:[5]

Based on the fact that most of [Wikipedia's] references are web pages, it is possible to get more information about their quality by using citation analysis tools. ... This paper presents general results of Wikipedia analysis using metrics from the Toolbox SISTRIX, which is one of the leading providers of indicators for Search Engine Optimization (SEO). In addition to the preliminary analysis of the Wikipedia articles as separate web pages, we extracted data from more than 30 million references in different language versions of Wikipedia and analyzed over 180 thousand most popular hosts.

(See also related earlier coverage)

Wikipedia biographies show how the invention of printing shaped the history of science and artEdit

From the abstract:[6]

Here we combine a common causal inference technique (instrumental variable estimation) with a dataset on nearly forty thousand biographies from Wikipedia (Pantheon 2.0), to study the effect of the introduction of printing in European cities on Wikipedia’s digital biographical records. By using a city’s distance to Mainz as an instrument for the adoption of the movable type press, we show that European cities that adopted printing earlier were more likely to become the birthplace of a famous scientist or artist during the years following the invention of printing.

"What is the central bank of Wikipedia?"Edit

From the abstract: [7]

We analyze the influence and interactions of 60 largest world banks for 195 world countries using the reduced Google matrix algorithm for the English Wikipedia network with 5 416 537 articles. While the top asset rank positions are taken by the banks of China, with China Industrial and Commercial Bank of China at the first place, we show that the network influence is dominated by USA banks with Goldman Sachs being the central bank.

"Generating Wikipedia by Summarizing Long Sequences"Edit

From the abstract:[8]

We show that generating English Wikipedia articles can be approached as a multi-document summarization of source documents. We use extractive summarization to coarsely identify salient information and a neural abstractive model to generate the article. ... We show that this model can generate fluent, coherent multi-sentence paragraphs and even whole Wikipedia articles.

See also media coverage

"Computing controversy: Formal model and algorithms for detecting controversy on Wikipedia and in search queries"Edit

From the abtract:[9]

... we propose a classification based method for automatic detection of controversial articles and categories in Wikipedia. Next, we demonstrate how to use the obtained results for the estimation of the controversy level of search queries. The proposed method can be incorporated into search engines as a component responsible for detection of queries related to controversial topics. The method is independent of the search engine’s retrieval and search results recommendation algorithms, and is therefore unaffected by a possible filter bubble. Our approach can be also applied in Wikipedia or other knowledge bases for supporting the detection of controversy and content maintenance.


  1. Avieson, Bunty (2019-04-30). "Breaking news on Wikipedia: collaborating, collating and competing". First Monday 24 (5). ISSN 1396-0466. doi:10.5210/fm.v24i5.9530. 
  2. Flavin, Michael; Hulova, Katerina (2018-11-23). "An inferior source? Quantitatively analysing the production and revision of five technology-enhanced learning-related terms on Wikipedia". Research in Learning Technology 26. ISSN 2156-7077. doi:10.25304/rlt.v26.2103.  CC BY 4.0
  3. Lerner, Juergen; Lomi, Alessandro (2019-01-08). The Network Structure of Successful Collaboration in Wikipedia. 52nd Annual Hawaii International Conference on System Sciences. p. 2622-2631. ISBN 9780998133126. 
  4. Heinrich, Horst-Alfred; Gilowsky, Julia (2018). "Wie wird kommunikatives zu kulturellem Gedächtnis? Aushandlungsprozesse auf den Wikipedia-Diskussionsseiten am Beispiel der Weißen Rose". (Digitale) Medien und soziale Gedächtnisse. Soziales Gedächtnis, Erinnern und Vergessen – Memory Studies. Springer VS, Wiesbaden. pp. 143–167. ISBN 9783658195120.    Google Books
  5. Lewoniewski, Włodzimierz; Härting, Ralf-Christian; Węcel, Krzysztof; Reichstein, Christopher; Abramowicz, Witold (2018). "Application of SEO Metrics to Determine the Quality of Wikipedia Articles and Their Sources". In Robertas Damaševičius; Giedrė Vasiljevienė. Information and Software Technologies. Communications in Computer and Information Science. Springer International Publishing. pp. 139–152. ISBN 9783319999722. doi:10.1007/978-3-319-99972-2_11.   
  6. Jara-Figueroa, C.; Yu, Amy Z.; Hidalgo, César A. (2019-02-20). "How the medium shapes the message: Printing and the rise of the arts and sciences". PLOS ONE 14 (2): –0205771. ISSN 1932-6203. doi:10.1371/journal.pone.0205771. Retrieved 2019-03-24. 
  7. Demidov, Denis; Frahm, Klaus M.; Shepelyansky, Dima L. (2019-02-21). "What is the central bank of Wikipedia?". arXiv:1902.07920 [physics, q-fin]. 
  8. Liu, Peter J.; Saleh, Mohammad; Pot, Etienne; Goodrich, Ben; Sepassi, Ryan; Kaiser, Lukasz; Shazeer, Noam (2018-01-30). "Generating Wikipedia by Summarizing Long Sequences". 
  9. Zielinski, Kazimierz; Nielek, Radoslaw; Wierzbicki, Adam; Jatowt, Adam (2018-01-01). "Computing controversy: Formal model and algorithms for detecting controversy on Wikipedia and in search queries". Information Processing & Management 54 (1): 14–36. ISSN 0306-4573. doi:10.1016/j.ipm.2017.08.005. 

Wikimedia Research Newsletter
Vol: 9 • Issue: 05 • May 2019
About • Subscribe: Email      [archives][Signpost edition][contribute][research index]