Wikimedia Blog/Drafts/607 journalists: How does Wikipedia respond to breaking news?

Published 8/17/15

Title ideas edit

  • 607 journalists: How does Wikipedia respond to breaking news?
  • ...

Summary edit

A brief, one-paragraph summary of the post's content, about 20-80 words. On the blog, this will be shown in the chronological list of posts or in the featured post carousel on top, next to a "Read more" link.

  • Wikipedia often deals with breaking news that is developed and expanded rapidly by its community of volunteer editors. 607 Journalists is a thesis that looks deeper into this development, investigating the speed of development, verifiability of article text, and the range of contributors to the article overall.

Body edit


Wikipedia is capable of covering news like any news agency. Photo by Kai Mörk, freely licensed under CC BY 3.0 (Germany).

Over almost fifteen years, the scope of topics that Wikipedia covers has continued to grow. Now, the free online encyclopedia covers everything from music, film and video games to geography, history and the sciences. It also contains information on news topics, updated by the thousands of volunteers swiftly and collaboratively as it breaks.

To investigate aspects of this phenomenon, such as the speed with which breaking news is covered on Wikipedia, the verifiability of information added over time, and the distribution of edits among Wikipedia's editors, I selected an article for further analysis in the form of a dissertation. Others have done research into this area; their work, methods and outcomes heavily influenced this study. In particular, Brian Keegan's work was instrumental in guiding the direction for this research; his 2013 study into breaking news, co-authored with Darren Gergle and Noshir Contractor, covers a far wider range than this thesis did.

The article selected was "Shooting of Michael Brown", which covered the killing of 18-year-old Michael Brown in Ferguson, Missouri, by police officer Darren Wilson. The incident attracted much press attention fuelled by protest in the town, a suburb of St. Louis. The article's history was observed until January 12, 2015.

Comparing page views and daily edit counts for the article, highlight key elements in the story's development.

The resulting data was split into two "peaks" in the development of this story: The initial media scramble after protests began in mid-August, and the Ferguson grand jury's decision not to indict Wilson for the teenager's death in late November. Each "peak" represented 500 indiviual "revisions" of the article in question. The use of peaks in this case allowed for cross-case analysis—that is, a direct comparison between two case studies.

The first peak was defined as the 500 edits made between 09:38 UTC on 16 August 2014 and 17:54 UTC on 18 August 2014 (a period of 2 days, 8 hours and 16 minutes), and the second, between 00:57 UTC on 23 November 2014 and 22:36 UTC on 01 December 2014 (a period of 8 days, 21 hours and 39 minutes).

Speed of editing edit


Graphing the speed of editing across both peaks of development. Image by Joe Sutherland, freely licensed under CC BY-SA 4.0.

Notably, pageviews and edit rates didn't line up as one might expect. Instead, there was a great flurry of edits a few days after the article was created, presumably as the editing community learned of the article's existance. The speed of editing was incredibly fast during this initial period of rioting and press attention, though highly inconsistent. The mean editing rate across this period was 18.57 edits per hour, more than eleven times the overall average for the article.

Media coverage, however, has a much more acute impact on pageviews: upon Darren Wilson's indictment decision in November, almost half a million people visited the article in just one day. A somewhat surprising observation was that this second peak resulted in much slower rates of editing. The mean for this period was just 7.21 edits per hour, which was two and a half times slower than in the first. It is also very inconsistent, mirroring the first peak.

In terms of text added to the article, the first peak—which was obversed over a much shorter period of time—saw an average of 501.02 bytes of text added per hour, some 3.6 times quicker than the rate of the second peak. By then, however, the article was much longer and the causation can likely be that there wasn't much left to add by that point.

Use of sources edit

To judge the article's accuracy is a very difficult task, which would by its very nature be subjective and require an in-depth knowledge of what happened in Ferguson that afternoon. To this end, the verifiability of the article was looked at instead—specifically, the volume of sources per kilobyte of text, referred to for this study as the article's "reference density".


"Reference densities" over each peak. Image by Joe Sutherland, freely licensed under CC BY-SA 4.0.

Ten samples were taken systematically for this research from each peak, and their references tallied. This was used in conjunction with the page's size in kilobytes to find the "reference density" — the number of references per kilobyte of page text.

In both peaks, the reference density steadily increased over time. It was significantly higher overall in the earlier peak, when the article was shorter and rapidly-changing information required more verification. This rise in reference density over time likely indicates Wikipedians' desire to ensure information added is not removed as unverifiable.

The majority of sources used in the article were from publications which focus on print media. This is more obvious in the second peak than the first, where local newspaper The St. Louis Post-Dispatch became much more common among the article's sources.


Locations of sources used within the article per peak. Image by Joe Sutherland, freely licensed under CC BY-SA 4.0.

Relatedly, it was discovered that a high volume of the sources were from media based in the state of Missouri, obviously local to the shooting location itself. The porportion falling into this category actually increased by the second peak, from just over 18 percent to just over a fifth of all sources. Other local sources which were regularly used in the article included the St. Louis American and broadcasters KTVI and KMOV.

It was the state of New York which provided the majority of sources, however; this seems to indicate that editors tend towards big-name, reputable sources such as the New York Times and USA Today, which both placed highly on ranking lists. Notably, the state of Georgia was almost exclusively represented by national broadcaster CNN, yet still made up around 10 percent of all sources used.

Range of contributors edit

Finally, the editing patterns of users were looked into to judge the distribution of edits among them. To do this, users were placed into categories based on their rates of editing—which, for the purposes of this study, was defined as their mean edits per day. Categories were selected to divide editors as evenly as possible for the analysis, and six bots were excluded to prevent the skewing of results.

Edits/day Category Count % Count of which status % Status
40+ Power users 27 4.49% 20 74.07%
10–40 Highly active users 73 12.15% 38 52.05%
5–10 Very active users 67 11.15% 26 38.81%
1–5 Active users 105 17.47% 19 18.10%
0.1–1 Casual users 92 15.31% 4 4.35%
0.01–0.1 Infrequent users 62 10.32% 0 0%
<0.01 Very infrequent users 13 2.16% 0 0%
IPs Anonymous users 162 26.96% 0 0%
Total/average 601 100% 107 17.80%

Clearly, the majority of users in the highly active and power users brackets hold some kind of status, whether that be the "rollback" tool given out by administrators, or elected roles such as administrator or bureaucrat. This at least implies that more daily edits can translate roughly into experience or trust on the project.

Looking at data added per category, highly active users have been responsible for the vast majority of the total content added to the article—over half of the total. However, breaking it down into mean content added per edit for each category provided some intriguing results.


Mean content added per edit, in bytes, per experience category. Image by Joe Sutherland, freely licensed under CC BY-SA 4.0.

While the highly active users take this crown too, it is a much closer race. Perhaps unintuitively, "casual" editors—those with fewer than one edit per day, but more than 0.1—added an average of 95.81 bytes per edit, and the category directly below that added 93.70 bytes per edit. This suggests that article editing is not just done by the heavily-active users on Wikipedia, but by a wide range of users with vastly different editing styles and experience.

Edits to the article were most commonly made a by very small group of users. Indeed, 58 percent of edits made to the article were by the top ten contributors, while over half of contributors made just one edit. Text added to the article followed the same pattern, though more pronounced: the same top ten contributed more than two-thirds of the content article content. This lends weight to theories that Wikipedia articles tend to worked on by a core "team", while others contribute with more minor edits and vandalism reversion.

Overall, the study shows that Wikipedia works on breaking news much like a traditional newsroom—verifiability is held in high regard, and a "core group" of editors tend to contribute a vast majority of the content. Editing rates, however, do not match up as obviously with peaks of media activity, which is worth investigating in future more qualitatively.

If you're interested in reading the full thesis, it's available from my website. For more academic research into Wikipedia, consider subscribing to the monthly Wikimedia Research newsletter.

Joe Sutherland, Wikimedia Foundation communications intern

Notes edit

Ideas for social media messages promoting the published post:

Twitter (@wikimedia/@wikipedia):

(Tweet text goes here - max 117 characters)


  • ...