Wikimedia Blog/Drafts/607 journalists: How does Wikipedia respond to breaking news?
Title ideas edit
- 607 journalists: How does Wikipedia respond to breaking news?
A brief, one-paragraph summary of the post's content, about 20-80 words. On the blog, this will be shown in the chronological list of posts or in the featured post carousel on top, next to a "Read more" link.
- Wikipedia often deals with breaking news that is developed and expanded rapidly by its community of volunteer editors. 607 Journalists is a thesis that looks deeper into this development, investigating the speed of development, verifiability of article text, and the range of contributors to the article overall.
Over almost fifteen years, the scope of topics that Wikipedia covers has continued to grow. Now, the free online encyclopedia covers everything from music, film and video games to geography, history and the sciences. It also contains information on news topics, updated by the thousands of volunteers swiftly and collaboratively as it breaks.
To investigate aspects of this phenomenon, such as the speed with which breaking news is covered on Wikipedia, the verifiability of information added over time, and the distribution of edits among Wikipedia's editors, I selected an article for further analysis in the form of a dissertation. Others have done research into this area; their work, methods and outcomes heavily influenced this study. In particular, Brian Keegan's work was instrumental in guiding the direction for this research; his 2013 study into breaking news, co-authored with Darren Gergle and Noshir Contractor, covers a far wider range than this thesis did.
The article selected was "Shooting of Michael Brown", which covered the killing of 18-year-old Michael Brown in Ferguson, Missouri, by police officer Darren Wilson. The incident attracted much press attention fuelled by protest in the town, a suburb of St. Louis. The article's history was observed until January 12, 2015.
The resulting data was split into two "peaks" in the development of this story: The initial media scramble after protests began in mid-August, and the Ferguson grand jury's decision not to indict Wilson for the teenager's death in late November. Each "peak" represented 500 indiviual "revisions" of the article in question. The use of peaks in this case allowed for cross-case analysis—that is, a direct comparison between two case studies.
The first peak was defined as the 500 edits made between 09:38 UTC on 16 August 2014 and 17:54 UTC on 18 August 2014 (a period of 2 days, 8 hours and 16 minutes), and the second, between 00:57 UTC on 23 November 2014 and 22:36 UTC on 01 December 2014 (a period of 8 days, 21 hours and 39 minutes).
Speed of editing edit
Notably, pageviews and edit rates didn't line up as one might expect. Instead, there was a great flurry of edits a few days after the article was created, presumably as the editing community learned of the article's existance. The speed of editing was incredibly fast during this initial period of rioting and press attention, though highly inconsistent. The mean editing rate across this period was 18.57 edits per hour, more than eleven times the overall average for the article.
Media coverage, however, has a much more acute impact on pageviews: upon Darren Wilson's indictment decision in November, almost half a million people visited the article in just one day. A somewhat surprising observation was that this second peak resulted in much slower rates of editing. The mean for this period was just 7.21 edits per hour, which was two and a half times slower than in the first. It is also very inconsistent, mirroring the first peak.
In terms of text added to the article, the first peak—which was obversed over a much shorter period of time—saw an average of 501.02 bytes of text added per hour, some 3.6 times quicker than the rate of the second peak. By then, however, the article was much longer and the causation can likely be that there wasn't much left to add by that point.
Use of sources edit
To judge the article's accuracy is a very difficult task, which would by its very nature be subjective and require an in-depth knowledge of what happened in Ferguson that afternoon. To this end, the verifiability of the article was looked at instead—specifically, the volume of sources per kilobyte of text, referred to for this study as the article's "reference density".
Ten samples were taken systematically for this research from each peak, and their references tallied. This was used in conjunction with the page's size in kilobytes to find the "reference density" — the number of references per kilobyte of page text.
In both peaks, the reference density steadily increased over time. It was significantly higher overall in the earlier peak, when the article was shorter and rapidly-changing information required more verification. This rise in reference density over time likely indicates Wikipedians' desire to ensure information added is not removed as unverifiable.
The majority of sources used in the article were from publications which focus on print media. This is more obvious in the second peak than the first, where local newspaper The St. Louis Post-Dispatch became much more common among the article's sources.
Relatedly, it was discovered that a high volume of the sources were from media based in the state of Missouri, obviously local to the shooting location itself. The porportion falling into this category actually increased by the second peak, from just over 18 percent to just over a fifth of all sources. Other local sources which were regularly used in the article included the St. Louis American and broadcasters KTVI and KMOV.
It was the state of New York which provided the majority of sources, however; this seems to indicate that editors tend towards big-name, reputable sources such as the New York Times and USA Today, which both placed highly on ranking lists. Notably, the state of Georgia was almost exclusively represented by national broadcaster CNN, yet still made up around 10 percent of all sources used.
Range of contributors edit
Finally, the editing patterns of users were looked into to judge the distribution of edits among them. To do this, users were placed into categories based on their rates of editing—which, for the purposes of this study, was defined as their mean edits per day. Categories were selected to divide editors as evenly as possible for the analysis, and six bots were excluded to prevent the skewing of results.
|of which status
|Highly active users
|Very active users
|Very infrequent users
Clearly, the majority of users in the highly active and power users brackets hold some kind of status, whether that be the "rollback" tool given out by administrators, or elected roles such as administrator or bureaucrat. This at least implies that more daily edits can translate roughly into experience or trust on the project.
Looking at data added per category, highly active users have been responsible for the vast majority of the total content added to the article—over half of the total. However, breaking it down into mean content added per edit for each category provided some intriguing results.
While the highly active users take this crown too, it is a much closer race. Perhaps unintuitively, "casual" editors—those with fewer than one edit per day, but more than 0.1—added an average of 95.81 bytes per edit, and the category directly below that added 93.70 bytes per edit. This suggests that article editing is not just done by the heavily-active users on Wikipedia, but by a wide range of users with vastly different editing styles and experience.
Edits to the article were most commonly made a by very small group of users. Indeed, 58 percent of edits made to the article were by the top ten contributors, while over half of contributors made just one edit. Text added to the article followed the same pattern, though more pronounced: the same top ten contributed more than two-thirds of the content article content. This lends weight to theories that Wikipedia articles tend to worked on by a core "team", while others contribute with more minor edits and vandalism reversion.
Overall, the study shows that Wikipedia works on breaking news much like a traditional newsroom—verifiability is held in high regard, and a "core group" of editors tend to contribute a vast majority of the content. Editing rates, however, do not match up as obviously with peaks of media activity, which is worth investigating in future more qualitatively.
Joe Sutherland, Wikimedia Foundation communications intern
Ideas for social media messages promoting the published post:
(Tweet text goes here - max 117 characters) ---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|------/