Research:From the media to Wikipedia: the relationship between news and vandalism in the encyclopedia during the 2019 Social Outburst

15:02, 27 September 2022 (UTC)
Dr. Ana María Castillo Hinojosa
Duration:  2022-June – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

This project aims to know the relationship of the news of the media in Chile, with the vandalisms that were detected in the Wikipedia in Spanish during the social outbreak of Chile in 2019. With the detailed history of the articles, it is possible to locate those editions that were reversed and cataloged as vandalism, in order to classify them and know which topics are the ones that generate more vandalism in times of crisis. In this way, it would be possible to know if what happens on Wikipedia is related to the media and if the vandalism is related to them; thus being able to anticipate this type of edition and understand how Wikipedia can contribute to the verification of information.

Goals and ImpactEdit

One of the most important impacts of studying how news affects different platforms is being able to understand how the media shapes our society. The difference, in this case, is that although the news shapes our algorithmic spheres of social networks, it cannot affect “what Wikipedia shows us”, since this is an information platform without algorithms involved. One of the expected results is to be able to recognize some pattern within the media, perhaps review which medium is the one that has the most impact on Wikipedia; traditional media are probably the ones that have the most impact since they have greater visibility in the territory. Understanding the vandalism that happens during a sociopolitical crisis helps us a lot to prepare for the next crisis. Having concrete results on how the news impacts vandalism can help to review article protection policies, perhaps helping to identify which ones can be protected before and not wait for the article to be vandalized. In addition, we cannot rely entirely on “the news tells the truth”, we have seen --especially in Latin America-- that the media opt for likes and do not always verify the information. Therefore, this investigation will also encounter misinforming news that is very likely to affect Wikipedia directly.

Furthermore, and in line with movement strategies, this research is totally related to identifying impact issues and understanding what role misinformation plays in times of crisis in the Wikipedia environment. That way we can know how our projects (not only Spanish Wikipedia) can be misused or manipulated by detecting threats with significant potential for harm, such as misinformation.

Research questionsEdit

  • What news content from the media is the one that produces the most malicious editions on Wikipedia in Spanish during the Social Outburst in Chile in 2019?
  • What are the types of vandalism that occur most on Wikipedia in Spanish based on the topics analyzed in the media during the Social Outburst in Chile in 2019?
  • What types of articles on the Spanish Wikipedia were most vandalized as a result of the news content of the media during the Social Outburst in Chile in 2019?


Several studies, including Geiß et al. 2016[1], show the relationship between mentions of a topic in the news and visits to Wikipedia. Based on different methods, these studies affirm that if an event or topic is repeatedly mentioned in the media, visits to its Wikipedia article and similar articles will increase. The interesting thing is that in some cases these visits end up being transformed into editions that can become vandalism.

During the Social Outbreak in Chile in 2019, there was a contradiction between the reality thought or imagined collectively with the reality reported by the hegemonic media.[2] You can also see the different approaches of the media regarding the news that covered the event. There are differences, such as the fact that the hegemonic media concentrated on narrating the events from the “high command”, while the independent media narrated the events from interviews with civilians.[3] The study analyzes, for example, the case of Televisión Nacional (TVN) which had a speech pointing to civilians as those responsible for most cases of violence in the streets, but media such as Piensa Prensa offered a very different version in which they pointed to the police as the cause of the violence. This position is not new, since the appearance of the media that there are authors who consider them as means of manipulation, where this control comes from the homogenization of taste and the reproduction of a version of reality, which can be shaped by who is responsible for the media. Nowadays, this is reflected in the fact that the main media in Chile continues to belong to elite group companies that influence the public agenda and the informative treatment of what is transmitted. Therefore, the role played by the media in the 2019 Social Outbreak is key to understanding the impact it had on different platforms, such as Wikipedia.

Taking the Social Outbreak as a temporary event, it is possible to locate the news around the events that occurred, furthermore, the fact that they have varied approaches raises the question of what topics were the most influential and made people seek information in other places. As Wikipedia is one of the most visited sites in Chile, this platform can be used as an object of study and the visits can be analyzed together with the history of editions of those articles that, as a result of the news, were the most visited during the social crisis. With the detailed history of the articles, it is possible to locate those editions that were reverted and cataloged as vandalism, to classify them and know which issues are the ones that most generate this type of edition. With this, it would be possible to know if what happens on Wikipedia is related to the media and if the vandalisms have any relation with the information that appears in the news. All of this is to be able to anticipate this type of edition and understand how Wikipedia can contribute to the verification of the information.


Information collection techniquesEdit


To generate the corpus of news to be used in the investigation, a weekly clipping (review and file) will be carried out that will cover the period between October 18th and November 15th. This technique is characterized by the selective collection of information, in our case news, to later “classify the documentation according to pre-established criteria, thus gathering it over time in press dossiers”.[4] In the same way, by being able to categorize this information, it is possible to say that the collected documents make sense when grouped.[5] Thus, this technique allows a selection of the main news items that were related to the Social Outburst in Chile in 2019. Finally, the content will be classified into different categories to be able to quantitatively identify the topics and statements of the news items, to then cross-reference this information with qualitative analysis techniques and thus meet the objectives set.

Web ScrappingEdit

This technique, also known as web harvesting, is used to extract data from the World Wide Web and thus be able to save it in a file or database that allows its analysis.[6] On this occasion, an algorithm will be used, that is, a selection computer program, to compile the vandalism that will be used in the investigation. This algorithm will search SeroBOT's history and identify any reverted edits that have any relevance to the article “Chile”. This will be done by using the “what links here” feature for each edit made by SeroBOT over a period of time. Thus, if an article is related to Chile, it is very likely that it contains the word “Chile” within its text and is linked to its article, so the program will recognize it and add it to the corpus. Once finished with the first iteration, the algorithm will do this process again, but with different articles, for example “Gobierno de Chile”, “Protestas chilenas 2019-2022” and “Metro de Santiago” (because the Social Eruption began with the rise in the metro fare).

Information analysis techniquesEdit

Thematic content analysisEdit

For this research, the information analysis technique that will be used will be the thematic content analysis, which aims to discover the meaning of a message. Specifically, it consists of classifying and/or codifying the various elements of a message into categories to make its meaning appear adequately.[7] It should be noted that the thematic content analysis is "a process to be used with qualitative information. It is not another qualitative method, but a process (...) that allows the translation of qualitative information into quantitative data, if this is desired by the researcher”.[8] Thus, in the case of the news, those contents that deal with issues regarding the Social Outbreak in Chile in 2019 will be analyzed, which allows us to know quantitatively and qualitatively the key issues that the media in the sample emphasized around the social revolt. On the other hand, in the case of malicious editions, the thematic content analysis may be used to characterize the types of vandalism that were most frequently generated according to the previously stated objectives.

Critical discourse analysisEdit

To analyze the news and also the vandalism, a critical discourse analysis (CDA) will be used, which is defined as the “study of the ways in which the abuse of social power, dominance and inequality are practiced, reproduced and fought by texts and speech in the social and political context”.[9] The justification for this technique is given by the sociopolitical context in which this research is focused, which is the Social Outbreak in Chile in 2019. On the other hand, the CDA allows including a relationship between text and context, in order to explore “how they were initially created the socially produced ideas and objects (in texts) that inhabit the world (reality), and how they are maintained and supported in a place in time (the context)”.[10] Therefore, thanks to the CDA it will be possible to give a more profound meaning to the news and also to vandalism.

Monthly UpdateEdit

November 2022
  • Begin data collection of news through Media Cloud.
  • We find that the term "Estallido social" was relatively new in the news, so we expand the search to other keywords like "gobierno" or "manifestación". Moreover, the image below show that the proposed period of time is accurate for the investigation because on October 18th 2019 the percentages are over 30% and at the end on November 25th 2019 the percentage is over 20%.
Graphic exported from Media cloud that shows the frequencies of news that have the keywords searched between October 15th 2019 and November 25th 2019.
October 2022
  • We learned about Media Cloud and discovered that one of the features we wanted to use on the platform is no longer available as of June 2022.
  • Begin exploring and learning about Wikimedia API.
August–September 2022
  • Meeting with Ana Castillo (collaborator from University of Chile)
  • Search for a fellow
  • Data scientist hired
July 2022
  • Worked on the justification of the investigation
  • Defining the methodology
  • Meeting with possible collaborators
June 2022
  • Worked on the theoretical framework
  • Defining the type of investigation
  • Research project proposal accepted


Provide links to presentations, blog posts, or other ways in which you disseminate your work.


  1. Geiß, S.; Leidecker, M.; Roessing, T. (2016). "The interplay between media-for-monitoring and media-for-searching: How news media trigger searches and edits in Wikipedia.". New Media & Society: 2740–2759. 
  2. Elias Valenzuela, Arturo Alfonso (2020). "Medios de Comunicación e imaginario social en la rebelión del 18 de octubre en Chile: Una relación contradictoria". Universidade Federal da Integração Latino-Americana. 
  3. Cabrera Cares, Ignacio (2020). "El trato mediático y el uso de Twitter en el estallido social chileno.". Repositorio institucional de la Universidad de La Laguna. 
  4. De la Fuente, V. (2012). "Archivos y centros de documentación del periodismo gráfico argentino". Departamento Archivos, Biblioteca Nacional Mariano Moreno. 
  5. Ferrara, G.; Rodríguez, D. (2017). "¿Archivos De Redacción O Centros De Documentación Periodística? La Importancia Y Problemáticas De Su Tratamiento Archivístico.". Técnicas Archivísticas. 
  6. Zhao, B. (2017). "Web Scrapping". Encyclopedia of big data. 
  7. Mayer, R.; Ouellet, F. (1991). "Metodología de investigación para trabajadores sociales.". Boucherville, Gaëtan Morin Éditeur. 
  8. Fraga, Cecilia; Maidana, Valeria; Paredes, Diego; Vega, Lorena (2007). "Transforming Qualitative Information: Thematic Analysis and Code Development". 
  9. Van Dijk, T. (1999). "El análisis crítico del discurso". Anthropos N°186. 
  10. Urra, E.; Muñoz, A.; Peña, J. (2013). "El análisis del discurso como perspectiva metodológica para investigadores de salud.". Enfermería Universitaria, Volume 10, Issue 2: 50–57.