Research:From the media to Wikipedia: the relationship between news and vandalism in the encyclopedia during the 2019 Social Outburst
This project aims to know the relationship of the news of the media in Chile, with the vandalisms that were detected in the Wikipedia in Spanish during the social outbreak of Chile in 2019. With the detailed history of the articles, it is possible to locate those editions that were reversed and cataloged as vandalism, in order to classify them and know which topics are the ones that generate more vandalism in times of crisis. In this way, it would be possible to know if what happens on Wikipedia is related to the media and if the vandalism is related to them; thus being able to anticipate this type of edition and understand how Wikipedia can contribute to the verification of information.
Goals and Impact Edit
One of the most important impacts of studying how news affects different platforms is being able to understand how the media shapes our society. The difference, in this case, is that although the news shapes our algorithmic spheres of social networks, it cannot affect “what Wikipedia shows us”, since this is an information platform without algorithms involved. One of the expected results is to be able to recognize some pattern within the media, perhaps review which medium is the one that has the most impact on Wikipedia; traditional media are probably the ones that have the most impact since they have greater visibility in the territory. Understanding the vandalism that happens during a sociopolitical crisis helps us a lot to prepare for the next crisis. Having concrete results on how the news impacts vandalism can help to review article protection policies, perhaps helping to identify which ones can be protected before and not wait for the article to be vandalized. In addition, we cannot rely entirely on “the news tells the truth”, we have seen --especially in Latin America-- that the media opt for likes and do not always verify the information. Therefore, this investigation will also encounter misinforming news that is very likely to affect Wikipedia directly.
Furthermore, and in line with movement strategies, this research is totally related to identifying impact issues and understanding what role misinformation plays in times of crisis in the Wikipedia environment. That way we can know how our projects (not only Spanish Wikipedia) can be misused or manipulated by detecting threats with significant potential for harm, such as misinformation.
Research questions Edit
- What news content from the media is the one that produces the most malicious editions on Wikipedia in Spanish during the Social Outburst in Chile in 2019?
- What are the types of vandalism that occur most on Wikipedia in Spanish based on the topics analyzed in the media during the Social Outburst in Chile in 2019?
- What types of articles on the Spanish Wikipedia were most vandalized as a result of the news content of the media during the Social Outburst in Chile in 2019?
Several studies, including Geiß et al. 2016, show the relationship between mentions of a topic in the news and visits to Wikipedia. Based on different methods, these studies affirm that if an event or topic is repeatedly mentioned in the media, visits to its Wikipedia article and similar articles will increase. The interesting thing is that in some cases these visits end up being transformed into editions that can become vandalism.
During the Social Outbreak in Chile in 2019, there was a contradiction between the reality thought or imagined collectively with the reality reported by the hegemonic media. You can also see the different approaches of the media regarding the news that covered the event. There are differences, such as the fact that the hegemonic media concentrated on narrating the events from the “high command”, while the independent media narrated the events from interviews with civilians. The study analyzes, for example, the case of Televisión Nacional (TVN) which had a speech pointing to civilians as those responsible for most cases of violence in the streets, but media such as Piensa Prensa offered a very different version in which they pointed to the police as the cause of the violence. This position is not new, since the appearance of the media that there are authors who consider them as means of manipulation, where this control comes from the homogenization of taste and the reproduction of a version of reality, which can be shaped by who is responsible for the media. Nowadays, this is reflected in the fact that the main media in Chile continues to belong to elite group companies that influence the public agenda and the informative treatment of what is transmitted. Therefore, the role played by the media in the 2019 Social Outbreak is key to understanding the impact it had on different platforms, such as Wikipedia.
Taking the Social Outbreak as a temporary event, it is possible to locate the news around the events that occurred, furthermore, the fact that they have varied approaches raises the question of what topics were the most influential and made people seek information in other places. As Wikipedia is one of the most visited sites in Chile, this platform can be used as an object of study and the visits can be analyzed together with the history of editions of those articles that, as a result of the news, were the most visited during the social crisis. With the detailed history of the articles, it is possible to locate those editions that were reverted and cataloged as vandalism, to classify them and know which issues are the ones that most generate this type of edition. With this, it would be possible to know if what happens on Wikipedia is related to the media and if the vandalisms have any relation with the information that appears in the news. All of this is to be able to anticipate this type of edition and understand how Wikipedia can contribute to the verification of the information.
Information collection techniques Edit
To generate the corpus of news to be used in the investigation, a weekly clipping (review and file) will be carried out that will cover the period between October 18th and November 15th. This technique is characterized by the selective collection of information, in our case news, to later “classify the documentation according to pre-established criteria, thus gathering it over time in press dossiers”. In the same way, by being able to categorize this information, it is possible to say that the collected documents make sense when grouped. Thus, this technique allows a selection of the main news items that were related to the Social Outburst in Chile in 2019. Finally, the content will be classified into different categories to be able to quantitatively identify the topics and statements of the news items, to then cross-reference this information with qualitative analysis techniques and thus meet the objectives set.
Web Scrapping Edit
This technique, also known as web harvesting, is used to extract data from the World Wide Web and thus be able to save it in a file or database that allows its analysis. On this occasion, an algorithm will be used, that is, a selection computer program, to compile the vandalism that will be used in the investigation. This algorithm will search SeroBOT's history and identify any reverted edits that have any relevance to the article “Chile”. This will be done by using the “what links here” feature for each edit made by SeroBOT over a period of time. Thus, if an article is related to Chile, it is very likely that it contains the word “Chile” within its text and is linked to its article, so the program will recognize it and add it to the corpus. Once finished with the first iteration, the algorithm will do this process again, but with different articles, for example “Gobierno de Chile”, “Protestas chilenas 2019-2022” and “Metro de Santiago” (because the Social Eruption began with the rise in the metro fare).
Information analysis techniques Edit
Thematic content analysis Edit
For this research, the information analysis technique that will be used will be the thematic content analysis, which aims to discover the meaning of a message. Specifically, it consists of classifying and/or codifying the various elements of a message into categories to make its meaning appear adequately. It should be noted that the thematic content analysis is "a process to be used with qualitative information. It is not another qualitative method, but a process (...) that allows the translation of qualitative information into quantitative data, if this is desired by the researcher”. Thus, in the case of the news, those contents that deal with issues regarding the Social Outbreak in Chile in 2019 will be analyzed, which allows us to know quantitatively and qualitatively the key issues that the media in the sample emphasized around the social revolt. On the other hand, in the case of malicious editions, the thematic content analysis may be used to characterize the types of vandalism that were most frequently generated according to the previously stated objectives.
Critical discourse analysis Edit
To analyze the news and also the vandalism, a critical discourse analysis (CDA) will be used, which is defined as the “study of the ways in which the abuse of social power, dominance and inequality are practiced, reproduced and fought by texts and speech in the social and political context”. The justification for this technique is given by the sociopolitical context in which this research is focused, which is the Social Outbreak in Chile in 2019. On the other hand, the CDA allows including a relationship between text and context, in order to explore “how they were initially created the socially produced ideas and objects (in texts) that inhabit the world (reality), and how they are maintained and supported in a place in time (the context)”. Therefore, thanks to the CDA it will be possible to give a more profound meaning to the news and also to vandalism.
Monthly Update Edit
|July 2023||Finished writing the report|
|June 2023||Depth in analysis with all the variables.
Write the results and work on the report.
|May 2023||Finished the separate analysis for the 3 sets of data (news, vandalism, articles).|
|April 2023||Begin applying the matrix to the list of vandalism and articles that the computational model gave us.|
|March 2023||News analysis done.
First findings show a discourse centered on the economy, politics, and violence. La Tercera is used as a platform for the political parties, and it frames the protest as something violent.
Worked with the Data Scientist to refine the model for the vandalism and the articles.
|February 2023||Keep up with the analysis.
First results from the Data Scientist. Vacations for the team.
|January 2023||Started analyzing news and applying the matrix. The data collected here was uploaded to taguette.
A number of the main articles about the Social Outbreak were protected for a certain period of time, leaving us without so many vandalism to analyze as we would have wanted. Therefore, it was decided to expand the coverage and also analyze the content of the main articles of the Social Outbreak.
|December 2022||Collect news and clean up data
Create the CDA matrix that will be used on the analysis.
|November 2022||Begin data collection of news through Media Cloud.
We find that the term "Estallido social" was relatively new in the news, so we expand the search to other keywords like "gobierno" or "manifestación". Moreover, the image below show that the proposed period of time is accurate for the investigation because on October 18th 2019 the percentages are over 30% and at the end on November 25th 2019 the percentage is over 20%.
|October 2022||We learned about Media Cloud and discovered that one of the features we wanted to use on the platform is no longer available as of June 2022.
Begin exploring and learning about Wikimedia API.
|August–September 2022||Meeting with Ana Castillo (collaborator from University of Chile)
Search for a fellow
|July 2022||Worked on the justification of the investigation
Defining the methodology
Meeting with possible collaborators
|June 2022||Worked on the theoretical framework
Defining the type of investigation
Research project proposal accepted
It is important to highlight that the results obtained by Seguel and Farias  were reinforced, who identified a marked discourse by La Tercera regarding the political legitimacy of the government. This trend was reflected in the analyzed media discourses, which supported the government's management in the context of the Chilean Social Outburst. Furthermore, it was confirmed that La Tercera employed information omission strategies in its media coverage. In particular, there was a lack of visibility of police repression in the analyzed dataset. The omission of this crucial dimension in the coverage of events can have a significant impact on the audience's understanding of the facts. It is concerning to note that, in contrast to the information available on Wikipedia, where the predominant discourse focuses on police repression and human rights violations during the Social Outburst, a marked difference is observed in the perspective offered by other media. This finding is particularly interesting as it highlights the divergence in the representation of events depending on the source consulted. The absence of a broader and more balanced narrative in certain media limits the global and accurate understanding of what really happened during the Social Outburst.
Another notable aspect of La Tercera's discourse is the tendency to reinforce stereotypes regarding the protesters. Through its media coverage, the outlet tends to associate protesters with acts of violence, generating a negative and stigmatized image of those who participated in the Social Outburst. It is important to emphasize that La Tercera's discourse is strongly influenced by its nature as a company controlled by a Chilean business family linked to the right-wing political sector. This link between business interests and media can influence the ideological and editorial orientation of content, which in turn can affect the representation and construction of social and political reality. 
On the other hand, when analyzing vandalism on Wikipedia, two extremes were identified in terms of edited content. On one hand, vandalism was found that followed the discourse marked by La Tercera, reproducing stereotypes and disqualifications towards the left-wing sector, which was also associated with the protesters through statements by politicians in the outlet. On the other hand, vandalism was detected that blamed the government and identified it as the main cause of the situation, even demanding the president's resignation. These acts of vandalism can be interpreted as a manifestation of the lack of human rights coverage by the media, where vandalism becomes an additional form of protest and expression.
It is worth noting that, although it is not possible to establish a direct correlation due to the anonymous nature of vandalism, these events provide clues as to how information sources interact and how certain media discourses can influence Wikipedia edits.
Regarding the influence of La Tercera on Wikipedia, it was observed that the outlet's discourses are mainly reflected in the economy and politics sections. In the economic field, a strong correlation was identified between the discourse used by La Tercera and the references used in Wikipedia. On the other hand, La Tercera's political coverage is also reflected to a lesser extent in Wikipedia content related to politics, as the outlet's numerous political news articles provide valuable context for encyclopedia editors. It is important to highlight that the lack of a large number of La Tercera discourses in Wikipedia can be considered positive, as it demonstrates a consensus among editors to provide content with the least possible bias.
From Wikipedia's perspective, the fundamental role of its neutrality and objectivity policy is worth highlighting, especially in the context of the Social Outburst. When conducting a critical discourse analysis, it is evident that as a space for collaboration and debate, Wikipedia promotes active community participation to ensure that the information presented is impartial and reflects different perspectives.
This is highly relevant in how information is delivered and its quality. In the current context, technological evolution has introduced new elements into the way information is produced and consumed. Social media and algorithms play a relevant role in content dissemination, leading to the proliferation of misinformation and the emergence of an infodemic. The media, including those with economic interests like La Tercera, are immersed in this dynamic and compete for interactions and audience in an increasingly saturated information environment. Given this scenario, the need for information spaces that promote centrality and impartiality in the treatment of topics becomes evident.
In times of crisis, such as the social outbreak, attention to information becomes crucial, and it is in this context that Wikipedia plays a fundamental role. However, it is important to recognize that Wikipedia cannot exist without reference sources. In this regard, the link between the media system and Wikipedia becomes evident, as news constitutes one of the main sources used as references on the platform. It is necessary to reflect on the fact that the media system is not neutral and is often influenced by particular interests, which can lead to the presence of stereotypes and biases in the information generated. Faced with this reality, the question arises of how we can balance the influence of media discourse with interests on Wikipedia, even when news is used as a source of information.
Part of this study has shed light on this issue and is ultimately resolved through the functioning of the platform itself. To ensure objectivity on a given topic, it is necessary to diversify sources and seek a wide range of perspectives. Furthermore, collaboration and debate play a crucial role in delivering quality, bias-free information. The collaborative nature of Wikipedia, where different editors contribute and discuss information, helps mitigate the influence of media discourse with interests. Through active community participation, a critical perspective is fostered, and the search for objectivity in the construction of collective knowledge is promoted.
Finally, in reflecting on the findings presented in this thesis, there is a need to question and critically examine how information is constructed and presented in the media during periods of crisis. The close relationship between media discourses and political and economic interests highlights the importance of having a diversity of sources.
- Geiß, S.; Leidecker, M.; Roessing, T. (2016). "The interplay between media-for-monitoring and media-for-searching: How news media trigger searches and edits in Wikipedia.". New Media & Society: 2740–2759.
- Elias Valenzuela, Arturo Alfonso (2020). "Medios de Comunicación e imaginario social en la rebelión del 18 de octubre en Chile: Una relación contradictoria". Universidade Federal da Integração Latino-Americana.
- Cabrera Cares, Ignacio (2020). "El trato mediático y el uso de Twitter en el estallido social chileno.". Repositorio institucional de la Universidad de La Laguna.
- De la Fuente, V. (2012). "Archivos y centros de documentación del periodismo gráfico argentino". Departamento Archivos, Biblioteca Nacional Mariano Moreno.
- Ferrara, G.; Rodríguez, D. (2017). "¿Archivos De Redacción O Centros De Documentación Periodística? La Importancia Y Problemáticas De Su Tratamiento Archivístico.". Técnicas Archivísticas.
- Zhao, B. (2017). "Web Scrapping". Encyclopedia of big data.
- Mayer, R.; Ouellet, F. (1991). "Metodología de investigación para trabajadores sociales.". Boucherville, Gaëtan Morin Éditeur.
- Fraga, Cecilia; Maidana, Valeria; Paredes, Diego; Vega, Lorena (2007). "Transforming Qualitative Information: Thematic Analysis and Code Development".
- Van Dijk, T. (1999). "El análisis crítico del discurso". Anthropos N°186.
- Urra, E.; Muñoz, A.; Peña, J. (2013). "El análisis del discurso como perspectiva metodológica para investigadores de salud.". Enfermería Universitaria, Volume 10, Issue 2: 50–57.
- Seguel, E.; Farias, A. (2022). "Construcción discursiva del “Estallido Social” en tres medios de ciberprensa chilena".
- "Social Mobilization and Media Framing In The Journalistic Coverage Of Oil Survey Permits In The Mediterranean.".