Research:Disinformation Literature Review

Created

Contact

Diego Saez Trumper

Wikimedia Foundation

Collaborators

Jonathan T. Morgan

Wikimedia Foundation

Disinformation

Research:Projects

This page documents a completed research project.

The aim of this study is to find key areas of research that can be useful to fight against disinformation on Wikipedia. To address this problem we perform a literature review trying to answer three main questions:

What is disinformation?
What are the most popular mechanisms to spread online disinformation?
Which are the mechanisms that are currently being used to fight against disinformation?.

In all these three questions we take first a general approach, considering studies from different areas such as journalism and communications, sociology, philosophy, information and political sciences. And comparing those studies with the current situation on the Wikipedia ecosystem.

What is disinformation?

We found that disinformation can be defined as non-accidentally misleading information that is likely to create false beliefs. While the exact definition of misinformation varies across different authors, they tend to agree that disinformation is different from other types of misinformation, because it requires the intention of deceiving the receiver. A more actionable way to scope disinformation is to define it as a problem of information quality. In Wikipedia quality of information is mainly controlled by the policies of neutral point of view and verifiability.

A taxomony of (mis)information, based on Zhou and Zafrani's work
	Authenticity	Intention
Disinformation	False	Bad
Misinformation	False	Unknown
Mal-Information	True	Bad
Fake News	False	Bad
Satire News	False	Not Bad
Imposter Content	False	Unknown
Fabricated Content	False	Bad
Manipulated Content	Unknown	Bad
Rumor	Unknown	Unknown

What are the most popular mechanisms to spread online disinformation?

The mechanisms used to spread online disinformation include the coordinated action of online brigades, the usage of bots, and other techniques to create fake content. Underresouced topics and communities are especially vulnerable to such attacks. The usage of sock-puppets is one of the most important problems for Wikipedia.

Social attacks classified by the type of weakness they exploit
Weakness exploited	Description	Example
Social System	When reputation systems are hacked to introduce disinformation.	Use bots or sock-puppets to over-represent an opinion or confirm a false information.
Lack of Information	When the lack of information is used to introduce disinformation.	Spread disinformation during on-going events like natural disasters or manipulate search engines results in topics without enough information.

Summary of the most popular mechanism to spread online disinformation.
Mechanism	Description	Type	Vulnerability of Wikipedia
Bots	Software used to automatize the spread of messages, generating the idea that of a lot people is given an specific opinion or interest about a topic	Technical	Low
Sock-puppets	Multiple Online identities used for purposes of deception.	Social	High
Web Brigades	A set of users coordinated to introduce fake content by exploiting the weakness of communities and systems.	Social	High
Click farms	Where a large group of low-paid workers are hired to perform some micro-tasks to deceive online systems.	Social	Medium
Deepfake	AI a technique for human image synthesis that can be used to create fake videos of celebrities or notable people.	Technical	Low
Data Voids	Exploiting missing data to manipulate search results	Social	Medium
Circular reporting	A situation where a piece of information appears to come from multiple independent sources, but in reality comes from only one source.	Social	High

Which are the mechanisms that are currently being used to fight against disinformation?

The techniques used to fight against information on the internet, include manual fact checking done by agencies and communities, as well as automatic techniques to assess the quality and credibility of a given information. Machine learning approaches can be fully automatic or can be used as tools by human fact checkers. Wikipedia and especially Wikidata play double role here, because they are used by automatic methods as ground-truth to determine the credibility of an information, and at the same time (and for that reason) they are the target of many attacks. Currently, the main defense of Wikimedia projects against fake news is the work done by community members and especially by patrollers, that use mixed techniques to detect and control disinformation campaigns on Wikipedia.

Conclusion

We conclude that in order to keep Wikipedia as free as possible from disinformation, it’s necessary to help patrollers to early detect disinformation and assess the credibility of external sources. More research is needed to develop tools that use state-of-the-art machine learning techniques to detect potentially dangerous content, empowering patrollers to deal with attacks that are becoming more complex and sophisticated.

Full Document

Full Paper on Arxiv