Grants:Programs/Wikimedia Research Fund/Beyond Fake News: Auditing the Online Information Ecosystem

statusnot funded
Beyond Fake News: Auditing the Online Information Ecosystem
start and end datesJuly 2023 - July 2024
budget (USD)47,500 USD
fiscal year2022-23
applicant(s)• Swapneel Mehta

Overview

edit

Applicant(s)

Swapneel Mehta

Affiliation or grant type

New York University

Author(s)

Swapneel Mehta

Wikimedia username(s)

User:SwapneelM; Swapneel Mehta

Project title

Beyond Fake News: Auditing the Online Information Ecosystem

Research proposal

edit

Description

edit

Description of the proposed project, including aims and approach. Be sure to clearly state the problem, why it is important, why previous approaches (if any) have been insufficient, and your methods to address it.

We propose an information auditing tool that monitors the impact of any news source in the online information ecosystem. This tool will quantify the impact that Wikipedia’s Perennial Sources and WikiNews articles have on multiple social channels with the goal of providing an “audit” of the spread of content from the source.

The specific focus of this project is to enforce knowledge integrity via the ability for users to audit the impact of Wikipedia articles and Perennial Sources on social media. This is a system envisioned to contextualize the reliability of any source with quantitative evidence beyond WIkipedia. We study the coordinated networks promoting content from a source of interest and create content-agnostic metrics in addition to content-based ones that are used to audit its impact.

We have already developed a viable prototype (https://parrot.report) jointly with The Times (UK) to study the spread of 8,000 Russian media articles from two unreliable sources that have historically spread disinformation on Twitter. This leverages a cloud infrastructure to collect data based on queries provided to Twitter’s API, Google Cloud services including BigQuery, DataFlow, Cloud Compute, and Neo4j for analysis, and a next.js based platform to serve the results. We would like to scale this prototype beyond Twitter, and support a broader set of sources for Wikipedia. We have received access to data from mainstream and fringe social media websites that will help source potentially manipulated campaigns around narratives on sensitive, emerging topics in the civic and public health domain.

Having worked at social media companies we already know that it is hard to identify manipulated narratives purely based on content; the advent of large language models will only exacerbate the issue. To address this, we develop an alternate system that relies on user behavior, historical sharing of content, and the network structure that reveals additional insights to surface coordinated inauthentic behavior and information operations.

This project will enable the timely detection of disinformation campaigns on social media that co-occur with malicious editing of Wikipedia articles linking state-sponsored domains to emerging topic areas like the Russian invasion of Ukraine. It will strengthen information provenance efforts including fact-checking, measuring biased content sharing, and enriching metadata about sources in an increasingly digital ecosystem.

Personnel

edit
  • Dr. Jaan Altosaar, Ph.D., Role: Advisor; CEO, One Fact Foundation, USA
  • Ahmed Medien, Role: Events Lead; Project Manager, Hacks/Hackers, Canada
  • Dr. Zhouhan Chen, Role: Technical Advisor; Founder, Safe Link Networks, USA
  • Jay Gala, Role: Research Engineer; AI Resident, IIT Madras, India
  • Deep Gandhi, Role: Research Engineer; M.Sc. Student, University of Alberta, Canada
  • Dhara Mungra, Role: Research Engineer; Data Scientist II, Bombura, USA
  • Jhagrut Lalwani, Role: Research Engineer; Undergraduate Student, VJTI, India
  • Raghav Jain, Role: Research Engineer; AI/NLP Researcher, IIT Patna, India
  • Mudra Nagda, Role: Web Designer; Graduate Student and Design Intern at Google, Georgia Institute of Technology, USA

Budget

edit

Approximate amount requested in USD.

47,500 USD

Budget Description

Briefly describe what you expect to spend money on (specific budgets and details are not necessary at this time).

Project staffing and technical costs for cloud services.

USD 12,000 - Project Lead (Research and Development) at USD 48/hr

USD 24,000 - Research Engineering at USD 40/hr

USD 8,000 - 6 months of Cloud based Data Analysis Expenses (based on current spend)

USD 1,500 - Web Design and User Experience Researcher (consulting) at USD 50/hr

USD 2,000 - Indirect costs

Impact

edit

Address the impact and relevance to the Wikimedia projects, including the degree to which the research will address the 2030 Wikimedia Strategic Direction and/or support the work of Wikimedia user groups, affiliates, and developer communities. If your work relates to knowledge gaps, please directly relate it to the knowledge gaps taxonomy.

Language models have made it trivial to generate polarizing news stories that can reach millions of people in minutes via social platforms causing an 'infodemic'.

This project advances ‘knowledge as a service’ and knowledge integrity. It is central to Wikipedia’s mission to develop systems that can aggregate and understand the emerging narratives especially in crises when the information landscape is evolving faster than platforms can track themselves. We provide a system that empowers every user of the community to cooperatively audit sources and articles of interest. Our auditing metrics are content-agnostic and can be applied in various multi-lingual settings to provide a powerful method for low-resource applications.

Dissemination

edit

Plans for dissemination.

Our work will be hosted at a live website and remain accessible to the Wikimedia community. We also intend to publish the aggregated research output of the “auditing” reports in scientific journals, publish shorter periodic reports summarizing our discovery of campaigns, and share our analyses in a reproducible manner. We will build on the existing projects within the Wikimedia community including TwikiL and Iffy News and invite feedback via office hours at 'Unicode Research'.

Past Contributions

edit

Prior contributions to related academic and/or research projects and/or the Wikimedia and free culture communities. If you do not have prior experience, please explain your planned contributions.

Platforms:

https://parrot.report - MVP for journalism

https://onefact.org - Non-profit focused on price transparency in healthcare

https://informationtracer.com

Research on Cross-platform Causal Effects and Coordinated Behavior:

https://openreview.net/forum?id=wMxp5eVhMVe

https://misinfocon.com/estimating-harms-from-coordinated-behavior-on-social-networks-897ce7a5447c

https://nyudatascience.medium.com/cds-phd-student-swapneel-mehta-presents-examining-the-causal-effect-of-twitters-interventions-on-54bab63b5b0

https://simppl.org - long-term project on simulating how information spreads online

Communities we run:

https://unicode-research.netlify.app/ - our research group

https://nyu-mll.github.io/nyu-ai-school-2022/

https://djunicode.in


I agree to license the information I entered in this form excluding the pronouns, countries of residence, and email addresses under the terms of Creative Commons Attribution-ShareAlike 4.0. I understand that the decision to fund this Research Fund application, the application itself along with all the information entered by my in this form excluding the pronouns, country of residences, and email addresses of the personnel will be published on Wikimedia Foundation Funds pages on Meta-Wiki and will be made available to the public in perpetuity. To make the results of your research actionable and reusable by the Wikimedia volunteer communities, affiliates and Foundation, I agree that any output of my research will comply with the WMF Open Access Policy. I also confirm that I have read the privacy statement and agree to abide by the WMF Friendly Space Policy and Universal Code of Conduct.

Yes