WikiCred/2022 CFP/Iffy.news

This project will make Wikidata/Wikipedia a more effective indicator of news-site credibility. The barrier is inconsistent methods of categorizing news outlets, causing unreliable search results and incomplete information. We can fix that. By standardizing the connection between a news website and its domain name, we can then programmatically insert other data (when missing), such as:

  • Circulation
  • First year in print
  • Owner
  • Global Traffic Rank
  • Year online
  • Credibility ratings
Iffy.news and Wikidata
A WikiCred 2022 Grant Proposal
Project TypeTechnology
AuthorBarrett Golding
(hearvox)
ContactIffy.news
Requested amountUS$6,700
Award amountUnknown
What is your idea?

Implementing structured-data standards will improve search accuracy for both machines and humans. The screenshots below show how: (1) A search for bostonglobe.com failed (No match was found), even though the Boston Globe's Wikidata page lists its website. But (2 & 3) adding the domain name as an Alias (4) puts it in the top result:

Entering a domain name as an Item's Alias make it searchable by domain
Adding domain name as Alias makes it searchable


Why is it important?
Wikidata item: Boston Globe showing old URL with incorrect (http://) protocol
Wikidata lists the old (http://) URL

Right now, Wikipedia articles and Wikidata items list domain names in many different ways. News outlets might have an Infobox, which might list a “Website”, which might be either its domain name or URL. Below there might be an External Links section, with the site listed as either “Official Website” or the domain name. And all those links are often the old, now-incorrect http URL instead the site's new https protocol.

All that unstructured data prevent machines from reliably identifying news outlets by their domain name. That prevents data from being reliably read or written programmatically.

Using APIs (and in consultation with Wikidata editors), we will give news outlets a standardized association with their website — e.g., the (correct) URL as an "official website” and domain name as an Alias. Once a news outlet is machine-identifiable by its unique domain name, we will API-insert additional data (stored in Iffy.news and other databases and ready to be used).

For the 2K outlets which already have a "Media Bias Fact Check ID”, we can add their MBFC credibility and factual-reporting ratings. Then we could experiment:

  • At Wikipedia, making a User Script that identifies the source credibility of an article's References.
  • At an external site, pulling in Wikidata on news sources, using Iffy.news' Fact-check Search tool to demonstrate.


Link(s) to your resume or anything else (CV, GitHub, etc.) that may be relevant

LinkedIn
GitHub
WordPress
Bio


Is your project already in progress?
World map showing news outlets with a Wikipedia (en) article
News outlets with a Wikipedia (en) article

This project builds on the knowledge gained in our previous WikiCred grant, documented at MisincoCon, Iffy.news, and DataJournalism.com. Among the results was a SPARQL query for all news outlets with Wikipedia (en) articles. We already have databases of USA news-outlet domains and scripts which pull/update site-related data.


How is this project relevant to credibility and Wikipedia?

Our work will strengthen a core Wikipedia/Wikidata principle: identifying and categorizing credible sources.

All articles must strive for verifiable accuracy, citing reliable, authoritative sources. — Wikipedia, "Five Pillars"


What is the ultimate impact of this project?

This endeavor significantly improves Wikidata/Wikipedia as a tool for evaluating news-source reliability — helping journalists, researchers, and the public.


Can your project scale?

The datasets we have list USA English news outlets. We will document our process and publish our SPARQL, JavaScript, and API code (in a GitHub repo). People can then use our instructions, to auto-gather data and programmatically insert it into Wikidata/Wikipedia, on their news-outlet data for other countries and languages.


Why are you the people to do it?

It needs to be done. Iffy.news has the databases, the API skills, the Wikidata familiarity, and the journalism/documentation experience to do it.


What is the impact of your idea on diversity and inclusiveness of the Wikimedia movement?

Mis/disinformation is the enemy of diversity, inclusion, and accessibility. This project helps stop mis/disinfo at its source.


What are the challenges associated with this project and how you will overcome them?
Chicago Tribune's URL updated in Wikidata with this text— Reason for deprecated rank: migration of website from HTTP to HTTPS
Example of updating an old, incorrect (http://) URL
  • Establish a machine-readable, unique relationship between news outlets in Wikidata/Wikipedia and their domain name.
  • With the help of experienced Wikidata editors, determine what statements should be added or updated for news-outlet Items (then script API calls to safely and accurately insert that data).
  • With the help of experienced Wikipedia editors, determine what information should be added or updated for news-outlet articles (then script safe, accurate API calls).
  • Demonstrate how external sites can use all this new Wikidata/Wikipedia data.


How will you spend your funds?
  1. Research and Programming $4,000
  2. Technical Writing $2,000
  3. Wiki Editors (Consultants) $700


How long will your project take?

6 months


Have you worked on projects for previous grants before?