Machine learning models/Proposed/Reference Verification for Wikidata

Model card
Model card
This page is an on-wiki machine learning model card.
	A model card is a document about a machine learning model that seeks to answer basic questions about the model.
Model Information Hub
Model creator(s)	Gabriel Amaral, Odinaldo Rodrigues, and Elena Simperl
Model owner(s)	King's KG Lab
Model interface	https://www.wikidata.org/wiki/Wikidata:ProVe
Past performance	https://app.swaggerhub.com/apis/YihangZhao/pro-ve_api/1.0.0
Publications	https://www.semantic-web-journal.net/content/prove-pipeline-automated-provenance-verification-knowledge-graphs-against-textual-sources
Code	https://github.com/King-s-Knowledge-Graph-Lab/RQV
Uses PII	No
In production?	This is hosted by the King's KG team at King's VM.
Which projects?	https://www.wikidata.org/wiki/Wikidata:WikiProject_Reference_Verification#cite_note-1
	This model uses a pair of claim and provenance sentences to predict supportiveness of provenance sentence about reference URLs of claims.
	v; t; e;

This model card page currently has a draft status. It is a piece of model documentation that is in the process of being written. Once the model card is completed, this template should be removed.

Most Wikidata claims are built from external resources, so it has numerous reference URLs to designate the original provenance of claims. However, URL verification of claims should be conducted to ensure the quality of references, because web document structures accessed via URLs change dynamically over time, and some claims may become outdated, e.g., a claim about someone's occupation. To verify references of claims, humans should visit the web documents via their URLs, read the entire manuscript, then determine if the webpage includes supportive sentences for the claim. This is a non-trivial and labor-intensive task for Wikidata editors. Furthermore, it's becoming an increasingly challenging task due to the drastic growth of Wikidata resources.

With advances in machine learning and language models, we can avoid exhausting investigations of large amounts of text, such as reference URL verification with its claims. We have trained three types of language models based on BERT and T5 with our own training dataset built on a crowdsourced labeled set. The first T5-based model is designed to verbalize a Wikidata claim into a natural sentence to compare with sentences extracted from the web document of the claim's URL reference. The second BERT-based model is designed for finding the most relevant sentences from the set of sentences extracted from the web document using the verbalized sentence. The third BERT-based model is fine-tuned with a crowdsourced training dataset to determine if the sentence extracted from web documents is supportive of the verbalized sentence or not.

Motivation

The motivation behind these models is to help Wikidata editors determine if the reference URLs are supportive of the Wikidata claims. By adopting these language models, a Wikidata editor can avoid reading the entire web document of a reference URL to check the supportiveness of the claim.

We have already developed these models on King's VM, and they are running live. You can test our tool as a Wikidata gadget on the Wikidata item page to see the results of these models. Further details are available here: https://www.wikidata.org/wiki/Wikidata:WikiProject_Reference_Verification

Users and uses

Use this model for

Checking supportiveness of a pair of claim and provenance sentences
Finding the most relevant sentences from long text data using a given sentence

Don't use this model for

Fact-checking for general purposes
Checking supportiveness of overly long sentences. This model is designed for simple sentences like Wikidata claims.
Numerical-centric sentences such as birth and death dates. This model is designed to understand more language-centric sentences rather than numerical data.
High-level logical inference like unseen fact checks or transitive fact checks.

Current uses

This model is used for the ProVe gadget tool to provide supportiveness checking results on a Wikidata item page as a widget of Wikidata. This model is currently hosted on King's VM.

Ethical considerations, caveats, and recommendations

Model

Performance

Implementation

{{ContentGrid|content=

Model architecture

Here's the updated version with unavailable information removed:

Window size: 512
Embeddings dimension: 768
Vocab size: 30,522
Total number of embeddings params: 23,440,896
Model architecture: BertForSequenceClassification
Number of hidden layers: 12
Number of attention heads: 12
Intermediate size: 3072
Problem type: Single label classification (3 classes)

Output schema

{
  sentences: [<sentence-1>, <sentence-2>],
  results:
	{SUPPROTS: <score \in (0,1)>, REFUTES: <score \in (0,1)>, NOT ENOUGH INFO: <score \in (0,1)>}
}

Example input and output

Model I/O API development

Data

Training dataset information on GitHub: https://github.com/gabrielmaia7/RSP?tab=readme-ov-file

Data pipeline

Training data

Test data

Licenses

Code:
Model:

Citation

Cite this model as:

@misc{name_year_modeltype,
   title={Model card title},
   author={Lastname, Firstname (and Lastname, Firstname and...)},
   year={year},
   url={this URL}
}