Machine learning models/Proposed/Reference Verification for Wikidata

Most Wikidata claims are built from external resources, so it has numerous reference URLs to designate the original provenance of claims. However, URL verification of claims should be conducted to ensure the quality of references, because web document structures accessed via URLs change dynamically over time, and some claims may become outdated, e.g., a claim about someone's occupation. To verify references of claims, humans should visit the web documents via their URLs, read the entire manuscript, then determine if the webpage includes supportive sentences for the claim. This is a non-trivial and labor-intensive task for Wikidata editors. Furthermore, it's becoming an increasingly challenging task due to the drastic growth of Wikidata resources.

Model card
This page is an on-wiki machine learning model card.
A diagram of a neural network
A model card is a document about a machine learning model that seeks to answer basic questions about the model.
Model Information Hub
Model creator(s)Gabriel Amaral, Odinaldo Rodrigues, and Elena Simperl
Model owner(s)King's KG Lab
Model interfacehttps://www.wikidata.org/wiki/Wikidata:ProVe
Past performancehttps://app.swaggerhub.com/apis/YihangZhao/pro-ve_api/1.0.0
Publicationshttps://www.semantic-web-journal.net/content/prove-pipeline-automated-provenance-verification-knowledge-graphs-against-textual-sources
Codehttps://github.com/King-s-Knowledge-Graph-Lab/RQV
Uses PIINo
In production?This is hosted by the King's KG team at King's VM.
Which projects?https://www.wikidata.org/wiki/Wikidata:WikiProject_Reference_Verification#cite_note-1
This model uses a pair of claim and provenance sentences to predict supportiveness of provenance sentence about reference URLs of claims.


With advances in machine learning and language models, we can avoid exhausting investigations of large amounts of text, such as reference URL verification with its claims. We have trained three types of language models based on BERT and T5 with our own training dataset built on a crowdsourced labeled set. The first T5-based model is designed to verbalize a Wikidata claim into a natural sentence to compare with sentences extracted from the web document of the claim's URL reference. The second BERT-based model is designed for finding the most relevant sentences from the set of sentences extracted from the web document using the verbalized sentence. The third BERT-based model is fine-tuned with a crowdsourced training dataset to determine if the sentence extracted from web documents is supportive of the verbalized sentence or not.

Motivation

edit

The motivation behind these models is to help Wikidata editors determine if the reference URLs are supportive of the Wikidata claims. By adopting these language models, a Wikidata editor can avoid reading the entire web document of a reference URL to check the supportiveness of the claim.

We have already developed these models on King's VM, and they are running live. You can test our tool as a Wikidata gadget on the Wikidata item page to see the results of these models. Further details are available here: https://www.wikidata.org/wiki/Wikidata:WikiProject_Reference_Verification

Users and uses

edit
Use this model for
  • Checking supportiveness of a pair of claim and provenance sentences
  • Finding the most relevant sentences from long text data using a given sentence
Don't use this model for
  • Fact-checking for general purposes
  • Checking supportiveness of overly long sentences. This model is designed for simple sentences like Wikidata claims.
  • Numerical-centric sentences such as birth and death dates. This model is designed to understand more language-centric sentences rather than numerical data.
  • High-level logical inference like unseen fact checks or transitive fact checks.
Current uses
This model is used for the ProVe gadget tool to provide supportiveness checking results on a Wikidata item page as a widget of Wikidata. This model is currently hosted on King's VM.

Ethical considerations, caveats, and recommendations

edit

Model

edit

Performance

edit

Implementation

edit

{{ContentGrid|content=

Model architecture

Here's the updated version with unavailable information removed:

  • Window size: 512
  • Embeddings dimension: 768
  • Vocab size: 30,522
  • Total number of embeddings params: 23,440,896
  • Model architecture: BertForSequenceClassification
  • Number of hidden layers: 12
  • Number of attention heads: 12
  • Intermediate size: 3072
  • Problem type: Single label classification (3 classes)
Output schema
{
  sentences: [<sentence-1>, <sentence-2>],
  results:
	{SUPPROTS: <score \in (0,1)>, REFUTES: <score \in (0,1)>, NOT ENOUGH INFO: <score \in (0,1)>}
}
Example input and output
Model I/O API development

Data

edit

Training dataset information on GitHub: https://github.com/gabrielmaia7/RSP?tab=readme-ov-file


Data pipeline
Training data
Test data

Licenses

edit
  • Code:
  • Model:

Citation

edit

Cite this model as:

@misc{name_year_modeltype,
   title={Model card title},
   author={Lastname, Firstname (and Lastname, Firstname and...)},
   year={year},
   url={this URL}
}