Machine learning models/Production/Spanish Wikiquote goodfaith edit
Model card | |
---|---|
This page is an on-wiki machine learning model card. | |
Model Information Hub | |
Model creator(s) | Aaron Halfaker (User:EpochFail) and Amir Sarabadani |
Model owner(s) | WMF Machine Learning Team (ml@wikimediafoundation.org) |
Model interface | Ores homepage |
Code | ORES Github, ORES training data, and ORES model binaries |
Uses PII | No |
In production? | Yes |
Which projects? | Spanish Wikiquote |
This model uses data about a revision to predict the likelihood that the revision is in good faith. | |
Motivation
editNot all damaging edits are vandalism. This model is intended to differentiate between edits that are intentionally harmful (badfaith/vandalism) and edits that are not intended to be harmful (good edits/goodfaith damage). The model provides a guess at whether or not a given revision is in good faith, and provides some probabilities to serve as a measure of its confidence level. This model was inspired by research of Wikipedia's quality control system and the potential for vandalism detection models to also be used as "goodfaith newcomer" detection systems.[1]
Users and uses
edit- This model should be used for prioritizing the review and potential reversion of vandalism on Spanish Wikiquote.
- This model should be used for detecting goodfaith contributions by editors on Spanish Wikiquote.
- This model should not be used as an ultimate arbiter of whether or not an edit ought to be considered good faith.
- The model should not be used outside of Spanish Wikiquote.
- Spanish Wikiquote uses the model as a service for facilitating efficient edit reviews or newcomer support.
- On an individual basis, anyone can submit a properly-formatted API call to ORES for a given revision and get back the result of this model.
https://ores.wikimedia.org/v3/scores/eswikiquote/441558/damaging
Ethical considerations, caveats, and recommendations
editSpanish Wikiquote decided to use this model. Over time, the model has been validated through use in the community.
This model is known to give newer editors lower probability of editing in good faith.
Internal or external changes that could make this model deprecated or no longer usable are:
- Data drift means training data for the model is no longer usable.
- Doesn't meet desired performance metrics in production.
- Spanish Wikiquote community decides to not use this model anymore.
Model
editPerformance
editTest data confusion matrix:
Label | n | ~True | ~False |
---|---|---|---|
True | 8361 | 8214 | 147 |
False | 561 | 241 | 320 |
Test data sample rates:
Rate | Sample | Population |
---|---|---|
sample | 0.937 | 0.063 |
population | 0.936 | 0.064 |
Test data performance:
Statistic | True | False |
---|---|---|
match_rate | 0.947 | 0.053 |
filter_rate | 0.053 | 0.947 |
recall | 0.982 | 0.57 |
precision | 0.971 | 0.691 |
f1 | 0.977 | 0.625 |
accuracy | 0.956 | 0.956 |
fpr | 0.43 | 0.018 |
roc_auc | 0.978 | 0.88 |
pr_auc | 0.984 | 0.658 |
Implementation
edit{
"type": "GradientBoosting",
"params": {
"scale": true,
"center": true,
"labels": [
true,
false
],
"multilabel": false,
"population_rates": null,
"ccp_alpha": 0.0,
"criterion": "friedman_mse",
"init": null,
"learning_rate": 1,
"loss": "deviance",
"max_depth": 7,
"max_features": "log2",
"max_leaf_nodes": null,
"min_impurity_decrease": 0.0,
"min_impurity_split": null,
"min_samples_leaf": 1,
"min_samples_split": 2,
"min_weight_fraction_leaf": 0.0,
"n_estimators": 700,
"n_iter_no_change": null,
"presort": "deprecated",
"random_state": null,
"subsample": 1.0,
"tol": 0.0001,
"validation_fraction": 0.1,
"verbose": 0,
"warm_start": false
}
}
{
"title": "Scikit learn-based classifier score with probability",
"type": "object",
"properties": {
"prediction": {
"description": "The most likely label predicted by the estimator",
"type": "boolean"
},
"probability": {
"description": "A mapping of probabilities onto each of the potential output labels",
"type": "object",
"properties": {
"true": {
"type": "number"
},
"false": {
"type": "number"
}
}
}
}
}
https://ores.wikimedia.org/v3/scores/eswikiquote/441558/damaging
Output:
{
"eswikiquote": {
"models": {
"damaging": {
"version": "0.5.0"
}
},
"scores": {
"441558": {
"damaging": {
"score": {
"prediction": false,
"probability": {
"false": 0.980556466810999,
"true": 0.01944353318900106
}
}
}
}
}
}
}
Data
editLicenses
edit- Code: MIT license
- Model: MIT license
Citation
editCite this model card as:
@misc{
Triedman_Bazira_2023_Spanish_Wikiquote_goodfaith,
title={ Spanish Wikiquote goodfaith model card },
author={ Triedman, Harold and Bazira, Kevin },
year={ 2023 },
url={ https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Spanish_Wikiquote_goodfaith_edit }
}
- ↑ Halfaker, A., Geiger, R. S., & Terveen, L. G. (2014, April). Snuggle: Designing for efficient socialization and ideological critique. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 311-320).