Machine learning models/Production/gogologo
Wikimedia Commons is a multimedia repository of publicly usable files, which must be released under a free license.
Hence, files that are subject to copyright are candidates for deletion.
Understanding copyright is a complex task that can lead to its infringement.
Following an analysis of deletion requests, we observed that a significant amount of media are deleted due to copyright violations, typically ranging from freedom of panorama to threshold of originality reasons.
Non-free logo images both stand out as the second most frequent reason for deletion and represent a fairly unambiguous target that fits a machine learning task.
Model card | |
---|---|
This page is an on-wiki machine learning model card. | |
Model Information Hub | |
Model creator(s) | MFossati_(WMF) |
Model owner(s) | WMF Structured Content team |
Model interface | Commons API |
Code | GitLab |
Uses PII | No |
In production? | Yes |
Which projects? | Commons |
Given an image file on Commons, detect whether it's a logo. | |
Content on Commons is usually curated by the community, with specialized contributors that patrol new uploads and delete inappropriate files. We argue that automatic approaches to detect problematic media can alleviate moderators' burden, thus simplifying problematic media detection. Therefore, we trained an image classifier with available Commons images to predict whether a given input image is a logo. The model is publicly available through a Commons API endpoint, and should be used to distinguish logo images from non-logo ones at a high level. On the other hand, it's not suitable to drill down into specific classes of graphic images: for instance, coat of arms, glyphs, chemical structures, or flags are likely to get classified as logos.
Motivation
editCopyrighted logo images on Wikimedia Commons are the second reason for media deletion, according to an analysis of deletion requests. This model detects them and aims at facilitating content moderation through automatic identification of problematic media.
Users and uses
editEthical considerations, caveats, and recommendations
editThe model offers a high-level distinction between graphic images (typically logos) and photographic ones. It's not suitable to drill down into specific classes of graphic images: for instance, coat of arms, glyphs, chemical structures, or flags are likely to get classified as logos.
Model
editPerformance
edit- Test dataset: available Commons images
- # test samples: 47,976 - half belonging to commons:Category:Logos, half random
- accuracy: 96.9
- AUC precision/recall: 98.8
- AUC ROC: 99
- loss: 10.2
Metrics definitions
edit- Accuracy
- Area under the curve (AUC), computed separately for each class and then averaged across classes, see also en:Receiver operating characteristic#ROC curves beyond binary classification
- AUC precision/recall
- AUC ROC
- model's loss function, i.e., categorical cross-entropy
Implementation
editImage classifier with an EfficientNetV2 backbone pre-trained on the ImageNet classification task (i.e., efficientnetv2_b0_imagenet
preset from [1]). Fine-tuned on available Commons images.
# Layers & their parameters
Input = 0
EfficientNetV2 backbone = 5,919,312
Global average pooling 2D = 0
Predictions = 2,562
# Parameters
Total = 17,644,408 (67.31 MB)
Trainable = 5,861,266 (22.36 MB)
Non-trainable = 60,608 (236.75 KB)
Optimizer = 11,722,534 (44.72 MB)
# Dataset
Validation split = 0.2
Image size = (224, 224)
Batch size = 64
# Data augmentation
Contrast factor = 0.11
Rotation factor = 0.16
Translation factor = 0.084
# Model
Classes = 2
Epochs = 25
Optimizer = Adam
Learning rate = 1e-2
Loss = categorical cross-entropy
{
"filename": <Commons file name>,
"target": "logo",
"prediction": <logo probability score (0,1)>,
"out_of_domain": <non-logo probability score (0,1)>
}
Input:
$ curl 'https://commons.wikimedia.org/w/api.php?action=mediadetection&format=json&formatversion=2&filename=Kanion_Co.png'
Output:
{
"predictions": [
{
"filename": "Kanion_Co.png",
"target": "logo",
"prediction": 0.9978,
"out_of_domain": 0.0022
}
]
}
Data
edit- Download a dataset of Commons image thumbnails from the API:
- one half belongs to commons:Category:Logos and its sub-categories, as returned by this PetScan query
- the other half is a random sample of available images
- Build the training & validation sets:
import keras
train, val = keras.utils.image_dataset_from_directory(
INPUT_DIR,
label_mode='categorical',
class_names=('out_of_domain', 'logo'),
batch_size=64, image_size=(224, 224),
seed=1984, validation_split=0.2, subset='both',
)
- Augment the training set:
import tensorflow as tf
def augment(image, augmentation_layers):
for layer in augmentation_layers:
image = layer(image)
return image
augmentation_layers = [
keras.layers.RandomContrast(0.11, seed=1984),
keras.layers.RandomFlip(seed=1984),
keras.layers.RandomRotation(0.16, seed=1984),
keras.layers.RandomTranslation(
height_factor=0.084,
width_factor=0.084,
seed=1984,
),
]
train = train.map(
lambda img, label: (
augment(img, augmentation_layers),
label,
),
num_parallel_calls=tf.data.AUTOTUNE,
)
Licenses
edit- Code: GNU General Public License v3.0
- Model: Creative Commons CC0 1.0