Machine learning models/Production/gogologo


Wikimedia Commons is a multimedia repository of publicly usable files, which must be released under a free license. Hence, files that are subject to copyright are candidates for deletion. Understanding copyright is a complex task that can lead to its infringement. Following an analysis of deletion requests, we observed that a significant amount of media are deleted due to copyright violations, typically ranging from freedom of panorama to threshold of originality reasons. Non-free logo images both stand out as the second most frequent reason for deletion and represent a fairly unambiguous target that fits a machine learning task.

Model card
This page is an on-wiki machine learning model card.
A diagram of a neural network
A model card is a document about a machine learning model that seeks to answer basic questions about the model.
Model Information Hub
Model creator(s)MFossati_(WMF)
Model owner(s)WMF Structured Content team
Model interfaceCommons API
CodeGitLab
Uses PIINo
In production?Yes
Which projects?Commons
Given an image file on Commons, detect whether it's a logo.

Content on Commons is usually curated by the community, with specialized contributors that patrol new uploads and delete inappropriate files. We argue that automatic approaches to detect problematic media can alleviate moderators' burden, thus simplifying problematic media detection. Therefore, we trained an image classifier with available Commons images to predict whether a given input image is a logo. The model is publicly available through a Commons API endpoint, and should be used to distinguish logo images from non-logo ones at a high level. On the other hand, it's not suitable to drill down into specific classes of graphic images: for instance, coat of arms, glyphs, chemical structures, or flags are likely to get classified as logos.

Motivation

edit

Copyrighted logo images on Wikimedia Commons are the second reason for media deletion, according to an analysis of deletion requests. This model detects them and aims at facilitating content moderation through automatic identification of problematic media.

Users and uses

edit
Use this model for
distinguishing logo images from non-logo ones.
Don't use this model for
fine-grained classification of graphics like diagrams or road signs.
Current uses
Monthly datasets of logo uploads, announced in the Commons Administrators' noticeboard, e.g., November 2024.

Ethical considerations, caveats, and recommendations

edit

The model offers a high-level distinction between graphic images (typically logos) and photographic ones. It's not suitable to drill down into specific classes of graphic images: for instance, coat of arms, glyphs, chemical structures, or flags are likely to get classified as logos.

Model

edit

Performance

edit
  • Test dataset: available Commons images
  • # test samples: 47,976 - half belonging to commons:Category:Logos, half random
  • accuracy: 96.9
  • AUC precision/recall: 98.8
  • AUC ROC: 99
  • loss: 10.2

Metrics definitions

edit


Implementation

edit
Model architecture

Image classifier with an EfficientNetV2 backbone pre-trained on the ImageNet classification task (i.e., efficientnetv2_b0_imagenet preset from [1]). Fine-tuned on available Commons images.

# Layers & their parameters
Input = 0
EfficientNetV2 backbone = 5,919,312
Global average pooling 2D = 0
Predictions = 2,562

# Parameters
Total = 17,644,408 (67.31 MB)
Trainable = 5,861,266 (22.36 MB)
Non-trainable = 60,608 (236.75 KB)
Optimizer = 11,722,534 (44.72 MB)

# Dataset
Validation split = 0.2
Image size = (224, 224)
Batch size = 64

# Data augmentation
Contrast factor = 0.11
Rotation factor = 0.16
Translation factor = 0.084

# Model
Classes = 2
Epochs = 25
Optimizer = Adam
Learning rate = 1e-2
Loss = categorical cross-entropy
Output schema
{
  "filename": <Commons file name>,
  "target": "logo",
  "prediction": <logo probability score (0,1)>,
  "out_of_domain": <non-logo probability score (0,1)>
}
Example input and output

Input:

$ curl 'https://commons.wikimedia.org/w/api.php?action=mediadetection&format=json&formatversion=2&filename=Kanion_Co.png'

Output:

{
  "predictions": [
    {
      "filename": "Kanion_Co.png",
      "target": "logo",
      "prediction": 0.9978,
      "out_of_domain": 0.0022
    }
  ]
}

Data

edit
Data pipeline
  • Download a dataset of Commons image thumbnails from the API:
  • Build the training & validation sets:
import keras

train, val = keras.utils.image_dataset_from_directory(
    INPUT_DIR,
    label_mode='categorical',
    class_names=('out_of_domain', 'logo'),
    batch_size=64, image_size=(224, 224),
    seed=1984, validation_split=0.2, subset='both',
)
  • Augment the training set:
import tensorflow as tf

def augment(image, augmentation_layers):
    for layer in augmentation_layers:
        image = layer(image)
    return image

augmentation_layers = [
    keras.layers.RandomContrast(0.11, seed=1984),
    keras.layers.RandomFlip(seed=1984),
    keras.layers.RandomRotation(0.16, seed=1984),
    keras.layers.RandomTranslation(
        height_factor=0.084,
        width_factor=0.084,
        seed=1984,
    ),
]
train = train.map(
    lambda img, label: (
        augment(img, augmentation_layers),
        label,
    ),
    num_parallel_calls=tf.data.AUTOTUNE,
)
Training data
24 k samples, half logos, half out of domain.
Test data
48 k samples, half logos, half out of domain.

Licenses

edit