Research:A System for Large-scale Image Similarity

13:41, 4 August 2021 (UTC)
Duration:  2021-08 – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

After several requests from different parts of the movement, the Research team is working on an image similarity tool. The tool will take as input an image and return the most similar images in Wikimedia Commons.


We will compute image "embeddings", namely a compact images representations containing numerical summaries of the main image characteristics. We will then compute similarity between images based on these features.


  1. Investigate best tools and methods to efficiently compute image similarity at scale
  2. Implement a first prototype for large-scale image similarity
  3. Iterate on the prototype and implement a public-facing tool