AI Sauna/AI for Wikimedia Commons

AI for Wikimedia Commons edit

Description edit

Experiment with existing AI models to classify, categorize, describe and title Commons images, also to help Wiki Loves Monuments

The team edit

What were the roles of each?

Created by: Ferdinando Traversa, Jens B., User:TuukkaH, User:Tohaomg, User:Piracalamina

Results edit

Our method edit

We started by exploring existing AI models to see if they are suitable to reach our goals.

For example: using an AI model to evaluate whether an image is suitable for WLM (e.g. it has no fingers, no people etc.). In Italy we have the panorama view sections that always gets a lot of images which are not panorama views. Using AI could be an idea.

Categories sugestions
Picture Question Model Answer good enough?
Archivo:Palacio Santos, interior 25.jpg What 10 Wikimedia Commons Categories can I use for this picture? blip-2 building, ceiling, floor, tile, tiled, tiled floor, tiled, tiled floor, tiled, tiled floor, tile no
I used four old files in c:Category:Photographs_of_Donostia-San_Sebastián_from_the_Foto_Car_Collection Asked for captioning blip-2 Captions are very generalistic, and not context aware it can't access Commons images directly, must be downloaded and uploaded. The captioning is not bad, but also not useful.
I used four old files in c:Category:Photographs_of_Donostia-San_Sebastián_from_the_Foto_Car_Collection
  • Asked for images descriptions
  • Asked to give relevant Wikipedia articles for the description
lava 13-b Description is very good, and could be interesting for describing the images. Anyway, it lacks cultural referencing. it can't access Commons images directly, must be downloaded and uploaded.
  • The image description is interesting, with small tweaks it could create a valid description.
  • Some of the articles don't exist, but the interesting thing is that they could exist (even, they should exist).
File:Statue-de-la-liberte-new-york.jpg Asked for Wikidata item of the image (Statue of Liberty) blip-2 "The image features the Statue of Liberty, a famous landmark in New York City, United States. The Wikidata item for the Statue of Liberty is Q1120." It recognizes a very evident element, but Nice model (Q1120) is not the Statue of Liberty
Gave a photo of Barack Obama. Then four images form the Wikipedia page of four very well known Basque personalities. Asked to recognize those. blip-2 Barack Obama was easily recognized. All the others were regonized as another person, with no relation. What is the limit to know someone?
File:Club Uruguay, Montevideo, Uruguay.JPG Which of the following Wikimedia Commons categories would you suggest for this picture: Arches in Uruguay,Balconies in Uruguay,Balustrades in Uruguay,Bartizans in Uruguay,Bathtubs in Uruguay,Battlements in Uruguay,Dome interiors in Uruguay,Coffered ceilings in Uruguay llava-13-b Based on the image, I would suggest the following Wikimedia Commons categories: Arches in Uruguay, Doors in Uruguay, and Wooden doors in Uruguay. The image features a large wooden door with an arched top, which is a notable architectural element in the scene. Yes! It can be problematic that it makes up Categories, but these categories make sense. We should avoid returning a parent category of a more accurate category that is also suggested. (We can't suggest both Doors in Uruguay and Wooden doors in Uruguay
File:Club Uruguay, Montevideo, Uruguay.JPG Which of the following Wikimedia Commons categories would you suggest for this picture: Arches in Uruguay,Balconies in Uruguay,Balustrades in Uruguay,Bartizans in Uruguay,Bathtubs in Uruguay,Battlements in Uruguay,Dome interiors in Uruguay,Coffered ceilings in Uruguay blip-2 Bartizans in Uruguay not really the most correct but sticks to the prompt
File:Alberobello_-_Chiesa_del_Carmine_-_2023-09-23_11-11-22_001.jpg In which of these category do this picture belong?

Alberobello train station
Buildings in Alberobello
Città dei Trulli (folk group)
Night in Alberobello
Streets in Alberobello
Urban squares in Alberobello

blip-2 Streets in Alberobello very good
File: Barletta_-_Basilica_del_Santo_Sepolcro_-_2023-09-04_17-39-07_004.jpg In which of these categories do this picture belong?

Architecture in Barletta‎

Barletta in art‎

Art in Barletta‎

Culture of Barletta‎

Disfida di Barletta‎

Economy of Barletta‎

Events in Barletta‎

History of Barletta‎

Landscapes of Barletta‎

Maps of Barletta‎

Palazzo Gentile (Barletta)‎

People of Barletta

Porto di Barletta‎

Roman Catholic Archdiocese of Trani-Barletta-Bisceglie‎

Streets in Barletta‎

Teleferica di Barletta‎

Barletta train station

blip-2 Architecture in Barletta very good
File: Barletta_-_Basilica_del_Santo_Sepolcro_-_2023-09-04_17-39-07_004.jpg In which of these categories do this picture belong?

Architecture in Barletta‎

Churches in Barletta

Barletta in art‎

Art in Barletta‎

Culture of Barletta‎

Disfida di Barletta‎

Economy of Barletta‎

Events in Barletta‎

History of Barletta‎

Landscapes of Barletta‎

Maps of Barletta‎

Palazzo Gentile (Barletta)‎

People of Barletta

Porto di Barletta‎

Roman Catholic Archdiocese of Trani-Barletta-Bisceglie‎

Streets in Barletta‎

Teleferica di Barletta‎

Barletta train station

blip-2 Churches in Barletta very good. It was able to identify it is a church
In which of these categories do this picture belong?

Agenzia nazionale per le nuove tecnologie, l'energia e lo sviluppo economico sostenibile‎ (6 C, 2 F)

Ancient Roman architecture in Rome‎

Art Déco architecture in Rome‎

Art Nouveau architecture in Rome

Baroque architecture in Rome‎

Byzantine architecture in Rome‎

Eclectic architecture in Rome‎

Gothic architecture in Rome‎

Gothic Revival architecture in Rome‎

Mannerist architecture in Rome‎

Neoclassical architecture in Rome‎

Paleochristian architecture in Rome‎

Rationalist architecture in Rome‎

Renaissance architecture in Rome‎

Romanesque architecture in Rome‎

Romanesque Revival architecture in Rome‎

Cityscapes of Rome

Culture of Rome‎

Economy of Rome‎

Films set in Rome‎

Films shot in Rome‎

Geography of Rome

Government of Rome‎

Historic Centre of Rome, the Properties of the Holy See in that City Enjoying Extraterritorial Rights and San Paolo Fuori le Mura‎ (5 C, 11 F)

History of Rome‎

Infrastructure in Rome‎

Inscriptions in Rome‎

Nature of Rome‎

Objects in Rome‎

Panoramics in Rome‎

People of Rome‎

Public services in Rome‎

Honorary citizens of Rome‎

Satellite pictures of Rome‎

Science in Rome‎

Things named after Rome‎

Twin towns of Rome‎

Villages in Rome‎

Archivio di Stato (Rome)‎

Art of Rome‎

Books about Rome‎

Books published in Rome

Bookshops in Rome

Catacombs of Rome‎

Cinema of Rome‎

Cinemas in Rome‎

Cuisine of Rome‎

Devotion in Rome‎

Education in Rome

Festivals in Rome‎

Folkstudio‎

GBR (television channel)‎

Gruppo dei Romanisti‎

Guilds of Rome‎

Homosexuality in Rome‎

Klimt 1918‎

Libraries in Rome‎

Media of Rome‎

Museums in Rome‎

Music of Rome‎

Playgrounds in Rome‎

Polish Institutes in Rome‎

Politics in Rome‎

Printers from Rome‎

Published in Rome‎

Rome Prize‎

Signs in Rome

Spoken Rome‎

Sports in Rome‎

Symbols of Rome‎

The Lust Syndicate‎

Theatres in Rome

Culture of the Vatican City‎

École française de Rome

blip-2 Ancient Roman architecture in Rome spot on!

Ideas edit

  1. Call the API that gives us possible commons categories from a coordinate (talk to Edward)
  2. Ask llava-1.6 model: Which of the following Wikimedia Commons categories would you suggest for this picture: [categories]
  3. Check if categories exist
  4. Fetch each category's subcategories.
  5. Ask llava-1.6 (?) to pick from the subcategories
  6. Repeat?
  7. Return the most specific categores

Resources we used edit

What resources did you use? Were they useful?

Conclusion edit

What did you learn?

  • Defusing/cleaning up categories with AI seems promising. A specifically trained model could yield even better and more precise results
  • We should not expect the model to know about Wikimedia Commons but suggest a set of categories
  • We should run our non-proprietary models on own servers

What next edit

Do you wish to continue exploring this? What was not covered? What did you get curious about?

Try apple AIM with specific training for our categories

Links, images, documentation edit

Upload at least one image to Wikimedia Commons for the image of the page banner.