AI Sauna/Generate alt-texts for historical images

Generate alt-texts for historical images edit

Description edit

Can we make historical images searchable using AI, without relying (much) on the metadata, similar to the prototype by the National Museum of Norway? There are two image sets among the AI Sauna Resources as well as a vector database, and LUMI for running AI models, these could be combined into a prototype.

The team edit

What were the roles of each?

Created by: Osma Suominen

Team members: Main coding by Osma. Prompting and documentation by Mona Lehtinen, Harri Hihnala, Julia Isotalo, Lu Chen, Vertti Luostarinen

Process edit

ALT-text best practices edit

  • Describe what can be seen in the picture, try to avoid biases and your own interpretations
  • Use plain language
  • Important info first
    • "What is in the forefront / fore ground?"
    • No use to start describing the background before the main characters
  • Don't start with "image of" or "photo of"
  • Don't include information that's already in the written image description or adjected text.
  • Additional information, such as the name of the photographer, is not included in the ALT-text
  • Always end with a dot.
  • Include text within the image
    • How to prompt this? If the text is hand-written and isn't easily readable, should write "photograph includes/has handwritten text"
  • Understand if the subject is a widely known (with specific name, for example, not just "a large building" but "the central railway station")
    • Can we get this information from the metadata for the AI?
  • Consider the context of the page

Prompts: edit

This is an alt text description. What can be seen in the front? what can be seen in the back? Is the photo coloured or black and white? indicate in the description if there's text in the picture. Do not use words image or picture in the description. Don't count the amount of things.

Testing edit

Test with prompting on llava-13b model on replicate.com Batch processing with Jupyter Notebook on LUMI.

Results edit

 

Our method edit

The data set was pre-made for testing. We started by gathering best practices for alt-texts. Then we went to manually try the llava-13b -model on replicate.com. We chose pictures from the data set and prompted the model to generate an alt-text. We aimed to find a good prompt for this task and see the results. For batch processing the same task, a Jupyter Notebook was made and ran on LUMI.

Resources we used edit

Computational resources edit

Lumi supercomputer, Jupiter Notebook, replicate.com ...

Data set edit

The data set consists of 5947 old photographs (until 1917). It is from the collections of the Helsinki City Museum, obtained from the Finna.fi discovery service. The data set with a full description can be found on Hugging Face

Conclusion edit

In general, making a good prompt is important & can be difficult. As it is now, while the llm does work surprisingly well, it is not perfect. The setup cannot be trusted to provide human-quality alt-texts on its own so intervention is needed.

What next edit

Do you wish to continue exploring this? What was not covered? What did you get curious about?

How about tagging / indexing the alt-text produced? How to make the automatically generated alt-texts better?

Links, images, documentation edit

Upload at least one image to Wikimedia Commons for the image of the page banner.

GitHub repository with code and results