AI Sauna/Generate alt-texts for historical images

Generate alt-texts for historical images

edit

Description

edit

Can we make historical images searchable using AI, without relying (much) on the metadata, similar to the prototype by the National Museum of Norway? There are two image sets among the AI Sauna Resources as well as a vector database, and LUMI for running AI models, these could be combined into a prototype.

The team

edit

What were the roles of each?

Created by: Osma Suominen

Team members: Main coding by Osma. Prompting and documentation by Mona Lehtinen, Harri Hihnala, Julia Isotalo, Lu Chen, Vertti Luostarinen

Process

edit

ALT-text best practices

edit
  • Describe what can be seen in the picture, try to avoid biases and your own interpretations
  • Use plain language
  • Important info first
    • "What is in the forefront / fore ground?"
    • No use to start describing the background before the main characters
  • Don't start with "image of" or "photo of"
  • Don't include information that's already in the written image description or adjected text.
  • Additional information, such as the name of the photographer, is not included in the ALT-text
  • Always end with a dot.
  • Include text within the image
    • How to prompt this? If the text is hand-written and isn't easily readable, should write "photograph includes/has handwritten text"
  • Understand if the subject is a widely known (with specific name, for example, not just "a large building" but "the central railway station")
    • Can we get this information from the metadata for the AI?
  • Consider the context of the page

Prompts:

edit

This is an alt text description. What can be seen in the front? what can be seen in the back? Is the photo coloured or black and white? indicate in the description if there's text in the picture. Do not use words image or picture in the description. Don't count the amount of things.

Testing

edit

Test with prompting on llava-13b model on replicate.com Batch processing with Jupyter Notebook on LUMI.

Results

edit
 

Our method

edit

The data set was pre-made for testing. We started by gathering best practices for alt-texts. Then we went to manually try the llava-13b -model on replicate.com. We chose pictures from the data set and prompted the model to generate an alt-text. We aimed to find a good prompt for this task and see the results. For batch processing the same task, a Jupyter Notebook was made and ran on LUMI.

Resources we used

edit

Computational resources

edit

Lumi supercomputer, Jupiter Notebook, replicate.com ...

Data set

edit

The data set consists of 5947 old photographs (until 1917). It is from the collections of the Helsinki City Museum, obtained from the Finna.fi discovery service. The data set with a full description can be found on Hugging Face

Conclusion

edit

In general, making a good prompt is important & can be difficult. As it is now, while the llm does work surprisingly well, it is not perfect. The setup cannot be trusted to provide human-quality alt-texts on its own so intervention is needed.

What next

edit

Do you wish to continue exploring this? What was not covered? What did you get curious about?

How about tagging / indexing the alt-text produced? How to make the automatically generated alt-texts better?

Links, images, documentation

edit

Upload at least one image to Wikimedia Commons for the image of the page banner.

GitHub repository with code and results