GLAM CSI/User story – Image reconciliation uploader
< GLAM CSI
Persona: Casey, art museum collections manager
edit- Background: Casey has been working as a collections manager at a museum of fine art for the last year. The museum has recently released high-resolution versions of their open access collection to the public. Casey knows that the museum uploaded a mix of low and high resolution images to Wikimedia Commons many years ago, and wants to supplement what Commons has now.
- Goals: Provide greater public access to higher resolution images of their fine art collections, and with better metadata to make them discoverable.
- Skills: Knowledgeable of working with image metadata using python and other tools such as OpenRefine. They have uploaded using Upload Wizard, but doesn't have experience with bulk upload tools.
- Challenges: Limited technical support for large-scale digital projects, navigating the complexities of copyright permissions for artists' works, engaging a broader audience beyond the local community.
User Story: Consolidating and supplementing images on Wikimedia Commons
editAs Casey, the art museum collections manager...
I want to supplement existing Wikimedia Commons uploads with new high-resolution images and metadata from our collections
So that the collections on Commons are more complete and discoverable for re-use within Wikimedia projects, and for the rest of the world
User Scenario: Image reconciliation and uploading
editStep | Narrative | Notes |
---|---|---|
1 | Casey finds a Wikimedia Commons category of uploaded files from their museum's collection performed by their institution over the years and wants to supplement it with new or updated files. | |
2 | Casey needs to find out what images are currently in Commons and to compare that to what is now available from the institution. | |
3 | Casey creates a dump of Commons filenames and relevant metadata. | |
4 | Casey compares the institution's content to what is on Commons to eliminate duplicates if possible, resulting in a working list of files. | |
5 | Casey then uploads the relevant files to Wikimedia Commons in the proper categories, knowing that there could be duplicates of the same image but that additional processes (bot cleanup) can help resolve any duplication later. Metadata is written to Structured Data on Commons. | |
6 | As a post-process, Casey enhances the media files with Structured Data on Commons, and eliminates any possible problems with duplicates. |
User Journey: Engaging with the Wikimedia Community and Beyond
editPhase | Narrative | Challenges | Tools and links |
---|---|---|---|
Preparation | Casey identifies a collection of objects and images their institution is interested in contributing to and enhancing on Commons. | Spreadsheet or database tool | |
Permissions | The institution ensures rights are cleared for uploading images from their collection. | Wikimedia Commons permissions system is rather complex if the uploader does not own the copyright | Commons:Volunteer_Response_Team requires an email to the VRTS system. |
Discovery | Casey finds a Wikimedia Commons category of uploaded files from their museum's collection performed by their institution over the years and wants to supplement it with new or updated files. The category also has some uploaded files by volunteers, making things a bit more complex. | Uploads performed over time by multiple parties may be inconsistent | Petscan, pywikibot |
Normalization | Casey needs to find out what images are currently in Commons and compare that to what is available from the institution. They use various tools to explore and consolidate files and clean up the Commons category tree. | Data and category cleanup required before proceeding | Cat-a-lot |
Metadata evaluation | Casey inspects the metadata of the Commons files, to see if there are any unique identifiers in the description fields, such as a URL pointing to the original file or object page, or if there is an accession number or catalog number so that it can be exactly matched with the institution's records. | ||
Metadata in Commons templates (e.g. Artwork) being "semi-structured" isn't easy to work with, requiring coding solutions. | pywikibot, PAWS | ||
Metadata in Structured Data on Commons still not mature, with the query service (WCQS) hard to use in a scripting/bot environment. | WikiCommons Query Service (WCQS) or OpenRefine | ||
Comparison | Casey creates a dump of Commons filenames and relevant metadata – unique identifiers, basic resolution, file size. Casey then compares them to the available files from the institution, seeing if they are the same or different. | Matching of files through checksums is imperfect, so comparison of basic resolution and file size is needed | Google Sheets and/or OpenRefine, or pywikibot |
Task list generation | Casey can eliminate duplicates if the object number, filesize and resolution are the same. Otherwise, a list of files to be uploaded is generated, with relevant metadata. | Generated upload list may contain duplicates of previous uploads | Google Sheets, python |
Upload | Casey bulk uploads the files to Wikimedia Commons in the proper categories, knowing that there could be duplicates of the same image, but in different resolutions. | ||
Bulk upload options to Commons are varied, depending on complexity of metadata. | Pattypan, url2Commons, flickypedia, pywikibot or OpenRefine | ||
Run a bot cleanup procedure to help resolve any duplicated uploads. | pywikibot | ||
Perform more individual categorization though Cat-a-lot runs slower than in the past due to API limits. | Cat-a-lot gadget | ||
Write relevant data to Structured Data on Commons | Quickstatements/Petscan for SDC, pywikibot, or OpenRefine | ||
Feedback and Follow-up | Monitors the usage of uploaded images, gathers feedback from the Wikimedia community, and assesses metrics from relevant tools. | Use Commons and GLAM specific metrics tools. | GLAMorgan |