FindingGLAMs/White Paper/FindingGLAMs

Expanding what is possible around GLAMs on the Wikimedia projects
A White Paper as Guidance for Future Work
developed as part of the FindingGLAMs project

Case Study 1: FindingGLAMs – Finding the Galleries, Libraries, Archives and Museums edit

Key facts edit

Time: August 2018 – February 2020

Organizations involved: UNESCO, Archives Portal Europe

Wikimedia/free knowledge communities involved: Wikimedia Sverige, Wikimedia Foundation

Keywords: GLAM, Wikidata

Key conclusions edit

  • Wikidata is a suitable platform to develop a global database of cultural heritage institutions, as it is free, structured, multilingual and open to edit for everyone.
  • By surveying the availability of GLAM datasets, we have brought the community's attention to the wealth of pre-existing data, which – even if it cannot be imported to Wikidata due to copyright restrictions – can be used e.g. as sources in Wikipedia articles.
  • Due to its scope and flexibility, Wikidata has a steep learning curve. In order to recruit new users more easily, the development of user-friendly tools is crucial.
  • Communication and documentation are important tools in reaching out to Wikidata editors. In our project, they helped us make clear why work on GLAM data is important, and how to start contributing.

Background edit

Having an overview of the cultural heritage institutions of the world is important to help them gain protection through visibility. In many natural and man made disasters, cultural heritage institutions have been damaged, lack of basic information (e.g. their location) has made response more difficult. This project will work towards helping solve this problem. Making this information visible and explorable will let more people learn about cultural heritage from many different cultures, it will create new insights and new knowledge.

The Wikimedia platforms are used and edited around the world – as of February 2020, Wikipedia has 299 active language versions. On Wikimedia Commons, photos, artworks and other media sourced from and related to cultural heritage institutions are collected. Wikidata, a free and open database, has a flexible structure that is well suited for storing data about everything under the sun, which can easily be edited, queried, analyzed and re-used. All of this created by volunteers and available to everyone, for free. In short, the Wikimedia platforms are the obvious choice for creating a truly global, multilingual, accessible and free database of cultural heritage institutions.

Wikidata contains over 150,000 Wikidata items representing the world's cultural heritage institutions (galleries, museums, libraries and archives). While it's hard to say how this corresponds to the total number of GLAMs around the world, it is clear that much remains to be done. For example, IFLA estimates that there are over 2.5 million libraries in the world;[1] there are 97,188 on Wikidata.[2] At the country level, it becomes obvious not only how much data is missing, but also that the degree of coverage differs greatly across nations: Kyrgyzstan, Turkmenistan and Madagascar are represented with 7, 9 and 10 GLAMs respectively, despite each country boasting millions of inhabitants.[3] Even without official data about the number of cultural heritage institutions in those countries, it is easy to see that much work remains to be done if Wikidata is to show even a moderately accurate picture of Asia and Africa.

Problem edit

The focus of this case study was adding and improving data about the world's cultural heritage institutions to Wikidata, as well as researching and evaluating different ways of doing that. We were particularly interested in investigating how to recruit new Wikidata contributors from the GLAM sector, as they have both professional expertise in the topic and an inherent interest in sharing their knowledge with the general public.

Implementation edit

Dataset index: surveying what we know edit

Many datasets of cultural heritage institutions already exist. They differ greatly in format, scope, completeness, verifiability and copyright terms. Some of them can be uploaded directly to Wikidata – as long as they are in some machine-readable, structured format and covered by a Wikidata-compatible license, such as CC0 or Public Domain. But even if they cannot be imported en masse, they are still valuable to Wikimedians: they give us an insight into areas of the world where Wikidata lacks data, and can be used as reference. That’s why throughout the whole project, we have been compiling an index of GLAM datasets.[4] It combines our own research with the findings of Wikimedians around the world, who have contributed with their local knowledge and language skills.

Data uploads: every little bit helps edit

We imported a small part of the identified datasets to Wikidata. The selection of datasets to upload was fully determined by their copyright status; while it is common knowledge that Wikidata editors do not always agree on what data is kosher to upload, we stayed on the safe side and only uploaded datasets that were clearly licensed CC0 or public domain.

OpenRefine was a crucial tool in the upload process, due to its flexibility; our data came from different sources and in different formats, but could all be explored and normalized in OpenRefine. The robust reconciliation capabilities were key to matching the data to existing Wikidata items and avoiding creating duplicates.

FindingGLAMs Campaign: developing user-friendly tools for newcomers edit

While uploading datasets is an efficient way to quickly increase the number and quality of GLAMs on Wikidata using reliable sources, we cannot only rely on it if we want to achieve our goal: putting all the world's GLAMs on Wikidata. Many of the published datasets are copyrighted and thus cannot be copied en masse. More importantly, such datasets do not exist for every country and every cultural sector in the world. This is where the power of Wikidata – its community – really comes into play. Every day, Wikidatans manually create, review, update and enrich thousands of articles. Just like Wikipedia, Wikidata exists thanks to millions of individual contributions.

Increasing the number of editors interested in cultural heritage is thus one way of coming closer to our ambitious goal. That's why we had an idea to make it easier for newcomers – especially GLAM staff – to take their first steps on Wikidata. We know from our experience in education that Wikidata has a steeper learning curve than e.g. Wikipedia. Newcomers have to internalize a lot of information before they feel comfortable editing: how the data in Wikidata is structured, how to filter and find data using the Wikidata Query Service[5] (which requires learning at least the basics of SPARQL), and most importantly – how the particular types of items they are interested in is normally modelled. In particular the last element is often not documented well, if at all; active editors rely on their experience and knowledge of unwritten rules. Even something as simple as finding the right place to ask for help might be difficult.

As Wikidata can be edited remotely via an API, we decided to develop a user-friendly web application specifically for browsing, displaying and editing data about cultural heritage institutions. The application would enable users to learn the basics of Wikidata editing without having to interact with its actual interface. Most importantly, it would provide a GLAM-specific editing form, presenting the user with some fields commonly used in GLAM items, such as administrative location, geographical coordinates, social media handles, etc. On Wikidata, one has to know which properties to choose and where to find them; in our application, they users would be guided along the way.

We hired a contractor to develop the application and the first usable version, named Monumental, was ready in May 2019.[6] This version prioritized libraries in Sweden, and we tested it with a number of Swedish librarians. Our goal was to have the application finished in late 2019 or early 2020, and to use it in a global campaign aimed at GLAM staff, especially in regions with little GLAM coverage on Wikidata.

Due to reasons beyond our control, the development of the tool could not be finished in time, and we had to revise our plans. Instead of a campaign aimed at GLAM staff, we decided to run a competition for the Wikidata community.

FindingGLAMs Challenge: editing together edit

Competitions can be a good tool to highlight particular areas in need of improvement and engaging the community in focused work. We have previously run competitions such as the WikiGap Challenge and the UNESCO Challenge, so we know that gamification works. As our original plan to recruit new editors using a newly developed tool fell through, we decided to instead run an activity aimed at the international Wikimedia community: the FindingGLAMs Challenge.[7]

The goal of the Challenge was simply to add as much information to Wikidata items of GLAM institutions as possible. Our intention was to engage community members in editing data in this area, hoping to increase their awareness of how much data is missing and that every single contribution counts. We also wanted the Challenge to be a memorable finale of the whole FindingGLAMs project, emphasizing that without international collaboration, we would not have been able to achieve what we did.

Wikidata Tours: taking the first steps edit

Documentation is a crucial element of every platform for user participation. This is especially true in the case of Wikidata, which, as mentioned previously, can feel intimidating to newcomers. However, despite the complexity of the platform, it is possible to start making valuable contributions after learning the very basics.

The Wikidata Tours[8] distill the absolute basics of editing into a series of short, interactive tutorials aimed at empowering beginners to take their first steps. They were first created back in 2014, but were not very detailed, offering only a brief introduction to statements and items – the building blocks of Wikidata. Furthermore, the software they were built on contained bugs, making it impossible for volunteers to improve the tours or create new ones.

We collaborated with the developers to find those bugs and have them fixed, and then created six new tours, focusing on small concrete tasks that editors of GLAM items might want to do, such as adding the geographical coordinates and location of an institution, or linking to a relevant photo on Wikimedia Commons. The tours can be edited and improved by volunteers, and they can also be translated into other languages.

The tutorials are linked from the main page of Wikidata, making them easily accessible to newcomers. Thanks to our contribution, many people’s first experience of editing Wikidata has hopefully been made more positive.

Communication and building awareness edit

The FindingGLAMs idea cannot, by definition, be realized by a single working group or organization. It requires that people from many different countries collaborate, contributing with their local knowledge and language skills, helping data owners share their resources with the community and engaging volunteers. That is why we put a lot of effort into building awareness of our project, in hopes of sparking a fire that will burn long after the project has formally ended. We used our contact network to speak directly to representatives of other Wikimedia affiliates, as they have the resources to spread the FindingGLAMs message locally.

We participated with posters and talks in several conferences, of which Wikimania 2019 in Stockholm, Sweden, deserves a special mention, as it gave us a unique opportunity to address some of the world’s most enthusiastic and knowledgeable Wikimedians face to face. We were also given the chance to present our project to Ambassadors to UNESCO and other delegation staff. This gave us the opportunity to highlight the importance of the Wikimedia projects for the world’s cultural heritage institutions and encourage them to share open data from their own countries.

Outcome edit

Dataset indexing and uploads edit

In our dataset index, we collected information about 67 datasets from 44 countries. The project revealed significant discrepancies in access to GLAM datasets around the world; for example we only found one dataset covering Africa, and it was created by researchers aiming specifically to improve the very poor situation on the continent in this respect.[9] Europe was the continent with the largest number of datasets, which aligns with our experience in working with and educating about open data. The vast majority of the datasets do not have a license compatible with Wikidata. That's why we only could process and upload a small number of datasets. Despite this limitation, we managed to add a significant amount of data to Wikidata by editing over 38,000 items, most of which were created from scratch. Possibly the most interesting of the datasets was the US Public Libraries Survey[10], comprising data about 9,000 public library systems and 17,000 individual library outlets – a good picture of the American library landscape. The majority of those had not existed on Wikidata previously, despite the US being privileged as a first-world country with many Wikimedians and data sources.

More GLAM data under an open license edit

One of the successful outcomes of the project was facilitating the release of a dataset of European archival institutions under an open license. The dataset, owned and developed by Archives Portal Europe[11], was under copyright when we included it in our dataset index. We found the data very interesting and valuable for our project, as many of the described institutions did not have Wikidata items. Since the goal of Archives Portal Europe is to provide information about archival collections under the CC0 license, we thought they might be willing to also release their institution directory as open data. We reached out to them directly, informing them about Wikidata and our work. The response was positive, and a couple months later most the data was made free – after the relevant data providers had expressed their agreement.

What was particularly interesting about this case is that Archives Portal Europe themselves did not have the power to release the data under an open license. The data about the institutions was provided by their regional partners, and it was up to each and every one of them to decide whether an open license could be used. That is why the process took a long time – about half a year from the first contact. Nevertheless, the positive outcome shows that active work towards making data open does pay off. Thanks to Archives Portal Europe serving as a hub for European archives, institutions from several countries have been informed about the value of Wikidata and Linked Open Data. Hopefully this will build a foundation for future collaboration and provide a model for coordinating organizations to act as an internal champion for license change.

FindingGLAMs Challenge edit

The week-long FindingGLAMs Challenge was advertised in social media channels with an international audience, such as Wikimedia- and GLAM-focused Facebook groups and Twitter. We also contacted the representatives of other Wikimedia organizations directly so that they could pass on the information to their local communities using their prefered channels and languages. The interest in the Challenge exceeded our expectations; 90 participants signed up, of which 53 did at least one edit to a GLAM item on Wikidata. 9 participants did at least 500 edits each; 21 participants, that is nearly half of the active participants, did at least 100 edits each. In total a staggering 19,200 improvements were made in a single week of the Challenge.[12]

The structure developed for the Challenge can be easily reused in the coming years and the activity can hence be re-organized for a very low cost. Furthermore, our experience and expectation is that the engagement is only likely to grow when repeated.

Future edit

The FindingGLAMs project had a very limited timeline and resources considering its ambitious scope. We knew, obviously, that putting all of the world's millions of GLAMs on the map over a year and a half would be impossible. Even though the project has formally ended, it was our intention that it would become the first step to future work – done not only by us, but also by other Wikimedia organizations and volunteers around the world.

The dataset index can continue to be maintained by the community. More importantly, it can serve as a starting point for Wikimedians to find trustworthy information about cultural heritage institutions and for Wikimedia affiliates to reach out to data owners to share their resources under open licenses. The awareness we have built around the project – and, more generally, around the important role the Wikimedia platforms play in engaging, bringing together and educating about cultural heritage institutions – will hopefully lead to more Wikimedians looking out for free GLAM data and supporting each other in including it on Wikidata, Wikipedia and Wikimedia Commons.

The data we uploaded to Wikidata will continue to increase in value every time it is viewed, queried and edited. Just like every other item on Wikidata, it can be improved by anyone. Errors can be corrected, outdated information can be updated, labels and descriptions can be translated to additional languages, GLAMs can be photographed and Wikipedia articles written and linked. The FindingGLAMs Challenge showed that there's a lot of interest in improving data about GLAMs on Wikidata – people clearly care about their local cultural heritage institutions.

While it is regrettable that we did not get an opportunity to launch the full version of the Monumental software, the early tests and the discussion with the community members were very promising. They made it clear that a tool like Monumental is necessary if we want to recruit new editors. Wikidata has a learning curve; many of its most active users have a background in databases, computing or at least a long experience with other Wikimedia platforms. This is not something that can be expected from new editors if we want to increase their number – and we absolutely do if the goal of finding all of the world's GLAMs is to be achieved. We hope that our experience with Monumental will spur research into, and development of, editing tools not primarily aimed at existing Wikidata editors.

References edit