Wikimedians in Residence Exchange Network/GLAM manifesto 2023

Draft of a document to describe some of the challenges of GLAM wiki work by Wikimedians in Residence and other community members. Please feel free to edit or comment!

Overview edit

In the last decade, open access initiatives and Wikimedia communities have successfully engaged numerous cultural and heritage institutions around the world. Whether it is through free licensing of images or embracing Wikidata, these GLAM and other memory institutions are heeding the call in our movement strategy:

By 2030, Wikimedia will become the essential infrastructure of the ecosystem of free knowledge, and anyone who shares our vision will be able to join us.

They have joined us in the GLAM Wiki community in numbers greater than ever, which is to be celebrated. But in recent years, our technical toolset is facing challenges in meeting these infrastructure needs.

Challenges edit

 
Example data and media partnerships workflow for Wikidata.

We have done better in recent years to document ways[1][2] for GLAM wiki partnerships to emerge. However, we are also failing in our ability to follow-through and support the needed phases of these new partnerships:

  • Contribution and ingestion of metadata and images
  • Enrichment and co-creative use of content
  • Measuring impact of contributions and reporting metrics

Unfortunately, the tools to engage our open knowledge partners are fragile. We have not updated our approach, our toolset, our support, or our planning for this new reality.

  • We have been reliant on the innovative tools developed by Magnus Manske that we appreciate greatly. However, for too long, these have been supported only in a "best effort" manner. That we have so many partnerships and efforts relying on this state of affairs creates an existential risk to the outreach and partnerships the GLAM wiki movement has worked hard to establish.
  • Of these, the metrics and measurement is perhaps the most critical part. For Wikimedia projects to align with the nearly universal expectation of communications professionals and management at these GLAM institutions, metrics must become accessible and reliable to access.

In 2023, this has reached crisis levels given the volume and extent of GLAM wiki work in traditional spaces such as Wikipedia, Commons, and Wikidata, but now with Structured Data on Commons.

Issues edit

  •  
    List of GLAM-oriented tools from Wikidata page on Linked_open_data_workflow.
    Contribution
    • Pattypan as a user-friendly image mass upload tool for Wikimedia Commons has had authentication problems, leading to months of it being inoperable. Many GLAM institutions had to delay or postpone their open access contributions. It was finally restored after a volunteer developer stepped in to fix the problem. Now, Pattypan has been marked as end of life, with no ready replacement. Fortunately, OpenRefine has been able to take over some of this, but it is a much more complicated tool to use, requiring a lot more involved training.
    • Pywikibot, as the main Python framework for Wikimedia community developers and script writers, had no support for Structured Data on Commons until late 2021. This means much more intricate use of the raw Mediawiki API is required to work with SDC, making for a steep learning curve. Because of authentication requirements to use WikiCommons Query Service (see below) toolmaking has been slow or prohibitive.
  • Enrichment and use
    •  
      Quickstatements with basic Commons support in batch mode.
      Quickstatements is the Wikidata workhorse for batch contribution and enrichment of items and statements. However, it is often unreliable or sporadic in terms of performance and is well-known for silently failing, and not providing any useful error messages to help diagnose problems. It also has only very basic support for Structured Data on Commons, and in only one mode.
    • Listeria, the main tool for generating reports and worklists from Wikidata on Mediawiki, has been unreliable and intermittently working, as its use has grown and the system may not be scalable. Query timeouts or unsuccessful runs with no usable error messages have been common, with no way to reliably induce updates to lists. (Magnus Manske's blog post about development possibilities for Listeria)
    • WikiCommons Query Service (WCQS) requires authentication for SPARQL queries, making it difficult to use for GLAM wiki efforts such as visualization, reporting, measurement, and tool making. The promise of RDF databases to support a "federated query" across many linked open data sources cannot be realized with WCQS as an authenticated service. This undermines many of the benefits GLAM partners were expecting in their open access content donations.
  • Measuring impact
    • BaGLAMa2 - The metrics tool has failed intermittently, and has not been reporting accurate numbers since late 2022, leaving a significant hole in our ability to report metrics about GLAM collections on Commons. This has affected folks such as Khalili Collections, Naturalis Biodiversity Center, DPLA, Smithsonian Institution, Metropolitan Museum of Art, and more. It is also problematic in how it reports "views," even when working correctly, and may need a total rethink.
    • GLAMorgan - as a Commons file usage tool that examines a category tree, it has been hard to scale up for large categories, and seems to perform unreliably with the pageviews API. A fix has been found, but the general concern is that there is no collective support for these tools.
    • GLAM Wiki Dashboard - the tool provides a UX-friendly institutional page for GLAMs, from which it would be possible to keep track of key metrics and to generate periodic reports. The tool was developed by Wikimedia Israel and is in urgent need of maintenance and development. Major concerns include the reliability of data outputs and the need to add more initiatives that have requested to be added to the tool. This platform is central for GLAM partnerships in Brazil and other communities. Large GLAM institutions, like the Metropolitan Museum of Art, are too large to track economically and have been removed from the dashboard. As a result, it was recently communicated that the ongoing maintenance of the Dashboard was being assisted by the WMF, in an attempt to move the entire service from Amazon Web Services to Wikimedia Cloud VPS. GLAM Wiki Dashboard is a fork of Cassandra, a WMCH project and service.

New approaches edit

How might we find new ways to address some of these shortcomings and future work in the GLAM wiki community? Brainstorming or bold proposals are most welcome.

  • Maintainability – Too much of the GLAM wiki ecosystem is lightly, or best-effort, supported. We use them in production contexts when they clearly aren't suited to the task, and have never been quality checked for those roles. For example, a major tool like Petscan has this problem, described in February 2023 when it had to be restarted manually: "Unfortunately Magnus has not set that up to restart automatically. Someone has to ssh into the instance, become Magnus' user, start a `screen` process, and finally start his custom service to make it work." Full description. Possible solutions: Might there be a way to identify key Magnus Manske-created tools that would merit moving into "supported" status, so that a dedicated team of either WMF staffers or volunteers could ensure more than one person constitutes a point of failure.
  •  
    Erik Zachte, creator of stats.grok.se, at Wikimania 2005. His volunteer work eventually led to formalizing stats.wikimedia.org and future Pageviews API efforts.
    A past success that acts as a model – For years, page view statistics for Wikimedia projects were reported on a volunteer-run service (stats.grok.se) by Erik Zachte. It was extremely useful for outreach and research, but had many problems with reliability and limited functionality. In 2015 onwards, the Foundation technical team redesigned and recreated the service from scratch, building a set of use cases, then a Pageviews API, then an interface. The result not only works reliably; it's now an essential part of educating new people (whether new editors or management at partner institutions) about Wikimedia and its reach. (Blog post by Magnus Manske with some historical background)
  •  
    WikiCommons Query Service launched in Q1'2022, still using the "beta" label at the request of the Wikimedia community because authentication is still required to use the SPARQL endpoint.
    "The query is the content"Wiki Commons Query Service (WCQS) was launched publicly as an authenticated service in Q1'2022. This came as a surprise to many people accustomed to the previous Wikidata Query (WDQS) service as an open and freely accessible SPARQL access point. The rationale was explained in an online meeting (January 10, 2022, notes and followup discussion) between Wikimedia users and the Wikimedia Foundation search dev team. It was revealed that perhaps unknown to even our own community, the existing WDQS was seen by internal staff as a fragile system with serious scalability problems, which is why the Wiki Commons Query Service would require being logged in, to provide for more reliable performing queries. While the Wikimedia community understands the need for a performant and scalable Blazegraph system in WCQS, an authenticated service is prohibitive in tool making, metrics reporting, or creating visual demos that can be shared with GLAM partners. It effectively becomes a registration "paywall" insulating content that was understood to be part of the "ecosystem of free knowledge." As part of a compromise, the Wikimedia community asked that if the WCQS service was going to remain authenticated, that it keep the "beta" label to indicate it was not the desired final experience. That "beta" label exists to this day. To understand the new paradigm of "queries as content" may require a shift in thinking. Searching, querying, and discovering are not just support systems to get to an article, an image, a digital asset, or a Wikidata item. The query itself, and the visualizations that are created with them, is the main content. As such, requiring people to log in to Wikimedia services to read or experience our "free" content should give us great pause.
  • Long term vision for the role of GLAM Wiki collaborations. GLAM & knowledge institutions are one of our major sources of multimedia quality content, probably alongside with campaigns (such as Wiki Loves Folklore, Earth, Butterflies, and the like). According to recent reports by the Community Resources team, GLAM is a strategy considered by ~70% of affiliates/grantees. However, the infrastructure and improvements on platforms are currently very much focused on a theory of individual contributors that does not necessarily match reality. More attention needs to be paid to the large scale efforts that are conducted inside knowledge institutions to create and contribute to the free knowledge ecosystem, from GLAMs to multilateral organizations like UNESCO and the like, and to the needs that power users have of more reliable & stronger tools. There's a need to better understand how most of the contributions to Commons are happening, who the contributors are, and how to serve their needs better.

Successes edit

There have been some examples in recent years of progress to address the needs of the GLAM wiki community. Among them include:

References for Structured Data on Commons (2021-2022) edit

  • Summary: This is one of the best success stories so far of WMF and GLAM community collaboration, where a need was identified, the feature was specified in careful community consultation, and the function was developed. This was all finished in an atypically quick span of April 2021 to January 2022.
Details

Thumbor (2022-2023) edit

  • After meeting between Maryana Iskander and GLAM wiki professionals in Washington, D.C. and New York City in January 2022, Thumbor was identified by the WREN community as a "shovel ready" project of immediate need for Wikimedia Commons.
  • https://phabricator.wikimedia.org/project/profile/1672/
  • In 2023, the Thumbor system was succesfully upgraded to a more modern Python 3 infrastructure.

OpenRefine edit

  • The funding of development of OpenRefine as a data contribution tool and the hiring of dedicated staff have contributed greatly to the GLAM wiki capabilities. We hope to see more of this going forward.

Links edit

Related writings edit

References edit

  1. "GLAM/Resources/Data and media partnerships workflow - Outreach Wiki". outreach.wikimedia.org. Retrieved 2023-02-22. 
  2. "Wikidata:Linked open data workflow - Wikidata". www.wikidata.org. Retrieved 2023-02-22.