Learning patterns/Uploading bibliographical or artwork metadata using OpenRefine

A learning pattern forGLAM

Uploading bibliographical or artwork metadata using OpenRefine

problemHow to successfully prepare and process batch integrate/export the metadata to Wikidata using the tool OpenRefine.

solution.

creator• Wikilover90

discuss

endorse

created on19:13, 8 March 2020 (UTC)

What problem does this solve?

Upload large set of bibliographical dataset such as literary works,author biodata or artwork metadata to integrate into Wikidata, as part of a content donation from an institution or some other project with advanced features such as reconciliation to avoid duplicate items. This learning pattern can be especially useful for the Wikidata projects such as Sum of all Art and Sum of all Authors.

What is the solution?

Install OpenRefine[edit | edit source]

Before installing, make sure that the latest version of Java or JDK installed is installed on your computer. Download Java here: https://www.java.com/download/
Download Pattypan from: https://openrefine.org/download.htmland save the .dmg file.
You need to have supporting Operating System on your device to run OpenRefine from Windows, Linux, macOS.

Create a spreadsheet to be filled in

Prepare your dataset to be uploaded in format of one of these formats: CSV file, Web Addresses (URLs), or Google Spreadsheet.

(Note: It is essential to correct any spelling errors, in the names of the items to avoid creation of duplicate items).

Getting Started

Pattypan - selection of description fields

Open OpenRefine file.
Click choose file and navigate to a file on your computer or online link of website or google drive containing the dataset you would like to upload or simply paste the dataset into the clipboard.
Click Begin
On the top right, give the name of the project and Click Start
Make sure you are logged in with your username.

Reconciliation

OpenRefine Reconcile Matching Wikidata Identifiers

In OpenRefine terminology, reconciliation is the process of linking free-text tabular cells to identifiers in knowledge bases. OpenRefine's built-in reconciliation capabilities make it a versatile tool to reconcile tabular data to a wide range of databases, including Wikidata. Use multiple columns in your dataset and match them against values of properties in Wikidata, which refines the reconciliation score and acts as a tiebreaker between namesakes.

Click Arrow next to the column name.
Click ReconcileStart Reconciling .
Click Start Reconciling .
Select Wikidata (en). Alternatively, you can install a version of the Wikidata reconciliation service for your language. Open the reconciliation dialog and click Add Standard Service. The URL is https://tools.wmflabs.org/openrefine-wikidata/pa/api where "pa" is replaced by your language code. When reconciling using this interface, items and properties will be displayed in your language if a translation is available.
Select the type to reconcile each cell to an entity of one of the mentioned types:(example:human, written work or location) and/or choose the appropriate option from Reconcile against type: ; Reconcile against no particular type; Auto-match candidates with high confidence.
Depending on the amount of data, the Wikidata reconciliation service processes about 3 rows per second for the process after which you will see that the reconciliation data in the cells.
In the reconciled column, you will see either the cell was successfully matched: it displays a single dark blue link or a few candidates are displayed, together with their reconciliation score, with light blue links. You need to pick manually the correct one. For each matching decision you make, you have two options: either Click match this cell only(), or also use the same identifier for all other cells containing the same unreconciled value (). You can Search for match for the correct Wikidata identifier and if no value matches your case, click on Create New Item.
Once a column of your table is reconciled to Wikidata, you can pull data from Wikidata, creating other columns in your dataset. If there are multiple claims for a given property, the values will be grouped as records in OpenRefine: they are stored in additional rows where the original reconciled column is blank. OpenRefine's record mode might therefore be more suitable for the later transformations you want to carry out on your table.

Dataset Augmentation with Schema

Dataset Schema

New issues show the problems in the Schema or Records automatically

Preview of the Wikidata Items

Click Wikidata on top right of the screen.
Click Export Schema.
Schema page would be opened, add the required items relevant to your dataset.
New issues would be created in the Issue tab. Check the issues for the problems and make changes accordingly in the record and schema.
Click Save Schema .
You can view the items in the preview tab before uploading to see how the dataset will appear as the Wikidata edits and inspect them manually.

Uploading to Wikidata

Open Refine export to Wikidata

Click Wikidata on top right of the screen.
Click Upload edits to Wikidata .
A dialogue box appears with your dataset, write briefs words of the edit in the edit summary.
Click Upload

Things to consider

Installation:Make sure that the latest version of Java is installed on your computer. Download Java here: https://www.java.com/download/.
Data quality: If you create new properties, make sure to check the pre-existing items on Wikidata, the identifiers may have different spelling or case due to which the existing items does not show up during reconciliation process. You can Search for match for the correct Wikidata identifier or proactively find those identifiers and map them to Wikidata items to ensure the prevent the creation of duplicate items.
After completing working on the schema, analyze and fix any issues raised automatically before exporting to Wikidata to avoid errors in the uploading process also, to avoid the possibility of important information relevant to the items missing in the exported dataset.
This learning pattern can be useful when there are simple set of datasets that need mass upload and integration in Wikidata project, such as bibliographical database, author biodata, etc. For other type of datasets, you may wish to consult:https://github.com/OpenRefine/OpenRefine/wiki/User-Guide.

When to use

This tutorial can be useful when there are simple set of datasets that need mass upload and integration in Wikidata project, such as bibliographical database, author biodata, etc.

When uploading a batch of dataset to Wikidata (e.g. Artist, Photographer, Institution, License) and/or
When flexibility and control over descriptions is needed.
For GLAM projects: in order to create or mass import/export the meta data to Wikidata.
This software can be downloaded for Windows, Linux and Mac.

Learning patterns/Uploading bibliographical or artwork metadata using OpenRefine

Contents