Learning patterns/Uploading bibliographical or artwork metadata using OpenRefine

A learning pattern forGLAM
Uploading bibliographical or artwork metadata using OpenRefine
problemHow to successfully prepare and process batch integrate/export the metadata to Wikidata using the tool OpenRefine.
solution.
creatorWikilover90
endorse
created on19:13, 8 March 2020 (UTC)


What problem does this solve? edit

Upload large set of bibliographical dataset such as literary works,author biodata or artwork metadata to integrate into Wikidata, as part of a content donation from an institution or some other project with advanced features such as reconciliation to avoid duplicate items. This learning pattern can be especially useful for the Wikidata projects such as Sum of all Art and Sum of all Authors.  

What is the solution? edit

Install OpenRefine[edit | edit source] edit

  1. Before installing, make sure that the latest version of Java or JDK installed is installed on your computer. Download Java here: https://www.java.com/download/
  2. Download Pattypan from: https://openrefine.org/download.htmland save the .dmg file.
  3. You need to have supporting Operating System on your device to run OpenRefine from Windows, Linux, macOS.

Create a spreadsheet to be filled in

Prepare your dataset to be uploaded in format of one of these formats: CSV file, Web Addresses (URLs), or Google Spreadsheet. edit

  • (Note: It is essential to correct any spelling errors, in the names of the items to avoid creation of duplicate items).
 
 
 

Getting Started edit

Pattypan - selection of description fields

  1. Open OpenRefine file.
  2. Click choose file and navigate to a file on your computer or online link of website or google drive containing the dataset you would like to upload or simply paste the dataset into the clipboard.
  3. Click Begin
  4. On the top right, give the name of the project and Click Start
  5. Make sure you are logged in with your username.

Reconciliation edit

 
OpenRefine Reconcile Matching Wikidata Identifiers

In OpenRefine terminology, reconciliation is the process of linking free-text tabular cells to identifiers in knowledge bases. OpenRefine's built-in reconciliation capabilities make it a versatile tool to reconcile tabular data to a wide range of databases, including Wikidata. Use multiple columns in your dataset and match them against values of properties in Wikidata, which refines the reconciliation score and acts as a tiebreaker between namesakes.

  1. Click Arrow next to the column name.
  2. Click ReconcileStart Reconciling .
  3. Click Start Reconciling .
  4. Select Wikidata (en). Alternatively, you can install a version of the Wikidata reconciliation service for your language. Open the reconciliation dialog and click Add Standard Service. The URL is https://tools.wmflabs.org/openrefine-wikidata/pa/api where "pa" is replaced by your language code. When reconciling using this interface, items and properties will be displayed in your language if a translation is available.
  5. Select the type to reconcile each cell to an entity of one of the mentioned types:(example:human, written work or location) and/or choose the appropriate option from Reconcile against type: ; Reconcile against no particular type; Auto-match candidates with high confidence.
  6. Depending on the amount of data, the Wikidata reconciliation service processes about 3 rows per second for the process after which you will see that the reconciliation data in the cells.
  7. In the reconciled column, you will see either the cell was successfully matched: it displays a single dark blue link or a few candidates are displayed, together with their reconciliation score, with light blue links. You need to pick manually the correct one. For each matching decision you make, you have two options: either Click match this cell only(), or also use the same identifier for all other cells containing the same unreconciled value (). You can Search for match for the correct Wikidata identifier and if no value matches your case, click on Create New Item.
  8. Once a column of your table is reconciled to Wikidata, you can pull data from Wikidata, creating other columns in your dataset. If there are multiple claims for a given property, the values will be grouped as records in OpenRefine: they are stored in additional rows where the original reconciled column is blank. OpenRefine's record mode might therefore be more suitable for the later transformations you want to carry out on your table.
Dataset Augmentation with Schema edit
 
Dataset Schema
 
New issues show the problems in the Schema or Records automatically
 
Preview of the Wikidata Items

 

  1. Click Wikidata on top right of the screen.
  2. Click Export Schema.
  3. Schema page would be opened, add the required items relevant to your dataset.
  4. New issues would be created in the Issue tab. Check the issues for the problems and make changes accordingly in the record and schema.
  5. Click Save Schema .
  6. You can view the items in the preview tab before uploading to see how the dataset will appear as the Wikidata edits and inspect them manually.

Uploading to Wikidata edit

 
Open Refine export to Wikidata
  1. Click Wikidata on top right of the screen.
  2. Click Upload edits to Wikidata .
  3. A dialogue box appears with your dataset, write briefs words of the edit in the edit summary.
  4. Click Upload

Things to consider edit

  1. Installation:Make sure that the latest version of Java is installed on your computer. Download Java here: https://www.java.com/download/.
  2. Data quality: If you create new properties, make sure to check the pre-existing items on Wikidata, the identifiers may have different spelling or case due to which the existing items does not show up during reconciliation process. You can Search for match for the correct Wikidata identifier or proactively find those identifiers and map them to Wikidata items to ensure the prevent the creation of duplicate items.
  3. After completing working on the schema, analyze and fix any issues raised automatically before exporting to Wikidata to avoid errors in the uploading process also, to avoid the possibility of important information relevant to the items missing in the exported dataset.
  4. This learning pattern can be useful when there are simple set of datasets that need mass upload and integration in Wikidata project, such as bibliographical database, author biodata, etc. For other type of datasets, you may wish to consult:https://github.com/OpenRefine/OpenRefine/wiki/User-Guide.

When to use edit

This tutorial can be useful when there are simple set of datasets that need mass upload and integration in Wikidata project, such as bibliographical database, author biodata, etc.

  • When uploading a batch of dataset to Wikidata (e.g. Artist, Photographer, Institution, License) and/or
  • When flexibility and control over descriptions is needed.
  • For GLAM projects: in order to create or mass import/export the meta data to Wikidata.
  • This software can be downloaded for Windows, Linux and Mac.

Endorsements edit

See also edit

Related patterns edit


References edit