Learning patterns/Uploading bibliographical or artwork metadata using OpenRefine
What problem does this solve?
editUpload large set of bibliographical dataset such as literary works,author biodata or artwork metadata to integrate into Wikidata, as part of a content donation from an institution or some other project with advanced features such as reconciliation to avoid duplicate items. This learning pattern can be especially useful for the Wikidata projects such as Sum of all Art and Sum of all Authors.
What is the solution?
editInstall OpenRefine[edit | edit source]
edit- Before installing, make sure that the latest version of Java or JDK installed is installed on your computer. Download Java here: https://www.java.com/download/
- Download Pattypan from: https://openrefine.org/download.htmland save the
.dmg
file. - You need to have supporting Operating System on your device to run OpenRefine from Windows, Linux, macOS.
Create a spreadsheet to be filled in
Prepare your dataset to be uploaded in format of one of these formats: CSV file, Web Addresses (URLs), or Google Spreadsheet.
edit- (Note: It is essential to correct any spelling errors, in the names of the items to avoid creation of duplicate items).
Getting Started
editPattypan - selection of description fields
- Open OpenRefine file.
- Click
choose file
and navigate to a file on your computer or online link of website or google drive containing the dataset you would like to upload or simply paste the dataset into the clipboard. - Click
Begin
- On the top right, give the name of the project and Click
Start
- Make sure you are logged in with your username.
Reconciliation
editIn OpenRefine terminology, reconciliation is the process of linking free-text tabular cells to identifiers in knowledge bases. OpenRefine's built-in reconciliation capabilities make it a versatile tool to reconcile tabular data to a wide range of databases, including Wikidata. Use multiple columns in your dataset and match them against values of properties in Wikidata, which refines the reconciliation score and acts as a tiebreaker between namesakes.
- Click
Arrow
next to the column name. - Click
ReconcileStart Reconciling
. - Click
Start Reconciling
. - Select Wikidata (en). Alternatively, you can install a version of the Wikidata reconciliation service for your language. Open the reconciliation dialog and click Add Standard Service. The URL is
https://tools.wmflabs.org/openrefine-wikidata/pa/api
where "pa" is replaced by your language code. When reconciling using this interface, items and properties will be displayed in your language if a translation is available. - Select the type to reconcile each cell to an entity of one of the mentioned types:(example:human, written work or location) and/or choose the appropriate option from Reconcile against type: ; Reconcile against no particular type; Auto-match candidates with high confidence.
- Depending on the amount of data, the Wikidata reconciliation service processes about 3 rows per second for the process after which you will see that the reconciliation data in the cells.
- In the reconciled column, you will see either the cell was successfully matched: it displays a single dark blue link or a few candidates are displayed, together with their reconciliation score, with light blue links. You need to pick manually the correct one. For each matching decision you make, you have two options: either Click
match this cell only
(), or also use the same identifier for all other cells containing the same unreconciled value (). You canSearch for match
for the correct Wikidata identifier and if no value matches your case, click onCreate New Item
. - Once a column of your table is reconciled to Wikidata, you can pull data from Wikidata, creating other columns in your dataset. If there are multiple claims for a given property, the values will be grouped as records in OpenRefine: they are stored in additional rows where the original reconciled column is blank. OpenRefine's record mode might therefore be more suitable for the later transformations you want to carry out on your table.
Dataset Augmentation with Schema
edit- Click
Wikidata
on top right of the screen. - Click
Export Schema
. - Schema page would be opened, add the required items relevant to your dataset.
- New issues would be created in the Issue tab. Check the issues for the problems and make changes accordingly in the record and schema.
- Click
Save Schema
. - You can view the items in the preview tab before uploading to see how the dataset will appear as the Wikidata edits and inspect them manually.
Uploading to Wikidata
edit- Click
Wikidata
on top right of the screen. - Click
Upload edits to Wikidata
. - A dialogue box appears with your dataset, write briefs words of the edit in the edit summary.
- Click
Upload
Things to consider
edit- Installation:Make sure that the latest version of Java is installed on your computer. Download Java here: https://www.java.com/download/.
- Data quality: If you create new properties, make sure to check the pre-existing items on Wikidata, the identifiers may have different spelling or case due to which the existing items does not show up during reconciliation process. You can
Search for match
for the correct Wikidata identifier or proactively find those identifiers and map them to Wikidata items to ensure the prevent the creation of duplicate items. - After completing working on the schema, analyze and fix any issues raised automatically before exporting to Wikidata to avoid errors in the uploading process also, to avoid the possibility of important information relevant to the items missing in the exported dataset.
- This learning pattern can be useful when there are simple set of datasets that need mass upload and integration in Wikidata project, such as bibliographical database, author biodata, etc. For other type of datasets, you may wish to consult:https://github.com/OpenRefine/OpenRefine/wiki/User-Guide.
When to use
editThis tutorial can be useful when there are simple set of datasets that need mass upload and integration in Wikidata project, such as bibliographical database, author biodata, etc.
- When uploading a batch of dataset to Wikidata (e.g. Artist, Photographer, Institution, License) and/or
- When flexibility and control over descriptions is needed.
- For GLAM projects: in order to create or mass import/export the meta data to Wikidata.
- This software can be downloaded for Windows, Linux and Mac.
Endorsements
editSee also
edit- https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine
- https://www.wikidata.org/wiki/Category:Wikidata:Tools
Related patterns
edit- https://openrefine.org/
- https://github.com/OpenRefine/OpenRefine/wiki/User-Guide
- https://github.com/OpenRefine/OpenRefine/releases/