Wikibase Community User Group/Meetings/2020-10-01/Notes

Archived notes from Etherpad of the 2020-10-01 meeting of the Wikibase Community User Group.

Schedule

16:00 UTC, 1 hour, Thursday 1st October2020
Online meeting on Google Meet: https://meet.google.com/hch-trhy-trv
Join by phone: https://meet.google.com/tel/hch-trhy-trv?pin=4486380681555&hs=1

Agenda

Three people talk about their Wikibase data import workflow
Q&A after each presentation
Open discussions if theres more time left: Lessons learned, challenges, success stories, etc with all participants.
Topic and schedule of the next meeting

Participants (who is here)

Mohammed (WMDE)
Benjamin Bober (ABES)
Jens Ohlig (WMDE)
Johannes Hentschel (DCML@EPFL)
Daniel Nations (Nat'l Inst. for Materials Science, Japan)
Lozana Rossenova (Rhizome)
Okko Vainonen (National library of Finland)
Jim Hahn (University of Pennsylvania Library, USA)
Paul Duchesne (NFSA/FIAF)
Jeroen De Dauw (Professional.Wiki)
Jarmo Saarikko (National Library of Finland)
Magnus Sälgö Stockholm, Sweden just a WD volontaire salgo60
Michelle Pfeiffer (Centre National de Recherche Archéologique, Luxembourg)
Jose Luis Ambite (University of Southern California)

Notes

introductions
29 people
Lozana takes it away.

Presenting OpenRefine integration for custom Wikibase instances

Overview of what you have to do if tou want to connect your custom Wikibase system to OpenRefine.
OpenRefine have not yet released an official version that supports custom wikibase extension, for those eager to try it, you can do so if you run OpenRefine from source (by cloning the master branch repository on Github and running through your terminal)
First you need to set up a reconciliation point to your Wikibase (link in google doc below)
You also need to set up a manifest for your Wikibase (see also link in google doc below)
You also need to create a Special tag in your Wikibase for Openrefine v3.5 (see also notes in google doc below)
The presentation did not cover the workflow to upload your own data into Wikibase via OpenRefine,becauseits basically the same process as with the existing Wikidata extension
See link to tutorial in google doc below.
What the presentation demoed during the meeting is a workflow how to download data from Wikidata, thenuplaod back into your own Wikibase via OpenRefine
This workflow can work the other way round, too, i.e. contribute data to Wikidata from your own Wikibase.
When selecting data think of what is useful to your own wikibase, e.g. a good rule of thumb would be data that is valid in any domain such as cities and countries, institutions, etc.
Steps for workflow can aso be followed from Lozana's google doc below
First run a query on Wikidata (catipal cities and countries)
Download CSV from that query
Data needs to be cleaned (historical empires and such are included, bur probably not needed in your own Wikibase, so can be deleted)
When you create a new project in OpenRefine, you need to reconciliate the data in against the correct endpoint, so select the service that you want (i.e. your own) from the "add service" option during reconciliation.
Reconcile data and choose "create new item" bulk action for anything that does not yet exist in your Wikibase
Make schema with new labels for capitals and countries (see screenshot in google doc below)
Finally upload it to your Wikibase!
Consilator gets mentioned: https://github.com/codeforkjeff/conciliator
Q. is it possible to import multiple languages at time
- Lozana shouldnt be a problem. The label and description services in Wikidata offer multiple languages, so the ones needed could be selected during querying via the Wikidata label service.
Q. How do you deal with the speed of openrefine ? If we want to do it at scale ? How many entities per cycle/ min.
- Tip: Reconciling gets faster when you use several columns as added rules for reconciliation

Paul tells us about his experiments

works as a data analyst. works with association of film archives in Wikibase.
take a set of data and construct an ontology using traditional Linked Open Data tools, do some kind of federation
Data set used was the Greatest films of all times by Sight and Sound
https://github.com/paulduchesne/sight-and-sound
Uses Jupyter
List gets polled from AFA website
Pairing up films with IMDB and Wikidata
been doing all these in python similar to what currenty showing here
ontology contrsuction in Protégé https://protege.stanford.edu/ , built to match the data we have
Visualize ontology with Karma https://usc-isi-i2.github.io/karma/
Export RDF from Karma
resort to leverging the Wikibase api and ingest RDF into Wikibase
40k things to write, took about 12 hours to run
statements include individual who have voted for a film
built two other things on top of Wikibase:
notebook that visualises year and country of films, interface built with https://github.com/voila-dashboards/voila
federate between wikibase and Wikidata by matching country labels
Q: did you run into trouble
would be interested to speed test it against of or wikidata integrator

Presentation from Olaf

https://database.factgrid.de/wiki/User:Olaf_Simons/sandbox
have a wikibase for historians
our users are very different.
still using Google Spreadsheets to input data because some have no experience with data and are happy to see that we'er able to transfor tm the dat and put it into a databse
various kinds of data entry people, some very professional, some need a lot of handholding
looking at the spreadhseet, we use the vertical lookup
a lot of the names are not unique and have to looked up before entry (e.g. father and son with the same name), we use vertical lookup in Google Spreadsheets for that
to create object we use the csv input as a fast way to put table in to the machine,
but text inputs with quotation marks are difficult to work with
another thing, if you come from data you retrieve in wikidata, if you want to have geo codes you have to be careful
the coordinates have the reverse format when you put them in because they have coordinates
also the problem of January first data.
what you'd need is query to check if its just a year or a month format
Andra asking if people know about https://gsuite.google.com/marketplace/app/wikipedia_and_wikidata_tools/595109124715
you could pre-process or merge but that depends
try to ensure that we dont run into merge at a later stage.
Q: what is the good seasrch for data that we can immediately fill with the right information?
a wikibase serch option where we have an output from wikibase thatweill me you an out

Openrefine 3.5 will include the wikibase reconciliation extension

Lozana google doc to follow:https://docs.google.com/document/d/1pn6adYjmgBrYqWDNVes0VfWSrxNPuMGX35_haBWGNIo/edit?usp=sharing

I made a video of the session for my colleague who was unable to get into the call. Would it be a bad idea to share it? -Daniel N.

I'll just put it up for now and if someone tells me to take it down I will.

Thanks :-)

paul sharing his dataset - https://github.com/paulduchesne/sight-and-sound

Olaf's presentation - things to look at: https://database.factgrid.de/wiki/User:Olaf_Simons/sandbox

Suggestions for next meeting

Talk:Wikibase Community User Group/Proposed meetings