Archived notes from Etherpad of the 2020-10-01 meeting of the Wikibase Community User Group.

Schedule edit

  • 16:00 UTC, 1 hour, Thursday 1st  October2020
  • Online meeting on Google Meet: https://meet.google.com/hch-trhy-trv
  • Join by phone: https://meet.google.com/tel/hch-trhy-trv?pin=4486380681555&hs=1

Agenda edit

  • Three people talk about their Wikibase data import workflow
  • Q&A after each presentation
  • Open discussions if theres more time left: Lessons learned, challenges, success stories, etc with all participants.
  • Topic and schedule of the next meeting

Participants (who is here) edit

  1. Mohammed (WMDE)
  2. Benjamin Bober (ABES)
  3. Jens Ohlig (WMDE)
  4. Johannes Hentschel (DCML@EPFL)
  5. Daniel Nations (Nat'l Inst. for Materials Science, Japan)
  6. Lozana Rossenova (Rhizome)
  7. Okko Vainonen (National library of Finland)
  8. Jim Hahn (University of Pennsylvania Library, USA)
  9. Paul Duchesne (NFSA/FIAF)
  10. Jeroen De Dauw (Professional.Wiki)
  11. Jarmo Saarikko (National Library of Finland)
  12. Magnus Sälgö Stockholm, Sweden just a WD volontaire salgo60
  13. Michelle Pfeiffer (Centre National de Recherche Archéologique, Luxembourg)
  14. Jose Luis Ambite (University of Southern California)

Notes edit

  • introductions
  • 29 people
  • Lozana takes it away.
Presenting OpenRefine integration for custom Wikibase instances
  • Overview of what you have to do if tou want to connect  your custom Wikibase system to OpenRefine.
  • OpenRefine have not yet released an official version that supports custom wikibase extension, for those eager to try it, you can do so if you run OpenRefine from source (by cloning the master branch repository on Github and running through your terminal)
  • First you need to set up a reconciliation point to your Wikibase (link in google doc below)
  • You also need to set up a manifest for your Wikibase (see also link in google doc below)
  • You also need to create a Special tag in  your Wikibase for Openrefine v3.5 (see also notes in google doc below)
  • The presentation did not cover the workflow to upload your own data into Wikibase via OpenRefine,becauseits basically the same process as with the existing Wikidata extension
  • See link to tutorial in google doc below.
  • What the presentation demoed during the meeting is a workflow how to download data from Wikidata, thenuplaod back into your own Wikibase via OpenRefine
  • This workflow can work the other way round, too, i.e. contribute data to Wikidata from your own Wikibase.
  • When selecting data think of what is useful to your own wikibase, e.g. a good rule of thumb would be data that is valid in any domain such as cities and countries, institutions, etc.
  • Steps for workflow can aso be followed from Lozana's google doc below
  • First run a query on Wikidata (catipal cities and countries)
  • Download CSV from that query
  • Data needs to be cleaned (historical empires and such are included, bur probably not needed in your own Wikibase, so can be deleted)
  • When you create a new project in OpenRefine, you need to reconciliate the data in against the correct endpoint, so  select the service  that you want (i.e. your own) from the "add service" option during reconciliation.
  • Reconcile data and choose "create new item" bulk action for anything that does not yet exist in your Wikibase
  • Make schema with new labels for capitals and countries (see screenshot in google doc below)
  • Finally upload it to your Wikibase!
  • Consilator gets mentioned: https://github.com/codeforkjeff/conciliator
  • Q. is it possible to import multiple languages at time
    • Lozana shouldnt be a problem. The label and description services in Wikidata offer multiple languages, so the ones needed could be selected during querying via the Wikidata label service.
  • Q. How do you deal with the speed of openrefine ? If we want to do it at scale ? How many entities per cycle/ min.
    • Tip: Reconciling gets faster when you use several columns as added rules for reconciliation
Paul tells us about his experiments
  • works as a data analyst. works with association of film archives in Wikibase.
  • take a set of data and construct an ontology using traditional Linked Open Data tools, do some kind of federation
  • Data set used was the Greatest films of all times by Sight and Sound
  • https://github.com/paulduchesne/sight-and-sound
  • Uses Jupyter
  • List gets polled from AFA website
  • Pairing up films with IMDB and Wikidata
  • been doing all these in python similar to what currenty showing here
  • ontology contrsuction in Protégé https://protege.stanford.edu/ , built to match the data we have
  • Visualize ontology with Karma https://usc-isi-i2.github.io/karma/
  • Export RDF from Karma
  • resort to leverging the Wikibase api and ingest RDF into Wikibase
  • 40k things to write, took about 12 hours to run
  • statements include individual who have voted for a film
  • built two other things on top of Wikibase:
  • notebook that visualises year and country of films, interface built with https://github.com/voila-dashboards/voila
  • federate between wikibase and Wikidata by matching country labels
  • Q: did you run into trouble
  • would be interested to speed test it against of or wikidata integrator
Presentation from Olaf
  • https://database.factgrid.de/wiki/User:Olaf_Simons/sandbox
  • have a wikibase for historians
  • our users are very different.
  • still using Google Spreadsheets to input data because some have no experience with data and are happy to see that we'er able to transfor tm the dat and put it into a databse
  • various kinds of data entry people, some very professional, some need a lot of handholding
  • looking at the spreadhseet, we use the vertical lookup
  • a lot of the names are not unique and have to looked up before entry (e.g. father and son with the same name), we use vertical lookup in Google Spreadsheets for that
  • to create object we use the csv input as a fast way to put table in to the machine,
  • but text inputs with quotation marks are difficult to work with
  • another thing, if you come from data you retrieve in wikidata, if you want to have geo codes you have to be careful
  • the coordinates have the reverse format when you put them in because they have coordinates
  • also the problem of January first data.
  • what you'd need is query to check if its just a year or a month format
  • Andra asking if people know about https://gsuite.google.com/marketplace/app/wikipedia_and_wikidata_tools/595109124715
  • you could pre-process or merge but that depends
  • try to ensure that we dont run into merge at a later stage.
  • Q: what is the good seasrch for data that we can immediately fill with the right information?
  • a wikibase serch option where we have an output from wikibase thatweill me you an out
Openrefine 3.5 will include the wikibase reconciliation extension

Lozana google doc to follow:https://docs.google.com/document/d/1pn6adYjmgBrYqWDNVes0VfWSrxNPuMGX35_haBWGNIo/edit?usp=sharing

I made a video of the session for my colleague who was unable to get into the call.  Would it be a bad idea to share it? -Daniel N.

I'll just put it up for now and if someone tells me to take it down I will.

<link to video removed for now

Thanks :-)

paul sharing his dataset - https://github.com/paulduchesne/sight-and-sound

Olaf's presentation - things to look at: https://database.factgrid.de/wiki/User:Olaf_Simons/sandbox

Suggestions for next meeting edit