Wikibase Community User Group/Meetings/2021-03-25/Notes

Archived notes from Etherpad of the 2021-03-25 meeting.

Schedule

16:00 UTC, 1 hour, Thursday 25th March 2021
Online meeting on Google Meet: https://meet.google.com/nky-nwdx-tuf
Join by phone: https://meet.google.com/tel/nky-nwdx-tuf?pin=4267848269474&hs=1

Participants (who is here)

Mohammed (WMDE)
Georgina Burnett (WMDE)
Georgios Mavropalias
Giovanni Bergamin
Dennis Diefenbach (The QA Company) (https://the-qa-company.com/products/Wikibase)
Nils Casties (Germany, Hannover, TIB)
Laurence 'GreenReaper' Parry (WikiFur, Flayrah)
Seila Gonzalez Estrecha
Jeffrey Goeke-Smith
...27 people join

Agenda

Presentation by Enslaved.org team about how they are using Linked Open Data (LOD) with the help of Wikibase
- Seila Gonzalez Estrecha
- Jeffrey Goeke-Smith
Questions/Feedback

Notes

Presentation: https://drive.google.com/file/d/1XSK0iLB9sZw8M8Esgajm5XcZpWMRiW45/view?usp=sharing
Seila presents
- Historians collecting all this data, but hidden in silos. A database here, a spreadsheet there. bringing all of this data together presented a major difficulty
- Team wanted to describe the life of those enslaved, triple store seemed like a good option for this. Wanted to have a place to preserve and sustain data longer term
- Software done, could be further improved, but mostly done. Now importing data. The team had to think very broadly about the kinds of data they needed to enter.
- Provenance was extremely important, some of the difference source types: auction notice, census or register, life history, etc
- Modelled events as well, such as a specific voyage. example: 46 slaves embarked by only 39 disembarked the San Joaquin Voyage
- People as the core focus and a set of core fields to help model these people e.g. status, race, name, age etc
- A lot of unnamed individuals, but this didn't mean there wasn't information about them e.g. age, sex, and relations to others
- Gathered user questions to help build the ontology
- Built a controlled vocabulary, helped to ensure consistency across the platform. All subclasses have origin records.
- Built a specific ontology to suit their specific data set
Because place data is hard to represent we use place controllers we call buckets
We use shape expressions and JavaScript to validate if it conforms to our ontology. So when the data gets published its integrated into the hub
Explored a few different triple stores, but wanted to use Wikibase as they were able to attach qualifiers and references to each triple
Why Wikibase: Because for qualifiers we can apply references, so this was key to us
We liked that we can have a name and attached the name of the person that was recorded at this event and the source tell the name of this person is this… this was also key to us
- - Effort went into aligning own ontology with Wikibase ontology, article on it "the enslaved dataset: a real-word complex ontology alignment benchmark using wikibase"
https://daselab.cs.ksu.edu/publications/enslaved-dataset-real-world-complex-ontology-alignment-benchmark-using-wikibase
Didn't need a SPARQL query service, but instead use the Blazegraph API
QuickStatement was very slow at the beginning, so because they use PHP to make some improvements
- changed the front-end so they don't rely on Libraries
- added a new button for edit button which is used to handle errors
Questions
Q: Is it feasible to share back some of that QuickStatements changes back to the community?
- Keen to share it back. A lot of work was done to it, so need to check that its not tied to their specific set up. Use case at enslaved.org involves doing a lot of bulk imports.
- I see a fair number of loading questions come up on the Wikibase channel, so maybe some would find it useful, though not for myself.
Q: How many people involved?
- Somewhere around a dozen people, none working full time on enslaved.org. Centre for digital humanities and this is the latest project. 6 full time developers, not a lot more development work to do, now focussed on data import.
Q: how much data is ingested by bots and how many by humans?
- Roughly 800,000 items at the moment. ("797243 records from the historical slave trade") https://enslaved.org/
- Very closed environment, researchers themselves do not add the data but it first needs to be peer reviewed and then added
- the software stack is driven by humans who are doing the automation to [missed part] the edits.
Q: Can you speak about how you loaded your ontology into your Wikibase? What was that process like?
- Started developing an ontology, then starting using Wikibase.
- follow up Q: If you would do it again, would you start in Wikibase?
- Want to build templates that you could load into Wikibase
Q: what was your biggest challenge?
- A lot of challenges to choose from :D
- Challenges around Wikibase - wikibase docker containers did not exist, little to no documentation - this made it very difficult to technical set up the instance
- All historians record their data differently, as an engineer it is hard to try to help them integrate all this data