Wikibase Community User Group/Meetings/2021-08-26
Online meeting of the Wikibase Community User Group.
Schedule
edit- Thursday, August 26, 2021
- Google Meet (Join by phone)
- Notes on Etherpad
Agenda
edit- What are you doing around Wikibase?
- <Add your topics>:
- Joe Wass: We're interested in experimentally modelling the Crossref metadata in WikiBase, just to see if the quantity of data, and rate of throughput would even fit. I have some questions which I'm interested in discussing with the group. But only if appropriate.
- Is anyone interested in high throughput (like millions of Statements per day)? What's the highest rate of throughput that people have got through the API?
- Have people made any performance tweaks to WikiBase or docker configuration?
- Laurence: Wikmedia 2021 Community Village table discussions
- Federated properties not fully known by potential user community
- Wikimedia Deutschland working on maps, could be useful for pushing WBStack support
- Lozana: Small updates from TIB / NFDI4Culture
- Wikibase workshop in end of July (NFDI)
- Set of requirements compiled at the workshop will be shared w/ WMDE + WBUG
- Laurence (if time): Possibility of Wikibase for the ARM64 architecture (Mac M1, new ARM servers)
- Blockers:
- CirrusSearch ext needs to be able to use newer ARM64-compatible ElasticSearch 7.8+ if used, Wikibase docker dist/WBStack is starting to use it for case-insensitive search and related after the removal of the wb_terms table, and statement value matching.
- Right now Cirrus is stuck on 6.x: https://phabricator.wikimedia.org/T263142
- May switch to OpenSearch, which is new enough (7.10+), due to Elastic license change (task T280482
- WMF not building for ARM64 because they don't use it yet (issues: task T272500 task T274140 task T283073 )
- Java 8; Need to use Docker build in separate repo, but it is possible (someone built Blazegraph docker: https://hub.docker.com/r/robcast/researchspace-blazegraph - possibly uses https://hub.docker.com/r/arm64v8/openjdk/ which goes all the way back to Java 8)
- CirrusSearch ext needs to be able to use newer ARM64-compatible ElasticSearch 7.8+ if used, Wikibase docker dist/WBStack is starting to use it for case-insensitive search and related after the removal of the wb_terms table, and statement value matching.
- Blockers:
Participants
edit- Joe Wass, Crossref (user: Afandian)
- Panos Pandis, Crossref (user: cpp)
- Georgina Burnett (Wikimedia Deutschland)
- Laurence "GreenReaper" Parry (WikiFur, WBUG)
- Bayan Hilles (WMDE)
- Mohammed
- Lozana Rossenova (TIB / Rhizome)
- Giovannni Bergamin (AIB)
Notes
edit- Survey results - Wikibase/Wikibase Installation & Updating survey/2021
- invite link to Wikibase community on telegram: https://t.me/joinchat/WBsf9-C9KPuMZCDT
- Question on scale, worth looking into RaiseWikibase > https://github.com/UB-Mannheim/RaiseWikibase
- Also relevant to check the recent mailing list discussion > https://lists.wikimedia.org/hyperkitty/list/wikibaseug@lists.wikimedia.org/thread/RBPTOYYMMLIFYRSHEPNEXUGLXTJDJTCI/
- Joe Wass: hundred million operations to modify per day in terms of scalability
- Laurence: I would saw WMDE tend to organise things for their use case. A lot of people edits with bots. Editing throughput is the big deal performance-wise [the read user case is cached reasonably well], don't add qualifiers one at a time; consider data architecture, because data is stored in JSON blobs and the bigger the item with more statements, the more work has to be done to regenerate it each time. There is work being done on that, I am not aware where they are. [Related bug reports on phabricator: task T285987 task T275286]
- There is some stuff on the mailing list related to this:
- https://lists.wikimedia.org/hyperkitty/list/wikibaseug@lists.wikimedia.org/thread/RBPTOYYMMLIFYRSHEPNEXUGLXTJDJTCI/#IVX7GG57VU4XPKBMYF4SDH7Y7NBRQKSP
- https://lists.wikimedia.org/hyperkitty/list/wikibaseug@lists.wikimedia.org/thread/JVC6LOPLKEGUILYJT2GTSABUZ7R4VE3D/
- https://lists.wikimedia.org/hyperkitty/list/wikibaseug@lists.wikimedia.org/message/R5SSDANFD3URKUVWVEHCRB77V7BYWBPN/
- As lozana said there are efforts in the community to address this issue. See also https://wikitech.wikimedia.org/wiki/User:Addshore/Wikibase-Performance and https://addshore.com/2021/02/testing-wdqs-blazegraph-data-load-performance/
- There is some stuff on the mailing list related to this:
- Is this the ticket: task T287164 (Laurence: yes, related to the bulk load aspect)
- Lozana: we have another group called the wikibase stakholder group. it is a bit more institutional - we try to organise every session on a specific topic: https://wbstakeholder.group/members
- Mohammed: Joe do you mind letting us know what crossref is about?
- Joe: CrossRef is a DOI registration agency. we are a community association of scholarly publishers (15 thousand scholarly publishers) and help them cite their content. Every article of book has metadata registered through CrossRef. We interacted with the community over the past years. What we are interested in is now is looking how the wikibase can apply to our metadata model and if the scale of data is suitable to wikibase. interested in open ended exploration on how those two domains work. The wikidata community and Crossref community domain overlap a lot.
- Lozana: You might be interested in the Stakeholder group. It is big institutions solving big problems of trying to use Wikibase is some shape or form. We will have a kick off meeting in October to start core development about what we need.
- Panos: we have a question on how time is represented
- Lozana: EDTF - https://wikibase.consulting/wikibase-edtf/ as far as I know the new extension for date and time. This is the best for what all of the people in Stakeholder group are moving towards. developed by wikibase consultancy and Luxembourg authority project. it represents time in Wikidata in a more sophisticated way that is currently possible. I don't think it comes installed in the wikibase docker- unless I am wrong
- Laurence: I think there is a ticket (https://phabricator.wikimedia.org/T280656) to suggest that it gets integrated into Docker distribution. Right now the date in Wikibase is not sophisticated if you have a range of date or uncertainty or if there is a particular time in a day right now you have to add a qualifier. The EDTF extension can do most of what the standard can represent. There have been some performance issues, I added a ticket on that and there has been some work on that ( https://github.com/ProfessionalWiki/EDTF/issues/69 https://github.com/ProfessionalWiki/WikibaseEdtf/issues/16 https://github.com/ProfessionalWiki/WikibaseEdtf/issues/11 ). This is basically the issue some things are used for Wikidata, some things aren't and don't get work. [This was: if Wikidata uses/needs something, it has development focus from WMDE. Wikibase Local Media extension was mentioned as an example of work done by WBSG.]
- Lozana: the problem is also the tools around wikibase. right now I use OpenRefine and it does not work with the new data type extensions
- Laurence: It is the same thing for Cradle (https://github.com/wbstack/cradle / d:Wikidata:Cradle), which may be of interest to community users who lack institutional front-end development resources, or many users experienced in SPARQL. WBUG was initially set up for development and there is still interest in community members doing that. Stuff like EDTF might be beyond what this community does because it needs dev funding, and that is where the stakeholder group comes in.
- Lozana: we worked closely with WMDE for the EDTF extension. There were members from the Stakeholder group working with development team in WMDE like Adam to make sure that the extension is compatible.
- Laurence: MediaWiki is very solid now and Wikibase is getting there as well, but is about a decade behind in terms of dev effort. It has come a long way. I would suggest trying out EDTF via WBStack, which has it.
- Laurence: Overview of Wikimania. Wikibase had a table in the Community Village near the Wikidata table where people could drop by for a chat. A few people did. It went pretty well, you could meet people not already involved in the Wikibase community and find out what their issues are, though the way presentations held in the same 'building' interrupted that was problematic. One person had tried Wikibase before but did not know about federated properties. There was a request for example wikibase(s) that showcase new features. The suggested to me that this is maybe an area the Wikibase community can work on. I know Wikidata has a page to show people about interesting queries they can do d:Wikidata:SPARQL examples. Maybe WMDE want to help with this? I don't know if this is their area. They have done documentation but not examples. We need to say it is there, here is an example of this feature, here is what you can do with it (Wikibase Registry was meant to be this in part, but perhaps WBStack would be better). I ran into a member of WMDE's Technical Wishes group (asks community what they want). Currently (for the next two years) they have mapping support as a focus WMDE Technical Wishes/Geoinformation/Ideas. I raised the issue of maps in Kartographer which aren't possible on WBStack now, as it's outside WMF: https://github.com/wbstack/mediawiki/issues/69 - maybe they can push it now, since WBStack is WMDE-supported. Is there a WikidataCon this year?
- Mohammed: Just to go back to Wikimania there was not so many Wikibase sessions. It was mostly about Wikidata. This is something to work on with this community. Over the years Wikibase has been tagging along this year we want to have a solid Wikibase representation at WikidataCon [at the end of October - WikidataCon 2021 ]. much of that depends on the community. We are going to have a session called what happened with Wikibase.
- Lozana: As far as I know WikidataCon is a closed thing because it is not open to public. I went to the last one with Dragan because we are with Rhizome. It will be good to have some information about how open it will be? [Suggestion about Laurence or other community members attending.]
- Mohammed: WikidataCon is usually two days. This time first day will be formal session. The second day will be designed by the community for the community. There should be room for the community to come up with things they want to do.
- Lozana. When there is more clarity you can keep us posted
- George: This year is completely online so there is not limitations on who can attend
- Mohammed: Yes, what I referred to earlier was the programme.
- Lozana: Some small updates from my end. What happened at the end of July is relevant to this group. We did a Wikibase workshop for the NDFI. What came up in this workshop was an extensive list of requirements that we are collecting with FIZ that we will share with WMDE and don't mind sharing with this community: data upload, wikibase deployment and ontology.
- Laurence: Being part of the conference can be useful for us. Maybe we can do something for WikidataCon.
- Lorzana: [Mentioned this Wikibase job offer for TIB, previously posted on the mailing list and Telegram: https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details/job-advertisement-no-62-2021 ]
- Laurence: As I said on Telegram, this is a good development for the community. We have a consulting section on Meta [ Wikibase/Consultants and Support Providers ]
- We should likewise have a place where people can post job offerings. (On meta? Elsewhere?)
- Lozana: That would be a great Idea to have a job posting place. Was looking around for places to post it.
- Laurence: Have been looking into the possibility of Wikibase on ARM64 architecture
- Blockers noted in agenda above; see also Oracle Cloud's free tier Ampere A1 4-core 24 GB cloud server task T272500#7241084) [+ invitation to apply for 12-month $30k credit]
- Probably not as interesting for big institutions but for community groups and developing countries.
- Joe: It is possible to run Wikibase Docker on Mac [his colleague did it; possibly via architecture translation?]
- Laurence: I don't know about running the query service though. I saw a Blazegraph docker made by someone, but for a non-Wikibase (but similar) project using it. If you want to run a service you need the base architecture to install it. I don't think WMDE is working on this. This is something we can do in the community if there is interest, and look to WMF for funding if needed.
- Lozana: why don't you write this in a phabricator ticket to put it in the mailing list and in the telegram to have community support around it for WMDE to notice is and address it. They might not be able to do something about it for x reasons and then we know and can start to ask who from the community can do something about it.
- Laurence: I was planning something along similar lines, just wanted to see what this community thinks first
- Lozana: The stakeholder group might be interested in this as well
- Laurence: I have also got in contact with MKaur at WMF presenting my introduction and waiting for follow-up from her. I am working on the report so feel free to contribute on that (not much done this month due to Wikimania and other work). There are also discussions related to its content in talk and on Telegram.
- Lozana: Thanks everyone! Thanks Laurence for your work on the community admin side, will try to contribute more next month.