Talk:Wikimedia Enterprise

(Redirected from Talk:Okapi)
Latest comment: 17 days ago by CReynolds (WMF) in topic Hugging Face dataset release

The following Wikimedia Foundation staff monitor this page:

In order to notify them, please link their username when posting a message.
This note was updated on 09/2024

Public access to on-demand API

edit

So on-demand API seems to slowly evolve into a very useful API on its own. As such I see it as a great addition to current Mediawiki API and REST API. It is modern and it combines multiple data sources so you can make one request instead of multiple requests you might have to do to "traditional" API endpoints.

Moreover, besides combining existing data it also processes data a bit and normalizes it. That is great. Furthermore, it shares schema with public HTML dumps available to the community.

Based on all that I would like to ask if the on-demand API could be provided to a wider community on same terms as Mediawiki API and REST API are? So without SLA, as best-effort, with rate-limiting, but without pricing (and maybe even without having to register at all).

I know about the 4th bullet point under "free" at https://meta.wikimedia.org/wiki/Wikimedia_Enterprise#Access, but I still think that it would be great if the API would just be provided, as open resource, like other two APIs are provided. I think it is great that its development (and operation?) can be financed by enterprise use, but that API itself is useful beyond just enterprise use. It could also be used for bots, academics, research, and in general to make it easier to access and disseminate knowledge accumulated in Wikipedia projects. That is the mission of Wikimedia Foundation.

Mitar (talk) 23:20, 3 January 2024 (UTC)Reply

Hi Mitar, this is just a placeholder message to note that your question has been seen. I'll make sure you're notified when a reply is published. LWyatt (WMF) (talk) 00:09, 4 January 2024 (UTC)Reply
Dear @Mitar, I am glad you are finding value in the on-demand offering. With regards to your request for availability in line with the WMF public APIs, the purpose of this API has always been for it to be differentiated from the existing APIs, as its purpose is specifically to serve a distinct customer-base of “large scale users” (as defined in the movement strategy). Consequently, the design with authentication is meant to have monitoring on the usage of our APIs, so we can charge based upon that high volume of usage (again, as defined in the movement strategy). That said, its focus is for usage at high-volume/speed, and not ‘’only’’ for commercial users. Although the two often overlap, there are also many commercial users of the existing APIs (and database dumps, and scraping…), just as there are also non-paying users of the Enterprise services. Apart from the 4th type of free access that you mentioned (on request for those whose needs are not met by the other methods), have you investigated use via the 3rd option listed? That is, via the WMCS. This is what I also suggested a couple of months ago on your Phabricator ticket about this topic: phabricator:T298437#9317244. The only level of authentication you need there is to access WMCS once in the system you can access the APIs just as you would be able to access the public APIs.
On a broader note, the WMF proposed project of the API Portal project could, eventually, become a one-stop-shop for all APIs - including this one. But it is not a current WMF engineering priority (see also the 2022-23 WMF Product & Technology annual plan). This is why we have a variety of free-access models available for different use cases (including the ‘catch all’ 4th bullet point option) in the interim.
Sincerely, HShaikh (WMF) (talk) 21:31, 12 January 2024 (UTC)Reply
Yes, we discussed this at Phabricator and I agree that there are ways for me personally to gain access to the API. But here I really wanted to open a more philosophical discussion on how putting an API behind restrictions aligns with the primary mission of the movement and foundation. As you noted yourself, the focus is for high-volume/speed and that is what drives the development. But the API itself is useful as-is also for low-volume/best-effort use cases. If current implementation requires authentication, OK, then why not just provide to all registered Wikipedia users access with API key with low rate limits and no SLA? While use cases of enterprise users are driving the API design, we can probably all agree that the API is useful on its own for broader set of users? I am really thinking here about an artifact (code) developed to serve this API and data accumulated to back it up, why not open it up? I do not think it would impact commercialization of the API. Even more, it might even increase it as it would allow (for example) me to my open source library support for the on-demand API and have it be tested inside CI. Sure, I could setup some proxy through WMCS or request exception via 4th type of free access, but the point is that having easier access to the API can spawn an open ecosystem around it, which would then drive both the adoption and use of the API. My thinking here is that this could be done with minimal development work, that it is more of a decision question, "do we open this up or not". I have not really see many arguments against it (except for "it is not planned/design" and "maybe in the future"). So can we then agree that this is the goal and we just have to get there? Or are we not on the same page that this is the goal? Mitar (talk) 23:31, 13 January 2024 (UTC)Reply
Dear @Mitar,
I had misunderstood that we were talking about how to best and most practically support your use-case and technical needs - which is why I asked @HShaikh (WMF) to respond from a technical perspective, but I see now understand that the issue you wish to address is philosophical. To that point, you might be interested to read our philosophical essay on the role and purpose of an API targeted at high-speed and commercial users within the ecosystem and culture of Wikimedia. It was published it March 2021. It sets forth our belief that the creation of such a product is ‘’in support of’’ the mission/values of the movement, if done in the appropriate way. Relatedly, and later that same year, OpenFutures - an independent European advocacy organisation who lobbies for greater internet freedoms published an extensive blogpost specifically about Wikimedia Enterprise’s philosophy - where they make the point that a specifically commercial service ‘’helps’’ not hinders to increase the diversity of re-users of wikimedia content. It ‘lowers the bar’ and ‘levels the playing field’ (or as OpenFutures phrases it in that article: ‘lowers the playing field’). As they state, the main restrictions upon the freeflow of knowledge online these days is a question of “code, not law”. That is - merely being freely licensed is not sufficient in the 2020s to enabling re-use.
This is all by way of saying that YES, the Enterprise team has and does take the philosophical implications of the project seriously and attempts to deal with those issues while ‘’simultaneously’’ trying to build a practical, useful, desirable product.
To your specific request to “provide to all registered Wikipedia users access with API key with low rate limits and no SLA” - this is what the WMCS access effectively already is. It would be an inefficient use of resources to duplicate the explicitly community-facing services that WMCS offers just for this specific API, so we are pleased to provide access via that system - as one among many tools it offers.
Moreover, it’s safe to say we agree with your principled position of the desired end-result. We too would like to see a future where there’s a consistent and low-barrier API access method - regardless of who you are and what you need to do. So, when Wikimedia API portal is ready - the Wikimedia Enterprise API will be part of it. Consistent, easy login, and interoperable services. In that eventual future, if a user consistently requires A LOT of API calls, and needs an SLA, and a dedicated customer-support etc.etc. - then they can pay for it. If they don’t, then they don’t. That would be much easier from the end-user perspective and would also break down the arbitrary barrier between a “commercial” and “non-commercial” users/uses. In the meantime though, until that portal is running, we’re trying to solve for - and preemptively provide for - ‘’pragmatic’’ access options today. We don’t want to “let perfect be the enemy of good” by waiting. So, in the mean time, and for this API service, we’ve gone to efforts to ensure that there are diverse solutions that are viable for all use-cases that we’ve come across. LWyatt (WMF) (talk) 19:16, 15 January 2024 (UTC)Reply
Thank you for the answer. I agree with points and references you made and I am glad to read that there is already an idea (Wikimedia API portal) which is similar to what I am also hoping for. Then let's wait for that. Thank you. Mitar (talk) 11:56, 17 January 2024 (UTC)Reply

2022-23 financial report

edit

Today the 2nd annual financial report for Wikimedia Enterprise was published on the diff blog, here:

https://diff.wikimedia.org/2024/01/10/wikimedia-enterprise-financial-report-fiscal-year-2022-2023/

The first, published in February 2023, covered the calendar year 2022 which was the first year of operations for the project. With this report we are now able to bring Wikimedia Enterprise reporting into sync with other Wikimedia Foundation financial reports – which cover the fiscal year of 1 July – 30 June. Consequently, for this time only, there is a 6 month overlap of the period covered by these two reports as this is the first full fiscal year that Wikimedia Enterprise reported revenue.

Summary:

For the fiscal year 2022-23, both monthly revenue and average monthly expenses increased slightly compared to the previous report. The increase in revenue was due to an increased customer base, notably Yep.com, while the increase in expenses was mainly related to expanding the team. However, due primarily to a one-time accounting expense in May for previously-capitalised development costs, this report shows a net loss of $756k. Projections for the fiscal year 2023-2024 are that profitability will more than offset this reported loss.

Further details are available in the report itself.
If you have specific questions relating to this report, please add them here. I'd be happy to schedule a public call/meeting as we have done in previous years, if there's sufficient interest. LWyatt (WMF) (talk) 21:08, 10 January 2024 (UTC)Reply

@LWyatt (WMF) Thank you for publishing the report. I am interetested in a public call. Maybe other people are also interested in it. Something new to me in this report was the rewrite of the software for Wikimedia Enterprise. How much of the software was rewritten to make it possible to do an one-time depreciation. To the other questions I had before I got an answer in the report and it was great that you mentioned a customer of Wikimedia Enterprise. I am interested in storys where customers of Wikimedia Enterprise tell a bit about how they reuse the content. Hogü-456 (talk) 21:59, 10 January 2024 (UTC)Reply
Hi Hogu. Yes, if there is interest from a number of people for a call we can schedule it. To your specific questions:
- The causative-relationship of the Version 1 software replacement/rewrite and the one-time accounting deprecation is the other way around. It was not that we wanted "to make it possible" to do the accounting this way, and rewrote some code in order to justify it. Instead, we needed to replace code after learning from our first year in operation and respond to customer needs (mainly, to make the system more able to work efficiently at a larger scale) and so, due to that that technical change, this accounting deprecation was required. This is in accordance with the rules of accounting practices in America [specific rules listed in the footnotes of the report].
- We too are interested in being able to tell public stories about what our customers are doing with this service, and to learn from those stories ourselves. We have a "news" page on the website which will be the host of those stories. When our customers are ready to tell their stories, that link is where you'll see them published (and I'll put a note here on this talkpage to alert people about it). As independent organisations in a competitive market it is up to them when, what, and how-much, they are willing to share publicly about their operations.
LWyatt (WMF) (talk) 12:53, 11 January 2024 (UTC)Reply

Quarterly product update

edit

For those interested in following technical updates of the Enterprise software, I’ve just published the 2024 - Q1 product update on our MediaWiki page. Sincerely, AMuller-WMF (talk) 20:07, 28 March 2024 (UTC)Reply

Hi again. I've just added our most recent quarterly 2024 - Q2 product roadmap update on Mediawiki. Sincerely, AMuller-WMF (talk) 21:02, 27 June 2024 (UTC)Reply

Enhanced Free API Accounts

edit

Wikimedia Enterprise just launched upgrades to our free accounts. All accounts now come with 5,000 On-Demand API requests per month and twice-monthly HTML snapshots. Previously, accounts only had 10,000 lifetime requests as a ‘trial’ and HTML snapshots were updated monthly. With this update, it ensures that many community use-cases of the Enterprise APIs can be built upon the service too. More details about these updates and SDK improvements can be found in this blogpost. CReynolds (WMF) (talk) 17:27, 23 September 2024 (UTC)Reply

Hugging Face dataset release

edit

Wikimedia Enterprise Snapshot API (HTML dumps) have added beta Structured Contents endpoints (blog post on that here) as well as released two beta datasets (English and French Wikipedia from Sept 16) from that endpoint to Hugging Face for public use and feedback (blog post on that). CReynolds (WMF) (talk) 17:29, 23 September 2024 (UTC)Reply

Return to "Wikimedia Enterprise" page.