Wikimedia Foundation Annual Plan/2023-2024/Product & Technology/OKRs

Note – This page's translations will be maintained by the WMF every 3 months, until mid-2024:
AR, FR, PT, RU, ES, JA, UK, ZH, SW, HI

This document represents "part 2" of the 2023-24 Annual Planning process for the Wikimedia Foundation's Product and Technology departments. It focuses on the departments' draft "objectives and key results" (OKRs) for the 2023-24 Annual Plan. "Part 1" was an explanation of the draft work portfolios (nominally called "buckets") and the theory and planning behind this document.

Although this document is complete, it is intended that the Key Results, and their underlying hypotheses, will be incrementally to be updated throughout the 2023–24 Annual Plan year as lessons are learned.
Objectives v2 (v1) Key Result Explanation

WE1: Contributor experience

Support the growth of high quality and relevant content within the world’s most linguistically diverse, trusted and comprehensive free knowledge ecosystem by enabling and supporting high quality and accessible experiences.

Context: In order to focus on one thing we need to trade off another. We want to focus on supporting content and content moderators, mobile contributions, supporting online campaigns, and reducing IP blocks. In order to focus on those things we have to deprioritize in-person event support and new editor productivity (with the exception of the IP block KR).

Discuss

1. Increase unreverted mobile contributions in the main article namespace on Wikipedias by 10%, averaged across a representative set of wikis. This KR provides broad encouragement to promote mobile content editing, both through activities that support the other KRs (e.g. moderation and content coverage) and through activities primarily geared toward mobile contribution. Over the last three years, mobile web content contribution has increased by about 20%. If we can come up with ways to increase it by 10% more in just one year, that will be an acceleration over its natural rate of increase.

This KR is inclusive of both the web and apps. It is "averaged across a representative set of wikis" to ensure that we make improvements that are valuable for multiple wikis, and not just our largest ones. We will choose which wikis later on.

2. Complete improvements to four workflows that improve the experience of editors with extended rights (admins, patrollers, functionaries, and moderators of all kinds); extend their creativity; impact at least four different wikis, and meet KRs for each improvement set collaboratively with the volunteers. We don't yet know whether a specific goal of, say, reducing backlogs is, in fact, what these editors doing moderation work need from us. Ultimately, we want to use our resources to increase their satisfaction and increase their ability to build and manage workflows -- that's what "extend their creativity" is about: these community members have built amazing things, and some of the best ways we can help are when we enable that creativity through platforms, endpoints, templates, and other tools. The number "four workflows" is because of how many teams we might imagine working on this KR. And the number "four different wikis" is to encourage us to generalize our impact across projects where possible. This KR will then require us to work with the affected volunteers to set actual KRs for each given improvement, so that we can agree with them on when we've had impact. Note that work on workflows used by users other than the editors doing moderation can still improve their experience. For instance, work that causes newcomers to create stronger first articles may improve the burdens of those who patrol new articles.
3. One percentage point increase (YoY) in the portion of newly created or improved articles on high-impact topics with acceptable quality, per the “global quality score”, that are created or edited on Wikipedia, starting with underrepresented geographic regions and gender. As we learn more and establish baselines, this metric may be adjusted including adapting for normalization and/or adjusted to compensate for fluctuations. This metric focuses on the increase of acceptable quality content for high-impact topics. In particular, the target is to have an additional 1% of acceptable quality articles on select topics (starting with underrepresented geographic regions and gender) on Wikipedia, taking the previous year as a baseline. For example, of all newly created or improved articles in 2022/2023 that were on geographic regions and met the acceptable quality score, 28% focused on underrepresented regions. This means that, in 2023/2024, of all the newly created or improved articles on geographic regions that meet the acceptable quality score, 29% should focus on underrepresented geographic regions. Note that the goal has changed from a 10% year-over-year growth (as previously written) to a 1% year-over-year growth. This metric was changed to 1% YoY because it was determined as more achievable and realistic by data analysts at the Wikimedia Foundation, given recent trends in article growth on high-impact topics (with a specific look at content on underrepresented regions in Wikipedia).

This metric is intended to be useful to set direction but flexible for teams to define their strategy and tracking progress. Initiatives to achieve this target could focus on improving the quality of existing articles on selected topics or encouraging the creation of new articles for those topics. Teams can evaluate their impact by comparing the results for the quarter or the whole year with the same period one year before. Aspects such as quality and relevance are always difficult to measure. Article quality is determined by the global quality score, which is based on multiple parameters of the article, such as number of sections, references, and links. Similarly, article topics will be identified in collaborations with the teams working on the knowledge gap analysis and will use their datasets, starting with underrepresented geographic regions and gender.

This work aligns with the Foundation-wide content metric, touching on both quantity and quality—this means we could impact it both by generating new articles or by improving existing ones. "High-impact topics" is a concept from Movement Strategy Recommendation #8: "Identify Topics for Impact." Gender and geography are both topic areas that our movement has highlighted as having important content gaps and that our Research team is equipped to measure.

4. X% increase in the share of IP blocks that get appealed, with static or decreasing share of appeals that get unblocked. IP blocks are our movement's main tool for stopping abusers of our sites, but it has the unfortunate effect of blocking many users who are acting in good faith. This causes a particularly negative impact on new editors and on community programs. There is no reliable way to measure how many people are blocked erroneously, but we can approximate it through how many of them request an exemption (i.e. appeal) being blocked. A barrier to doing this, though, is that our appeal process is difficult for users to find and complete. Therefore, this KR attempts to guide us to improving the IP block situation on two fronts. First, it calls on us to make the appeal process clear for users, such that we would expect to see more blocked people appealing. And at the same time it calls on us to reduce how many erroneous blocks are happening in the first place by looking at the share of appeals that are unblocked. In other words, if we are able to block only exactly the right users, then we'll see very few of them getting unblocked. This KR may precipitate deep community and technical discussions about the nature of IP addresses and how we use them, and about the workloads and workflows of the functionaries who manage these processes. As we work with community members, we may discover that there are better ways to measure progress on the IP blocks issues, and we can refocus on other metrics. [See update note]
5. Enable a new Wikimedia community for building an open library of functions, Wikifunctions, that is capable of creating new forms of knowledge across Wikimedia sites. This KR captures the Foundation's strategic bet on creating Wikifunctions as a platform for communities to build, use, and maintain a library of functions. Wikifunctions will also form the technical foundation for Abstract Wikipedia, a project to enable the creation and maintenance of Wikipedia articles in a language-independent way. The ultimate objective of Abstract Wikipedia is to make knowledge more accessible and usable for everyone, regardless of their language or background.

WE2: Reading and media experience

Produce a modern, relevant and accessible reading and media experience for our projects.

Context: We want to focus on increasing unique devices, increasing internal discovery, and non-editing engagement. In order to do that, we have to deprioritize engagement with images and audio and inbound issues with accessibility. The KRs below reflect that focus as well.

Discuss

1. Ensure a quality reading experience for all users by adapting the default experience for 15% of pageviews, based on the individual needs and constraints of the user. This KR is focused on allowing the opportunity for our interface to adapt to individual needs when necessary. The theory here is that people will feel more engaged with a website and interface that can adapt based on their needs. This can include work such as dark mode, text and page density, and font size customizations. Some of this adaptation can be done automatically by the interface - for example, creating responsive versions of a feature or tool, or ensuring that dark mode turns on based on the browser or device settings of the user. In other cases, this adaptation can be done through intentional customization - allowing users to select non-default states in specific (but limited) cases. From an accessibility perspective, it will focus on the features that need to be built as standalone to allow for more accessibility, or to allow for setting defaults that are more accessibility friendly, while leaving the opportunity for customization to users who have different preferences. To set the specific number “15%“, we looked at how users adapt the default experience in the Wikipedia iOS app. 59% of users of the app are using a non-default theme (dark, black, or sepia). We used this number as a baseline, but factored in our assumption that it is more likely that habitual users of Wikipedia on the web take the time to adapt their reading experience, as opposed to sporadic users.
2. Interested readers will discover and browse more content, measured via a 10% increase in internally referred page interactions in representative wikis. This KR is focused on making it easier for interested readers to discover content by exploring different content discovery methods or entry points. The goal is to provide readers with these options in specific moments of their journey or after specific actions which indicate that they’re interested in learning more. "Page interactions", in this context, is inclusive of all the ways that a user can interact with content beyond just looking at a page (page previews are an example). "Internally referred" means that we'll only be counting those page interactions that happen after a user already starts their session on our property (i.e. excluding the first time they land on the site, which usually happens through a search engine referral).
3. Deepen reader engagement with Wikipedia via 0.05% of unique devices engaging in non-editing participation. This KR focuses on deepening reader engagement, while also exploring ways in which readers can contribute to our projects that are not editing pages. We hypothesize that there are people who are interested in getting involved with the wikis but for whom editing of any kind is too big of a leap. We want those people to have a way to get more deeply involved, perhaps becoming more committed readers, or eventually becoming comfortable enough to edit. "Non-editing participation" refers to any actions users can take on the wikis besides editing (we are also counting edits to discussions as 'editing'). While our websites don't have any of this, our apps do, in the form of reading lists or sharing content to social media. This work could include letting users configure their own personal reading experience, or could also focus on sharing content across the wiki, curating, and suggesting content to others. The KR is inclusive of work on the mobile and desktop websites and the apps. For mobile and desktop it may include the adoption of some non-editing participation functionality that exists on the apps. For the apps, it may include improving on existing functionality or building out new ideas. The number 0.05% is approximately the ratio of editors to unique devices -- so perhaps in the first year of this feature set, we see a similar ratio for non-editing participants, which would eventually increase to greater than the number of editors in the future. [See update note]
4. Improve web site performance for users in South America, starting with a reduction of p50 latency for users in Brazil by at least 100 ms This KR focuses on improving web-site performance in an under-served region. Research suggests that a significant and noticeable reduction in web site response time improves user engagement. As part of this KR, we will improve web site response times in South America, e.g. by deploying an additional cache site in the region. Although we anticipate the first significant impact to be measurable in Brazil by the end of the fiscal year, the entire region is expected to see significant performance benefits shortly after.

WE3: Knowledge Platform

Increase collaboration and efficiency among software developers by improving the development process for MediaWiki

Discuss

1. Reduce fragmentation in developer workflows, achieving 75% adoption of at least one officially supported developer tool in active use. The goal of this key result is to provide standard development tools that meet the needs of most Wikimedia developers. We also aim to be able to replicate production-like environments for a wider range of components at the development, testing and deployment stages. By accomplishing this, we will provide a better developer experience. This experience will allow engineers to onboard more quickly, assist each other when running into difficulties and deploy new features to production with greater confidence. This work is not intended to serve all developer workflows in the first year, but to make improvements in the areas that most impact developer productivity.
2. Increase by 20% the number of authors that have committed more than 5 patches across a specific set of MediaWiki repositories that are deployed to production. Increasing the number of people willing and able to contribute to the MediaWiki code base will make it less likely that a team gets blocked when changes to MediaWiki core are needed. It also makes it less likely that workarounds are created that add technical debt. In addition, this metric shows that the code base is becoming easier and safer to contribute to without unexpected effects.
3. Resolve and document 4 major points of technical strategic direction/policy/process. Product and Technology leadership has identified key areas where strategic direction is needed to increase the impact of technical work. Examples include defining an approach to support for MediaWiki outside Wikimedia and creating a policy for open-source software. Defining a strategic direction for these topics will mean increased efficiency and more cohesion in Wikimedia’s technical direction.
Objectives v2 (v1) Key Results Explanation

SDS1: Defining essential metrics

Each metric and dimension in our essential metric data set is scientifically or empirically supported, standardized, productionized, and shared across the Foundation.

Context: Effective use of metrics to make strategic decisions at the Foundation requires us to measure and assess the impact of work using a common, reliable, and well-understood set of metrics. Ensuring that different teams working on different projects are using the same metrics with the same definitions to understand the impact of their work will allow us to align efforts across the Foundation, with affiliates, and with communities. These metrics will allow Foundation staff and communities to evaluate proposals for programs and product features and to monitor and evaluate results. And they enable the engineers who support the tools used in data preparation and analysis to deliver a higher standard of service by more precisely defining the scope of their work, making the effort more tractable with our current resourcing. Data is only as useful as it is accessible to users. Our metrics must have maximum accessibility for us to maximize their utility to all audiences. We will gather, organize, and make available the necessary information to guide appropriate use and prevent misuse.

Discuss

1. For three out of the four core metric areas, provide at least 1 metric with documented adherence to essential metric criteria. This work requires that we identify and clarify criteria for essential metrics, and document the extent to which our core metrics meet the criteria. By doing this work, we will identify gaps and opportunities for continuing to improve our essential metrics.

Our starting assumption is that the requirements for essential metrics include that the metrics are scientifically or empirically supported and have a clear definition, calculation, data provenance, versioning, and data steward. As we begin to operationalize these metrics, we may identify a revised set of criteria that is more helpful in guiding metric selection and definitions.

4. Five annual plan initiatives engage with a core metric as a point of inquiry, to measure and communicate progress, or to inform the direction of resources. We can detect that leaders and staff understand how our core annual plan metrics connect to their work by observing that these metrics are influencing annual plan initiatives across the Foundation. The influence may vary from team to team and initiative to initiative, so we look for indicators that span across three levels of engagement with these metrics. Some leaders may engage with metrics as an observational tool and open up investigations to understand the relationship of trends to their annual plan initiatives. Others may use these metrics as a tool to widely communicate progress or the baselines surrounding their work, such as by incorporating the metrics in quarterly reviews or in annual plan engagements with the movement. In the most ideal case, leaders of annual plan initiatives directly use these metrics to guide and evaluate resourcing decisions, as demonstrated in internal or external communications about these decisions. By encouraging this multi-leveled uptake of our core annual plan metrics across a variety of annual plan initiatives, we move the entire organization closer to the ideal of using a shared set of metrics to guide coordinated work across the Foundation.

SDS2: Making empirical decisions

Wikimedia staff and leadership make data-driven decisions by using essential metrics to evaluate program progress and assess impact

Context: By using essential metrics to evaluate program progress and assess impact, we can ensure that we are making informed decisions that are backed by evidence. This allows us to stay focused on our most important goals, make adjustments as needed, and track our progress over time. In order to achieve this data-driven culture, we must start by codifying essential metrics and related processes into tools and outputs that enable key audiences to understand, evaluate, and explore data that is of high quality. This will mean not just investing in the development of tools to review metrics, but also investing in data infrastructure and quality solutions that will enable us to improve the accuracy, coverage, and timeliness of our data products. We will focus on two key areas:

  • Enabling senior leadership to make data-driven decisions by providing shared tools and data to inform their perspective. This work will include making the 3 core metrics available as data products, providing tools that allow relevant audiences to analyze and evaluate these metrics, and necessary investments in data infrastructure and existing data products.
  • Streamlining data generation related to our products and features that enable us to compare and run experiments with products. Experimentation allows us to learn quickly and helps us develop the right kinds of experiences with the community. As our product portfolio grows, we need to develop a strategy to systematically and transparently validate that our decisions and investments are moving us in the right direction to address the movement's goals.

Discuss

5. Four feature teams use shared tools to evaluate and improve user experiences based on empirical data from user interactions. Creating shared tools that feature teams can use to measure the impact of feature changes will improve our efficiency by reducing the effort required to create and capture measurements and make it easier to align those measurements to our core metrics.
6. Senior leadership can periodically use a shared tool to evaluate the Foundation’s progress against core metrics. As we align senior leadership around core metrics as part of Objective 1, we need to provide easy-to-use tools that enable them to evaluate and measure the Foundation’s progress against core metrics. This work explores how we can achieve this through static reports, data visualization tools, etc., as well as investments in data infrastructure and quality. For this year, our focus will be on serving core metrics.

SDS3: Using and distributing data

Users can reliably access and query Wikimedia content at scale

Context: Mechanisms that deliver data from our projects are critical to the successful development of on-wiki experiences as well as for the development tools, analysis of our projects, and other movement activities. We must be able to deliver data via data products that are reliable, sustainable, and scalable in order to meet the needs of free knowledge distribution, discovery, curation, and creation. A key area of focus for the foundation this year will be exploring how we can deliver a more sustainable knowledge graph solution that continues to support the continued acquisition and growth of knowledge content in a way that is sustainable and performant, while retaining access to existing content. For this year we will primarily focus on:

  • Exploring methods to improve WDQS query performance and reliability
  • Address potential risks of imminent failure associated with WDQS
  • Laying the groundwork for longer-term solutions to scaling a rapidly growing knowledge graph

Discuss

1. Wikidata knowledge graph can be reloaded within 10 days for a graph of up to 20 billion tuples. The underlying issue that we are trying to address is the medium term scalability and stability of Wikidata Query Service, which can hinder the ability to query Wikidata. The Wikidata Query Service runs on top of Blazegraph, and comprises 15 billion triples. The graph is currently growing at the rate of 1 billion triples per year. With the current size and growth of the graph, we are experiencing a number of scalability issues:
  • The reloading (rebuilding) of the graph from the Wikidata dumps takes more than 2 months. This is in part because the operation is long, however the time is extended because the reload crashes unpredictably once the graph reaches a certain size, requiring the process to be restarted
  • More frequent stability issues with WDQS
  • The queries are taking a longer time to run, with more frequent timeouts

The ability to reload the graph is a critical function in order to ensure data consistency and be able to recover from potential critical data issues. It is an indication of the stability and scalability of the system. Furthermore, the instability of the data reload process is directly linked to the size of the graph, in a similar way that the runtime stability of WDQS is linked to the size of the graph.

Objectives v2 (v1) Key Results Explanation Research

FA1: Describe multiple potential strategies

Through which Wikimedia could satisfy our goal of being the essential infrastructure of the ecosystem of free knowledge

Discuss

1. Participants in Future Audiences work are equipped with at least three candidate strategies for how Wikimedia projects (especially Wikipedia and Wikimedia Commons) will remain the “essential infrastructure of free knowledge” in the future, including the audiences they would reach, the hypotheses they test, and approaches for testing them. Before the Future Audiences bucket digs in to investigate possible future work, we want to lay out the different strategies that we'll be investigating, and think through the questions that need to be answered to detect their viability.

Commons community members have explicitly asked us to think about the strategy for the future of Commons -- this KR ensures that we do, but that it also fits in with the larger product strategy thinking of the bucket.

Wikimedia External Trends 2023 overview highlighted a number of changes to technology and user behavior in search and content creation that pose potential risks to our movement's sustainability. This track of work will be aimed at diving deeper into how our projects and communities can continue to thrive in the face of different potential future challenges.

Contact: User:MPinchuk (WMF)

FA2: Test hypotheses

To validate or invalidate potential strategies for the future, starting with a focus on third party content platforms

Discuss

1. Test a hypothesis aimed at reaching global youth audiences where they are on leading third-party content platforms, to generate ideas for products we can build on or off our sites, which can help increase their engagement with Wikimedia content as consumers and contributors. One of the strategic directions we're sure we want to investigate is around the spreading of free knowledge on other platforms, like YouTube, Instagram, etc. A tremendous amount of knowledge is consumed in these places for free, and we don't yet do anything to facilitate that, nor do we yet have theories on how to gain participants and revenue from those places. The language of this KR was updated on October 11, 2023 to make it clearer that the purpose of this KR is to spread Wikimedia content through third-party platforms, as opposed to the Wikimedia brand, and to make it clearer that this is in service of developing product ideas. As the content spreads, it is still important that attribution and branding spreads with it, so that Wikimedia can be sustained with editors and donors -- but that is not the primary orientation of this work.
  • 2022 Brand Health Survey looked at how Wikipedia is seen by different age groups. It noted especially low scores among 18-24 year olds in some markets (US, Germany, South Africa), who gave Wikipedia a negative Net Promoter Score. Per the survey: "This poses a high risk for the future of the project and the movement as a whole."
  • The New York Times reported on evidence that global youth are increasingly spending time on social apps and less time using traditional search engines (which typically bring the bulk of new audiences to our projects).

Contact: User:MPinchuk (WMF)

2. Test a hypothesis around conversational AI knowledge seeking, to explore how people can discover and engage with content from Wikimedia projects Another strategic direction we're sure we want to investigate is around conversational AI, a technology that looks like it will be transformative in the free knowledge ecosystem. Not all work using large language models and chatbots would fall in this KR; rather just that work that investigates conversational AI as a way to bring free knowledge to audiences that otherwise would not experience Wikimedia content.
  • Reuters reported that as of February 2023, 2 months after launching, ChatGPT had 100 million active users, indicating its large appeal and fast growth.
  • GPT-4 and other LLMs are now being used to power many new tools including search and content creation online. Many in our movement are interested in and concerned about how our work and projects can continue to thrive in a world of increasingly sophisticated AI tools.

Contact: User:MPinchuk (WMF)