Wikimedia Foundation Medium-term plan 2019/Platform evolution
To become the essential infrastructure of free knowledge, we need to evolve our platform for vast extensibility, broad content sharing, high performance, ease-of-use, and low barrier to entry. Our communities and projects need to be able to remain relevant and competitive in an ecosystem in which machines create content, and our platforms must provide tools that allow all people to be both the creators and curators of knowledge.
The Platform Evolution priority encompasses improving and modernizing Wikimedia’s technical ecosystem to respond to a landscape where Artificial Intelligence is creating content, rich media dominates learning, content is structured, and collaboration tools work across multiple devices and have minimal technical requirements. This priority also enables growth in new markets by making contribution, curation, and collaboration tools more equitable, by focusing on providing both small and new communities with the same abilities to create and moderate content as the larger established projects.
This requires embracing techniques like artificial intelligence learning (AI) and deeper levels of automation in order to respond to the needs of our contributors and the need to innovate quickly. The integration of AI services will enable us to quickly identify and close content gaps, protect content integrity, and empower smaller community projects and languages to build on more mature wikis. For machine learning to be effective, the data that composes our content must become more structured, and we need to empower our contributors with tools to help them be effective and consistent in contributing and working with data.
Addressing content gaps also includes making it easier to incorporate rich media, which requires more storage and server power, and better tooling for editing, uploading, and incorporating more types of media. On the engineering front, better automation of the software release process through continuous integration, and a more intentional focus on code quality and testing will allow for more innovative and faster experimentation.
The ability to build modern experiences in a consistent manner requires updates to our server, network infrastructure and software development environments alongside the core software through which our readers and contributors interact with our projects. This includes the tooling and infrastructure that support the Wikimedia technical ecosystem, including Mediawiki and Wikibase, and the projects that provide the majority of content creation and consumption, like Wikipedia, Wikidata, Commons and Wikisource.
Investments in platform evolution will therefore target machine learning, structured data development, multimedia and interactive content capacity, server and network infrastructure, developer tooling and engineering productivity, and volunteer diversity.
Outcomes
edit1. Software platforms with integrated machine learning, rich media, and structured data components, and associated tooling for internal and external development and reuse of code and content.
Priorities supported: Platform Evolution, Worldwide Readership
- This outcome focuses on:
- the development of a robust AI infrastructure, consisting of APIs, tooling, data pipelines and other infrastructure to assist in surfacing knowledge gaps, automated language translation and other forms of knowledge creation, moderation, search and discovery. This infrastructure will also provide facilities to detect and correct algorithm biases and make Wikipedia, Wikidata and other project data available to train models created outside of our project spaces.
- expanded development of structured data tools to enhance data formatting and categorization capabilities for easier consumption by our machine learning pipelines behind the APIs, as well as by machine learning tools built by third parties.
- support for the integration and discoverability of rich content including video, audio, and interactive media, as well as the infrastructure to serve it with high performance, high redundancy, and low latency to all parts of the world.
2. Fully automated and continuous code health and deployment infrastructure.
Priorities supported: Platform Evolution
- With an early focus on improving engineering productivity for technical contributors, including Foundation staff engineers at the Foundation, this outcome encompasses:
- automating our code deployment pipeline and ensuring broad test coverage with tooling and practices that make it easy for volunteer and staff developers to deploy safe, healthy code;
- speeding up deployments and having greater confidence in the quality, performance, scalability and overall sustainability of our code base;
- making software high quality in a measurable way that ships with testing, analytics, monitoring, security and privacy built in, and;
- addressing architectural issues that will improve the modularity of our technology stacks, making it easier to maintain the health of our codebase as we continue to scale and maintain our projects.
3. Tooling for contributors is easy to use, well-documented, and accessible to users, increasing engagement and contribution.
Priorities supported: Platform Evolution, Thriving Movement
- This outcome is focused on the contributor experience and will provide:
- high-quality and accessible tooling for technical and content contributors, curators and collaborators;
- processes that remove hurdles to simplify code and content contributions into projects, and;
- a lower barrier to entry for new technical contributors and aspiring editors.
Metrics
edit1. Machine learning, structured data and rich media integration.
- AI tools and workflows are utilized against 25% of content. This output consists of two main components: successful completion of AI platforms and the usage of these platforms in content and creation consumption activities. This metric is designed to capture both aspects. More specifically, the MTP’s growth in editors, readers, and content will not be possible without AI platforms but the actual impact is difficult to estimate.
- 25% of content consumed or created uses structured data. This includes Wikidata but also extends to content from articles, templates, and other sources stored in formats that can be used programmatically for various contribution and consumption formats. Like AI, this metric incorporates both completion and usage and is critical for the completion of outputs in other priorities.
- A 25% increase in rich media content created and consumed across the projects. The Worldwide Readership outputs, specifically output 2 (Substantially extend our core product experiences) depend on having a robust rich media platform used across our projects as illustrated by this metric.
2. Engineering productivity and technical community indicators.
- 25% increase in code quality measured by automated measurement and profiling across our code base thought industry standard metrics.
- An increase in developer satisfaction measured by regular surveys and a 20% decrease in the number of outstanding code reviews.
- In-line with our metrics for our editing communities, a 5% increase in growth and retention of the technical communities, including from underrepresented geographies.