Research:Exploration on content propagation across Wikimedia projects

20:19, 4 February 2020 (UTC)
Duration:  2020-05 – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


  • Understand the cross-pollination of content across Wikimedia projects.

Problem definitionEdit

In this research we want to understand how content propagates across different languages in Wikipedia. As a unit of study we use the sitelinks on Wikidata Items, meaning that we consider the subset of Wikidata Items that has at least one article associated to any Wikimedia project. We start by evaluating once an sitelink is created in one language what is the most probable next language that will propagate to. Our hypothesis is there is a relation between the creation of items in different languages. For example, if an item exists just in one language it might not propagate to more projects, but if the item already exists in 5 languages it is more likely to appear in a new project. Moreover,this probability may also depend on how related languages (as proxy for cultures) are, for example if an item has sitelinks to Spanish, Catalan, and Portuguese, is more likely to appear later in French than in Chinese?


  • Being able to model content propagation across Wikis would be useful to potentiate the flow of high quality content and also to prevent the cross-pollination of mis/disformation.


Check the results for the first round of analysis.