Research:Contribution Inequality

Contact
Giovanni Luca Ciampaglia
This page documents a completed research project.


Topic

edit

Has contributing to Wikipedia increasingly become an elite activity? Are contributions coming only from a restricted circle of editors, or is everybody more or less contributing the same amount? These questions have been already explored to some extent by the research community[1]. The inequality of contributions has been first studied by Ortega et al.[2]. Extending on their work, here we look at the inequality of contributions and use the Gini Coefficient and see how inequality is changing over time and across namespace. This will let us understand if certain activities are more or less open to everybody or not.

Process

edit

For each year we count how many edits each user did to each namespace and rank users in descending order of contributions. We can measure the amount of inequality by means of the Gini coefficient, a measure of inequality that is widely used in economics and the social sciences.

Results and discussion

edit

The plots below report the increasing inequality in the distribution of editor contributions by different namespaces. The distribution of contribution is known to follow an heavy-tailed distribution, perhaps a Power-law [3], so we expect high values of the Gini coefficient. One interesting thing to note is that even though the main namespace is more or less stable around a 90% inequality, contribution has become increasingly skewed for the two project namespaces (NS 4, Wikipedia, and NS 5, Wikipedia Talk), starting from values around 60% in 2001 up to more than 90% (and thus more than the main article namespace) around 2009. If we focus only on users with at least 10 edit (the minimun to be considered a Wikipedian by the community), and higher (at least 100 and 1000 total edit count, respectively) we get of course lower values of the Gini coefficient, since we are only considering the tail of the distribution, but still see the effect of increasing inequality for the project namespace.

 
Gini coefficient of the distribution of contributions for contributors with >= 1 edits by year and namespace
 
Gini coefficient of the distribution of contributions for contributors with >= 10 edits by year and namespace
 
Gini coefficient of the distribution of contributions for contributors with >= 100 edits by year and namespace
 
Gini coefficient of the distribution of contributions for contributors with >= 1000 edits by year and namespace

We can ask ourselves if the elite forms more or less a stable group or not. In order to quantify the amount of churn in the top contributors, we compute the set similarity, or Jaccard coefficient, of the top 100 (top 1000) contributors between one year and the next. The following plots show that one-year similarity has increased from about 20% (a fifth of the top contributors shared across one year and the next) to about 45%.

 
Group similarity of top 100 contributor by year and namespace
 
Group similarity of top 1000 contributor by year and namespace


References

edit
  1. Kittur et al., Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie, link
  2. Ortega et al. On the Inequality of Contributions to Wikipedia, DOI
  3. See J. Voss, Measuring Wikipedia