Research:Measuring overall contribution of editors
The goal of this sprint is to define a new metric for measuring which editors add the most content to Wikipedia. So far, the main ways of determining the contribution of an individual editor are edit count, number of articles created, and whether articles have passed through assessment processes such as GA and FA. In this sprint we aim to measure the overall contribution using the text added to pages (in kilobytes) by editors, in order to better identify and recognize those Wikipedians who are active authors of the encyclopedia.
The main challenge will likely be filtering out the noise in the revision data (template additions, bots, page moves, script-assisted editing). It would be great if we could successfully separate this noise, as the measure could then be used as an alternative way to objectively determine the contributions of editors.
First, we will create a list of top contributors on Wikipedia by year and month. Depending on how cleanly we can separate the noise, we can then proceed to investigate how the distribution of contributions has changed over time. i.e.
- How does the life cycle of an editor look in terms of kb contribution? Does he contribute more at the beginning or towards the end?
- Has the group of editors that have contributed most of the content become smaller over the years?
- Have the dynamics of the top contributors changed over time?
Please add any interesting suggestions you might have.
Results and discussionEdit
- Results for this research project can be seen at Research:Wikimedia Summer of Research 2011/WikiPride