Research:Outreach evaluation
This page documents a planned research project.
Information may be incomplete and change before the project starts.
Key Personnel
edit- Mani Pande
- Nimish Gautam
- Ayush Khanna
Project Summary
editThis project will be to evaluate outcomes of various WMF outreach programs in regard to user contributions and participation in various projects
Methods
editWe will be asking for self-reported information at outreach events, collecting this information, and then comparing various user contribution activities from the accounts of users at outreach events to determine the potential effectiveness of the event.
Survival Metric
editGiven that an article has revisions (where is the most current revision of the article available at time of performing the analysis) and the revision we're interested in is :
- A byte is considered significant if it is non-whitespace
- A byte is considered to have survived if it was put in by the user in revision , and persisted to revision
- The set of survived significant bytes for a revision is then
Survival is calculated for this given revision as
Note on reordering text: There's a small, static "bonus" added to the number of significant bytes if any reordering of text was detected in that revision whatsoever (for instance, paragraphs being moved around). The reordered bytes aren't counted otherwise.
Rationale
editWe want to be able to figure out the number of bytes a user has added or changed in a given set of revision differences, and we want to see whether those changes persisted, as an approximation of the community's judgement of the information being added as being of high quality. Although persistence is not always an accurate measure of quality, the chances of a given edit being high quality is higher if it has survived 1000 revisions, more so than if it has only survived 1.
Interpretation
edit- The ratio of survived significant bytes to edit count can aid in identifying users whose editing patterns consist of high-content, highly survivable edits
- The ratio of Survival to edit count can aid in identifying users with high-content, highly survivable edits with consistency over time.
- The ratio of survived significant bytes to bytes added can aid in identifying users who produce highly survivable edits in general.
- Ranges : still TBD
Known shortcomings
edit- Edits that occur in sections of articles or articles that are subject to time, such as a sports score. If a user puts in a score of 40, and soon afterwards the team scores 15 more points and the article now says 55, it will be seen as those bytes entered by the user did not survive. This is not a good approximation of quality, as the edit was of high quality.
- Reversions of vandalism. The edits will count an unfairly large number of bytes as having survived.
- Note: there are numerous methods to detect vandalism reversion, and in the code implementation there is room for use of these heuristics if they are needed
- Collaborative editing sessions. This can be remedied by looking at a group of collaborative editors as one unit.
Code
editCode that performs this analysis is available under the GPL on the Wikimedia SVN repository
Dissemination
editAll findings will be publicly available on a WMF wiki.
Wikimedia Policies, Ethics, and Human Subjects Protection
editBenefits for the Wikimedia community
editCommunity and foundation will be able to better gauge and use effective outreach practices
Timeline
edit(in-progress)