Research:Talk Sentiment Analysis: Editor Retention and Editing Prediction

Contact
Sergio Martinez-Ortuno
Lars Roemheld
Deepak Menghani
This page documents a completed research project.


We perform sentiment analysis on messages exchanged between Wikipedia editors in the user talk pages, to predict future user editing behavior. We found a reasonably well-performing model to predict the number of edits next week on a per-user level by applying the GBM algorithm, and we discuss the relatively limited impact our sentiment scores had for this model. Our findings could be used better engage editors, potentially resulting in better article quality.

Methodology edit

We collected the following data for users who registered to Wikipedia during 2013:

  1. Their full edit history during the first 30 days after registration.
  2. The full edit history of their UTP.
  3. The aggregate number of edits up until November 12th, 2014

Using this data we computed the following features:

  1. From the UTP messages received during the first 30 days after registration:
    • Total word count
    • Number of messages
    • Scores using Liu’s Sentiment Lexicon [5]
      • Positive
      • Negative
    • Sentiment scores using NRC Sentiment Lexicon [6]
      • Anger
      • Anticipation
      • Disgust
      • Fear
      • Joy
      • Negative
      • Positive
      • Sadness
      • Surprise
      • Trust

Using these features we trained several models to predict editor retention and editing behavior.

Results edit

Project Paper: Sentiment as a Predictor of Wikipedia Editor Activity