Research:Trending articles and new editors

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


When a Wikipedia article or more likely its subject, gets high attention, more people than usual start editing Wikipedia. I will be analyzing how different is the behavior of those editors who started contributing to Wikipedia by editing a trending article, from those who started with non-trending articles.

Topic edit

  • Is it less or more likely for an editor to be retained if he/she started contributing to Wikipedia by editing a trending article? (RQ3)

Process edit

  1. Gather bursts in page views using the Wikistats' hourly page view counts [1], and store them in the tuples of (article title, revision number, time).
  2. Extract a new editor table that contains every editor's first revision from the Wikilytics datasets.
  3. Classify the new editor set into 'trending' and 'non-trending', checking if their first revisions fall into the burst set.
  4. Analyze differences of 'trending' and 'non-trending', in terms of retention, total number of edits, etc.

Results and discussion edit

Preliminary results edit

I extracted the edits made in trending articles when they were trending, and calculated the percentages of new registered editors, new IP users, IP users with many contributions, and others. Below is the table to show the difference in the distributions of edits in trending articles and all [2] articles.

The results indicate that trending articles did not attract many new registered editors. The number of edits made by new registered editors was actually lower than that number in averaged case.

As an additional observation, I found that trending articles edited more than 30 times frequently than usual. Although the increase is most clearly seen in the number of 'new' IP users , [3] the distribution of new/old and registered/non-registered users did not change largely. Note that there is a large percentage of edits made by old registered editors in trending articles, despite the fact that the chart above even excluded any edits made when the article was semiprotected (i.e., when the article can be edited only by registered users with a certain edit history length).

Definitions edit

  • Trending : An article is trending when its page view count in the last hour surpasses (3 * (linear-fitting prediction of the page view based on the record of the previous 2 hours)).
  • New: An edit is counted as a new editor's edit if the edit made within 30 days since the editor's first edit.

Datasets edit

  • Trending table summarizes edits made in trending articles with the following columns:
    • title
    • page_id
    • redirect?
    • pageview timestamp (in date and hour)
    • predicted pageview
    • actual pageview
    • trending hours (the duration of the continued trending hours)
    • surprisedness (percentage of the increase from the prediction to the actual page view count)
    • revision
    • revision timestamp (in date, hour, min and seconds)
    • user type (registered user, bot, anonymous user)
    • username
    • editcount (editcount until the revision timestamp)
    • new user? (whether the user had 30 days editing history as of the revision)
  • Trending-and-nontrending table
    • has the same columns as the above Trending table, except for 'predicted pageview', 'surprisedness' and 'trending hours' filled with a dummy value.

Future work edit

I will be analyzing if I can find special behaviors in new editors who joined Wikipedia by editing, by looking at page view counts earlier than 2010 and contribution histories until now. Although the number of new editors are not different whether their first articles are trending or not, the style of their future contributions might be different.

Toolserver keeps old page view counts in tswiki:User-store. This dataset enables us to explore how the editor participated trending articles were different in one year, in terms of activity (retention), contributing area etc.

See also edit

References edit

  1. Wikistats aggregates the page views of Wikipedia articles in an hourly basis for recent months.
  2. For computational efficiency, I examined the articles contained Domas's page view counts only, and discarded 99.99% at random.
  3. Note that some 'new' for IP users can have long editing experience, but cannot be seen as 'old' editors because of IP address change.