Grants:IdeaLab/Automated good-faith newcomer detection

IdeaLab beaker and flask.svg
Automated good-faith newcomer detection
Build and deploy a machine learning model for flagging newcomers who are editing in good-faith. This has the potential to mitigate some of the secondary, demotivational effects when good-faith newcomers' work passes through curation/review processes.
Hex icon with lightning white.svg
idea creator
this project needs...
Hex icon with hand black.svg
Hex icon with hexes black.svg
Hex icon with circles black.svg
Hex icon with bee black.svg
community organizer
Hex icon with lightning black.svg
created on23:10, 29 February 2016 (UTC)

Project ideaEdit

What is the problem you're trying to solve?Edit

Recent changes patrolling, review and warning templates have become common to curation practice in large wikis. Research suggests that when good-faith newcomers interact with these quality control processes, they tend to be demotivated and frustrated by the lack of personal human interaction[1]. But efficiency is a serious concern. For example, English Wikipedia is edited 160k times per day. Reviewing edits at that scale requires that we minimize the amount of time that patrollers need to spend on each edit (and therefor, the contributing editor). This is why warning templates and high-speed automation have become so popular. Having a real human interaction with every new editor whose contribution was rejected would require an overwhelming additional investment of time and energy.

What is your solution?Edit

We know how to predict which new editors are likely to be editing in good-faith.[2] If we could make the predictions of such a model available as a service, automated quality control tools like Huggle could incorporate the prediction signal such that, when a Huggle user reverts a newcomer who is likely to be editing in good-faith, they are interrupted and encouraged to take a little bit of extra time to have a personal interaction.

Further, such a model could feed into systems like HostBot that route good-faith newcomers to newcomer support spaces like the Teahouse.

There's many potential use-cases for such a prediction model. By releasing a prediction model for good-faith newcomers, we could empower a whole class of new tools directed towards support good-faith new editors and to help target them for socialization & training (as opposed to rejection and warning).

Damage prediction scores. A histogram of STiki scores for main namespace edits by newcomers is plotted for undesirable (0) and desirable (1) newcomers.
Theoretical model. Two beta distributions are fit based on empirical observations of STiki scores for undesirable (0) and desirable (1) newcomers


Get InvolvedEdit

About the idea creatorEdit

I'm a Senior Research Scientist at the Wikimedia Foundation. I've personally performed research studies around the retention of newcomers, the quality of new article creations, and the design of machine learning tools for supporting curation practices.



  • endorse we need to move away from automatic reversion based on source, and move to assessment and rejection based on quality of edit. there are some semi-automatic welcomes, but automating this process will free up the humans to do other tasks. Slowking4 (talk) 15:30, 1 March 2016 (UTC)
  • Critical to editor uptake. Any tools which allow identification of wheat from chaff a good idea Casliber (talk) 02:51, 3 March 2016 (UTC)
  • Endorse; I think the idea could even be expanded to guiding detected newbies, but this is a good start. {{Nihiltres |talk |edits}} 16:54, 8 March 2016 (UTC)
  • Not entirely clear on how the detection will work in practice, but if it works, it will be quite valuable Sphilbrick (talk) 14:52, 15 March 2016 (UTC)
  • I really like the idea as it causes more newcomers to stay on the site. Muchotreeo (talk) 15:19, 16 March 2016 (UTC)

Expand your ideaEdit

Would a grant from the Wikimedia Foundation help make your idea happen? You can expand this idea into a grant proposal.

Expand into an Individual Engagement Grant
Expand into a Project and Event Grant
  1. m:R:The Rise and Decline
  2. Halfaker, A., Geiger, R. S., & Terveen, L. G. (2014, April). Snuggle: Designing for efficient socialization and ideological critique. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 311-320). ACM. PDF