Research:Prioritization of Wikipedia Articles/Recommendation/Newcomer Tasks

Newcomer Tasks is a module rolled out to various Wikipedia language editions with the aim of providing structure/support for new editors. As part of the module, editors are shown various cards that each represent a given Wikipedia article and action they could take to improve the article -- e.g., add a link, add references, copyedit.[1] These recommendations are not prioritized but are subject to various filters as described below.

The nature of these filters makes simulating the pipeline difficult but thankfully the Newcomer Tasks pipeline is well-logged so it's possible to gather high-quality data about the relevant filters, which recommendations are seen (and by whom), which recommendations are clicked on (and by whom), and which edits actually are generated. Full analysis can be found here: https://github.com/geohci/wiki-prioritization/blob/master/recommendation_evaluation/newcomer_tasks/NewcomerTasks.ipynb

Newcomer Task Filters edit

The first set of filters are task-related. Each article recommendation is associated with a task. Only certain articles on a given wiki will be associated with tasks -- either because they have a certain template or were identified by a machine learning model. This subset of articles with tasks is not a representative sample of that wiki but it would be complicated to ascertain how exactly its biased so we will rely on logs of task impressions to identify the impact of these various task-related filters. Editors can select which tasks they are interested in -- e.g., just copyediting and link recommendation -- but any correlation between task filter and content equity is largely happenstance. That said, generally these task filters remove higher quality articles and focus attention on stubs, which on many wikis can display different biases than the overall wiki.

The second set of filters are user-selected (and of more interest to this project). Each editor has the opportunity to identify specific topic areas of interest. These topic areas are very directly connected to content equity. Some are very explicit (geographic topics or the women topic) but most others are still implicitly connected to content equity ranging from e.g., sports (which greatly overrepresents men) to the music topic which has a much more balanced representation of gender identities.

Not all editors use these topic filters so it's possible to divide the data into topic-filtered tasks and tasks that were not topic-filtered. This shows the general impact of making topic filters available. Individual topics can then be inspected to determine which ones increased bias and which ones supported more equitable content.

Newcomer Task Results edit

For gender, the topic filters led to a slightly more equitable distribution of content (generally a few percentage points). There were no clear correlations between stages of the funnel (impression -> click -> edit) and gender equity, suggesting minimal selection bias by the users with regards to gender. This mirrors findings from SuggestedEdits. Out of the topic filters available, the women and arts topics faciliated the largest impact on gender equity, the culture and history topics largely supported the status quo (though some sub-topics of those have different impacts), the sports and STEM topics skewed content even more towards men, and geographic topics had mixed effects. Notably, there were no topics that directly supported transgender and non-binary identities and thus we did not see major numbers of edits to these biographies.

For geography, we saw slightly different results. Topic filtering substantially biased interactions to content about the United States, United Kingdom, and Japan (generally this would be viewed as decreasing the diversity of content though in some languages, these are not the dominant regions). This bias is a strong artifact of the current approach to topics -- ORES topics are only calculated for English-language articles and then propagated to other languages and thus restrict any topic-filtered content to content that also exists on English Wikipedia. Language-agnostic models would presumably not have this effect.

Geographic biases also seemed to grow larger with each step of the funnel (impression -> click -> edit). It's interesting that this effect shows up for geography but not gender. This suggests that while some editors are selective about what regions/cultures they edit about, editors are generally less selective about the gender of the subject. Put another way, this suggests that interventions to improve content about women or non-binary gender identities have a higher chance of broader adoption than interventions to diversify the geographic representation of content. Put another way, this suggests that the (lack of) diversity of the editor population may be a larger barrier to the geographic diversity of content than it is to gender equity.

Notes/References edit

  1. Some of these tasks are more structured while others are based on templates for the page but less guided in the specific changes to be made. See e.g., en:MediaWiki:NewcomerTasks.json for an example of which templates are used for which tasks.