Research:Recommending links to increase visibility of articles

Tracked in Phabricator:
task T293030
Duration:  2021-August – ??
GearRotate.svg

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


In order to support newcomers in their first edits, the Growth Team has been developing the Structured Tasks framework. Structured tasks break down the editing process into smaller steps that are easily understood, easy to use on mobile devices, and can be guided by algorithms. The first structured task that was implemented was add-a-link, which has been deployed to 4 wikis (arwiki, bnwiki, cswiki, and viwiki). Results from those wikis have been encouraging (T277355) -- with only 6.2% of edits from recommended links being reverted. Therefore, we would like to implement other types of tasks that are part of editors’ workflows.

One idea is to further develop the structured task on adding links. The current add-a-link framework is simple (it suggests text and the link) and the priority is given the action of adding links rather than the value of the added link. Here, we want to add new incoming links to articles in order to increase their visibility. For example, there still exist many orphan-articles, i.e. articles without any incoming links, which cannot be reached from any other Wikipedia page. This is a much more difficult editing tasks, since we have to add the link to our target article to the text of a different article, the source article

The aims of the projects:

  • Understand better which articles require new links to address structural biases in Wikipedia
  • Develop an algorithm for a structured task to suggest links to orphan (or similar) articles to increase their visibility

BackgroundEdit

(Zhu et al., 2020) [1] show that improving an articles as part of campaigns can lead to significant, substantial, and long-term increases in both content consumption and subsequent contributions. More importantly in this context, they show that they find that there are also significant spillover effects in the increase in attention to downstream hyperlinked articles.

(Wagner et al. 2015; Wagner et al. 2016) [2][3] investigated the gender gap in the content of Wikipedia articles. They showed that in addition to an underrepresentation of women in the number of articles, there are also substantial structural biases in the way articles on women are connected in the hyperlink network. For example, women biographies are less central in the network quantified, for example, through their consistently lower values in PageRank. This results in lower visibility. (Langrock&Gonzáles-Bailón 2020) [4]systematically investigate how campaigns such as Art+Feminism are able to address these biases. They find that they are generally successful at improving the content of a target-page, but fail to improve the visibility (number of inlinks).

There is a {{Orphan}} maintenance template that tracks articles that are not linked from any other article (without incoming links). The category Category:Orphaned_articles lists these articles. As of 2021-08-20, there are about 90k articles listed in this category. The template mentions the Find link tool, though for a few examples I tried, it did not yield any suggestions.

Takeaways:

  • Incoming links are important for the visibility of articles
  • There are many articles that lack incoming links, either as part of a structural bias or because they are simply orphans
  • In contrast to other biases, existing campaigns are not as successful in addressing these biases
  • Machine learning algorithms can help empower editors to address these issues by generating good recommendations

MethodsEdit

Recommending links to increase the visibility of articles can be broken down into 3 steps:

  • Identify articles that are lacking incoming links (these are the target pages of the new links, for example orphan-articles)
  • Identify candidate article from which to link to these articles (these are the source pages for the new links)
  • Identify potential locations in text of the source page where to insert the link to the target page. This might be specific words, or sentences, or sections where we assume the link should be added. This is most likely very challenging as suitable anchor text for the link might not yet exist. Thus, adding the links will probably also involve adding some text.

TimelineEdit

  • Exploratory analysis
  • Developing a prototype model and evaluation
  • Potential refinement

ResultsEdit

Approach 1: Link-translation for orphan articlesEdit

As a first prototype, we consider a simple approach to this problem:

  • We restrict ourselves to orphan articles as target pages. Without any incoming links, those articles are not visible from within Wikpedia; thus adding any incoming link will increase their visibility.
  • We generate candidate links from inspecting all other language versions of Wikipedia. Specifically, we check whether there is an existing link to the target page in any of the other Wikipedias. If yes, we will identify the matching article in the corresponding language and recommend that link. This corresponds to "translating" an existing link from one language to another language version.
  • (optional) Recommend the translated section. Since we recommend existing links from other languages, we can recommend a suitable location for that link in the text. For example, we first identify the section-title where the already existing link is located. Using, e.g. the section alignment tool, we can identify a suitable section for the language of interest.

Detailed progress is captured in the Link-translation subpage

See alsoEdit

SubpagesEdit

Pages with the prefix 'Recommending links to increase visibility of articles' in the 'Research' and 'Research talk' namespaces:

Research talk:

  1. Zhu, K., Walker, D., & Muchnik, L. (2020). Content Growth and Attention Contagion in Information Networks: Addressing Information Poverty on Wikipedia. Information Systems Research, 31(2), 491–509. https://doi.org/10.1287/isre.2019.0899
  2. Wagner, C., Garcia, D., Jadidi, M., & Strohmaier, M. (2015). It’s a man's Wikipedia? Assessing gender inequality in an online encyclopedia. Ninth International AAAI Conference on Web and Social Media. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/viewPaper/10585
  3. Wagner, C., Graells-Garrido, E., Garcia, D., & Menczer, F. (2016). Women through the glass ceiling: gender asymmetries in Wikipedia. EPJ Data Science, 5(1), 1–24. https://doi.org/10.1140/epjds/s13688-016-0066-4
  4. Langrock, I., & González-Bailón, S. (2020). The Gender Divide in Wikipedia: A Computational Approach to Assessing the Impact of Two Feminist Interventions. https://doi.org/10.2139/ssrn.3739176