Research:Peer Production and the Urban-Rural Divide

Created

13:53, 27 January 2017 (UTC)

Contact

Isaac Johnson

Northwestern University

Collaborators

Yilun Lin

Northwestern University

Toby Li

Carnegie Mellon University

Andrew Hall

University of Minnesota

Aaron Halfaker

Wikimedia Foundation

Johannes Schöning

University of Bremen

Brent Hecht

Northwestern University

Duration: 2015-March – ??

Open access
via www-users.cs.umn.edu ^[1]

Research:Projects

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources of geographic knowledge for humans and intelligent technologies. We explore the effectiveness of the peer production model across the rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find that in both Wikipedia and OpenStreetMap, peer-produced content about rural areas is of systematically lower quality, is less likely to have been produced by contributors who focus on the local area, and is more likely to have been generated by automated software agents (i.e. “bots”). We continue to explore and codify the systemic challenges inherent to characterizing rural phenomena through peer production as well as discuss potential solutions.

Methods

We undertake a quantitative examination of trends in the quantity, process of creation, and quality of geographic Wikipedia articles. We focus our analyses on geographic articles in English Wikipedia that are located in the contiguous United States (i.e. articles with latitude and longitude coordinates such as Chicago), but also examine geographic articles in Chinese Wikipedia that are located in China. We aggregate all of our metrics to the county level in the United States and precinct level in China. We use spatial regressions to explore the relationship between these metrics and the urban-rural divide while controlling for several other important socioeconomic status indicators such as income and race. We examine the following metrics (see the paper for descriptions of each metric):

Quantity: # of articles per capita
Quantity: article length (bytes) per capita
Quantity: outlinks per capita
Process: % of tokens that are produced by editors with a "local focus"
Process: # of tokens that are produced by editors with a "local focus" per capita
Process: % of tokens that are produced by human editors (i.e. not bots or scripts)
Process: # of tokens that are produced by human editors per capita
Quality: % of articles that are C-class or higher
Quality: # of articles that are C-class or higher per capita
Quality: outlink entropy

Timeline

Fall 2015: examination of high-level trends in quantity, process, and quality metrics across urban-rural divide.
Spring 2017: further exploration.

Policy, Ethics and Human Subjects Research

It's very important that researchers do not disrupt Wikipedians' work. Please add to this section any consideration relevant to ethical implications of your project or references to Wikimedia policies, if applicable. If your study has been approved by an ethical committee or an institutional review board (IRB), please quote the corresponding reference and date of approval.

We have been conducting a quantitative data analysis of Wikipedia dump files and so have not needed an IRB as we have not worked directly with any Wikipedians or examined any private data.

Results

The high-level results from our first exploration are as follows:

Peer-produced content about urban areas is of higher quality than that of rural areas, with this difference persisting across two countries with very different human geographies as well as two communities (we also studied analogous metrics in OpenStreetMap in both the United States and China)
We find that bot and batch-editing tools play a critical role in ensuring that there is any content at all about some rural areas.
Through our multivariable spatial regressions, we identify that these biases in quantity, process, and quality also appear to be related to political leanings, education, and profession.

References

Johnson, I., Lin, Y., Li, T., Hall, A., Halfaker, A., Schöning, J., and Hecht, B. Not at Home on the Range: Peer Production and the Urban/Rural Divide. Proceedings of CHI 2016. New York: ACM Press.

↑ Johnson, I. L., Lin, Y., Li, T. J. J., Hall, A., Halfaker, A., Schöning, J., & Hecht, B. (2016, May). Not at home on the range: Peer production and the urban/rural divide. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 13-25). ACM.

[1] Johnson, I. L., Lin, Y., Li, T. J. J., Hall, A., Halfaker, A., Schöning, J., & Hecht, B. (2016, May). Not at home on the range: Peer production and the urban/rural divide. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 13-25). ACM.

[1]