Research:Peer Production and the Urban-Rural Divide
Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources of geographic knowledge for humans and intelligent technologies. We explore the effectiveness of the peer production model across the rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find that in both Wikipedia and OpenStreetMap, peer-produced content about rural areas is of systematically lower quality, is less likely to have been produced by contributors who focus on the local area, and is more likely to have been generated by automated software agents (i.e. “bots”). We continue to explore and codify the systemic challenges inherent to characterizing rural phenomena through peer production as well as discuss potential solutions.
We undertake a quantitative examination of trends in the quantity, process of creation, and quality of geographic Wikipedia articles. We focus our analyses on geographic articles in English Wikipedia that are located in the contiguous United States (i.e. articles with latitude and longitude coordinates such as Chicago), but also examine geographic articles in Chinese Wikipedia that are located in China. We aggregate all of our metrics to the county level in the United States and precinct level in China. We use spatial regressions to explore the relationship between these metrics and the urban-rural divide while controlling for several other important socioeconomic status indicators such as income and race. We examine the following metrics (see the paper for descriptions of each metric):
- Quantity: # of articles per capita
- Quantity: article length (bytes) per capita
- Quantity: outlinks per capita
- Process: % of tokens that are produced by editors with a "local focus"
- Process: # of tokens that are produced by editors with a "local focus" per capita
- Process: % of tokens that are produced by human editors (i.e. not bots or scripts)
- Process: # of tokens that are produced by human editors per capita
- Quality: % of articles that are C-class or higher
- Quality: # of articles that are C-class or higher per capita
- Quality: outlink entropy
- Fall 2015: examination of high-level trends in quantity, process, and quality metrics across urban-rural divide.
- Spring 2017: further exploration.
Policy, Ethics and Human Subjects ResearchEdit
It's very important that researchers do not disrupt Wikipedians' work. Please add to this section any consideration relevant to ethical implications of your project or references to Wikimedia policies, if applicable. If your study has been approved by an ethical committee or an institutional review board (IRB), please quote the corresponding reference and date of approval.
We have been conducting a quantitative data analysis of Wikipedia dump files and so have not needed an IRB as we have not worked directly with any Wikipedians or examined any private data.
The high-level results from our first exploration are as follows:
- Peer-produced content about urban areas is of higher quality than that of rural areas, with this difference persisting across two countries with very different human geographies as well as two communities (we also studied analogous metrics in OpenStreetMap in both the United States and China)
- We find that bot and batch-editing tools play a critical role in ensuring that there is any content at all about some rural areas.
- Through our multivariable spatial regressions, we identify that these biases in quantity, process, and quality also appear to be related to political leanings, education, and profession.
Johnson, I., Lin, Y., Li, T., Hall, A., Halfaker, A., Schöning, J., and Hecht, B. Not at Home on the Range: Peer Production and the Urban/Rural Divide. Proceedings of CHI 2016. New York: ACM Press.
- Johnson, I. L., Lin, Y., Li, T. J. J., Hall, A., Halfaker, A., Schöning, J., & Hecht, B. (2016, May). Not at home on the range: Peer production and the urban/rural divide. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 13-25). ACM.