Research:Why readers trust Wikipedia

Tracked in Phabricator:
Task T198659

Created

23:07, 31 May 2018 (UTC)

Contact

Jonathan Morgan

Wikimedia Foundation

Collaborators

Andrea Forte

Drexel University

Houda Elmimouni

Drexel University

Isaac Johnson

Wikimedia Foundation

Duration: 2018-06 – 2019-10

Research:Projects

This page documents a completed research project.

As part of an ongoing effort to assess and improve the integrity of the knowledge captured in Wikimedia projects, this project involves research into the role that citations (references) play in helping Wikipedia readers achieve their learning goals—i.e. the ostensible reason that they have chosen to read a Wikipedia article in the first place.

Because trust in the credibility and correctness of information is necessary for learning, a major question for this research is what characteristics of Wikipedia articles—including, but not limited to, citations—factor into readers' credibility judgements.

Background

Previous research suggests that citations can help readers achieve their learning goals in at least two ways:

Increase trust by providing them with evidence that the information about a particular topic on Wikipedia is credible, and
Support exploration by providing them with signposts to additional information resources about that topic

However, it is not clear whether citations are a major factor in reader's credibility judgements—their decisions to believe, and act on, the information they read in Wikipedia—or whether other content, context, or structural factors are as or more important than extensive citation of reliable sources.

Learning more about the role that citations play in readers' experience of Wikipedia can help Wikimedia contributors focus and prioritize their sourcing efforts and help Wikimedia Foundation product teams build features and functionality to support readers’ learning goals and their digital literacy.

Methods

Literature review: This study will begin with a review of existing academic literature related to reader perceptions of the trustworthiness of web content (including Wikipedia content) and the ways that readers use Wikipedia content to learn in both formal and informal learning contexts.

Surveys: Subsequent to that analysis, the researchers will develop a survey instrument, based on the the one developed for Research:Characterizing Wikipedia Reader Behavior, supplemented with additional questions focused on gaining insight into the factors that mediate readers' perceptions of a Wikipedia article's quality and credibility. A link to the surveys will be deployed to a sample of readers on English Wikipedia, when they view articles, using the QuickSurvey extension.

Interviews: Ethnographic interviews will be conducted with survey respondents who indicate their willingness to be contacted for follow-up discussions. These interviews will help the researchers gain a deeper understanding into the factors that mediate a reader's trust of Wikipedia content, including but not limited to citations (e.g. when, how, and why they use citations). These interviews will also help tease out what characteristics of a Wikipedia reader or their context of use (information needs, available technology, profession, education level, sociocultural background) are associated with likelihood to engage with citations and the role of other article features that readers draw upon when evaluating the accuracy, neutrality, or usefulness of Wikipedia content. Study information for interview participants.

Timeline

July-September 2018: background research, study plan
October 2018 - January 2019: develop, deploy and analyze first survey on English Wikipedia; develop taxonomy of reasons for trusting Wikipedia content.
February - May 2019: deploy second-round survey on English and analyze data
July - December 2019: recruit for and begin conducting follow-up interviews
January - March 2020: analyze interview results; report and publish project findings

Results

First round survey

The survey ran on English Wikipedia from 2019/1/7 to 2019/1/9. We received 297 complete survey responses (425 responses total). Of the 359 responses for which we have platform data, 133 (37%) came from the desktop site and 226 (63%) came from the mobile site (en.m.wikipedia.org). Completion rate was slightly higher for respondents on desktop (74%) than mobile (69%).

Descriptive statistics

Q A.1Why are you reading this article today?
to get an overview of the topic	47%
to look up a specific fact or to get a quick answer	25%
to get an in-depth understanding of the topic	27%

Q A.2 *Prior to visiting this article today, how familiar were you with the topic of this article?*
Extremely familiar	8%
Very familiar	11%
Moderately familiar	29%
Slightly familiar	28%
Not familiar at all	22%

Q A.3 In general, how much do you trust the information you read on Wikipedia?
a great deal	40%
a lot	48%
a moderate amount	10%
a little	2%
not at all	< 1%

Q A.5 How much do you trust the information in the article you are reading right now?
a great deal	33%
a lot	61%
a moderate amount	5%
a little	1%
not at all	<1%

Trust taxonomy

The responses to the free-text "why do you trust [Wikipedia|this article|this specific fact]?" questions (Q's A.4, A.6, and A.8) were combined and analyzed to develop a taxonomy of factors that contribute to respondents' assessments of the trustworthiness of Wikipedia content. The taxonomy, and information about the process for developing these categories, can be found at Research:The role of citations in how readers evaluate Wikipedia articles/Trust taxonomy.

These categories were used to develop the trust component questions in the second survey (sections B-D).

Taxonomy of reasons for (dis)trusting Wikipedia articles (click to expand)

Prior experience: Assessments based on the reader's personal experience with the content of Wikipedia

Direct familiarity: Degree that info the reader was looking for, or other info in the article, matches their prior knowledge of the subject
Wikipedia familiarity: Degree that info on Wikipedia in general matches reader's prior knowledge

Citations and external links: Assessments based on the prevalence or characteristics of cited sources or other external links

Presence of sources: Whether the article contains (any) sources
Number of sources: How many sources the article has
Perceived authoritativeness of sources: the reputation or ethos of cited sources
Accessibility of sources: The degree to which the information in the article may be independently verified by checking the cited sources

Prose style: Assessments based on the textual characteristics or writing style of the article

Authoritative tone: Degree to which the tone of the article is professional or suggests expertise
Neutral tone: Degree to which the article contains biased or opinion-based language

Risk of incorrectness: Assessments based on the reader's judgement of the likelihood that this information could be wrong or misleading

Topic coverage: Perceived availability of reliable information on the topic in external sources
Atomic information: Degree to which the information sought is simple or unambiguous
Motivation for bias: Perceived likelihood that an unknown author would want to present wrong or misleading information on the topic
Potential for bias: Degree to which information on this topic could be presented in a wrong or misleading way

Article structure: Assessments based on the overall size, coverage, or structure of the article

Perceived comprehensiveness: Degree to which the article presents all relevant information on the topic
Raw size: The length of the article
Structural features: Visual or organizational elements of the article content or user interface

Wikipedia process: Assessments based on reader's prior knowledge (or beliefs) about how Wikipedia content is created

Open collaboration: Perceptions around the impact of low technical barriers to contribution and voluntary participation
Evidence of gatekeeping: Observations of specific indications that the article is actively monitored and moderated by people with decision-making authority
Transparency: Degree to which the reader believes they can inspect the article development history

Reputational measures: Assessments based on reputation or perceived popularity of Wikipedia

Popularity: Perceptions about how many people consume or contribute to this article
PageRank: Observation of the ranking of this article in search engine results pages
Hearsay: General perceptions about how much other people trust this article, articles on this topic, or Wikipedia as a whole
Specific incident: Indirect knowledge of specific incident(s) that influence credibility judgements

Other measures: Assessments based on no clear or specific criteria

Faith: Unquestioned belief in the trustworthiness of the content without supporting rationale
Common knowledge: Perception that the information in this article is widely or universally known and accepted

Second round survey

The survey ran on English Wikipedia from 2019/3/19 to 2019/3/22. 1419 people took the survey (answered at least 1 question), 807 people completed the survey (made it to the final page), and 522 answered all of the questions. 308 of those who completed the survey indicated that they would be willing to be contacted for further research, and left an email address.

The order of the trust component question sections (B, C, and D) was randomized to ensure a consistent response rate across the three sections.

Because of data logging errors, we were only able to record which platform the respondent used (mobile vs desktop site) for 1092 respondents. Of these, 380 (35%) took the survey on desktop, and 712 (65%) took the survey from the mobile site.^[1]

Descriptive statistics

Why are you reading this article today?

Prior to visiting this article today, how familiar were you with the topic of this article?

How much do you trust the information you are reading in this article?

*Q A.1 Why are you reading this article today?*
I am reading the article to get an overview of the topic	42%
I am reading the article to look up a specific fact or to get a quick answer	29%
I am reading the article to get an in-depth understanding of the topic	29%

*Q A.2 Prior to visiting this article today, how familiar were you with the topic of this article?*
Extremely familiar	8%
Very familiar	13%
Moderately familiar	29%
Slightly familiar	30%
Not familiar at all	21%

*Q A.3 How much do you trust the information you are reading in this article?*
A great deal	31%
A lot	42%
A moderate amount	22%
A little	3%
Not at all	<1%

Trust component questions - mean response score (higher score --> higher agreement)
I believe that this article...
article: contains accurate information	4.34
article: contains detailed and comprehensive information	4.1
article: contains an adequate number of references to external sources	4.06
article: contains references to high quality external sources	3.97
article: is written in a professional way	4.31
article: is written in a way that is clear and easy to understand	4.43
article: is well structured	4.37
article: has been written by many people	3.73
article: has been reviewed and corrected by many people	3.74
article: has been read by many people	4.1
article: is often a 'top hit' in search results related to the article topic	3.86
I believe that the people who write this article...
people: know a lot about the article topic	4.13
people: try to keep incorrect information from being added to the article	4.1
people: try to fix incorrect information when they see it	4.21
people: want the article to be neutral and unbiased	4.23
people: want to help readers understand how much to trust the information in the article	4.16
I believe that the topic of this article...
topic: has been written about in many other information sources (not just Wikipedia)	3.97
topic: is written about in a neutral and unbiased way in other sources I have read	3.93
topic: is a controversial topic^[2]	2.53

The overall high mean response value for answers to the trust component questions indicate that, overall, respondents agree that Wikipedia articles are trustworthy for these reasons. Future work could examine which, if any, of the trust components predict high overall trust assessments (question 3 above).

Top countries (20+ responses)
country	responses
United States	405
India	394
United Kingdom	76
Canada	51
Australia	38
Germany	31

The survey was taken by respondents from 105 different countries. The countries with over 20 responses were the US, India, UK, Canada, Australia, and Germany.

Analysis

Information need by country (top countries only). Segments represent percentage of total responses.

Trust in current article by country (top countries only). Segments represent percentage of total responses.

Trust in current article by predicted article quality (all respondents)

Information need by country
	quick answer	overview	in-depth
Australia	15	16	6
Canada	17	22	10
Germany	11	10	8
India	85	141	143
United Kingdom	21	37	16
United States	125	168	100

Respondents from different countries reported different patterns of information need (Chi χ=29.504, p=0.001, df=10).

Trust in current Wikipedia article by country
	not at all	a little	a moderate amount	a lot	a great deal
Australia	1	2	14	11	10
Canada	2	3	11	19	15
Germany	0	1	7	18	5
India	1	11	70	159	140
United Kingdom	1	1	19	36	19
United States	6	15	93	159	128

Respondents from different countries reported different degrees of trust in Wikipedia (Kruskal-Wallis H=12.827, n=977, p=0.025)

Trust and article quality
Respondents' self-professed trust in the information presented in the article they are currently reading (Q A.3) is significantly correlated (weak, positive) with the predicted ORES quality class of that article (Spearman's Rho 0.067, n=1312, p = 0.014).

Trust and familiarity
Respondents' self-professed trust in the information presented in the article they are currently reading (Q A.3) is not significantly correlated with their level of familiarity with the topic of the article (Q A.2) (Spearman's Rho 0.049, n=1381, p = 0.07).

Trust and information need
Respondents' trust in the current article is significantly related to their information need (Q A.1) (Kruskal-Wallis H=10.511, n=1350, p=0.005). If "information need" is treated as an ordinal scale, there is a weak positive relationship between increased trust and size of information need (Spearman's Rho 0.078, n=1350, p = 0.003).

Conclusions

Overall, respondents reported a very high level of trust in Wikipedia. 88% of respondents to the first survey reported that they trusted Wikipedia a lot or a great deal. 73% of respondents to the second survey reported that they trusted the information in the article they were currently reading a lot or a great deal (94% in the first survey^[3]). In contrast, less than 4% of respondents in the second survey reported distrusting the information in the current article to any degree. This reflects the findings from a reader survey commissioned by the Wikimedia Foundation in 2011.
Information need and topic familiarity responses are consistent. The distribution of responses to these questions were strikingly consistent across both rounds of the survey. They are also reflective of the response patterns to the 2016 Why we read Wikipedia survey.
There is no apparent relationship between overall reported trust and responses to the trust component questions. Although these questions were developed based on the reasons for (dis)trusting Wikipedia elicited from respondents to the first survey, their overall (mean) scores are remarkably consistent. The relatively high mean values (3.7-4.4) could be interpreted as reflecting the high overall trust people place in Wikipedia, but further analysis is needed to learn whether any of these features of an article, its topic, or the people who write it are particularly salient to the credibility judgements of readers.
There is a (weak) positive relationship between amount of trust and article quality. This suggests that (despite the ambivalent results in the individual trust component ratings), people do have a sense of the general quality of the article they're reading, and it factors into their credibility assessments.
Respondents from different countries have different information needs and levels of trust in Wikipedia. Among the top six countries, we found significant differences in responses to these two questions. Among these countries, Indian respondents indicated the most trust in Wikipedia, and Australians the least. These two countries also stood out from one another in terms of relative information need responses: Indian respondents: almost 60% of Indian respondents indicated they were reading the current article to gain an "in depth" understanding of the topic; by contrast, less than 20% of Australians reported an "in depth" information need, and 40% reported reading the article to learn a "quick fact". These country-mediated relationships to Wikipedia deserve further investigation.

References

↑ Because the logging bug affected a substantial number of responses and has an unknown cause, we will not be able to perform comparative analysis to explore relationships between platform and country, trust, information need, or prior knowledge.
↑ The mean agree score for "I believe that this topic... is a controversial topic" is substantially lower than that for the other questions because the way this question is phrased flips the valence of the question: in this case, a high mean score would have suggested that the article is likely less trustworthy because more controversial articles are more likely to be biased or unreliable.
↑ The 21% 'drop' in current-article-trust between the first and second surveys looks startling, but it is probably due to framing effects. In the first survey, the per-article trust question was directly preceded by the question "how much do you trust Wikipedia?" and a question about WHY they trusted Wikipedia as highly as they did. This likely prompted people to evaluate the current article more positively than they would have if they hadn't been asked about overall trust first (as was the case in the second survey). The responses to the second survey are probably more reflective of readers' actual credibility judgements.

[1] Because the logging bug affected a substantial number of responses and has an unknown cause, we will not be able to perform comparative analysis to explore relationships between platform and country, trust, information need, or prior knowledge.

[2] The mean agree score for "I believe that this topic... is a controversial topic" is substantially lower than that for the other questions because the way this question is phrased flips the valence of the question: in this case, a high mean score would have suggested that the article is likely less trustworthy because more controversial articles are more likely to be biased or unreliable.

[3] The 21% 'drop' in current-article-trust between the first and second surveys looks startling, but it is probably due to framing effects. In the first survey, the per-article trust question was directly preceded by the question "how much do you trust Wikipedia?" and a question about WHY they trusted Wikipedia as highly as they did. This likely prompted people to evaluate the current article more positively than they would have if they hadn't been asked about overall trust first (as was the case in the second survey). The responses to the second survey are probably more reflective of readers' actual credibility judgements.

[1]

[2]

[3]

Research:Why readers trust Wikipedia

Contents

Background

Methods

Timeline

Results

First round survey

Descriptive statistics

Trust taxonomy

Second round survey

Descriptive statistics

Analysis

Conclusions

See also

Related research

Subpages of this page

References