Research:Studying Wikimedia Commons
Introduction and BackgroundEdit
Wikimedia Commons (or Commons) has been an important part of Wikimedia Foundation’s (WMF) projects since its launch in 2004. It serves as a free online repository for WMF projects to store and manage diverse multimedia content. Commons, while a separate project, both benefits and contributes to other WParticipants have different motivations for contributing to Wikipedia and Wikimedia Commons, and they have different perspectives of the goals of each of these different platforms. Wikipedia focused editors believe that the main goal of Commons is to support other Wikimedia projects. In contrast, editors who worked mostly on Commons state that the goal of Commons is broader than being a media library for Wikimedia projects. Commons contributes to open-knowledge. These differing perspectives created conflicts around what should be included or deleted from Commons and conflicts around what makes a curatorial modification legitimate. MF projects. A majority of prior research has focused on Wikipedia, its organization, articles, editors, readers, and policies. Few studies have focused on the ways Commons benefits and supports activities in other WMF projects. Commons is a community with its own goals, participants and organization, so it is a worthy candidate for focused research rather than simply an adjunct of Wikipedia.
Research Questions & ScopeEdit
We are broadly interested in understanding the intersection between Wikimedia Commons and the needs and goals of other Wikimedia projects. We have four specific questions that we hope to answer through this research:
- What are the goals of the Commons?
- What are the goals of Commons curators (editors), and how do they participate to improve Commons?
- How do the goals of Commons curators (editors) align with the needs and goals of other Wikimedia projects?
- How do editors of other Wikimedia projects leverage the content of Commons to improve the quality of their projects?
We recognize that people come to Commons for a variety of reasons, at this time, we want to study the following two groups:
Curators are editors who work extensively on Commons to upload, edit and organize contents. In this study, we will invite curators to talk about their experience with Commons. We plan to recruit 15-25 Commons curators for this study.
We want to interview editors who use multimedia content from Commons to illustrate their projects. We are interested in their needs for Commons, and how Commons can be improved to better support their work. We plan to recruit 15-25 Wikipedia editors for this study.
Participation in our study is voluntary. However, in recognition of the participant's time we will make a donation to a 'like minded' organization. Specifically, for each participant who completes an interview with us, we will acknowledge the contribution of their time, effort, and expertise to our research project with a donation of $15 to one of three charitable organizations. Participants completing an interview can choose whether a contribution will be made to: Wikimedia Foundation, or Creative Commons, or Internet Archive. If not otherwise specified the default choice will be Wikimedia Foundation.
This is qualitative research.
This work has been reviewed by an Institutional Review Board (IRB) at the University of Washington. In late February 2020, the University of Washington Human Subjects Division (HSD) determined that this study is human subjects research and that it qualifies for exempt status. This exempt determination is valid for the duration of the study.
Under the IRB proposal we requested the ability to gain verbal consent from the interview participants. We will verbally inform individuals of the basics of the study, similar to the information provided in this research project description, and ask them whether they agree to participate, prior to asking any of our interview questions. During the interview individuals may decline to answer questions, or they can stop their participation and ask to have the interview deleted.
It is our practice to make a good faith effort to maintain the confidentiality of the interviews. We will not tell outsiders which individuals were part of this research. We will do our best to anonymize the data. However, it is important for participants to know that Wikipedians are skilled at research and can sometimes figure out who and what is being described, even though a best effort has been made to anonymize the data. Wikipedia and Commons are public platforms where all edits are visible which makes a promise of anonymity very difficult to ensure. We feel we cannot make that promise. But we do promise to make our best effort.
We will recruit participants through two channels, community board announcement, and individual, direct email invite. The community board announcement will target curators who have contributed to Commons and editors who have used multimedia content from Commons to illustrate at least one Wikipedia article in the last 24 months. After potential participants contact us to express their interest, we will screen their eligibility by examining their Wikipedia User Page, User Talk Page, and Wikipedia Edit History. We will identify individuals who we will contact directly by reviewing their publicly available edit history. We will review editors and curators editing history to make sure the participant meets our contribution threshold prior to individual contact. A member of our study team will send a personalized, direct email invite to editors who have inserted a picture from Commons to illustrate an article.
We plan to conduct semi-structured interviews with eligible Commons curators and Wikipedia editors who are 18 years old or older. We hope to record and transcribe the interviews for further analysis.
Impact of the studyEdit
The primary impacts are a better understanding of Commons and the wikiwork that is performed by Commons curators (editors). We will also gain a better understanding of the relationship between Commons and other WMF projects (focused through the lens of EN Wikipedia).
There are possible indirect impacts through an understanding of potential improvements to tools and processes that would allow editors from other WMF projects to more effectively leverage the content in Commons.
Executive Summary of FindingsEdit
Editors work across or between many WMF platforms including Wikimedia Commons, the many different language editions of Wikipedia, Wikidata, Wikiversity, and many others. In Commons, editors upload media and curate metadata such as descriptions and categories to make media available to many WMF projects. The cross-platform work to include media is an example of “stitching” . Our academic analysis focused on the ways that stitching is supported or hindered across WMF platforms. Through our analysis we found a number of key challenges. Below we highlight six challenges and provide suggestions that may address those challenges to improve cross-platform collaboration and resource utilization.
- Lack of Communication Across Networks
“I am a member of [WMF <country>], and I have the occasional discussion there. And I also have had worthwhile discussions on particular topics in Wikipedia, but not on Wikimedia Commons. Although currently, I’m trying to engage in some discussion about <country> copyright and copyright of government documents that I’ve found [and it is] extremely difficult, in fact impossible so far, to attract anyone who’s interested in that topic. . . . But it would be good to have an area where you can go and say, hey, we’ve got some discussions going on soon. So would you like to join?”
We found editors formed sub-networks while working on diverse projects. For example, We found networks of photographers focused on producing images of different subjects, a network of admins that handle copyright issues, a network of categorizers that deal with meta categories and networks of Wikipedia editors who write articles in specific subject areas. These sub-networks established their own ways of communicating and organizing activities, but there was an absence of communication between these distributed sub-networks both across platforms and within a platform. The lack of mutual awareness of project needs inhibits the formation of collaboration between existing micro-networks. We found editors formed sub-networks while working on diverse projects. For example, We found networks of photographers focused on producing images of different subjects, a network of admins that handle copyright issues, a network of categorizers that deal with meta categories and networks of Wikipedia editors who write articles in specific subject areas. These sub-networks established their own ways of communicating and organizing activities, but there was an absence of communication between these distributed sub-networks both across platforms and within a platform. The lack of mutual awareness of project needs inhibits the formation of collaboration between existing micro-networks.
- Differing Perspectives
“Well, there are two things really. The first one is to support the Wikipedia and other Wikimedia projects by being the image repository for those projects. But the other thing and to some people this is more important. It is a repository of free media for anybody to use. If you want a picture to put on a T shirt or your own blog, or your academic paper or whatever it might be, just to print out and put on the wall to look nice. You can use Wikimedia Commons. And it documents things. So if you want to know what style of works artists produced, you can go and look at their images on Commons. And that has an educational value in its own right. So it’s, it’s part of the, the free culture movement that makes knowledge available freely to people, which is what Wikimedia’s mission is.”
Participants have different motivations for contributing to Wikipedia and Wikimedia Commons, and they have different perspectives of the goals of each of these different platforms. Wikipedia focused editors believe that the main goal of Commons is to support other Wikimedia projects. In contrast, editors who worked mostly on Commons state that the goal of Commons is broader than being a media library for Wikimedia projects. Commons contributes to open-knowledge. These differing perspectives created conflicts around what should be included or deleted from Commons and conflicts around what makes a curatorial modification legitimate.
- Multilingual Resources
“Um, well, again, the linguistic issue, I mean, if people are not English speakers, it’s much harder. Commons is in theory a multilingual project, probably over 50% of the work done there is done in one language. There’s a few other languages that are pretty serious contenders. And I would say conversely, if you’re looking for material about China, you do a lot better to have Chinese to do your search in English, because an awful lot of the material about China is documented only in Chinese. And, you know, that’s less so for some other countries. That’s probably the extreme of it in my experience.”
Media resources on Commons are technically multilingual, but practically, this is a challenge. The name and description of Commons resources such as image files and categories are often written in one language only. With Commons key-word based search engine, results must contain the exact keywords from the language entered. These resources cannot be found if an editor searches for keywords in another language. This issue severely impacted participants from other language versions of Wikipedia who have limited or no English proficiency because the majority of Commons content was produced and curated by English speakers. This issue impacts English-speaking contributors as well, but possibly not as severely. Many English natives have a feeling that there are valuable resources about non-english speaking countries that are not available through English metadata search.
- Cross-Platform Vandalism
“. . . there was an incident of a celebrity ... porn outing on their Wikipedia page. So there were nude photographs of this actress. I can’t I don’t remember who it was. . . . And for some period of time for like five minutes. Maybe it was longer. This unfortunate woman had her nude photographs on her Wikipedia article. I don’t think the press picked up. The concern for Wikipedia, English Wikipedia, was that it was a type of vandalism. It was a form of inter project vandalism that was undetectable on Wikipedia. So recent change editors on Wikipedia wouldn’t see anything changing if the image on Commons was overwritten. And there’s nothing to stop somebody even even a new editor from overwriting images for the vast majority of images, including portraits of celebrities.”
Commons enables sharing and reuse of the same media resources for many WMF projects without having to upload and store those resources on their local servers redundantly. But this also creates an opportunity for cross-platform vandalism. Because anyone can overwrite an image on Commons, that feature can be used to change what is displayed on (potentially many) other platforms for malicious purposes. Changes in Commons do not generate a clear event or change in articles that depend on an image or media resource. This type of vandalism is often serendipitously detected by editors as they review articles for other reasons.
- Differing Policies
“Well, largely because one of the issues with maps on Wikipedia, or on Wikimedia Commons, for that matter, is that a lot of them have been made by people. And they might have been based on some reliable source. But often the source is not actually provided in the file description. So in order for … the particular map file to be acceptable at a featured article class, what’s required is that the map has to be properly sourced, you have to be able to say, Well, I copied this from a map on this page of this book, or whatever. . . That’s a common problem with maps. And people often, yeah, upload maps that they’ve made themselves within Skype or whatever. But they’re not. But they’re not able to be used, particularly in featured articles, because they’re not, it’s not clear where the information that’s on the map has come from.”
While Commons and the different language editions of Wikipedia share the same underlying philosophical stance toward open and free knowledge, policies and guidelines still reflect individual characteristics of the differing online communities. Participants found policy and guideline misalignments between Commons and Wikipedia. One example is how they treat issues of copyright; compare Wikipedia’s policy to that of Commons. [Wikipedia: Assume Good Faith; Commons: Precautionary Principle]. Another example is the importance of sources, reliability and verifiability on Wikipedia (with four fully elaborated policies) and Commons' distinctly different approach. [Wikipedia: Citing Sources, Reliable Sources, Verifiability, Image Use; Commons: Verifiability]. These misalignments potentially make resources less usable and reliable and can introduce editorial conflict among editors from different WMF projects.
- Differing Practices
“… I tend to do very little categorization unless the categorization is very robust. And I tend to make up my own categorization. … [I’ve uploaded lots of] photographs of [artifact]. I've uploaded a lot of them but I created a whole category structure based on their categorization, not based on common's existing categorization ... because I've had so much headaches in the past trying to do that.”
Commons content is curated by editors with diverse perspectives, values and expectations. Participants complained that there is a lack of “best practices” for creating file names, descriptions and categories. This makes it difficult to search for, understand and reuse content curated by different editors. For example, when with difficult decisions about where an item might go into the existing Commons category system, many editors noted they would simply create a new category and put the item into that new category.
Suggestions that Potentially Address these ChallengesEdit
- Create Places for Inter-Network Discussion
Participants did not know of a clear location for discussions of policies and practices that impacted cross-platform work. There’s a need to create and maintain a place (maybe on Commons or another WMF platform) to foster cross-project discussions. Culturing social practices that point cross-platform discussions to a sanctioned location would improve visibility and foster these discussions.
- Invest in Inter-Platform Vandalism Detection Technologies
The Wikimedia Foundation should consider investing in technologies that facilitate collaboration between Commons and Wikipedia editors in detecting and/or preventing cross-platform vandalism.
- Improve Multilingual Search to Make Existing Resources More Available
Wikimedia Commons should investigate changes to search to help make resources with metadata in only one language available through search in another language.
- Bennett, Lance; Segerberg, Alexandra; Walker, Shawn (2014). "Organization in the crowd: peer production in large-scale networked protests". Information, Communication \& Society. pp. 232––260.