Notes on good practices on Wikipedia research

Brainstorming of suggestions on how to develop a research in a way that respects Wikimedia community principles.

Active vs. Passive Research


Passive research, for the purposes of this page, will be defined as a research method that does not require or require a minimum interaction with Wikimedia or Wikimedians (Examples:?????). Passive research is done on publicly available datasets and should require no authorization/approval from Wikimedians or the Foundation. Active research, on the other hand, requires the participation of Wikimedians or interaction with Wikimedians and will require some type of consent or authorization. The rest of this page refers only to active research. (Examples:????).

Passive Research Datasets


You might consider that there is already a significant amount of behavior data collected by the Wikimedia Foundation (based on digital threads). You might want to consider whether some of the data can be an indicator that helps answer your research question (

Anonymised re pseudonymised


Some research identifies Wikimedians by their username, and this can be a real life identity, an anonymous pseudonym or something in between the two.

Passive analysis producing anonymised datasets


Is a common uncontroversial practice - people may query your conclusions and methodology but they are unlikely to have any concern as to your handling of their personal data.

For example Wikimedia's stats on the English Language Wikipedia don't name individual editors, nor would it be possible to reverse engineer those statistics and identify the editors concerned.

Passive research producing data on identifiable Wikimedians


Generally Wikimedians expect that those who take on extra responsibilities within the community will be subject to more scrutiny than those who don't.

So the List of Wikimedians by number of edits has an opt out for Wikimedians who don't want to be listed.
But the The list of administrators on the English Wikipedia by number of actions does not.

Community Engagement


Talk to Wikimedians before taking action.

  • Post in Village Pumps, Foundation-l, wiki(language)-l, wiki-research-l, etc. (See mailing lists.). Describe your research project. Ask for feedback.



Wikimedians should know who you are and have a general idea of your research goals.

  • Add username to list of researchers (that is?) and note researcher status on User page
  • Add project to list of research projects: Research/Research_Projects
  • Produce a page summarizing the research project (include credentials, irb (????) approval, etc.)



Wikimedia and/or the community should benefit in some way from participating in your work.

  • Share (properly anonymized) datasets used in analysis
  • Publish a freely available summary of results for Wikimedians. For example:
    • Consider presenting at Wikimania (Next Wikimania is in August 2012 in Washington, D.C.)
    • Send a short document explaining the main research results to the communications channels (Village Pumps, Foundation-l, wiki(language)-l, wiki-research-l, etc).



Wikimedia and its editors should not be negatively affected by your work.

  • Become familiar with policies/guidelines relevant to your project. (TODO: List of common/important policies to consider.)


  • Recruitment via channels
    • Mailing lists: Foundation_l , Wikipedia_l and Wiki_research.
    • Many Wikimedias maintain village pumps and the like for discussion, see Distribution list.
  • Recruitment via direct message
    • Special:EmailUser?
    • Post on talk pages?
  • Consider that active participants might be more reactive to the call.
    • If you want to reach the several profiles of participation of the power law spectrum (90% audience/ 9% weak/ 1% active participants), it would be good to adapt the call to each.

Wikimedia and its projects usually use Creative Commons CC-BY-SA 3.0 License, many Wikimedians are very conscious of copyright, for some of them their principle hobby is resolving copyright issues with material that others post here. Be aware of copyright re the information you use, the information you collect and in the copyright consequences for your data. Remember that Wikimedians not only expect that people working with their data to use a compatible copyright, but omitting or using a "closed" copyright statement on a research questionnaire could skew your results by discouraging a certain group of wikimedians from completing your questionnaire.



Other materials