Experiments

This is a guiding document on the use of experiments on Wikimedia wikis.

Just brainstorming. More on the talk page.

Current practices in Web analytics reflect their commercial origins. For better or worse, the greatest motor behind the use of Web analytics has been the profit interests of online retailers and social networks, for whom the user is a commodity. These profit interests have profoundly shaped the discourse of Web analytics, setting both the tenor and the tone of debate. Consider, for example, the values implicit in "funnels," a term of art.

A thoughtless application of Web analytics to Wikimedia wikis would import a moral outlook that is incompatible with (and, indeed, rightfully offensive to) its community. It also wouldn't work well, because neither Wikimedia wikis nor their editing communities are for sale. It is therefore crucial that technical efforts be accompanied by a process of reflection, the goal of which should be to articulate criteria for Web analytics that express and promote the broader ambitions of the Wikimedia movement and the moral commitments that underlie it.

Background

Experiments on Wikimedia wikis, typically conducted by the Wikimedia Foundation, have become increasingly common in recent years.

200?–present: Ongoing experiments with the site banners during the annual fundraiser.
2009: The Usability team conducts a number of experiments (no public results).
2012–2014: The Editor engagement experiments (E3, later Growth) team conducted experiments on users via the E3Experiments MediaWiki extension and others.
2021: Example proposal of WMF experiment on the English Wikipedia.

Of note:

Experiments number in the dozens and are usually documented in the Meta-Wiki Research namespace.
Experiment outcomes are often not used for any concrete deliverable, such as a merged change to MediaWiki core PHP code or a peer-reviewed paper.
Sometimes potentially harmful changes that would have difficulty passing standard code review are deployed as "experiments" to bypass tougher scrutiny.
- This applies to fundraising banners, where poor translations since 2011 are often actively damaging public opinion and understanding of Wikimedia projects, according to some dozens users surveyed.

These experiments have varying goals and motivations, but we've likely reached a point where we need to establish clearer guidelines and guidance about what is and is not appropriate on Wikimedia wikis. It's particularly important to have clear communication (like [3]?), to avoid any risk of getting a backlash as happened to Facebook in June-July 2014 (despite being approved by an ethical board of the researchers' university).

Principles

Dignity and collegiality of all

Wikimedians must be treated as colleagues, not as customers. Experiments are often an attempt to optimize human behavior and workflow. There's nothing inherently wrong with such goals, but Wikimedians are not customers in the same way that users of Facebook are customers of its site.^[1] They instead should be viewed as colleagues. Would you go into the office of someone you work with and start messing with them to optimize their behavior? Surely not. But this is exactly the type of behavior the Wikimedia Foundation is now engaging in. Disrupting the work of long-time editors in the name of questionable experimentation.

Implementation

Mitigation

Experimentation comes with high costs and high risks. Adding extra weight by including people in a test group who ought not be tested on adds considerable cost and risk without any benefit.

Smarter code should ensure that only editors who meet specified criteria load extra code (JavaScript). A number of factors can be taken into account when determining whether to load extra JavaScript for a particular user, including:

user's logged-in status;
user's edit count;
user's registration date; and
whether the user is using the default skin.

Looking at and using smarter metrics is important. If a user is using a non-default skin, it's fairly safe to assume that they probably don't want to be fucked with. Experiments on these users should undergo the most scrutiny and require the most consideration.

Opt-out

Any and all experiments should have an opt-out feature. However, an opt-out feature is not a license to be more obnoxious simply because people can opt out of your experiment. Several years ago, the Usability Initiative provided an opt-out preference ("⧼vector-noexperiments-preference⧽") for development of Vector. The current experiments team respects this opt-out when it is relevant as well. Further reading at How can I opt-out of experiments?.

Restricting to new users

For WMF solely "experimenting with the newbies", the nice thing is: no negative feedback. The bad thing is, the experiment can fuck things up without anyone noticing. Maybe you will notice, if you have chosen the relevant metrics beforehand, and if you do the analysis, and if you interpret the data correctly, and if you don't dismiss the inopportune results of the analysis. For example, if user feedback says they feel patronized by the software – how do you track that?

As an example, fundraising banners are no longer shown to logged in users and were allowed to become more and more obtrusive. When in 2014 they were shown to logged in users for a few hours by mistake, an unanimous vote of almost a hundred fr.wiki users identified a number of issues and asked to restore the "traditional banners".^[2]

References

↑ «PlanOut was developed at Facebook for running experiments involving hundreds of millions of people.» [1]
↑ w:fr:Wikipédia:Le Bistro/25 novembre 2014#Mettre en place une bannière classique; [2]; phabricator:T75812.

External links

https://goomics.net/270/

[1] «PlanOut was developed at Facebook for running experiments involving hundreds of millions of people.» [1]

[2] w:fr:Wikipédia:Le Bistro/25 novembre 2014#Mettre en place une bannière classique; [2]; phabricator:T75812.

[1]

[2]