Learning and Evaluation/Archive/Grantmaking and Programs/Learning & Evaluation portal/Using bots

Other languages:

Overview edit

In the past year or so, Research Strategist Jonathan Morgan has programmed several simple bots that perform common tasks around wikis, starting with the Teahouse pilot project and more recently the Individual Engagement Grants program and the Brazil pilot program. Bots are useful tools for getting information onto and off of wiki pages, for tracking and reporting certain on-wiki activity metrics, and for promoting community involvement in projects.

Bots also have the potential for making several aspects of our work in Grantmaking & Programs a little easier* by streamlining workflows and automating repetitive tasks. Below, I provide some of basic info to help you decide whether using a bot makes sense for your project or program.

* As long as they follow Asimov's Laws.

What is a bot? edit

In wiki-speak, a bot is simple program or collection of scripts (written in a programming language like Python, Perl or PHP) that makes edits to wiki pages from a single, dedicated user account. On Wikipedia, bots are used to sign people's posts on talk pages, to revert vandalism, send messages en mass, and perform hundreds of other important but labor-intensive tasks.

For example, here are some of the things that one WMF-run bot, HostBot, does around the Teahouse:

  • invites around 100 new editors to visit the Teahouse every day
  • moves the profiles of Teahouse hosts and guests so that when people visit those pages, they see the profiles of the people who were active most recently.
  • updates rotating galleries of recent questions and featured hosts and guests that are displayed on the Teahouse landing page.
  • publishes daily reports of the new users who have been invited to the Teahouse, as well as monthly metrics reports of host and guest activity on the Teahouse.

Wow! That's one busy bot! But how is this relevant to grantmaking?

Why use a bot? edit

Bots aren't the only way to do things on wikis, but they're often the cheapest and most efficient way to edit many pages at once or perform the same kinds of edits on a regular basis. Here are some of the reasons you might want to use a bot:

Bots can perform batch actions and regular updates

Manually maintaining category lists, sending messages to users, and keeping even a modest number of pages up to date can be very labor intensive. Bots can be set up to perform these and other 'maintenance' tasks automatically at regular intervals, or in response to changes made by other users.

Bots are relatively easy to implement

In many cases, you can write a bot for a task in much less time than you can implement a new gadget or create a new MediaWiki extension. In fact, using a bot is sometimes the only way to automate a task on wiki if you don't have access to substantial developer resources or admin privileges on the wiki. And code written for one bot is often relatively easy to re-use for another bot, or on another wiki.

Bots play nicely with databases and APIs

Bots can be used to track on-wiki activity through MediaWiki databases and through the MediaWiki API. That means you have full access to both the text of any wiki page at any point in time, and a variety of metadata about that page.

Bots are transparent and open

Since bot code is generally simpler and more public than gadget or feature code, it is easier for someone who is not the bot's owner to see how the bot works and understand its actions. If their code is hosted in an open online repository like github.com, bots can even be collaboratively developed and run by multiple users. This feature of bots helps assure that they can be fixed if they break, and maintained even if their original creator is no longer around.

And because bots have the same rights as other registered users, people other than the bot's owner can track and control the bot's behavior: for instance, if the bot makes a mistake, any other editor can revert it. If a user does not want to receive talk page messages from a bot, they can add a bot-readable template to their userpage indicating their preference.

What can bots do for you? edit

Many of the features of a bot like HostBot or EdwardsBot may be useful for running pilot projects or streamlining G&P program workflows.

Bots for pilot projects

Bots can be useful for targeting cohorts of editors and inviting them to participate in pilot projects. One current project for which a bot is under development is the Brazil Education program. The leaders of this project are attempting to increase editor retention and improve article quality on the Portugese Wikipedia by inviting promising new editors and subject matter experts to participate in WikiProject Medicina. The bot currently being developed will:

  • Send talk page messages to editors who have recently edited medicine-related articles, and also brand new editors who are very active, inviting them to get involved with the project.
  • Track which editors are invited, which ones get involved, and their editing activities related to the project. A good example of the kind of metrics the bot might track is available here. A more detailed description is provided in the 'metrics' section below.
Bots for programs

Bots can also be helpful colleagues in the running of established programs. User:GrantsBot, for example, is a relatively new bot that supports the individual engagement grants submission and review process:

  • sends messages to people who have started a grant proposal, reminding them of the application deadline and the steps they need to complete in advance of that deadline.
  • removes instruction text from grant proposals once the proposer is finished with a particular step in the proposal process.
  • publishes information] about the current status of submitted grant proposals (such as who submitted them, whether they're "under review" or "withdrawn", etc) to Google Docs (using Evan Rosen's gcat module).

What can't bots do? edit

Bots can't create web forms, sliders, pop-ups, real-time status indicators or many other shiny, social-media style features common on non-wiki websites. For that, you need gadgets (like the Teahouse "ask a question" gadget) or other more sophisticated software features. Basically, bots have the same rights on Wikimedia wikis as human users: they can theoretically perform any task a human editor can, but do it much faster. In practice, they're best at performing simple, well-structured tasks: but that still leaves a lot of potential!

Things to consider when considering a bot edit

Community rules and bot review processes

Community-run WMF wikis like en.wikipedia, pt.wikipedia, meta and commons have policies around what kinds of tasks bots are allowed to perform. On these wikis, bots must be proposed and undergo a review process before they are allowed to make edits (although it's sometimes okay to perform a very few test edits in out-of-the-way places on the wiki before the bot is proposed).

The rules for bots and the proposal process are different for every wiki, so it's important to get community buy-in before setting your bot loose in a public place: for example, a bot that welcomes all newly-registered users may be fine on the Portugese Wikipedia, but there are long-standing rules against this kind of bot on English Wikipedia.

Links to bot-related policies and review processes on various wikis are provided at the end of this page under 'Bot rules'.

Bot hosting

To get the most out of your bot, consider hosting it on an external server (such as WMF Labs or the Toolserver). This allows you to take advantage of tools installed on that server--such as cron, various programming languages and modules, and production or replicated databases of various Wikimedia wikis which contain up-to-date metadata on pages, revisions and users.

Building your bot

There are lots of resources available for bots already, so it's useful to ask around or check out existing resources before you decide to reinvent the wheel and write all your code yourself. PywikipediaBot, for example, is an extensive volunteer-maintained framework for coding bots which contains scripts that can be used to perform many common bot tasks. See the list of resources at the bottom of this page to get started, or ask me or Evan Rosen to learn more.

G&P bots edit

Name Description Home wiki Source Wranglers Status
HostBot Invites new editors, tracks user metrics and performs housekeeping tasks on the Wikipedia Teahouse en.wikipedia.org github repository active
GrantsBot Reminds grant proposal submitters of deadlines and action items, reports metrics of grant proposal status and performs housekeeping tasks at the Individual Engagement Grants portal meta.wikipedia.org github repository active
WikiProjectBot Invites new editors and editors who have recently edited articles in the scope of WikiProjeto Medicina pt.wikipedia.org github repository under development

Bot resources edit

Bots in pilot projects edit

Examples edit

Using bots to track participation edit

There are many interesting datapoints around project participation that you could track. But it is a good practice to know what you want to measure before you start collecting data. This not only saves you time, it also helps you avoid the common research pitfall of data dredging, where you hunt through a bunch of datapoints looking for significant correlations without a firm idea of which correlations are most important, or what the correlations you find actually mean.

When you invite someone to participate in a project on a community wiki, you usually want to be able to answer specific questions about how participating in that project might effect the editing behavior of the users you invite. You probably also want to know how many of the people you invited actually saw your invitation, how many acted on it by visiting your project, and what they did when they got there (e.g. posting on the talk page, adding their name to the member list). You may also want to compare the activity of your visitors with a control group.

Basic metrics

Below, I provide a few datapoints that you should consider tracking about the users you invite to your project, and describe why they are useful. Many of these datapoints, such as who was invited and whether they visited your project, can be stored in a single table (such as your invited_yourprojectname table) and updated automatically. Others, such as how many edits your new project members made to project articles, you may want to wait and only calculate after your project pilot has concluded, to feature in your en:Research:Teahouse/Phase_2_report/Metrics project report.

  • users you invited: Which users did you invite to participate to your project? This data is stored by default in the invited_yourprojectname table.
  • users who visited: Of those who were invited, which ones ended up visiting your project? You can count someone as a 'visitor' is to measure how many of them made at least one edit to the project page, talk page, or sub-pages. If you have access to clickthrough or pageview data, you might also check to see how many of these people clicked on the link in your invitation message, even if they did not edit.
  • users who declined: It can be hard to tell if someone chose not to visit your project because they just weren't interested, or if they actually did not see the message you posted. Fortunately, most(?) Wikimedia Wikis use a little yellow notification banner to let you know if someone has posted a message on your user talk page since your last login. So if you have access to user session data that allows you to see whether someone has logged in since you delivered their invite, you can be somewhat confident that they at least saw your message. If you don't have access to session logs, you can still get an estimate of users who declined your invitation by checking to see whether they have made at least 1 edit somewhere on the wiki after receiving the invitation.
  • project page edits by visitors: How many times did these users edit your project pages? Which pages did they edit? You can even get a sense of what kinds of edits they made, without actually viewing the text of the revision. For example, if someone edits the project talk page, you can be reasonably certain that they posted a message there. Using edit comment metadata available on Wikimedia wikis, you can often tell which sections of a page they edited. For example, if they edited a section called /*project members*/, you can reasonably assume that they added their name to the project member list.
  • visitor edits by namespace: You can look at which namespaces the visitors edited most: for example, are they editing more article pages, talk pages or user pages?
  • articles edited/created: It is also useful to know which articles they edited, or what categories of articles. Do your visitors mostly edit articles that are related to your WikiProject? Do they create new articles in the same topic area?
  • edits over time: It is useful to know how long your project members continue to edit. On many Wikimedia wikis, most new editors stop editing within a few days or weeks (because they get bored, get frustrated, who knows?). But research has shown that joining a project with other editors can make new editors continue participating longer. So consider analyzing how many of your visitors make at least x edits per week for y weeks/months. You can set the edit threshold at different levels, but if you want to be able to compare your study to other studies of editor retention, it is probably best to use a common measure of activity, such as "active editors" (5 edits per month) or "very active editors" (100+ edits per month), which are used by stats.wikimedia.org.
Control groups

A control group is a sample of people (in this case, wiki editors) who did not receive a treatment. Establishing some sort of control group is necessary if you want to determine whether joining your project made a difference in what users did on Wikipedia (such as how many edits they made, how long they continued to edit, whether they vandalized articles, etc.). There are many ways of establishing a control group, but one fairly straightforward way is to only invite some of the editors who meet your criteria for invitation. For example, if every day on your wiki there are about 150 new users who register an account and make at least 5 edits, you could automatically invite 100 of these editors to your project, but not invite the other 50.

You can keep track of who's-who by creating a sample_type field in your <invited_yourprojectname> table, and populating it with "exp" or "con" (or 1 and 0), based on whether an editor was sent an invitation or not. This way, you can later compute the metrics listed above for both groups, and compare their averages. It may also be informative to use visitors who saw your message, but didn't participate in your project as a control group (see "users who declined" above). If visitors to your project remain active longer, if they create more articles, and/or if they have their edits reverted less, you may be able to infer that your project has a positive impact on new editor behavior. Just remember: correlation does not equal causation, and the way you set up your control group determines the claims you can make about your data!