User:Shouston (WMF)/Sandbox/Global Metrics/Final report

Background

This project was undertaken jointly by the Community Resources and Learning & Evaluation teams at the Wikimedia Foundation, as both of these teams have played a central role in the creation, implementation and support of Global Metrics over the last two years.

The goal of updating Global Metrics was to create something that was responsive to copious feedback and suggestions we received through the Global Metrics retrospective and June consultation.

Given the spectrum of feedback, we identified a set of upfront principles that would enable us to create ideas and make decisions that were both responsive and consistent. In the proposed set of changes, these were the set of design principles that were used to assess the strengths and weaknesses of each idea proposed. In deciding the final updates to Global Metrics by synthesizing and incorporating feedback, these were principles that we used to guide our decision making:

We would not introduce new metrics if there were no easy-to-use tools available. Agreed upon new metrics would only be introduced once tools were available.
We would identify places where we could iterate, understanding that not everything can be done now, but it’s better to do it well even if it’s done slowly.
We would think holistically about metrics, acknowledging that grant metrics, organizational metrics, and program metrics might overlap, but aren’t necessarily the same thing. All of these metrics need to complement not supersede each other.

Executive Summary

The Metrics Library received strong support from the grantees and grant committee members who gave feedback. The creation of this centralized resource will become one of the core projects for Amanda Bittaker from Learning & Evaluation in the 2016-2017 fiscal year.
There was no consensus from respondents on whether Wikimetrics should become the single tool to calculate Global Metrics. Given Wikimetrics has significant front-end and back-end problems, a deeper project will be undertaken by Sati Houston from Community Resources in the 2016-17 fiscal year to investigate whether a new tool or improving an existing tool is the best solution to the issues around data collection.
There was strong support from respondents to simplify the requirement of Global Metrics, particularly to change the structure to “Proposal 2: 3 shared measures + 2 grantee-selected measures”.
For the four metrics proposed, respondents provided significant feedback on the importance, usefulness, and definition of each metric, as well as how easy/difficult it would be to collect the information. This feedback was integrated into the final set of metrics, with risks and concerns offered potential mitigations. For additional information see the section on Participation, Content, and Community Building.
- All respondents indicated Community Building was important and useful outcome to capture, but had hugely divergent responses to the question, “What is community building”?. Moreover, respondents indicated that collecting information on Community Building would be difficult; it was unclear if surveys, storytelling, or some other solution was the right fit for capturing this information. As such, Community Building will not be included as a shared measure at this time.

Grant metrics instead of Global Metrics

Global Metrics will be replaced with a simplified requirement around grant metrics, with the following structure and metrics. This grant metric requirement will only apply to the Project Grant and Annual Plan Grant programs. Rapid Grants and Conference & Travel Grants will retain their current program-specific requirements.

Three shared metrics (bold)

Total participants. This is the number of people who attend events or activities, either in person or virtually. It does not include many others who might be “involved” in events or activities, such as organizers or channels of outreach (e.g. social media followers). While these groups remain important, each grantee can choose to include them in their grantee-selected metrics, if relevant.
Number of Newly registered users. This is the current global metric which is collected through the Magic Button and Wikimetrics. The definition remains the same: the number of newly registered users as a result of the project, using a two week window for specific events if needed.
Content pages created or improved, aggregated by Wikimedia project. This is the current global metric, which is collected through the Magic Button only. The definition has been slightly changed to make it clear that all Wikimedia projects are included.

Grantees and grant committee members interviewed actually prefered two different metrics than (2) and (3):

Newly registered users, as well as their retention over 1, 3, 6, or 12 months (called “New User Retention”)
Content pages created or improved, disaggregated by Wikimedia project

However, neither of the New User Retention metric or the disaggregated content metric have easy-to-use tools readily available today. When these tool issues are resolved, these metrics will replace the current (2) and (3) metrics.

+ Two grantee-selected measures (bold) These measures (whether quantitative or qualitative) are a space for the grantee to highlight achievements that go beyond those captured by the shared metrics. These could be the outcome of a single program, or a set of programs; there is no preset list to choose from when selecting these measures. It is not intended that these two metrics to replace the program-specific metrics that many Annual Plan Grantees identify. Rather they are intended to show a few outcomes at the overall grant level, replacing the Global Metrics tables in the “Overview-Global Metrics” section of the proposal and report template. Two measures are required, but grantees may include as many as they feel necessary or relevant.

This new structure of 3 shared metrics and 2 grante-selected measures intends to highlight that grant metrics, organizational metrics, and program metrics might overlap, but aren't necessarily the same; all of these metrics are intended to complement, not supersede each other. Storytelling remains important to capture the breadth and diversity of outcomes that happen in the context of a grant, especially those things that cannot be captured in a number. These five metrics are not intended to be comprehensive.

Timeline

These new grant metrics will become a part of the upcoming Project and Annual Plan Grant Rounds, specifically: Project Grants - September open call Simple Annual Plan Grant - November Full Annual Plan Grant - October All current grant will not be required to change. They will remain with the metrics, objectives and targets identified in their proposals. ….

Details on community feedback and rationale behind changes

Structure

Overall, most grantees and grant committee members interviewed prefered “Proposal 2: 3 shared measures + 2 grantee-selected measures”. This structure integrates Global Metrics and other metrics into a single set of grant metrics. Those who favored this structure indicated that it adequately balanced the need for consistency by having a set of shared metrics, and also the need for flexibility by allowing grantees to highlight their specific outcomes and achievements. However, the overall success of this proposed new structure depends heavily on the Metrics Library, as the library will be the main mechanism for grantees to discover new metrics, online resources, and potential tools.

There were three primary concerns raised about shifting to structure that integrated Global Metrics and other grant metrics. In an effort to address these concerns, we have identified a few potential mitigations where applicable:

This new structure will diminish the distinction between those metrics required by WMF, and those identified by the grantee.
- It is true that removing the this structure will integrate metrics required by WMF and those other metrics identified by the grantee. However, the respondents who supported Proposal 2 highlighted this as a strength of the structure, as all of these metrics together represent the outcomes of a grant. The need to distinguish and separate metrics into two groups, especially when they might overlap, is not useful to understanding outcomes holistically.
Requiring at least two grantee-selected metrics might increase the burden on smaller grants and grantees.
- Mitigation: The Rapid Grant program is designed reduce the grant approval time and the overall burden of reporting, and will not follow this same structure. It will instead maintain its current lighter weight requirements.
Given these grantee-selected metrics will vary by grant, there may not be adequate support (e.g. tools, online resources) available to the grantee.
- Mitigation: The Learning & Evaluation team has spent the last three years documenting common program metrics through program toolkits and learning patterns, which includes some documentation on available tools. These resources will be integrated into the Metrics Library before its launch, to help with identifying, calculating, and reporting “other grant metrics”. However, a longer-term solution is still needed to ensure that other grantees can contribute their own metrics to the Metrics Library (as well as the associated tools and resources), and to ensure that common metrics are easily identified.

Based on this decision to bring Global Metrics and other metrics into a single set of “grant metrics”, updates will be made to the proposal and reporting templates for Project and Annual Plan Grants to facilitate the integration. Template changes will also aim to make clear that these three shared metrics should only be required when relevant to the goals of the grant.

Measures

Participation

Of the two participation metrics proposed (Individuals Involved and Editors Retained) respondents indicated that both metrics were useful and important, but covered different outcomes and were applicable in different situations. While there was support for having only Individuals Involved as the primary participation metric (link to voting data?), there was equal support for allowing grantees to choose between Individuals Involved and Retention, as applicable to their grant (link to voting data?).

Respondents also indicated that the definition of each proposed metric needed to be refined, with the following being the primary changes suggested to the definition:

For Individuals Involved, the definition requires more specificity. While there are many groups of people “involved” in a grant, there are three main groups that are typically included in this metric:

Participants - these are people who show up to events, either in person or virtually. For instance, these are the people who whose usernames are collected.
Organizers - these are the volunteers or staff members who are responsible for organizing the event.
Large audiences that are the target of outreach activities - these are the large groups of people who are reached (primarily) through mass communication, where the goal is to raise awareness about the Wikimedia movement, a specific project or event, etc. For example these are newsletter recipients, social media followers, mailing list members, those who visit a Wikimedia booth at a fair, etc.

While respondents indicated that the distinction between these groups isn’t perfect or comprehensive (e.g., Where would donors or partner organizations be captured? What about participants who become organizers?), the disaggregation of these three groups would improve the specificity of the definition, and be more useful overall. Two respondents went further to add that the number of volunteer organizers is really a measure of “volunteer engagement” and the size of those large outreach audiences is really measure of “raising awareness”; while both are related to participation generally, they actually measure different outcomes than “total participants”.

For Editors Retained, most respondents indicated the metric should instead focus on New Users and New User Retention. Respondent supported expanding the definition of retention to 1 edit in any namespace in any Wikimedia project, but indicated that:

To measure retention, it’s important to know the total number of new users.
To measure retention well, it must be tracked over multiple time periods (e.g. 1, 3, 6, or 12 months).
To know what qualifies as “good” retention, there need to be baselines available.

Respondents indicated that while the engagement of the existing contributors is critical for many activities, the outcomes of those activities are better captured qualitatively - focusing on feelings of motivation, connectedness to their community, etc. - rather than through a numeric retention metric.

However, while all of these participation metrics have issues around data collection, New User Retention has significant issues. To our knowledge, no tool exists with sufficient functionality to capture retention over multiple time periods, with the suggested definition. Moreover, Single User Login was introduced after the original Global Metrics were defined, and it will likely be necessary to change the way “New User” is technically defined in each tool (e.g. distinguishing between “those who are creating a username for the first time” and “those who are existing contributors, but new to a specific language project”), as well as how “New User Retention” is technically defined. While there are many retention-focused tools - both created by WMF and by volunteers - each would need small to large improvements. (This is based on tools that we know of; if we have missed a tool, please let us know!). As such, the current “Newly registered user” metric should remain until tools can be updated to support new definitions and functionality.

(box)

Given all of this feedback, the new participation metrics will be as follows:

Total participants. This is the number of people who attend events, either in person or virtually. As suggested, we have included a set of examples of who should and shouldn’t be included.
- This definition of participation returns to the metric that the majority grantees reported before the start of Global Metrics.
- Engaging volunteers and raising awareness remain important goals and outcomes to capture. The “grantee-selected” metrics represent an opportunity for grantees to highlight these achievements in ways that are most relevant to them, rather than be forced to report it in ill-fitting the “participants” metric.
Number of Newly Registered Users. This is the current global metric, and the definition remains the same: the number of newly registered users as a result of the project, using a two week window for specific events if needed.

Content

Most respondents strongly endorsed the proposed content metric: Content pages created or improved, disaggregated by Wikimedia project.

This new definition resolves some of the issues identified in the retrospective, particularly that the current “pages new/improved” metric was interpreted to be “Wikipedia articles new/improved”, or that the metric aggregated pages across Wikimedia projects (e.g. Wikidata items, Wiktionary entries and Wikisource pages). Respondents did not have many suggested improvements to the definition but they did have three concerns. In an effort to address these concerns, we have again identified a few potential mitigations where applicable:

The quality of content is not addressed
- Given quality is currently primarily assessed through community processes (e.g. Featured Article process on English Wikipedia, Featured Image process on Commons), or specific processes & rubrics of an event or contest, it is highly contextualized and would be difficult to define in a centralized way. While the adoption of automated tools might be an option in the future, for now it isn’t feasible to include as a shared metric.
This metric might be easily “gamed” or manipulated, e.g. through creating stub articles or making small edits to as many pages as possible
- While this is a valid concern, it is unlikely that the grant system will ever have sufficient oversight to ensure gaming doesn't happen.
The definition of an “improvement” differs by Wikimedia project
- Potential mitigation: More detailed examples of improved could be added to the metric documentation in the Metrics Library.

While this metric does provide more specificity in content added or improved across the various Wikimedia projects, there are no available tools to collect this detailed information currently (without having some technical expertise in languages such as SQL). Wikimetrics can report “content pages added, disaggregated by Wikimedia project” and “aggregated content pages created and improved”, but not “content pages improved, disaggregated by Wikimedia project”. (Again, the assessment that no tools currently exists is based on tools that we know of; if we have missed a tool, please let us know!)

Given this tool limitation, this metric cannot become a shared grant metric yet. As such, the current “pages created or improved” metric will remain, with slight improvements to the definition. Once the tool issues are addressed, the updated metric (with the disaggregation by Wikimedia project) will replace the current one.

(box) Content pages created or improved, aggregated by Wikimedia project. This is the current Global Metric, calculated by the Magic Button, where a “content page” is defined as an article on Wikipedia, an entry on Wiktionary, a file on Commons, an item on Wikidata, a page on Wikisource, or similar units of content on other Wikimedia projects.

Grantees may disaggregate these content pages by Wikimedia project if desired, though it is not required. The time and effort to include these disaggregated numbers is likely too much for most grantees.
For example, participants at an editathon create 4 Wikipedia articles, improve 12 Wikipedia articles, upload 4 images to Commons, and add properties to 18 items on Wikidata. The grantee would report 38 content pages new or improved across Wikimedia projects. They may also report 16 created or improved articles on Wikipedia, 4 media uploaded or improved on Commons, 18 items created or improved on Wikidata.

Community Building

All respondents indicated Community Building was important and useful outcome to capture, but had hugely divergent responses to the question, “What is community building”?. Moreover, respondents indicated that collecting information on Community Building would be difficult; it was unclear if surveys, storytelling, or some other solution was the right fit for capturing this information.

As such, Community Building will not be included as a shared measure. While important, it needs more investigation and experimentation, to clarify its definition, how the information would be collected, how the information would be used.

However, we will continue to investigate the work that has already been done or is currently being done around Community Building, including currently used definitions. Depending on interest of conference organizers, we could present these findings at a movement conference in 2017.

Data

‘Who responded’ Former and current grantees, WMF staff, grant committee members, and a WMF board member. The summarized feedback below is based on interviews and survey data collected from:

5 WMF staff from the Community Resources team 34 grantees IEG: 4 PEG: 14 APG: 10 Simple APG: 4 Unknown: 2 8 Grant committee members FDC members: 3 GAC members: 3 SAPG committee members: 2

‘Holistic approach’

‘Structure’

‘Metrics’

‘Resources’