Grants:IdeaLab/Redesigning Global Metrics & its support/Outcome

Background

This project was undertaken jointly by the Community Resources and Learning & Evaluation teams at the Wikimedia Foundation, as both of these teams have played a central role in the creation, implementation and support of Global Metrics over the last two years.

The goal of updating Global Metrics was to create something that was responsive to copious feedback and suggestions we received through the Global Metrics retrospective and this consultation.

Given the spectrum of feedback, we identified a set of upfront principles that would enable us to create ideas and make decisions that were both responsive and consistent. In the proposed set of changes, these were the set of design principles that were used to assess the strengths and weaknesses of each idea proposed. In deciding the final updates to Global Metrics by synthesizing and incorporating feedback, these were principles that we used to guide our decision making:

We would not introduce new metrics if there were no easy-to-use tools available. Agreed upon new metrics would only be introduced once tools were available.
We would identify places where we could iterate, understanding that not everything can be done now, but it’s better to do it well even if it’s done slowly.
We would think holistically about metrics, acknowledging that grant metrics, organizational metrics, and program metrics might overlap, but aren’t necessarily the same thing. All of these metrics need to complement not supersede each other.

Executive Summary

The Metrics Library received strong support from the grantees and grant committee members who gave feedback. The creation of this centralized resource will become one of the core projects for Amanda Bittaker from Learning & Evaluation in the 2016–2017 fiscal year.
There was no consensus from respondents on whether Wikimetrics should become the single tool to calculate Global Metrics. Given Wikimetrics has significant front-end and back-end problems, a deeper project will be undertaken by Sati Houston from Community Resources in the 2016–17 fiscal year to investigate whether a new tool or improving an existing tool is the best solution to the issues around data collection.
There was strong support from respondents to simplify the requirement of Global Metrics, particularly to change the structure to “Proposal 2: 3 shared measures + 2 grantee-defined measures”.
For the four metrics proposed, respondents provided significant feedback on the importance, usefulness, and definition of each metric, as well as how easy/difficult it would be to collect the information. This feedback was integrated into the final set of metrics, with risks and concerns offered potential mitigations. For additional information see the section on #Participation, #Content, and #Community Building.
- All respondents indicated Community Building was important and useful outcome to capture, but had hugely divergent responses to the question, “What is community building”?. Moreover, respondents indicated that collecting information on Community Building would be difficult; it was unclear if surveys, storytelling, or some other solution was the right fit for capturing this information. As such, Community Building will not be included as a shared measure at this time but we hope to learn more about successes and challenges in this area throughout the upcoming year.

Grant metrics instead of Global Metrics

Global Metrics will be replaced with a simplified requirement for identifying, collecting and reporting grant metrics, as follows:

Three shared metrics

Total participants. This is the number of people who attend events or activities, either in person or virtually. It does not include many others who might be “involved” in events or activities, such as organizers or outreach channels (e.g. social media followers).
- If outcomes related to volunteer engagement (e.g. the number of volunteer organizers) or raising awareness (e.g. social media followers) remain important to your grant activities, they can be included as grantee-defined metrics.
Number of Newly registered users. This is the current global metric which is collected through the Magic Button and Wikimetrics. The definition remains the same: the number of newly registered users as a result of the project, using a two week window for specific events if needed.
Content pages created or improved, aggregated across Wikimedia projects. This is the current global metric, which is collected through the Magic Button. The definition has been slightly re-worded to make it clear that all Wikimedia projects are included.

Grantees and grant committee members interviewed actually prefered two different metrics than (2) and (3):

Newly registered users, as well as their retention over 1, 3, 6, or 12 months (called “New User Retention”)
Content pages created or improved, disaggregated by Wikimedia project

However, neither the New User Retention metric or the disaggregated content metric have easy-to-use tools readily available today. When these tool issues are resolved, the shared metrics will be updated.

+ Two grantee-defined measures

These measures (whether quantitative or qualitative) are a space for the grantee to highlight outcomes that go beyond those captured by the shared metrics. These could be the outcome of a single program, or a set of programs; there is no preset list to choose from when selecting these measures.

Timeline

This grant metric requirement will only apply to the Project Grant and Annual Plan Grant programs. Rapid Grants and Conference & Travel Grants will retain their current program-specific requirements. No ongoing grants be required to change to the new structure. They will remain with the metrics, objectives and targets identified in their proposals. These new grant metrics will become a requirement for Project and Annual Plan Grant on the following timelines:

Project Grants – Grants approved in or after September 2016
Simple Annual Plan Grants – Grants submitted for funding starting January 1 2017 (i.e. grant proposals submitted by November 1)
Full Annual Plan Grant – Grants submitted in Round 1 this October 2016

How does this new requirement work in Project Grants?

Currently each Project grant has two tables of metrics: Global Metrics and Project Metrics. Under this new grant metrics requirement, there will be one table, with three rows for the three shared metrics, and two rows for the grantee-defined metrics. Grantees can remove any of the three shared metrics that aren’t relevant to their goals. They may also add more grantee-defined metrics if they have more than two.

How does this new requirement work in Simple Annual Plan Grants?

Currently grantee enters their Global Metrics targets and achievements into a common spreadsheet. For new Simple Annual Plan Grants, the metrics in this spreadsheet will be updated to match the new shared metrics, and the grantee will identify two others that are relevant to their grant.

How does this new requirement work in Full Annual Plan Grants?

Currently both the proposal and report forms have a table to capture Global Metrics targets and achievements (e.g. the “Global metrics overview – all programs” section in the progress report); program metrics are reported in the relevant program section. Under this new grant metrics requirement, there will be one table at the beginning of the proposal and report forms, with three rows for the three shared metrics, and two rows for the grantee-defined metrics. Grantees can remove any of the three shared metrics that aren’t relevant to their goals. They may also add more grantee-defined metrics, if they have more than two. Other program-specific metrics remain in the relevant program section.

Next steps

Update Collecting Global Metrics learning pattern – August 12th
Update grant proposal and reporting templates – Will vary by grant program, according to the dates the grant rounds start.
Begin scoping the Metrics Library project – August 15th
Begin scoping the tool for shared metrics project – September 5th

Details on community feedback and rationale behind changes

Structure

Overall, most grantees and grant committee members interviewed prefered “Proposal 2: 3 shared measures + 2 grantee-defined measures”. This structure integrates Global Metrics and other metrics into a single set of grant metrics. Those who favored this structure indicated that it adequately balanced the need for consistency by having a set of shared metrics, and also the need for flexibility by allowing grantees to highlight their specific outcomes and achievements. However, the overall success of this proposed new structure depends heavily on the Metrics Library, as the library will be the main mechanism for grantees to discover new metrics, online resources, and potential tools.

There were three primary concerns raised about shifting to structure that integrated Global Metrics and other grant metrics. In an effort to address these concerns, we have identified a few potential mitigations where applicable:

This new structure will diminish the distinction between those metrics required by WMF, and those identified by the grantee.
- It is true that removing the this structure will integrate metrics required by WMF and those other metrics identified by the grantee. However, the respondents who supported Proposal 2 highlighted this as a strength of the structure, as all of these metrics together represent the outcomes of a grant.
Requiring at least two grantee-defined metrics might increase the burden on smaller grants and grantees.
- Mitigation: The Rapid Grant program is designed reduce the grant approval time and the overall burden of reporting, and will not follow this same structure. It will instead maintain its current lighter weight requirements.
Given these grantee-defined metrics will vary by grant, there may not be adequate support (e.g. tools, online resources) available to the grantee.
- Mitigation: The Learning & Evaluation team has spent the last three years documenting common program metrics through program toolkits and learning patterns, which includes some documentation on available tools. These resources will be integrated into the Metrics Library before its launch, to help with identifying, calculating, and reporting “other grant metrics”. However, a longer-term solution is still needed to ensure that other grantees can contribute their own metrics to the Metrics Library (as well as the associated tools and resources), and to ensure that common metrics are easily identified.

Based on this decision to bring Global Metrics and other metrics into a single set of “grant metrics”, updates will be made to the proposal and reporting templates for Project and Annual Plan Grants to facilitate the integration. Template changes will also aim to make clear that these three shared metrics should only be required when relevant to the goals of the grant.

Measures

Participation

Of the two participation metrics proposed (Individuals Involved and Editors Retained) respondents indicated that both metrics were useful and important, but covered different outcomes and were applicable in different situations. While there was support for having only Individuals Involved as the primary participation metric, there was equal support for allowing grantees to choose between Individuals Involved and Retention, as applicable to their grant (voting data).

Respondents also indicated that the definition of each proposed metric needed to be refined, with the following being the primary changes suggested to the definition:

For Individuals Involved, the definition requires more specificity. While there are many groups of people “involved” in a grant, there are three main groups that are typically included in this metric:

Participants – these are people who attend events, either in person or virtually. For instance, people whose usernames are collected during the program are participants.
Organizers – these are the volunteers or staff members who are responsible for organizing the event.
Large audiences that are the target of outreach activities – these are the large groups of people who are reached (primarily) through mass communication, where the goal is to raise awareness about the Wikimedia movement, a specific project or event, etc. For example these are newsletter recipients, social media followers, mailing list members, those who visit a Wikimedia booth at a fair, etc.

While respondents indicated that the distinction between these groups isn’t perfect or comprehensive (e.g., Where would donors or partner organizations be captured? What about participants who become organizers?), the disaggregation of these three groups would improve the specificity of the definition, and be more useful overall. Two respondents went further to add that the number of volunteer organizers is really a measure of “volunteer engagement” and the size of those large outreach audiences is really measure of “raising awareness”; while both are related to participation generally, they actually measure different outcomes than “total participants”.

For Editors Retained, more respondents indicated the metric should instead focus on New Users and New User Retention. Respondents supported expanding the definition of retention to 1 edit in any namespace in any Wikimedia project, but indicated that:

To measure retention, it’s important to know the total number of new users.
To measure retention well, it must be tracked over multiple time periods (e.g. 1, 3, 6, or 12 months).
To know what qualifies as “good” retention, there need to be baselines available.

Respondents indicated that while the engagement of the existing contributors is critical for many activities, the outcomes of those activities are better captured qualitatively – focusing on feelings of motivation, connectedness to their community, etc. - rather than through a numeric retention metric.

However, while all of these participation metrics have issues around data collection, New User Retention has significant issues. To our knowledge, no tool exists with sufficient functionality to capture retention over multiple time periods, with the suggested definition. Moreover, Single User Login was introduced after the original Global Metrics were defined, and it will likely be necessary to change the way “New User” is technically defined in each tool (e.g. distinguishing between “those who are creating a username for the first time” and “those who are existing contributors, but new to a specific language project”), as well as how “New User Retention” is technically defined. While there are many retention-focused tools – both created by WMF and by volunteers – each would need small to large improvements. (This is based on tools that we know of; if we have missed a tool, please let us know!). As such, the current “Newly registered user” metric should remain until tools can be updated to support new definitions and functionality.

Based on all of this feedback, the new participation metrics will be as follows:

Total participants. This is the number of people who attend events, either in person or virtually. As suggested, the updated learning pattern will include examples of who should and shouldn’t be included.
- Engaging volunteers and raising awareness remain important goals and outcomes to capture. The “grantee-defined” metrics represent an opportunity for grantees to highlight these achievements in ways that are most relevant to them, rather than be forced to report it in ill-fitting the “participants” metric.
Number of Newly Registered Users. This is the current global metric, and the definition remains the same: the number of newly registered users as a result of the project, using a two week window for specific events if needed.

Content

Most respondents strongly endorsed the proposed content metric: Content pages created or improved, disaggregated by Wikimedia project.

This new definition resolves some of the issues identified in the retrospective, particularly that the current “pages new or improved” metric was interpreted as “Wikipedia articles new or improved”, or as aggregated pages across Wikimedia projects (e.g. Wikidata items, Wiktionary entries and Wikisource pages). Respondents did not have many suggested improvements to the definition but they did have three concerns. In an effort to address these concerns, we have again identified a few potential mitigations where applicable:

The quality of content is not addressed.
- Given quality is currently primarily assessed through community processes (e.g. Featured Article process on English Wikipedia, Featured Image process on Commons), or specific processes & rubrics of an event or contest, it is highly contextualized and would be difficult to define in a centralized way. While the adoption of automated tools might be an option in the future, for now it isn’t feasible to include as a shared metric.
This metric might be easily “gamed” or manipulated, e.g. through creating stub articles or making small edits to as many pages as possible.
- While this is a valid concern, it is unlikely that the grant system will ever have sufficient oversight to ensure gaming doesn't happen.
The definition of an “improvement” differs by Wikimedia project.
- Potential mitigation: More detailed examples of improved could be added to the metric documentation in the Metrics Library.

While this disaggregated content metric provides more specificity in content added or improved across the various Wikimedia projects, there are no available tools to collect this detailed information currently (without having some technical expertise in languages such as SQL). Wikimetrics can report “content pages added, disaggregated by Wikimedia project” and “aggregated content pages created and improved”, but not “content pages improved, disaggregated by Wikimedia project”. (Again, the assessment that no tools currently exists is based on tools that we know of; if we have missed a tool, please (let us know!)

Given this tool limitation, this metric cannot become a shared grant metric yet. As such, the current “pages created or improved” metric will remain, with slight improvements to the definition. Once the tool issues are addressed, the updated metric (with the disaggregation by Wikimedia project) will replace the current one.

Content pages created or improved, aggregated by Wikimedia project

This is the current Global Metric, calculated by the Magic Button, where a “content page” is defined as an article on Wikipedia, an entry on Wiktionary, a file on Commons, an item on Wikidata, a page on Wikisource, or similar units of content on other Wikimedia projects.

Grantees may disaggregate aggregated number by Wikimedia project if desired, but it is not required. The time and effort to include these disaggregated numbers is likely too much for most grantees.
Example: A grantee holds an editathon with the goal of creating or improving content across Wikimedia projects. Participants at this editathon create 4 Wikipedia articles, improve 12 Wikipedia articles, upload 4 images to Commons, and add properties to 18 items on Wikidata.
- The grantee would report 38 content pages new or improved across Wikimedia projects.
- They could also but are not required to report 16 created or improved articles on Wikipedia, 4 media uploaded or improved on Commons, 18 items created or improved on Wikidata.

Community Building

Respondents strongly indicated Community Building was important and useful outcome to capture, but had hugely divergent responses to the question, “What is community building”?. Moreover, respondents indicated that collecting information on Community Building would be difficult; it was unclear if surveys, storytelling, or some other solution was the right fit for capturing this information.

As such, Community Building will not be included as a shared measure. While important, it needs more investigation and experimentation, to clarify its definition, how the information would be collected, how the information would be used.

However, we will continue to investigate the work that has already been done or is currently being done around Community Building, including currently used definitions. Depending on interest of conference organizers, we could present these findings at a movement conference in 2017.

Appendix

Demographics of respondents

The summarized feedback below is based on interviews and survey data collected from former and current grantees, WMF staffs, grant committee members, and a WMF board member.

5 WMF staff from the Community Resources team
34 grantees
- IEG: 4
- PEG: 14
- APG: 10
- Simple APG: 4
- Unknown: 2
8 Grant committee members
- FDC members: 3
- GAC members: 3
- SAPG committee members: 2

Data

Topic	Comment
Reporting	Need to update the forms to make sure that the relevancy question is clearly answered – i.e. when are these relevant and when not
Reporting	The grant framework is not conducive to capturing longer-term outcomes; it's more conducive to capturing the shorter-term outputs; given the clear feedback Grant got in the need to simplify reporting, grant reports are not the right medium to report these longer term outcomes.
Simplification	This update simplifies collecting Global Metrics
Simplification	Will make comparisons more difficult
Simplification	Less information about understanding impact
Simplification	There is a loss of nuance in the proposed solutions
Simplification	Quantitative metrics create quantitative bias
General	Need to communicate how WMF is using this information more broadly

Structure

Topic	Comment	Frequency
Proposal 1	Endorsements for Proposal 1	6
Proposal 1	Other metrics will be included anyways; other metrics shouldn't be required	3
Proposal 1	Keeps the distinction between Global Metrics and other metrics	2
Proposal 1	Proposal 2 may be too much for smaller communities, unless the metrics are clearly defined and have identified tools	1
Proposal 2	Endorsements for Proposal 2	16
Proposal 2	Gives organizations flexibility/freedom to pick and choose which outcomes are most relevant to their work	7
Proposal 2	Need to start looking at those metrics beyond Global Metrics, given diversity of programs	1
Proposal 2	Good combination of cross-cutting metrics, uniform data-gathering, and other/local metrics that are diverse	2
Proposal 2	Good balance between consistency and flexibility	1
Proposal 2	Gives a chance to acknowledge things/outcomes a grantee thinks brought value	1
Proposal 2	The opportunity to see what others are measuring (i.e. sharing) could lead to the opportunity to collaborate / build together new measures	2
Proposal 2	Provides space for those measures of success that the grantee has already identified	2
Proposal 2	Provides space for those locally relevant challenges and achievements	3
Proposal 2	Going to collect other metrics whether it is required by WMF or not	1
Proposal 2	Would be difficult for the "extra" metrics to be cross program ones; these would be difficult to identify and collect	2
Proposal 2	Will not be able to aggregate these "extra" metrics	1
Proposal 2	"Other" metrics need to be consistent year to year	1
Proposal 2	Dependent on the breadth and rollout of the Metrics Library	1
General	Would be difficult (and additional burden) to report metrics by programme	1
General	Need to be clear in the update that a low number won't be held against a grantee	1
General	Need to emphasize the importance of other metrics and that “Global Metrics + Other metrics” is what tells the full story	1
General	Defining a broader set of metrics, with good definitions and tools would also be an alternate solution	1
General	Outreach events have very different metrics – press mentions, social media, people actively reaching out to partners	1
General	Having only 3 standard metrics is good as long as the combination of shared and other metrics is sufficient to "measure impact and enable people to learn from success/mistakes"	1
General	"Proposal 1 allows for homogeneity, which is a good for mapping outcomes, but Proposal 2 allows us to gain insights into challenges and conditions of grantees that will have a huge impact on the movement."	1
General	Need to test these metrics & structure and incrementally develop further over the years	1

Metrics

Topic	Affect	Comment	Frequency
Individuals Involved	General	Endorsements for Individuals Involved (over Retention)	9
Individuals Involved	Positive	Metric reflects grantee goals, better than other metrics	1
Individuals Involved	Positive	Main metric for activity	3
Individuals Involved	Positive	Important metric for those events that only happen once a year (e.g. Art+Feminism, WLM)	2
Individuals Involved	Positive	Even though the definition of this metric is open and can easily change or be interpreted differently, it gives a good sense of reach	1
Individuals Involved	Positive	Able to share this externally, beyond the movement	1
Individuals Involved	Positive	Only way currently to capture offline activity	1
Individuals Involved	Positive	Easier to collect than retention	1
Individuals Involved	Positive	Individuals Involved is a useful metric	8
Individuals Involved	Concerns	Grantee anxiety when the number is small	1
Individuals Involved	Concerns	Unclear why WMF is interested in this metric / definition overall	1
Individuals Involved	Concerns	Easy to maximize this number	1
Individuals Involved	Concerns	Individuals Involved is not a useful metric	3
Individuals Involved	Concerns	Privacy concerns make tracking difficult	1
Individuals Involved	Concerns	Manual tracking is the only option	2
Individuals Involved	Concerns	Redundancy between different sign-in, sign-up mechanisms	1
Retention	General	Endorsements for Retention (over Individuals Involved)	4
Retention	Positive	1 edit threshold is good	1
Retention	Positive	Good to include any project	1
Retention	Positive	Good to include any namespace	1
Retention	Positive	“The definition is fine"	2
Retention	Positive	Good metric for longer-term outcomes	2
Retention	Positive	Retention is a useful metric	2
Retention	Concerns	Retention is not a goal for every activity	1
Retention	Concerns	Very online, editing focused	3
Retention	Concerns	Not a good fit for those one time contest participants, outreach	1
Retention	Concerns	Much more limited metric	1
Retention	Concerns	Is 1 edit meaningful? At what point does the number of edits become meaningful?	1
Retention	Concerns	A new editor retention metric will not capture activities focused on existing editor community	1
Retention	Concerns	Capturing both new and existing retention is important	1
Retention	Concerns	Selecting 30/90/12 months will not fit time periods like semesters well	1
Retention	Concerns	Retention is not sufficient to be the one metric	1
Retention	Definition	Existing editor retention is not useful	2
Retention	Definition	Should focus on new editor retention	2
Retention	Definition	Existing editor retention might be addressed through community building question	1
Retention	Definition	Need flexibility in the retention period	4
Retention	Definition	Grantee needs to define their retention period beforehand	1
Retention	Definition	Needs a great baseline, to contextualize "good" retention	3
Retention	Definition	Time periods might be set to fit the chapters reporting	1
Retention	Concerns	Short term retention isn't useful to track	1
Retention	Concerns	30 day retention isn't going to be useful to all grantees	1
Retention	Collection	Time intensive to collect and track	1
Retention	Collection	Need an automated system that tracks the retention of users	1
Retention	Collection	No system can capture the "retention" of volunteer organizers	1
Participation	General	Both Individuals Involved and Retention are necessary to see the full picture	1
Participation	General	Captures both output and outcomes	1
Participation	General	Allows for flexibility, given the different types of activities run	1
Content	Positive	Endorsements for the Content metric	13
Content	Positive	Disaggregation by Wikimedia project is good	7
Content	Positive	Information on content by project is useful	9
Content	Concerns	Doesn't capture everything related to content	1
Content	Concerns	Could be manipulated, e.g. by stubs	2
Content	Concerns	Number of "Pages improved" is not a good measure of quality	2
Content	Concerns	Need a measure of quality	1
Content	General	Make clear "project" is a Wikimedia project and not something else (e.g. WikiProject, or a funded project)	1
Content	General	Definition should have more detail – Wikipedia articles translated, Wikisource books proofread twice, Wikidata statements created	1
Content	General	Need to be clear what an "improvement" means – would vary by wiki project	1
Community Building	Collection	Survey community being served	3
Community Building	Collection	Measure engagement after an event	1
Community Building	Collection	Measure active editors	1
Community Building	Collection	Measure number of participants in community	1
Community Building	Collection	Answer a set of questions, to assess various dimensions	1
Community Building	Collection	Look for indicators, not direct causality	1
Community Building	Collection	Measure before and after the program, not a specific event, but the entire program	1
Community Building	Concerns	Cannot yet be systematically captured	1
Community Building	Concerns	Really hard to capture, particularly the qualitative side	5
Community Building	Concerns	Making the question specific would make it less applicable	1
Community Building	Concerns	Should not make it too resource intensive to collect and evaluate	1
Community Building	Concerns	Community Building might not be an outcome, but a prerequisite of the program	1
Community Building	Concerns	Proving causality will be difficult	2
Community Building	Concerns	Will be difficult to automate this data collection	1
Community Building	Concerns	Ambiguous definition	1
Community Building	General	Community Building is useful information	8
Community Building	General	Community Building is important information	15
General	General	Need to think about how the information will be collected in the field	1
General	General	Metrics are harder for things that are not timebound	1
General	General	Grantee needs to have a clear definition and be clear about why this metric is important to them	1
General	General	Grantee needs to demonstrate consistency between the goal and what they are measuring	1
General	General	WMF needs to address anxiety about low numbers	1
General	General	Global metrics show how program are working around the globe; program specific metrics are different	1
General	General	Should have metrics that are applicable from the smallest to largest grant; the comparison is still interesting	1

Resources

Topic	Comment	Frequency
Metrics Library	Endorsement for Metrics Library	13
Metrics Library	Suggestions for features for the Metrics Library	16
1:1 Support	Create a position to help program managers do program evaluation	1
Tutorials	More online training about using and understanding Wikimetrics could be something to think about.	1
Tutorials	An online tutorial/masterclass one per month would be a nice way to resolve doubts and questions.	1
Current resources	Learning patterns and Idea Lab insufficient to inspire experimentation and new program design. So the metrics library seems like a good addition.	1
Current resources	Maybe there is already sufficient guidance but it is difficult to find. Not all grantees (even "experienced" grantees) knew about it.	1