Grants:IdeaLab/Redesigning Global Metrics & its support/Outcome/da

This page is a translated version of the page Grants:IdeaLab/Redesigning Global Metrics & its support/Outcome and the translation is 7% complete.
The new page for grant metrics now up on Meta.

Baggrund

This project was undertaken jointly by the Community Resources and Learning & Evaluation teams at the Wikimedia Foundation, as both of these teams have played a central role in the creation, implementation and support of Global Metrics over the last two years.

The goal of updating Global Metrics was to create something that was responsive to copious feedback and suggestions we received through the Global Metrics retrospective and this consultation.

Given the spectrum of feedback, we identified a set of upfront principles that would enable us to create ideas and make decisions that were both responsive and consistent. In the proposed set of changes, these were the set of design principles that were used to assess the strengths and weaknesses of each idea proposed. In deciding the final updates to Global Metrics by synthesizing and incorporating feedback, these were principles that we used to guide our decision making:

  • We would not introduce new metrics if there were no easy-to-use tools available. Agreed upon new metrics would only be introduced once tools were available.
  • We would identify places where we could iterate, understanding that not everything can be done now, but it’s better to do it well even if it’s done slowly.
  • We would think holistically about metrics, acknowledging that grant metrics, organizational metrics, and program metrics might overlap, but aren’t necessarily the same thing. All of these metrics need to complement not supersede each other.

Executive Summary

  • The Metrics Library received strong support from the grantees and grant committee members who gave feedback. The creation of this centralized resource will become one of the core projects for Amanda Bittaker from Learning & Evaluation in the 2016–2017 fiscal year.
  • There was no consensus from respondents on whether Wikimetrics should become the single tool to calculate Global Metrics. Given Wikimetrics has significant front-end and back-end problems, a deeper project will be undertaken by Sati Houston from Community Resources in the 2016–17 fiscal year to investigate whether a new tool or improving an existing tool is the best solution to the issues around data collection.
  • There was strong support from respondents to simplify the requirement of Global Metrics, particularly to change the structure to “Proposal 2: 3 shared measures + 2 grantee-defined measures”.
  • For the four metrics proposed, respondents provided significant feedback on the importance, usefulness, and definition of each metric, as well as how easy/difficult it would be to collect the information. This feedback was integrated into the final set of metrics, with risks and concerns offered potential mitigations. For additional information see the section on #Participation, #Content, and #Community Building.
    • All respondents indicated Community Building was important and useful outcome to capture, but had hugely divergent responses to the question, “What is community building”?. Moreover, respondents indicated that collecting information on Community Building would be difficult; it was unclear if surveys, storytelling, or some other solution was the right fit for capturing this information. As such, Community Building will not be included as a shared measure at this time but we hope to learn more about successes and challenges in this area throughout the upcoming year.


Grant metrics instead of Global Metrics

Global Metrics will be replaced with a simplified requirement for identifying, collecting and reporting grant metrics, as follows:

Three shared metrics

  1. Total participants. This is the number of people who attend events or activities, either in person or virtually. It does not include many others who might be “involved” in events or activities, such as organizers or outreach channels (e.g. social media followers).
    • If outcomes related to volunteer engagement (e.g. the number of volunteer organizers) or raising awareness (e.g. social media followers) remain important to your grant activities, they can be included as grantee-defined metrics.
  1. Number of Newly registered users. This is the current global metric which is collected through the Magic Button and Wikimetrics. The definition remains the same: the number of newly registered users as a result of the project, using a two week window for specific events if needed.
  1. Content pages created or improved, aggregated across Wikimedia projects. This is the current global metric, which is collected through the Magic Button. The definition has been slightly re-worded to make it clear that all Wikimedia projects are included.
Grantees and grant committee members interviewed actually prefered two different metrics than (2) and (3):
  • Newly registered users, as well as their retention over 1, 3, 6, or 12 months (called “New User Retention”)
  • Content pages created or improved, disaggregated by Wikimedia project
However, neither the New User Retention metric or the disaggregated content metric have easy-to-use tools readily available today. When these tool issues are resolved, the shared metrics will be updated.

+ Two grantee-defined measures

These measures (whether quantitative or qualitative) are a space for the grantee to highlight outcomes that go beyond those captured by the shared metrics. These could be the outcome of a single program, or a set of programs; there is no preset list to choose from when selecting these measures.

Tidslinje

This grant metric requirement will only apply to the Project Grant and Annual Plan Grant programs. Rapid Grants and Conference & Travel Grants will retain their current program-specific requirements. No ongoing grants be required to change to the new structure. They will remain with the metrics, objectives and targets identified in their proposals. These new grant metrics will become a requirement for Project and Annual Plan Grant on the following timelines:

  • Project Grants – Grants approved in or after September 2016
  • Simple Annual Plan Grants – Grants submitted for funding starting January 1 2017 (i.e. grant proposals submitted by November 1)
  • Full Annual Plan Grant – Grants submitted in Round 1 this October 2016

How does this new requirement work in Project Grants?

Currently each Project grant has two tables of metrics: Global Metrics and Project Metrics. Under this new grant metrics requirement, there will be one table, with three rows for the three shared metrics, and two rows for the grantee-defined metrics. Grantees can remove any of the three shared metrics that aren’t relevant to their goals. They may also add more grantee-defined metrics if they have more than two.

How does this new requirement work in Simple Annual Plan Grants?

Currently grantee enters their Global Metrics targets and achievements into a common spreadsheet. For new Simple Annual Plan Grants, the metrics in this spreadsheet will be updated to match the new shared metrics, and the grantee will identify two others that are relevant to their grant.

How does this new requirement work in Full Annual Plan Grants?

Currently both the proposal and report forms have a table to capture Global Metrics targets and achievements (e.g. the “Global metrics overview – all programs” section in the progress report); program metrics are reported in the relevant program section. Under this new grant metrics requirement, there will be one table at the beginning of the proposal and report forms, with three rows for the three shared metrics, and two rows for the grantee-defined metrics. Grantees can remove any of the three shared metrics that aren’t relevant to their goals. They may also add more grantee-defined metrics, if they have more than two. Other program-specific metrics remain in the relevant program section.

Næste skridt

  • Update Collecting Global Metrics learning pattern – August 12th
  • Update grant proposal and reporting templates – Will vary by grant program, according to the dates the grant rounds start.
  • Begin scoping the Metrics Library project – August 15th
  • Begin scoping the tool for shared metrics project – September 5th

Details on community feedback and rationale behind changes

Struktur

Overall, most grantees and grant committee members interviewed prefered “Proposal 2: 3 shared measures + 2 grantee-defined measures”. This structure integrates Global Metrics and other metrics into a single set of grant metrics. Those who favored this structure indicated that it adequately balanced the need for consistency by having a set of shared metrics, and also the need for flexibility by allowing grantees to highlight their specific outcomes and achievements. However, the overall success of this proposed new structure depends heavily on the Metrics Library, as the library will be the main mechanism for grantees to discover new metrics, online resources, and potential tools.

There were three primary concerns raised about shifting to structure that integrated Global Metrics and other grant metrics. In an effort to address these concerns, we have identified a few potential mitigations where applicable:

  1. This new structure will diminish the distinction between those metrics required by WMF, and those identified by the grantee.
    • It is true that removing the this structure will integrate metrics required by WMF and those other metrics identified by the grantee. However, the respondents who supported Proposal 2 highlighted this as a strength of the structure, as all of these metrics together represent the outcomes of a grant.
  1. Requiring at least two grantee-defined metrics might increase the burden on smaller grants and grantees.
    • Mitigation: The Rapid Grant program is designed reduce the grant approval time and the overall burden of reporting, and will not follow this same structure. It will instead maintain its current lighter weight requirements.
  1. Given these grantee-defined metrics will vary by grant, there may not be adequate support (e.g. tools, online resources) available to the grantee.
    • Mitigation: The Learning & Evaluation team has spent the last three years documenting common program metrics through program toolkits and learning patterns, which includes some documentation on available tools. These resources will be integrated into the Metrics Library before its launch, to help with identifying, calculating, and reporting “other grant metrics”. However, a longer-term solution is still needed to ensure that other grantees can contribute their own metrics to the Metrics Library (as well as the associated tools and resources), and to ensure that common metrics are easily identified.

Based on this decision to bring Global Metrics and other metrics into a single set of “grant metrics”, updates will be made to the proposal and reporting templates for Project and Annual Plan Grants to facilitate the integration. Template changes will also aim to make clear that these three shared metrics should only be required when relevant to the goals of the grant.

Measures

Deltagelse

Of the two participation metrics proposed (Individuals Involved and Editors Retained) respondents indicated that both metrics were useful and important, but covered different outcomes and were applicable in different situations. While there was support for having only Individuals Involved as the primary participation metric, there was equal support for allowing grantees to choose between Individuals Involved and Retention, as applicable to their grant (voting data).

Respondents also indicated that the definition of each proposed metric needed to be refined, with the following being the primary changes suggested to the definition:

For Individuals Involved, the definition requires more specificity. While there are many groups of people “involved” in a grant, there are three main groups that are typically included in this metric:
  • Participants – these are people who attend events, either in person or virtually. For instance, people whose usernames are collected during the program are participants.
  • Organizers – these are the volunteers or staff members who are responsible for organizing the event.
  • Large audiences that are the target of outreach activities – these are the large groups of people who are reached (primarily) through mass communication, where the goal is to raise awareness about the Wikimedia movement, a specific project or event, etc. For example these are newsletter recipients, social media followers, mailing list members, those who visit a Wikimedia booth at a fair, etc.
While respondents indicated that the distinction between these groups isn’t perfect or comprehensive (e.g., Where would donors or partner organizations be captured? What about participants who become organizers?), the disaggregation of these three groups would improve the specificity of the definition, and be more useful overall. Two respondents went further to add that the number of volunteer organizers is really a measure of “volunteer engagement” and the size of those large outreach audiences is really measure of “raising awareness”; while both are related to participation generally, they actually measure different outcomes than “total participants”.
For Editors Retained, more respondents indicated the metric should instead focus on New Users and New User Retention. Respondents supported expanding the definition of retention to 1 edit in any namespace in any Wikimedia project, but indicated that:
  1. To measure retention, it’s important to know the total number of new users.
  2. To measure retention well, it must be tracked over multiple time periods (e.g. 1, 3, 6, or 12 months).
  3. To know what qualifies as “good” retention, there need to be baselines available.
Respondents indicated that while the engagement of the existing contributors is critical for many activities, the outcomes of those activities are better captured qualitatively – focusing on feelings of motivation, connectedness to their community, etc. - rather than through a numeric retention metric.

However, while all of these participation metrics have issues around data collection, New User Retention has significant issues. To our knowledge, no tool exists with sufficient functionality to capture retention over multiple time periods, with the suggested definition. Moreover, Single User Login was introduced after the original Global Metrics were defined, and it will likely be necessary to change the way “New User” is technically defined in each tool (e.g. distinguishing between “those who are creating a username for the first time” and “those who are existing contributors, but new to a specific language project”), as well as how “New User Retention” is technically defined. While there are many retention-focused tools – both created by WMF and by volunteers – each would need small to large improvements. (This is based on tools that we know of; if we have missed a tool, please let us know!). As such, the current “Newly registered user” metric should remain until tools can be updated to support new definitions and functionality.

Based on all of this feedback, the new participation metrics will be as follows:

  1. Total participants. This is the number of people who attend events, either in person or virtually. As suggested, the updated learning pattern will include examples of who should and shouldn’t be included.
    • Engaging volunteers and raising awareness remain important goals and outcomes to capture. The “grantee-defined” metrics represent an opportunity for grantees to highlight these achievements in ways that are most relevant to them, rather than be forced to report it in ill-fitting the “participants” metric.
  1. Number of Newly Registered Users. This is the current global metric, and the definition remains the same: the number of newly registered users as a result of the project, using a two week window for specific events if needed.

Indhold

Most respondents strongly endorsed the proposed content metric: Content pages created or improved, disaggregated by Wikimedia project.

This new definition resolves some of the issues identified in the retrospective, particularly that the current “pages new or improved” metric was interpreted as “Wikipedia articles new or improved”, or as aggregated pages across Wikimedia projects (e.g. Wikidata items, Wiktionary entries and Wikisource pages). Respondents did not have many suggested improvements to the definition but they did have three concerns. In an effort to address these concerns, we have again identified a few potential mitigations where applicable:

  1. The quality of content is not addressed.
    • Given quality is currently primarily assessed through community processes (e.g. Featured Article process on English Wikipedia, Featured Image process on Commons), or specific processes & rubrics of an event or contest, it is highly contextualized and would be difficult to define in a centralized way. While the adoption of automated tools might be an option in the future, for now it isn’t feasible to include as a shared metric.
  1. This metric might be easily “gamed” or manipulated, e.g. through creating stub articles or making small edits to as many pages as possible.
    • While this is a valid concern, it is unlikely that the grant system will ever have sufficient oversight to ensure gaming doesn't happen.
  1. The definition of an “improvement” differs by Wikimedia project.
    • Potential mitigation: More detailed examples of improved could be added to the metric documentation in the Metrics Library.

While this disaggregated content metric provides more specificity in content added or improved across the various Wikimedia projects, there are no available tools to collect this detailed information currently (without having some technical expertise in languages such as SQL). Wikimetrics can report “content pages added, disaggregated by Wikimedia project” and “aggregated content pages created and improved”, but not “content pages improved, disaggregated by Wikimedia project”. (Again, the assessment that no tools currently exists is based on tools that we know of; if we have missed a tool, please (let us know!)

Given this tool limitation, this metric cannot become a shared grant metric yet. As such, the current “pages created or improved” metric will remain, with slight improvements to the definition. Once the tool issues are addressed, the updated metric (with the disaggregation by Wikimedia project) will replace the current one.

Content pages created or improved, aggregated by Wikimedia project

This is the current Global Metric, calculated by the Magic Button, where a “content page” is defined as an article on Wikipedia, an entry on Wiktionary, a file on Commons, an item on Wikidata, a page on Wikisource, or similar units of content on other Wikimedia projects.

  • Grantees may disaggregate aggregated number by Wikimedia project if desired, but it is not required. The time and effort to include these disaggregated numbers is likely too much for most grantees.
  • Example: A grantee holds an editathon with the goal of creating or improving content across Wikimedia projects. Participants at this editathon create 4 Wikipedia articles, improve 12 Wikipedia articles, upload 4 images to Commons, and add properties to 18 items on Wikidata.
    • The grantee would report 38 content pages new or improved across Wikimedia projects.
    • They could also but are not required to report 16 created or improved articles on Wikipedia, 4 media uploaded or improved on Commons, 18 items created or improved on Wikidata.

Community Building

Respondents strongly indicated Community Building was important and useful outcome to capture, but had hugely divergent responses to the question, “What is community building”?. Moreover, respondents indicated that collecting information on Community Building would be difficult; it was unclear if surveys, storytelling, or some other solution was the right fit for capturing this information.

As such, Community Building will not be included as a shared measure. While important, it needs more investigation and experimentation, to clarify its definition, how the information would be collected, how the information would be used.

However, we will continue to investigate the work that has already been done or is currently being done around Community Building, including currently used definitions. Depending on interest of conference organizers, we could present these findings at a movement conference in 2017.

Appendix

Demographics of respondents

The summarized feedback below is based on interviews and survey data collected from former and current grantees, WMF staffs, grant committee members, and a WMF board member.

  • 5 WMF staff from the Community Resources team
  • 34 grantees
    • IEG: 4
    • PEG: 14
    • APG: 10
    • Simple APG: 4
    • Unknown: 2
  • 8 Grant committee members
    • FDC members: 3
    • GAC members: 3
    • SAPG committee members: 2

Data

Topic Comment
Reporting Need to update the forms to make sure that the relevancy question is clearly answered – i.e. when are these relevant and when not
Reporting The grant framework is not conducive to capturing longer-term outcomes; it's more conducive to capturing the shorter-term outputs; given the clear feedback Grant got in the need to simplify reporting, grant reports are not the right medium to report these longer term outcomes.
Simplification This update simplifies collecting Global Metrics
Simplification Will make comparisons more difficult
Simplification Less information about understanding impact
Simplification There is a loss of nuance in the proposed solutions
Simplification Quantitative metrics create quantitative bias
General Need to communicate how WMF is using this information more broadly

Structure

Topic Comment Frequency
Proposal 1 Endorsements for Proposal 1 6
Proposal 1 Other metrics will be included anyways; other metrics shouldn't be required 3
Proposal 1 Keeps the distinction between Global Metrics and other metrics 2
Proposal 1 Proposal 2 may be too much for smaller communities, unless the metrics are clearly defined and have identified tools 1
Proposal 2 Endorsements for Proposal 2 16
Proposal 2 Gives organizations flexibility/freedom to pick and choose which outcomes are most relevant to their work 7
Proposal 2 Need to start looking at those metrics beyond Global Metrics, given diversity of programs 1
Proposal 2 Good combination of cross-cutting metrics, uniform data-gathering, and other/local metrics that are diverse 2
Proposal 2 Good balance between consistency and flexibility 1
Proposal 2 Gives a chance to acknowledge things/outcomes a grantee thinks brought value 1
Proposal 2 The opportunity to see what others are measuring (i.e. sharing) could lead to the opportunity to collaborate / build together new measures 2
Proposal 2 Provides space for those measures of success that the grantee has already identified 2
Proposal 2 Provides space for those locally relevant challenges and achievements 3
Proposal 2 Going to collect other metrics whether it is required by WMF or not 1
Proposal 2 Would be difficult for the "extra" metrics to be cross program ones; these would be difficult to identify and collect 2
Proposal 2 Will not be able to aggregate these "extra" metrics 1
Proposal 2 "Other" metrics need to be consistent year to year 1
Proposal 2 Dependent on the breadth and rollout of the Metrics Library 1
General Would be difficult (and additional burden) to report metrics by programme 1
General Need to be clear in the update that a low number won't be held against a grantee 1
General Need to emphasize the importance of other metrics and that “Global Metrics + Other metrics” is what tells the full story 1
General Defining a broader set of metrics, with good definitions and tools would also be an alternate solution 1
General Outreach events have very different metrics – press mentions, social media, people actively reaching out to partners 1
General Having only 3 standard metrics is good as long as the combination of shared and other metrics is sufficient to "measure impact and enable people to learn from success/mistakes" 1
General "Proposal 1 allows for homogeneity, which is a good for mapping outcomes, but Proposal 2 allows us to gain insights into challenges and conditions of grantees that will have a huge impact on the movement." 1
General Need to test these metrics & structure and incrementally develop further over the years 1

Metrics

Topic Affect Comment Frequency
Individuals Involved General Endorsements for Individuals Involved (over Retention) 9
Individuals Involved Positive Metric reflects grantee goals, better than other metrics 1
Individuals Involved Positive Main metric for activity 3
Individuals Involved Positive Important metric for those events that only happen once a year (e.g. Art+Feminism, WLM) 2
Individuals Involved Positive Even though the definition of this metric is open and can easily change or be interpreted differently, it gives a good sense of reach 1
Individuals Involved Positive Able to share this externally, beyond the movement 1
Individuals Involved Positive Only way currently to capture offline activity 1
Individuals Involved Positive Easier to collect than retention 1
Individuals Involved Positive Individuals Involved is a useful metric 8
Individuals Involved Concerns Grantee anxiety when the number is small 1
Individuals Involved Concerns Unclear why WMF is interested in this metric / definition overall 1
Individuals Involved Concerns Easy to maximize this number 1
Individuals Involved Concerns Individuals Involved is not a useful metric 3
Individuals Involved Concerns Privacy concerns make tracking difficult 1
Individuals Involved Concerns Manual tracking is the only option 2
Individuals Involved Concerns Redundancy between different sign-in, sign-up mechanisms 1
Retention General Endorsements for Retention (over Individuals Involved) 4
Retention Positive 1 edit threshold is good 1
Retention Positive Good to include any project 1
Retention Positive Good to include any namespace 1
Retention Positive “The definition is fine" 2
Retention Positive Good metric for longer-term outcomes 2
Retention Positive Retention is a useful metric 2
Retention Concerns Retention is not a goal for every activity 1
Retention Concerns Very online, editing focused 3
Retention Concerns Not a good fit for those one time contest participants, outreach 1
Retention Concerns Much more limited metric 1
Retention Concerns Is 1 edit meaningful? At what point does the number of edits become meaningful? 1
Retention Concerns A new editor retention metric will not capture activities focused on existing editor community 1
Retention Concerns Capturing both new and existing retention is important 1
Retention Concerns Selecting 30/90/12 months will not fit time periods like semesters well 1
Retention Concerns Retention is not sufficient to be the one metric 1
Retention Definition Existing editor retention is not useful 2
Retention Definition Should focus on new editor retention 2
Retention Definition Existing editor retention might be addressed through community building question 1
Retention Definition Need flexibility in the retention period 4
Retention Definition Grantee needs to define their retention period beforehand 1
Retention Definition Needs a great baseline, to contextualize "good" retention 3
Retention Definition Time periods might be set to fit the chapters reporting 1
Retention Concerns Short term retention isn't useful to track 1
Retention Concerns 30 day retention isn't going to be useful to all grantees 1
Retention Collection Time intensive to collect and track 1
Retention Collection Need an automated system that tracks the retention of users 1
Retention Collection No system can capture the "retention" of volunteer organizers 1
Participation General Both Individuals Involved and Retention are necessary to see the full picture 1
Participation General Captures both output and outcomes 1
Participation General Allows for flexibility, given the different types of activities run 1
Content Positive Endorsements for the Content metric 13
Content Positive Disaggregation by Wikimedia project is good 7
Content Positive Information on content by project is useful 9
Content Concerns Doesn't capture everything related to content 1
Content Concerns Could be manipulated, e.g. by stubs 2
Content Concerns Number of "Pages improved" is not a good measure of quality 2
Content Concerns Need a measure of quality 1
Content General Make clear "project" is a Wikimedia project and not something else (e.g. WikiProject, or a funded project) 1
Content General Definition should have more detail – Wikipedia articles translated, Wikisource books proofread twice, Wikidata statements created 1
Content General Need to be clear what an "improvement" means – would vary by wiki project 1
Community Building Collection Survey community being served 3
Community Building Collection Measure engagement after an event 1
Community Building Collection Measure active editors 1
Community Building Collection Measure number of participants in community 1
Community Building Collection Answer a set of questions, to assess various dimensions 1
Community Building Collection Look for indicators, not direct causality 1
Community Building Collection Measure before and after the program, not a specific event, but the entire program 1
Community Building Concerns Cannot yet be systematically captured 1
Community Building Concerns Really hard to capture, particularly the qualitative side 5
Community Building Concerns Making the question specific would make it less applicable 1
Community Building Concerns Should not make it too resource intensive to collect and evaluate 1
Community Building Concerns Community Building might not be an outcome, but a prerequisite of the program 1
Community Building Concerns Proving causality will be difficult 2
Community Building Concerns Will be difficult to automate this data collection 1
Community Building Concerns Ambiguous definition 1
Community Building General Community Building is useful information 8
Community Building General Community Building is important information 15
General General Need to think about how the information will be collected in the field 1
General General Metrics are harder for things that are not timebound 1
General General Grantee needs to have a clear definition and be clear about why this metric is important to them 1
General General Grantee needs to demonstrate consistency between the goal and what they are measuring 1
General General WMF needs to address anxiety about low numbers 1
General General Global metrics show how program are working around the globe; program specific metrics are different 1
General General Should have metrics that are applicable from the smallest to largest grant; the comparison is still interesting 1

Resources

Topic Comment Frequency
Metrics Library Endorsement for Metrics Library 13
Metrics Library Suggestions for features for the Metrics Library 16
1:1 Support Create a position to help program managers do program evaluation 1
Tutorials More online training about using and understanding Wikimetrics could be something to think about. 1
Tutorials An online tutorial/masterclass one per month would be a nice way to resolve doubts and questions. 1
Current resources Learning patterns and Idea Lab insufficient to inspire experimentation and new program design. So the metrics library seems like a good addition. 1
Current resources Maybe there is already sufficient guidance but it is difficult to find. Not all grantees (even "experienced" grantees) knew about it. 1