User:NaRay (WMF)/Sandbox/EquityLandscape/Pilot/Design Considerations/

This page is under construction
Please mind the gap

Here we go a little deeper with design considerations that are intended to be responded to as they are relevant to you as a potential data partner or user. edit

Consideration 1. The Nomenclature
^{Click in this space to reveal the question prompt and background information}

Do any of the proposed metric domains or facet labels present potential challenges to you or your group’s planned projects and partnerships?

We have active voting on the title of the equity data landscape in All our ideas
You can review the metrics schema table to review metrics names and comment with any challenges

Answer on the talk page

Consideration 2. Metadata frame
^{Click in this space to reveal the question prompt and background information}

To what extent do the metadata classification systems enable or inhibit potential cross-use with datasets you or your team use to understand Wikimedia communities?

We plan to catalog the following metadata for cross-referencing across datasets:

UN Country labels and codes
UN Continent & Subcontinent classifications
IBAN 2- and 3-Digit Country Codes
Maxmind country label
Wikimedia organizing hubs alignments (i.e., CEE, ESEAP, Iberocoop, WikiIndaba, North America (& US Coalition), Northern Europe, WikiArabia, WikiFranca, South Asia, Chairperson's Group, Chapter EDs
Official language listings and ISO 639-1, 639-2, and 639-3 Language codes as available

Answer on the talk page

Consideration 3. Referring to (In)Equity
^{Click in this space to reveal the question prompt and background information}

Does referring to equity or inequity resonate more with recent patterns of use you have seen in research and evaluation of inequity in your operating space?

Referring to equity vs inequity. Some comments have been shared that we should focus on improving equity rather than reducing inequity and labeling our aims and all coefficients positively in this way.

Answer on the talk page

Consideration 4. Measuring Diversity & Inequity
^{Click in this space to reveal the question prompt and background information}

Kolm categorized absolute and relative measurements of inequality as “rightist,” “centrist,” and “leftist,” depending on their treatment of inequality as an absolute or relative concept. Leftist measures are sensitive to absolute changes; they do not change when all incomes go up by the same absolute amount. An example of this type of measure would be the absolute gini, which is a standard gini coefficient multiplied by the mean of the distribution. Kolm defines centrist measures as measures which show increased inequality when average incomes rise and the relative distribution stays the same, and decreased inequality when all incomes rise by the same absolute amount. Rightist measures are purely relative; when all incomes go up in the same proportion, they are unchanged.

Using percentile ranking relies on a relative view of each group as opposed to an absolute view of capacity. When it comes to measuring change over time, we will rely on measuring the distributions of the input measures. For this we have several options:

For (In)Equity: Gini coefficient (Rightist - Relative measure) Absolute Gini (Leftist - Absolute measure) Intermediate Gini (Centrist measure - Combination or Relative and Absolute measure) [Inequality Index] (Leftist - Absolute measure) Hoover index (Leftist - Absolute measure) [[1]] & [Entropy] (Rightist - Relative measure) 20/20 ratio (Centrist - Relative measure) Palma ratio (Centrist - Relative measure)

For Diversity & Dissimilarity (see also) Simpson Index Shannon's H Richness R [evenness|Pielou’s Evenness E]

If you are familiar enough with any of the above, please share via direct comment, your thoughts on these options.

Answer on the talk page

Consideration 5. Scaling and Weighting
^{Click in this space to reveal the question prompt and background information}

For scaling and weighting we have identified some key options and are leaning to the options noted below. Please share what, if any, concerns or alternative suggestions you may have.

Scaling: For relative comparisons and roll-up ranking we need to triangulate across more than one input data point to estimate a more generalized global ranking. This means scaling for relative comparison and combining for relative presence and growth.

We considered raw ordered ranks, ordered ranks/N, Z-scores, and ratios. With the exception of some background calculations to determine limits to room for growth which apply ratios, we are currently applying percentile ranks 0 to 100 across all inputs for triangulation and output metrics.

Weights: Grants dollars must be weighted by the local economy and all input measures must be weighted by some population factor in calculating coefficients of inequity.

For grants, we considered: median income, per capita GDP, median equivalised income, and Per capita GDP (PPP). We currently plan to use per capita GDP, PPP (see also)

For GDP (PPP) weighting, there are two available weights for annual per capita GDP, PPP, and current international $ and constant international $. We currently plan to apply constant international $ for the weighting of historical grants and current international $ for the corresponding calendar year of grants.

For all other input measures, we are considering weighting relevant resourcing data either by general population, readers, or editors.

We currently propose to apply as follows:

Once we can derive unique editor counts by geography, resources could be weighted by editor and/or active editors population for consideration of underserved editor populations in the world or by general population to consider underrepresented populations in our ecosystem
Editor data could also be weighted by general population to understand a concept such as “activation rate” for growth potential or by readers to understand a concept like “under-activated” consumers.
Similarly, Readers stats could be weighted by editor, and/or active editors population, and/or general population in consideration of slightly different reach and inclusion questions and arrive at different answers regarding participation gaps.
Lastly, regional roll-ups could be weighted by population.

What engagement and resourcing gaps are most important to the strategy work you or your group engage in? Please share what excitements, concerns, or curiosities about the above weighting options you have.

Answer on the talk page