User:Leaderboard/StewardMark

(English) This is an essay. It expresses the opinions and ideas of some Wikimedians but may not have wide support. This is not policy on Meta, but it may be a policy or guideline on other Wikimedia projects. Feel free to update this page as needed, or use the discussion page to propose major changes.

If it isn't obvious, StewardMark is not an official Meta-Wiki policy (or indeed that of any wiki as far as I am aware.

StewardMark is a experimental scoring system that ranks the performance of each steward candidate using a model that is nearly the same as the support percentage that is currently used to determine whether a candidate passes, and scales across multiple steward years. The model only considers steward election from 2009, as the voting population of prior years is harder to compare.

Calculation

Let x be the number of supports received by a candidate.
Let y be the number of opposes received.
Let z be the number of neutral votes received.

Then the StewardMark S_m of a candidate is defined by

$S_{m}={\frac {100(x+{\frac {z}{2}})}{x+y+z}}$

The key difference is that some weightage is given to neutral users, because I believe that their opinions should also count. For most candidates this will mean that S_m < support %, and will mean the other way round for the rest.

StewardMark only applies to users that have not withdrawn or been disqualified.

Standardisation

This can be used to compare with scores from other contexts (say RfA scores from Wikipedia). A conversion table should be defined in any case. The "standardised" scale is a real number from 0 to 20, rounded to two decimal place.

The US grade system equivalent is meant to answer this question: If stewardship was a course and the election determined your grade, what would it be? Just like a real college course, C is a bad grade and such students often have to retake - and passed candidates usually have a B or higher, again reflecting the real-world scenario.

Conversion scale
Standardised scale (0 - 20)	StewardMark cutoff (/100)	US grade system equivalent
20	99.5	A+
19	96.5	A+
18	93	A
17	90	A
16	86	A-
15	81	B+
14	77	B
13	73	B-
12	67	C+
11	60	C
10	54	C-
9	45	D+
8	40	D
7	35	D-
6	29	F
5	22
4	16
3	11
2	7
1	2
0	0

Statistics

The dataset includes all steward candidates from 2009 and later. Data correct as of the 2024 steward elections.

Comparison of statistical parameters
Statistical parameter	StewardMark (/100)	StandardScale (/20)
Mean	68.4	13.08
Median	79.45	14.61
Maximum	99.39	19.96
Minimum	2.76	1.15
Standard deviation	27.7	5.15

Steward election stats over the years
Year	StewardMark mean	StandardMark mean	Number of candidates	StandardMark Stdev
2009	67.27	12.69	22	4.86
2010	48.32	9.49	25	6.72
2011	71.10	13.75	20	5.39
2012	83.24	15.52	9	1.80
2013	68.81	13.51	10	6.38
2014	82.29	15.67	10	3.70
2015	73.87	13.94	14	4.07
2016	62.74	11.61	10	2.43
2017	69.76	13.09	7	4.07
2018	70.04	13.15	10	4.27
2019	74.87	14.19	7	4.64
2020	66.16	12.81	14	5.36
2021	61.23	11.92	10	5.89
2022	79.7	15.45	7	5.08
2023	83.3	15.83	5	3.43
2024	73.3	14.09	11	5

Raw data

See Raw data.

Takeaways

There are some steward candidates that have done really well, with two candidates in the same year getting a StewardMark of over 99. When setting the conversion scale, one objective was to design in such a way that it would be extremely, but not impossibly, difficult to get a perfect standardised score of 20. MF-Warburg came incredibly close to that with a StewardMark of 99.39/100.
The skew implies that most steward candidates do pretty well - about 50% of the candidates in the dataset passed.
There are a couple of cases where someone with a higher StewardMark (for example, 2009's Putnik with a 77.73/100) has failed than someone else who passed. The reason is that the former had fewer neutrals: the latter might have just crossed the 80% support ratio but garnered more neturals that would drag down the score. They are rare though.

StewardMark from a en.wp perspective

A natural question would be to analyse the suitability of StewardMark when analysing en.wp adminship, giving the large number of candidates that have attempted for adminship. There are some important differences however:

We must include withdrawn and SNOW cases, as they comprise a significant number of candidates.
The results are different. For instance, about 3.4% of all candidates score a 100/100 StewardMark, and hence get a 20. On the other hand, mainly as a result of SNOW, one-eighth of all candidates get a zero. These extremes should be taken into account, and even then, en.wp adminship proposals score very well on the high end as compared to stewards.

The raw data for en.wp is available at User:Leaderboard/StewardMark/en.wp RFA raw data. Data last updated: March 2024.

en.wp StewardMark statistics
Statistical parameter	StewardMark (/100)	StandardScale (/20)
Mean	52.86	10.27
Median	54.05	10.01
Maximum	100	20
Minimum	0	0
Standard deviation	36.16	6.89

en.wp stats over the years
Year	StewardMark mean	StandardMark mean	Number of candidates	StandardMark Stdev
2008	50.66	9.9	591	6.92
2009	50.32	9.8	354	6.75
2010	47.76	9.42	231	6.69
2011	51.94	10.13	139	6.88
2012	46.64	9.05	95	6.73
2013	59.90	11.62	74	6.28
2014	51.35	10.02	62	7.65
2015	52.63	10.18	58	6.48
2016	57.22	11.08	36	7.39
2017	65.48	12.82	41	6.72
2018	66.80	12.92	18	6.84
2019	76.89	14.82	31	5.02
2020	74.28	14.28	24	5.77
2021	83.08	16.41	11	5.48
2022	77.41	15.12	20	6.38
2023	72.23	14.11	19	7.08
2024	78.97	15.54	5	6.27