Research:Language-Agnostic Topic Classification/Outlink model performance/All wikis

This is a data dump of how the model does when evaluated on every single Wikipedia (based on applying English labels to other languages per Wikidata). In particular, "Had Groundtruth" means that an article existed in that language for which there was an English equivalent with groundtruth topic labels (and therefore is included in the micro/macro precision/recall/f1 statistics). "No Groundtruth" generally means that that article had no English equivalent per Wikidata (though in a very small number of cases, it would have an English equivalent with no groundtruth topic labels). Generally I would include average precision statistics as well, but that is much more memory-intensive and so has not been done yet for all wikis. This is actually based on a model with the following characteristics:

  • Code: https://github.com/geohci/wikipedia-language-agnostic-topic-classification
  • Model architecture: multi-label fastText supervised model
  • Number of articles by language in training data (% of training data). This was an attempt to balance convenience and coverage. The languages were included in rough proportion to their number of articles with labels (with some dampening of English Wikipedia):
    • Arabic (ar): 662,021 (10.5%)
    • Spanish (es): 854,992 (13.6%)
    • English (en): 2,363,824 (37.5%)
    • French (fr): 1,138,743 (18.1%)
    • Hindi (hi): 62,720 (0.1%)
    • Russian (ru): 707,924 (11.2%)
    • Chinese (zh): 513,292 (8.1%)
  • Epochs: 5
  • Learning rate: 0.1
  • Window size: 20
  • Min count (under which QID is not retained in vocab): 20
  • No pre-trained embeddings used
  • Embeddings dimension: 50
  • Total number of model params: 3200 (50 x 64)
  • Vocab size: 2,829,260
  • Total number of embeddings params: 141,463,000 (vocab size * embeddings dimension)
  • Model size on disk: 588.0 MB

Outliers and Anecdotal Observations edit

These observations are based on making topic predictions for every Wikipedia article and then looking at languages that are outliers in their distribution of topics -- e.g., no biographies or a lot of history articles. Some of these are real -- e.g., small wikis can have strong skews introduced by highly active editors; Lsjbot and other such bots can lead to a very strong skew in article topic towards science or geography or other topics for which it is easy to generate large numbers of articles based on standard templates. Some outliers though, are a combination of bot skew and data bias -- i.e. a wiki is heavily skewed towards a certain type of article such as articles about dates but each of these articles has a similar distribution of outlinks that are different than what the model was trained on and thus it consistently predicts the wrong topic for all of these articles. To verify trends, I checked specific predictions for a language as well as repeatedly used the simple process of visiting https://<lang-code>.wikipedia.org/wiki/Special:Random and then checking the Wikidata item to see what the article was about. Some examples:

  • Skew towards articles about settlements and rivers that are predicted to be about History and Society.Society. Many of these articles seem to originate with EmausBot (see contributions by wiki here)
  • Skew towards biographies that is in part a large number of articles about dates (e.g., Q2065)
    • Western Armenian (hyw)
  • Skew towards Culture.Linguistics and History_and_Society.Business_and_economics that is actually asteroid articles with very few outlinks (e.g., Q209053)
    • Yoruba (yo)
  • All languages, regardless of content, have zero recall for Central Africa as a result of my not implementing the topic labeling changes described here: https://github.com/wikimedia/drafttopic/blob/master/drafttopic/utilities/add_central_africa.py
  • Common outliers that are appropriate predictions:
    • Articles for dates that are predicted to be related to History: Turkmen (tk), Nahuatl (nah), Greenlandic (kl)

Performance by Wiki edit

wiki_db Had Groundtruth No Groundtruth micro precision macro precision micro recall macro recall micro f1 macro f1
amwiki 8031 6572 0.652 0.615 0.348 0.287 0.434 0.367
sqwiki 52302 24798 0.778 0.656 0.608 0.379 0.666 0.456
mlwiki 52489 15146 0.807 0.717 0.600 0.423 0.675 0.513
emlwiki 8624 3814 0.623 0.405 0.427 0.232 0.473 0.267
kiwiki 1206 100 0.747 0.361 0.545 0.160 0.596 0.199
cywiki 102396 27825 0.806 0.732 0.654 0.407 0.692 0.493
pihwiki 737 26 0.702 0.448 0.304 0.189 0.395 0.243
hsbwiki 10095 3414 0.793 0.525 0.554 0.251 0.624 0.308
bxrwiki 1948 164 0.723 0.557 0.314 0.255 0.407 0.326
dsbwiki 2763 424 0.712 0.468 0.439 0.227 0.508 0.282
kowiki 303220 184175 0.818 0.772 0.667 0.533 0.725 0.614
ilowiki 14930 213 0.828 0.619 0.717 0.295 0.739 0.355
crhwiki 2387 2945 0.770 0.385 0.512 0.181 0.578 0.222
lnwiki 2124 852 0.634 0.558 0.333 0.247 0.417 0.321
ltgwiki 644 256 0.656 0.315 0.425 0.125 0.464 0.155
mswiki 243883 95045 0.907 0.817 0.807 0.587 0.847 0.662
hywiki 147969 122191 0.837 0.785 0.603 0.466 0.679 0.560
abwiki 4203 1693 0.563 0.142 0.407 0.133 0.432 0.104
nywiki 423 47 0.656 0.399 0.204 0.148 0.288 0.199
chrwiki 548 39 0.604 0.331 0.192 0.103 0.261 0.139
vowiki 120989 3565 0.890 0.424 0.751 0.262 0.800 0.295
nrmwiki 3981 447 0.589 0.398 0.387 0.172 0.429 0.202
eewiki 303 9 0.532 0.300 0.305 0.170 0.362 0.199
thwiki 100687 36381 0.822 0.760 0.669 0.540 0.730 0.615
bawiki 19864 32130 0.809 0.682 0.337 0.305 0.431 0.386
glkwiki 1193 4557 0.670 0.221 0.381 0.126 0.479 0.149
elwiki 126848 50989 0.805 0.766 0.669 0.529 0.720 0.607
ltwiki 113098 83717 0.826 0.766 0.670 0.540 0.727 0.615
maiwiki 10081 3199 0.787 0.561 0.604 0.256 0.653 0.322
minwiki 33712 188086 0.932 0.370 0.908 0.227 0.913 0.252
pdcwiki 1213 550 0.615 0.379 0.356 0.201 0.427 0.243
urwiki 127581 25176 0.852 0.735 0.748 0.498 0.785 0.567
cuwiki 671 31 0.705 0.368 0.318 0.161 0.399 0.197
mtwiki 3092 326 0.756 0.543 0.530 0.293 0.598 0.348
ugwiki 2528 1471 0.741 0.555 0.408 0.243 0.489 0.300
astwiki 86124 20724 0.858 0.792 0.648 0.461 0.718 0.557
scnwiki 20145 3939 0.791 0.615 0.589 0.302 0.657 0.379
be_x_oldwiki 39412 30822 0.807 0.743 0.589 0.443 0.662 0.531
srnwiki 841 181 0.541 0.311 0.329 0.149 0.361 0.176
afwiki 72733 18495 0.794 0.763 0.509 0.399 0.581 0.486
etwiki 121204 85314 0.797 0.740 0.615 0.494 0.680 0.575
angwiki 2889 111 0.714 0.585 0.386 0.267 0.478 0.340
nnwiki 120378 32000 0.844 0.737 0.733 0.509 0.773 0.580
mrwiki 36219 16129 0.824 0.695 0.603 0.392 0.682 0.478
rowiki 256232 149010 0.862 0.796 0.747 0.546 0.790 0.627
newiki 20311 11265 0.767 0.627 0.593 0.333 0.651 0.412
zh_yuewiki 57260 27158 0.802 0.728 0.623 0.452 0.690 0.537
azwiki 101834 53357 0.827 0.767 0.620 0.456 0.690 0.551
papwiki 1665 238 0.657 0.499 0.271 0.200 0.348 0.258
stqwiki 3500 462 0.684 0.492 0.484 0.282 0.543 0.325
barwiki 19842 10734 0.726 0.593 0.604 0.343 0.644 0.408
liwiki 9950 3000 0.733 0.603 0.554 0.332 0.606 0.403
sahwiki 7071 4709 0.707 0.558 0.310 0.232 0.397 0.301
jbowiki 1110 46 0.658 0.459 0.340 0.194 0.415 0.248
tyvwiki 981 1081 0.603 0.323 0.198 0.119 0.265 0.152
lezwiki 2343 1627 0.611 0.435 0.350 0.223 0.388 0.264
euwiki 283636 74972 0.908 0.807 0.781 0.499 0.828 0.593
aswiki 5266 1252 0.729 0.537 0.525 0.229 0.580 0.291
suwiki 10993 44595 0.741 0.590 0.571 0.310 0.615 0.375
ruwiki 852420 764561 0.891 0.861 0.741 0.674 0.794 0.736
azbwiki 228938 5454 0.825 0.621 0.587 0.347 0.655 0.395
ladwiki 3135 391 0.633 0.448 0.394 0.242 0.454 0.284
miwiki 2954 4192 0.588 0.276 0.416 0.132 0.430 0.158
biwiki 1112 71 0.563 0.387 0.170 0.115 0.227 0.159
bjnwiki 1309 1261 0.628 0.464 0.316 0.178 0.379 0.225
fawiki 609314 116590 0.872 0.788 0.721 0.570 0.778 0.638
kbpwiki 1494 99 0.565 0.428 0.177 0.136 0.242 0.184
gomwiki 794 2162 0.482 0.170 0.117 0.027 0.174 0.043
newwiki 22800 49051 0.877 0.588 0.618 0.265 0.658 0.308
pswiki 7010 3288 0.731 0.558 0.466 0.255 0.539 0.313
nds_nlwiki 5004 1921 0.646 0.430 0.470 0.223 0.512 0.266
vlswiki 5523 1597 0.638 0.458 0.539 0.257 0.549 0.307
simplewiki 156742 5559 0.821 0.757 0.655 0.547 0.722 0.620
orwiki 12196 3178 0.741 0.437 0.610 0.262 0.643 0.297
lowiki 2171 771 0.683 0.537 0.371 0.204 0.450 0.264
fiwiki 330250 153521 0.822 0.770 0.684 0.571 0.738 0.641
lgwiki 633 208 0.611 0.304 0.249 0.113 0.310 0.142
kuwiki 12405 14302 0.717 0.618 0.493 0.308 0.563 0.384
glwiki 102666 61259 0.828 0.803 0.645 0.482 0.707 0.577
tumwiki 548 9 0.861 0.357 0.418 0.125 0.462 0.163
lmowiki 34801 4774 0.844 0.595 0.708 0.302 0.746 0.362
smwiki 681 110 0.691 0.412 0.370 0.218 0.454 0.268
zuwiki 1977 150 0.748 0.406 0.496 0.145 0.559 0.193
zeawiki 4275 417 0.777 0.413 0.631 0.225 0.665 0.261
pagwiki 2284 144 0.849 0.256 0.752 0.117 0.781 0.118
dtywiki 2259 836 0.756 0.498 0.413 0.150 0.499 0.212
pamwiki 7901 566 0.784 0.438 0.635 0.260 0.685 0.300
huwiki 324058 144786 0.862 0.788 0.760 0.550 0.799 0.627
arcwiki 1428 248 0.727 0.498 0.428 0.266 0.515 0.321
oswiki 7795 3911 0.692 0.483 0.452 0.274 0.513 0.307
siwiki 9544 4406 0.736 0.593 0.505 0.297 0.579 0.375
zh_classicalwiki 7704 2335 0.715 0.590 0.516 0.286 0.570 0.354
csbwiki 4377 607 0.615 0.350 0.389 0.159 0.435 0.188
fowiki 11315 1853 0.757 0.637 0.538 0.286 0.592 0.356
furwiki 2730 529 0.634 0.427 0.414 0.217 0.476 0.262
xhwiki 749 112 0.643 0.352 0.443 0.169 0.497 0.210
alswiki 20743 6382 0.769 0.635 0.659 0.373 0.693 0.448
lijwiki 3140 592 0.691 0.483 0.359 0.203 0.440 0.262
tnwiki 540 77 0.692 0.180 0.319 0.098 0.431 0.121
enwiki 5764873 297117 0.868 0.826 0.794 0.685 0.825 0.735
arzwiki 432317 146835 0.871 0.693 0.578 0.273 0.667 0.349
snwiki 1670 2742 0.662 0.492 0.255 0.211 0.347 0.274
omwiki 503 223 0.622 0.344 0.322 0.144 0.387 0.178
diqwiki 11771 5821 0.757 0.607 0.560 0.320 0.616 0.381
udmwiki 3744 1112 0.700 0.380 0.300 0.168 0.398 0.214
mwlwiki 2931 445 0.716 0.532 0.385 0.224 0.471 0.288
brwiki 56010 11856 0.745 0.654 0.600 0.400 0.647 0.468
lawiki 115712 17117 0.853 0.728 0.681 0.436 0.747 0.521
cebwiki 562536 4193784 0.879 0.547 0.802 0.331 0.822 0.362
guwiki 8568 20068 0.739 0.512 0.571 0.226 0.615 0.285
scwiki 5687 789 0.737 0.626 0.414 0.276 0.505 0.363
quwiki 18212 3905 0.796 0.670 0.588 0.362 0.655 0.439
ndswiki 27322 15962 0.727 0.606 0.621 0.390 0.654 0.443
tkwiki 4860 482 0.749 0.502 0.466 0.182 0.528 0.239
rnwiki 251 202 0.434 0.255 0.128 0.130 0.181 0.155
iewiki 4574 238 0.716 0.467 0.457 0.267 0.532 0.310
gagwiki 1787 864 0.701 0.388 0.455 0.219 0.518 0.251
mywiki 16595 27724 0.747 0.625 0.531 0.320 0.587 0.392
dvwiki 1994 837 0.701 0.400 0.410 0.174 0.476 0.211
bewiki 92051 99095 0.836 0.787 0.657 0.516 0.720 0.601
stwiki 579 52 0.727 0.324 0.500 0.151 0.559 0.186
nsowiki 5213 2945 0.903 0.349 0.863 0.224 0.872 0.246
pawiki 24216 7494 0.747 0.665 0.470 0.306 0.555 0.398
bhwiki 6843 189 0.775 0.616 0.595 0.307 0.648 0.365
anwiki 27892 8435 0.855 0.763 0.653 0.371 0.713 0.466
bowiki 2491 3045 0.736 0.523 0.358 0.260 0.447 0.309
kbdwiki 1189 374 0.746 0.377 0.389 0.145 0.456 0.188
lbwiki 38341 19681 0.778 0.670 0.629 0.376 0.678 0.450
novwiki 1612 36 0.691 0.400 0.461 0.232 0.519 0.273
kawiki 97184 36816 0.822 0.778 0.654 0.515 0.713 0.596
itwiki 1079796 527130 0.857 0.806 0.759 0.612 0.798 0.677
fiu_vrowiki 3286 2225 0.684 0.606 0.328 0.232 0.409 0.306
olowiki 2298 851 0.548 0.457 0.238 0.188 0.294 0.235
wawiki 5353 7420 0.643 0.542 0.447 0.300 0.511 0.359
mgwiki 77707 14584 0.686 0.487 0.195 0.158 0.255 0.186
cawiki 431728 215389 0.856 0.795 0.742 0.584 0.786 0.657
cowiki 4541 1150 0.660 0.410 0.404 0.232 0.476 0.273
szywiki 575 16 0.650 0.449 0.225 0.115 0.308 0.168
zh_min_nanwiki 339214 63327 0.936 0.747 0.816 0.487 0.865 0.555
eowiki 191856 88212 0.819 0.748 0.692 0.511 0.739 0.588
frrwiki 9092 1510 0.742 0.543 0.539 0.321 0.585 0.369
sswiki 420 40 0.570 0.332 0.202 0.081 0.268 0.117
ruewiki 6478 965 0.744 0.521 0.540 0.246 0.593 0.308
pcdwiki 3640 1006 0.708 0.345 0.564 0.201 0.602 0.218
krcwiki 1877 128 0.565 0.340 0.296 0.145 0.351 0.181
shwiki 220895 230251 0.849 0.766 0.739 0.523 0.778 0.599
wuuwiki 28790 2181 0.760 0.696 0.357 0.262 0.467 0.356
scowiki 54561 2066 0.840 0.794 0.604 0.474 0.688 0.568
hewiki 200222 65956 0.825 0.796 0.653 0.536 0.719 0.624
vewiki 233 105 0.820 0.374 0.723 0.239 0.749 0.273
kvwiki 3784 1512 0.686 0.381 0.325 0.210 0.404 0.245
vepwiki 5491 943 0.772 0.617 0.318 0.316 0.423 0.391
hrwiki 127557 69825 0.811 0.754 0.661 0.518 0.719 0.598
roa_rupwiki 1031 118 0.647 0.381 0.346 0.186 0.414 0.227
cswiki 291075 164120 0.828 0.784 0.697 0.557 0.748 0.633
bat_smgwiki 6078 10610 0.689 0.598 0.412 0.239 0.483 0.311
fjwiki 811 17 0.652 0.399 0.259 0.197 0.324 0.226
bnwiki 71238 14609 0.818 0.732 0.655 0.445 0.715 0.534
extwiki 2793 324 0.642 0.538 0.347 0.240 0.420 0.299
rwwiki 1576 167 0.764 0.445 0.353 0.201 0.435 0.252
iowiki 28385 823 0.762 0.691 0.502 0.334 0.582 0.419
pnbwiki 41271 11055 0.794 0.667 0.577 0.349 0.645 0.426
szlwiki 11406 40270 0.753 0.550 0.673 0.293 0.683 0.344
map_bmswiki 3337 9913 0.768 0.458 0.691 0.191 0.693 0.233
mdfwiki 836 298 0.718 0.391 0.157 0.126 0.221 0.166
bgwiki 186520 76175 0.822 0.762 0.696 0.534 0.744 0.609
trwiki 253790 97492 0.825 0.768 0.683 0.569 0.741 0.639
bclwiki 8717 996 0.748 0.496 0.582 0.222 0.625 0.273
cvwiki 19213 23460 0.781 0.630 0.368 0.324 0.459 0.396
kabwiki 4052 539 0.753 0.525 0.487 0.257 0.558 0.312
crwiki 81 5 0.546 0.141 0.299 0.083 0.368 0.096
bugwiki 13956 67 0.994 0.305 0.982 0.226 0.985 0.239
hifwiki 5472 4049 0.728 0.562 0.486 0.232 0.540 0.284
mhrwiki 5579 4470 0.726 0.435 0.346 0.217 0.423 0.257
bmwiki 532 54 0.675 0.330 0.266 0.152 0.323 0.189
ganwiki 5122 1249 0.720 0.474 0.538 0.260 0.589 0.303
mrjwiki 9123 1012 0.693 0.279 0.422 0.190 0.496 0.204
gawiki 49403 3230 0.824 0.746 0.571 0.337 0.639 0.426
srwiki 266936 365258 0.841 0.768 0.729 0.519 0.768 0.597
gvwiki 4693 288 0.717 0.514 0.534 0.306 0.579 0.352
dzwiki 172 6 0.527 0.257 0.328 0.154 0.345 0.168
akwiki 669 19 0.803 0.380 0.241 0.131 0.310 0.172
myvwiki 4374 1695 0.724 0.372 0.265 0.173 0.337 0.209
yowiki 18282 13459 0.711 0.592 0.499 0.353 0.555 0.405
ttwiki 30684 58295 0.793 0.665 0.347 0.360 0.434 0.435
nlwiki 882260 1129948 0.873 0.791 0.785 0.599 0.821 0.665
gorwiki 398 2180 0.741 0.182 0.604 0.141 0.639 0.141
tetwiki 912 546 0.576 0.324 0.387 0.172 0.427 0.204
ptwiki 735185 294143 0.861 0.800 0.749 0.603 0.795 0.672
hiwiki 75110 60232 0.872 0.825 0.724 0.558 0.782 0.650
ocwiki 77503 8050 0.855 0.703 0.740 0.398 0.777 0.483
koiwiki 2308 1043 0.740 0.252 0.320 0.156 0.405 0.159
warwiki 309820 946155 0.965 0.583 0.887 0.305 0.909 0.351
jamwiki 1379 14 0.742 0.541 0.326 0.195 0.418 0.258
nowiki 362132 170168 0.830 0.767 0.715 0.556 0.760 0.630
tawiki 78575 48093 0.823 0.711 0.638 0.415 0.706 0.506
knwiki 16050 8460 0.729 0.595 0.479 0.256 0.559 0.334
tpiwiki 1514 32 0.693 0.399 0.322 0.165 0.383 0.196
napwiki 13422 1045 0.784 0.328 0.752 0.215 0.749 0.224
cdowiki 10707 4443 0.826 0.624 0.539 0.292 0.612 0.354
plwiki 867196 538486 0.861 0.803 0.755 0.596 0.797 0.667
avwiki 1709 475 0.643 0.340 0.343 0.170 0.396 0.204
lvwiki 67662 33783 0.828 0.756 0.658 0.509 0.719 0.588
gnwiki 3509 234 0.701 0.603 0.402 0.286 0.482 0.346
tcywiki 569 542 0.552 0.282 0.312 0.094 0.363 0.126
igwiki 1032 97 0.611 0.378 0.349 0.158 0.422 0.203
sewiki 6548 961 0.730 0.533 0.592 0.273 0.612 0.320
svwiki 842490 2888049 0.864 0.777 0.769 0.570 0.806 0.640
hakwiki 8088 959 0.829 0.653 0.597 0.335 0.671 0.407
klwiki 713 8 0.642 0.321 0.405 0.135 0.454 0.166
chywiki 562 26 0.598 0.358 0.274 0.178 0.352 0.210
mznwiki 10576 2406 0.799 0.568 0.588 0.293 0.657 0.353
tewiki 21806 46667 0.802 0.595 0.621 0.285 0.681 0.356
roa_tarawiki 8733 511 0.927 0.356 0.868 0.142 0.880 0.179
tiwiki 101 18 0.482 0.237 0.110 0.140 0.158 0.152
bpywiki 19448 5554 0.899 0.401 0.823 0.216 0.833 0.225
rmywiki 603 41 0.674 0.326 0.460 0.176 0.514 0.207
tlwiki 58722 12949 0.764 0.689 0.486 0.385 0.551 0.453
piwiki 316 96 0.632 0.259 0.581 0.239 0.603 0.245
ikwiki 223 25 0.627 0.216 0.173 0.084 0.247 0.109
chwiki 421 8 0.526 0.240 0.180 0.116 0.250 0.150
tywiki 797 148 0.773 0.276 0.245 0.109 0.347 0.144
hawiki 4680 225 0.756 0.526 0.504 0.251 0.579 0.317
swwiki 36588 21867 0.767 0.664 0.533 0.412 0.608 0.482
zhwiki 606632 505069 0.907 0.875 0.810 0.691 0.850 0.755
sawiki 7904 3181 0.687 0.444 0.537 0.194 0.558 0.233
nawiki 1300 50 0.695 0.393 0.307 0.173 0.392 0.221
frpwiki 3471 364 0.667 0.386 0.504 0.182 0.531 0.207
inhwiki 978 184 0.699 0.492 0.295 0.168 0.380 0.228
sowiki 3719 1493 0.699 0.573 0.402 0.279 0.489 0.358
iswiki 33927 15283 0.788 0.744 0.562 0.431 0.643 0.527
dawiki 175503 83775 0.820 0.775 0.693 0.522 0.740 0.605
kgwiki 1042 79 0.717 0.428 0.354 0.165 0.437 0.210
fywiki 26924 16810 0.740 0.682 0.538 0.381 0.606 0.465
lbewiki 914 236 0.718 0.264 0.361 0.107 0.436 0.123
xalwiki 1825 42 0.565 0.300 0.162 0.094 0.184 0.109
pflwiki 2351 251 0.858 0.292 0.779 0.111 0.797 0.147
viwiki 520274 723339 0.911 0.793 0.817 0.584 0.855 0.655
gdwiki 13512 1485 0.773 0.620 0.587 0.338 0.642 0.408
twwiki 618 31 0.590 0.365 0.259 0.192 0.337 0.233
kshwiki 2200 609 0.506 0.380 0.277 0.151 0.327 0.197
slwiki 111852 56157 0.844 0.758 0.723 0.512 0.769 0.591
towiki 694 942 0.725 0.297 0.498 0.097 0.532 0.129
bswiki 53571 29053 0.823 0.726 0.686 0.442 0.734 0.526
aywiki 4426 231 0.768 0.441 0.474 0.163 0.551 0.214
pmswiki 57898 6823 0.838 0.578 0.790 0.268 0.791 0.313
tgwiki 61288 37891 0.838 0.644 0.554 0.321 0.626 0.392
iuwiki 275 20 0.614 0.337 0.236 0.143 0.314 0.182
dewiki 1178972 1243500 0.822 0.776 0.730 0.610 0.767 0.668
frwiki 1379793 836635 0.893 0.864 0.805 0.696 0.841 0.755
ckbwiki 20315 5392 0.758 0.672 0.507 0.372 0.592 0.453
yiwiki 10528 4365 0.727 0.672 0.494 0.392 0.572 0.477
rmwiki 3141 459 0.756 0.392 0.619 0.199 0.661 0.245
kaawiki 1417 160 0.660 0.454 0.278 0.174 0.364 0.231
ukwiki 597011 418505 0.854 0.803 0.742 0.599 0.783 0.666
acewiki 2484 7744 0.686 0.432 0.324 0.174 0.405 0.221
kkwiki 103091 113705 0.852 0.685 0.686 0.391 0.739 0.467
jvwiki 25358 31377 0.790 0.687 0.571 0.392 0.650 0.479
shnwiki 3583 3302 0.925 0.161 0.803 0.145 0.851 0.131
tswiki 491 134 0.567 0.339 0.328 0.153 0.390 0.186
arwiki 762928 281507 0.891 0.844 0.752 0.596 0.804 0.679
kywiki 38817 40221 0.771 0.623 0.476 0.289 0.553 0.367
dinwiki 99 1 0.345 0.128 0.130 0.055 0.170 0.067
lfnwiki 3409 209 0.694 0.527 0.372 0.247 0.472 0.318
sdwiki 7703 4745 0.593 0.388 0.380 0.197 0.431 0.223
cbk_zamwiki 2645 56 0.823 0.333 0.651 0.181 0.677 0.207
xmfwiki 12402 1410 0.780 0.686 0.519 0.369 0.594 0.445
htwiki 36776 21028 0.801 0.541 0.517 0.224 0.601 0.272
mkwiki 71377 33508 0.826 0.751 0.659 0.492 0.720 0.572
sgwiki 207 12 0.449 0.241 0.263 0.109 0.302 0.138
pntwiki 422 21 0.640 0.259 0.223 0.122 0.290 0.149
nvwiki 14291 839 0.868 0.362 0.770 0.191 0.798 0.219
hywwiki 4674 3010 0.589 0.446 0.355 0.165 0.392 0.213
kswiki 182 26 0.604 0.208 0.331 0.069 0.400 0.094
mnwiki 14857 3750 0.783 0.689 0.531 0.356 0.609 0.441
jawiki 527883 673018 0.835 0.792 0.716 0.575 0.762 0.648
nahwiki 5874 931 0.726 0.559 0.502 0.260 0.564 0.325
adywiki 338 52 0.770 0.453 0.281 0.209 0.365 0.268
atjwiki 649 452 0.562 0.128 0.319 0.038 0.349 0.046
kwwiki 3848 136 0.747 0.487 0.475 0.247 0.539 0.300
iawiki 18193 4011 0.804 0.648 0.629 0.346 0.682 0.417
kmwiki 2646 4405 0.735 0.536 0.466 0.213 0.537 0.285
zawiki 1597 154 0.637 0.361 0.325 0.148 0.389 0.182
vecwiki 55953 1503 0.919 0.507 0.887 0.301 0.896 0.353
lrcwiki 1096 3239 0.743 0.366 0.353 0.139 0.446 0.181
idwiki 288752 235923 0.853 0.794 0.708 0.571 0.767 0.649
skwiki 154422 77274 0.851 0.728 0.738 0.503 0.782 0.578
cewiki 67020 186091 0.900 0.551 0.434 0.262 0.495 0.311
ffwiki 213 7 0.689 0.273 0.239 0.097 0.301 0.125
wowiki 1095 139 0.634 0.427 0.307 0.186 0.383 0.237
eswiki 1015429 531111 0.900 0.868 0.801 0.682 0.842 0.748
hawwiki 2504 777 0.794 0.438 0.515 0.179 0.577 0.233
uzwiki 97534 36344 0.847 0.674 0.597 0.322 0.660 0.393

Performance by Number of Outlinks edit

To be precise, outlinks that were mappable to Wikidata IDs. They may or may not have been in the model's vocabulary.

# outlinks Had Groundtruth No Groundtruth micro precision macro precision micro recall macro recall micro f1 macro f1
1 229821 222672 0.587 0.611 0.217 0.182 0.299 0.262
2 407078 499090 0.569 0.565 0.388 0.308 0.443 0.379
3 500734 434597 0.635 0.610 0.487 0.401 0.539 0.467
4 602630 490467 0.705 0.646 0.591 0.473 0.629 0.527
5 589201 542381 0.717 0.658 0.602 0.496 0.644 0.549
6 675404 679716 0.757 0.693 0.643 0.529 0.687 0.584
7 804155 907277 0.784 0.713 0.652 0.542 0.704 0.601
8 765228 1127214 0.800 0.729 0.679 0.562 0.726 0.620
9 749086 1166286 0.811 0.743 0.700 0.576 0.745 0.635
10-19 6751252 7412010 0.852 0.786 0.749 0.611 0.791 0.673
20-29 3988634 3230575 0.869 0.816 0.761 0.623 0.804 0.692
30-39 2557281 1751152 0.874 0.830 0.760 0.629 0.806 0.701
40-49 1740596 687020 0.877 0.844 0.759 0.635 0.806 0.708
50+ 9606934 3373909 0.903 0.874 0.759 0.635 0.816 0.717