Research:Copyediting as a structured task/LanguageTool
A detailed analysis of LanguageTool for the copyedit structured task.
Advantages of LanguageTool
editThere are many reasons why LanguageTool seems a good starting point:
- It is open software
- It works out of the box and has been (and continues being) developed for many years
- It supports 30+ languages
- Each of the detected errors comes with an explanation. This is important for the machine-in-the-loop approach where editors should keep full control of whether to adopt or reject the model’s suggestions https://meta.wikimedia.org/wiki/Research:Knowledge_Gaps_3_Years_On#Principles_Guiding_Knowledge_Gaps_Research
- Most of the detected errors come with a suggestion for improvement.
- The community can define custom rules for the model https://community.languagetool.org/
Challenges for LanguageTool
editLanguageTool provides a browser-interface (https://languagetool.org/) to copy-paste text for copyediting.
When using LanguageTool for Wikipedia articles we face different challenges:
- A Wikipedia article contains not only plain text but also other elements such as tables, infoboxes, references etc which we probably dont want spell-check.
- A Wikipedia article contains content (text, links, etc) that is transcluded from, e.g., templates. Fixing potential copyedits in this case is not recommended as i) it would have to be done in the template and not in the article itself; and ii) will also affect the content in other articles.
- A Wikipedia article contains many text elements which might appear as copyedits but are in fact correct, such as in quotes, uncommon entity names and should thus not be highlighted as copyedits.
As an example, when manually pasting the text from the lead section of the article on Roman Catholic Diocese of Bisceglie, LanguageTool yields 7 copyedits (marked in bold) which are all false positives:
The Diocese of Bisceglie (Latin: Dioecesis Vigiliensis) was a Roman Catholic diocese located in the town of Bisceglie on the Adriatic Sea in the province of Barletta-Andria-Trani, Apulia in southern Italy. It is five miles south of Trani. In 1818, it was united with the Archdiocese of Trani to form the Archdiocese of Trani-Bisceglie.[1][2]
The main challenge is then to ensure that applying LanguageTool to find copyedits in Wikipedia articles yields genuine errors and not too many false positives (highlighted errors that are in fact correct).
API for LanguageTool
editIn order to investigate LanguageTool in more detail, we set up our own instance to be used via an API.
Endpoint on cloud-vps
editWe set up a remote server running our own instance of LanguageTool on cloud-vps.
We can then query LanguageTool in the following way:
- Directly: https://copyedits.wmcloud.org/v2/check?language=en&text=my+text
- In a jupyter-notebook: https://gitlab.wikimedia.org/repos/research/copyedit/-/blob/main/example_LanguageTool.ipynb
More documenation available at: https://github.com/wikimedia/research-api-endpoint-template/tree/language-tool
Frontend on toolforge
editWe also built an experimental API to run LanguageTool on Wikipedia articles. The tool automates some of the pre- and post-processing:
- it extracts the plain text of an article. Using the HTML-version, we can keep track of the HTML-tags encoding information about whether the text corresponds to, e.g., a link, a quote, a reference, etc
- it runs LanguageTool on the extracted plain text using the endpoint on cloud-vps
- it allows for filtering of the copyedits based on some heuristics. For example, we filter errors related to the anchor-text of links.
The tool can be queried by specifying the language (e.g. "en") and the article title of the corresponding Wikipedia. Some example queries in different languages:
- https://copyedit.toolforge.org/api/v1/lt?lang=simple&title=Sodium
- https://copyedit.toolforge.org/api/v1/lt?lang=de&title=Panama_Hotel_(Seattle)
The supported languages are: ar, ast, be, br, ca, da, de, el, en, eo, es, fa, fr, ga, gl, it, ja, km, nl, pl, pt, ro, ru, simple, sk, sl, sv, ta, tl, uk, zh. These correspond to the Wikipedia-projects for which there is a supported language in LanguageTool. We always use the language without specifying a specific variant (e.g. “en” instead of “en-US”). For Simple Wikipedia (simplewiki) we use LanguageTool with “en”.
More documentation available at: https://gitlab.wikimedia.org/repos/research/copyedit-api
Evaluation of LanguageTool
editIn order to evaluate the performance of LanguageTool in detecting errors, we need an annotated dataset with ground-truth errors. Comparing the predicted with the true errors, we can calculate performance metrics around precision and recall, especially true positives (how many of the predicted errors are genuine errors) and false positives (how many of the predicted errors do not correspond to a genuine error).
The main limitation is that these ground-truth datasets are extremely rare. Even more so when going beyond English or for Wikipedia articles.
Benchmark corpus
editOne starting point is the NLP-task of grammatical error correction, i.e. “the task of correcting different kinds of errors in text such as spelling, punctuation, grammatical, and word choice errors.” In the past, different benchmark datasets with ground-truth errors have been compiled to systematically investigate different approaches for grammatical error correction. Though, most of these resources are only available for English.
We evaluate LanguageTool on the W&I benchmark data of the BEA19 Shared task using Errant. W&I (Write & Improve) is an online web platform that assists non-native English students with their writing. Specifically, students from around the world submit letters, stories, articles and essays in response to various prompts, and the W&I system provides instant feedback. Since W&I went live in 2014, W&I annotators have manually annotated some of these submissions and assigned them a CEFR level. Thus, we have annotated errors in three difference levels: A (beginner), B (intermediate), C (advanced). My interpretation is that these classes contain errors with increasing complexity.
We then compare the errors from LanguageTool with the ground-truth errors in the benchmark data. For LanguageTool, we use language variants “en” and “en-US”. We evaluate on error detection (only detection) as well as error correction (detection + improvement).
data | #sents | LT-lang | #TP | #FP | #FN | Prec. | Rec. | F0.5 |
---|---|---|---|---|---|---|---|---|
A.train | 10,880 | en | 2,338 | 2,045 | 26,734 | 0.5334 | 0.0804 | 0.2508 |
en-US | 4,108 | 3,200 | 24,964 | 0.5621 | 0.1413 | 0.3523 | ||
B.train | 13,202 | en | 1,363 | 1,954 | 22,854 | 0.4109 | 0.0563 | 0.1818 |
en-US | 2,586 | 3,335 | 21,631 | 0.4368 | 0.1068 | 0.2699 | ||
C.train | 10,667 | en | 516 | 1,362 | 9,140 | 0.2748 | 0.0534 | 0.1503 |
en-US | 924 | 2,436 | 8,732 | 0.275 | 0.0957 | 0.2 |
data | #sents | LT-lang | #TP | #FP | #FN | Prec. | Rec. | F0.5 |
---|---|---|---|---|---|---|---|---|
A.train | 10,880 | en | 1,898 | 2,481 | 26,264 | 0.4334 | 0.0674 | 0.2078 |
en-US | 2,873 | 4,431 | 25,289 | 0.3933 | 0.102 | 0.2504 | ||
B.train | 13,202 | en | 1,175 | 2,136 | 22,490 | 0.3549 | 0.0497 | 0.1592 |
en-US | 1,911 | 4,004 | 21,754 | 0.3231 | 0.0808 | 0.2019 | ||
C.train | 10,667 | en | 461 | 1,415 | 9,017 | 0.2457 | 0.0486 | 0.1357 |
en-US | 739 | 2,619 | 8,739 | 0.2201 | 0.078 | 0.1613 |
Summary:
- Error detection yields a precision of around 55% in the easy-corpus (A.train). The difference between language-variants is small (53 for en and 56 for en-US)
- Error-detection yields a recall between 8% (en) and 14% (en-US). The en-US language variants is more sensitive capturing more errors. This means that LanguageTool does not detect all errors and misses a large fraction. Though in absolute numbers, LanguageTool still detects thousands of errors.
- The number of correctly detected errors decreases for medium (B.train) and hard (C.train) corpora.
- Error-correction is a much harder problem, however, the precision is still at around 40% for the easy corpus (A.train)
Wikipedia (English)
editWe would like to understand how the results from the benchmark corpora generalize when applied to Wikipedia. However, evaluating LanguageTool on Wikipedia articles is more challenging. We dont have a ground-truth dataset of at least some articles with a complete annotation of all the grammatical errors. Thus, we cannot just repeat the analysis from above.
Therefore, we will do an approximation by using annotations on the article level (instead of each single error):
- Featured articles are considered to be some of the best articles Wikipedia has to offer. As a rough approximation, we assume that these articles are free of any errors (as viewed by Wikipedia’s editors); thus, we consider any error we will find here as a false positive. For enwiki, we find 6,090 featured articles with 1,192,369 sentences.
- Articles with a copyedit-template. Wikipedia’s editors add this template to articles to indicate at the top of the article that these “may require copy editing for grammar, style, cohesion, tone, or spelling.” We assume that these articles have a higher chance to yield errors. For enwiki, we find 1,024 articles with the copyedit-template with 104,403 sentences.
Running LanguageTool we get the following statistics:
- featured_en: 0.06 errors per sentences
- featured_en-US: 0.792 errors persentence
- copyedit-template_en: 0.125 errors per sentence
Summary:
- How many false positives are there?
- Using LanguageTool with the language-variant “en-US” on featured articles causes an extremely high number of false positives in Wikipedia articles. On average, almost every sentence will yield a false positive. This is consistent with the qualitative observations when using the browser-interface of LanguageTool. In fact, the default language-variant in the browser version is “en-US”
- Using the language-variant “en” on featured articles substantially reduces the occurrence of false positives by more than 10-fold to only 1 false positive every 15 sentences or so.
- How precise are errors highlighted by LanguageTool?
- Using the language-variant “en” on copyedit-template articles we find a higher rate of errors (0.126 per sentence) than for featured articles (0.06 per sentence). Assuming that the errors in featured articles correspond to a baseline rate of false positives in all articles, we can approximate the precision by subtracting the baseline rate. Thus, we would have 0.126-0.06=0.066 errors per sentence that are genuine. This would correspond to a precision of 0.066/(0.06+0.066)= 0.524.
- This value for the precision is consistent with the findings in the benchmark corpora (52% vs 53%)
- The value for the precision is likely to be a lower bound (in reality it is higher) since we assume that all found errors in featured articles are false positives where, in fact, some of them might be genuine.
Error types
We can also look at the types of errors LanguageTool detects in Wikipedia articles.
- What sticks out, is that the large fraction of “misspelling” for featured articles when using “en-US”. One interpretation is that errors from the “misspelling”-rule are a main driver of false-positives in Wikipedia articles.
Wikipedia (non-English)
editWe compare the error rate from LanguageTool in articles with the featured article badge (Q17437796) against articles containing the copyedit-template (Q6292692) in the corresponding language.
wiki_db | language-code | featured_n-art | featured_n-sent | featured_n-err | template_n-art | template_n-sent | template_n-err | featured_err-per-sent | template_err-per-sent | prec |
---|---|---|---|---|---|---|---|---|---|---|
enwiki | en | 6090 | 1192321 | 71574 | 1024 | 104403 | 13197 | 0.06 | 0.126 | 0.525 |
simplewiki | en | 30 | 4926 | 286 | 15 | 415 | 66 | 0.058 | 0.159 | 0.635 |
arwiki | ar | 692 | 154990 | 310459 | 512 | 22594 | 58990 | 2.003 | 2.611 | 0.233 |
astwiki | ast | 325 | 71918 | 324711 | 868 | 38430 | 169848 | 4.515 | 4.42 | 0 |
bewiki | be | 88 | 38043 | 65063 | 675 | 36571 | 61446 | 1.71 | 1.68 | 0 |
brwiki | br | 2 | 223 | 448 | 0 | 0 | 0 | 2.009 | - | - |
cawiki | ca | 764 | 145185 | 155363 | 8 | 397 | 538 | 1.07 | 1.355 | 0.21 |
dawiki | da | 17 | 5967 | 5077 | 14 | 819 | 676 | 0.851 | 0.825 | 0 |
dewiki | de | 2730 | 935452 | 102807 | 0 | 0 | 0 | 0.11 | - | - |
elwiki | el | 129 | 30611 | 44224 | 0 | 0 | 0 | 1.445 | - | - |
eowiki | eo | 311 | 70371 | 136043 | 0 | 0 | 0 | 1.933 | - | - |
eswiki | es | 1235 | 350673 | 425496 | 1547 | 99005 | 159589 | 1.213 | 1.612 | 0.247 |
fawiki | fa | 198 | 53013 | 3358 | 16 | 1024 | 141 | 0.063 | 0.138 | 0.54 |
frwiki | fr | 2019 | 679560 | 749826 | 0 | 0 | 0 | 1.103 | - | - |
gawiki | ga | 2 | 509 | 1433 | 0 | 0 | 0 | 2.815 | - | - |
glwiki | gl | 218 | 59451 | 112419 | 209 | 11371 | 21333 | 1.891 | 1.876 | 0 |
itwiki | it | 536 | 124571 | 207444 | 720 | 53209 | 106962 | 1.665 | 2.01 | 0.172 |
jawiki | ja | 92 | 30542 | 399 | 0 | 0 | 0 | 0.013 | - | - |
kmwiki | km | 21 | 930 | 28741 | 6 | 53 | 1506 | 30.904 | 28.415 | 0 |
nlwiki | nl | 365 | 115060 | 87518 | 0 | 0 | 0 | 0.761 | - | - |
plwiki | pl | 944 | 268900 | 220568 | 1 | 22 | 27 | 0.82 | 1.227 | 0.332 |
ptwiki | pt | 1315 | 328326 | 190550 | 1346 | 43865 | 32652 | 0.58 | 0.744 | 0.22 |
rowiki | ro | 196 | 58467 | 70636 | 256 | 15335 | 28486 | 1.208 | 1.858 | 0.35 |
ruwiki | ru | 1627 | 651035 | 480016 | 11 | 1157 | 304 | 0.737 | 0.263 | 0 |
skwiki | sk | 73 | 20324 | 24104 | 0 | 0 | 0 | 1.186 | - | - |
slwiki | sl | 381 | 86241 | 123624 | 212 | 11198 | 16979 | 1.433 | 1.516 | 0.055 |
svwiki | sv | 354 | 79217 | 81458 | 278 | 11898 | 16231 | 1.028 | 1.364 | 0.246 |
tawiki | ta | 14 | 3160 | 787 | 2 | 179 | 161 | 0.249 | 0.899 | 0.723 |
tlwiki | tl | 29 | 3274 | 13667 | 3 | 155 | 930 | 4.174 | 6 | 0.304 |
ukwiki | uk | 233 | 68208 | 40630 | 883 | 64783 | 42342 | 0.596 | 0.654 | 0.089 |
zhwiki | zh | 929 | 154844 | 15396 | 1761 | 57344 | 7259 | 0.099 | 0.127 | 0.215 |
Summary:
- The error rates in enwiki and simplewiki are consistent.
- In most languages, error rates in articles with the copyedit-template is indeed higher than for featured articles.
- However, for most languages the precision is below 0.5 and for some languages the error rate in featured articles is similar to that in articles with copyedit-templates.
Takeaway:
- These results suggest that we should add a post-processing step in which we filter some errors such as certain types (e.g. spelling) or in certain text regions (text of links).
Filtering errors
editWe can now use additional post-processing to filter certain errors and use the above evaluation protocol to assess whether this strategy improves the precision. The idea is to propose such a filter that removes errors that are false positives (those in the featured articles) but keeps as many that are genuine (those in the articles with copyedit-templates).
As a first naive attempt, we use the annotations of the text contained in the HTML of the article:
- we keep track of all substrings that contain any annotation (such as when text is bold, italics, a link, etc)
- we filter an error from LanguageTool if: i) the position of the error overlaps with any of the substrings; or ii) the string of the error does not match any of the substrings with an annotation.
wiki_db | language-code | featured_err-per-sent | featured_err-per-sent-filter | template_err-per-sent | template_err-per-sent-filter | prec | prec-filter | prec-change-ppt |
---|---|---|---|---|---|---|---|---|
enwiki | en | 0.06 | 0.037 | 0.126 | 0.072 | 0.525 | 0.489 | -0.036 |
simplewiki | en | 0.058 | 0.037 | 0.159 | 0.133 | 0.635 | 0.724 | 0.089 |
arwiki | ar | 2.003 | 1.01 | 2.611 | 1.785 | 0.233 | 0.434 | 0.201 |
astwiki | ast | 4.515 | 1.572 | 4.42 | 1.794 | 0 | 0.124 | 0.124 |
bewiki | be | 1.71 | 0.557 | 1.68 | 1.039 | 0 | 0.464 | 0.464 |
brwiki | br | 2.009 | 0.377 | - | - | - | - | - |
cawiki | ca | 1.07 | 0.291 | 1.355 | 0.554 | 0.21 | 0.475 | 0.265 |
dawiki | da | 0.851 | 0.242 | 0.825 | 0.336 | 0 | 0.28 | 0.28 |
dewiki | de | 0.11 | 0.042 | - | - | - | - | - |
elwiki | el | 1.445 | 0.564 | - | - | - | - | - |
eowiki | eo | 1.933 | 0.647 | - | - | - | - | - |
eswiki | es | 1.213 | 0.243 | 1.612 | 0.624 | 0.247 | 0.611 | 0.363 |
fawiki | fa | 0.063 | 0.014 | 0.138 | 0.057 | 0.54 | 0.757 | 0.217 |
frwiki | fr | 1.103 | 0.196 | - | - | - | - | - |
gawiki | ga | 2.815 | 1.525 | - | - | - | - | - |
glwiki | gl | 1.891 | 0.43 | 1.876 | 0.865 | 0 | 0.503 | 0.503 |
itwiki | it | 1.665 | 0.421 | 2.01 | 0.916 | 0.172 | 0.541 | 0.369 |
jawiki | ja | 0.013 | 0.012 | - | - | - | - | - |
kmwiki | km | 30.904 | 16.081 | 28.415 | 18.528 | 0 | 0.132 | 0.132 |
nlwiki | nl | 0.761 | 0.257 | - | - | - | - | - |
plwiki | pl | 0.82 | 0.284 | 1.227 | 0.591 | 0.332 | 0.519 | 0.187 |
ptwiki | pt | 0.58 | 0.314 | 0.744 | 0.461 | 0.22 | 0.318 | 0.098 |
rowiki | ro | 1.208 | 0.287 | 1.858 | 1.127 | 0.35 | 0.745 | 0.396 |
ruwiki | ru | 0.737 | 0.31 | 0.263 | 0.156 | 0 | 0 | 0 |
skwiki | sk | 1.186 | 0.511 | - | - | - | - | - |
slwiki | sl | 1.433 | 0.536 | 1.516 | 0.97 | 0.055 | 0.447 | 0.393 |
svwiki | sv | 1.028 | 0.293 | 1.364 | 0.487 | 0.246 | 0.398 | 0.152 |
tawiki | ta | 0.249 | 0.168 | 0.899 | 0.782 | 0.723 | 0.785 | 0.062 |
tlwiki | tl | 4.174 | 2.009 | 6 | 2.142 | 0.304 | 0.062 | -0.242 |
ukwiki | uk | 0.596 | 0.248 | 0.654 | 0.388 | 0.089 | 0.362 | 0.274 |
zhwiki | zh | 0.099 | 0.062 | 0.127 | 0.097 | 0.215 | 0.355 | 0.141 |
Summary:
- The post-processing step of filtering errors substantially improves the precision of almost all wikis (i.e. it filter relatively more errors in the featured articles than in the copyedit-template articles)
- Other more nuanced filters could lead to further improvements in the precision of LanguageTool.
Takeaways from the evaluation
edit- We can apply LanguageTool to at least 30 Wikipedias running our own instance. Checking the text of Wikipedia articles requires some preprocessing of the text (e.g. to identify only raw text and avoid transcluded content from templates) and post-processing to filter some errors (e.g. avoid correcting anchor text of links)
- LanguageTool can detect a high volume of copyedit errors beyond simple misspellings based on a dictionary-lookup.
- We estimate the precision of the errors of LanguageTool in for English to be around 50% (or higher)
- The concern about a large number of false positives can be mitigated by using the generic language-variant (e.g. “en” instead of “en-US”).
- Applying different filtering of errors can substantially improve the precision of LanguageTool to detect errors in almost all wikis.
Comparison to spell-checker
editIn this section, we look how spellcheckers perform at the same tasks as above. This gives us a good sense how LanguageTool compares to the much simpler spellchecking tools. Specifically, I used the enchant spell-checking library which provides uniform access to spellcheckers in different languages via python. From the projects considered for the evaluation of LanguageTool, I readily found spellcheckers a subset of projects ‘enchant.list_languages()‘ (though many more languages can be installed): enwiki (en-US), simplewiki (en-US), arwiki (ar), cawiki (ca) , dewiki (de_DE), elwiki (el), eswiki (es), fawiki (fa), frwiki (fr), glwiki (gl_ES), itwiki (it_IT), nlwiki (nl), plwiki (pl), ptwiki (pt_BR), rowiki (ro), ruwiki (ru_RU), svwiki (sv), ukwiki (uk).
Benchmark corpus
editdata | #sents | LT-lang | #TP | #FP | #FN | Prec. | Rec. | F0.5 |
---|---|---|---|---|---|---|---|---|
A.train | 10,880 | en_GB | 1,878 | 1,471 | 27,194 | 0.5608 | 0.0646 | 0.2211 |
en_US | 1,925 | 1,720 | 27,147 | 0.5281 | 0.0662 | 0.2205 | ||
B.train | 13,202 | en_GB | 1,249 | 1,764 | 22,968 | 0.4145 | 0.0516 | 0.1722 |
en_US | 1,312 | 2,119 | 22,905 | 0.3824 | 0.0542 | 0.1729 | ||
C.train | 10,667 | en_GB | 423 | 1,288 | 9,233 | 0.2472 | 0.0438 | 0.1282 |
en_US | 460 | 1,692 | 9,196 | 0.2138 | 0.0476 | 0.1259 |
data | #sents | LT-lang | #TP | #FP | #FN | Prec. | Rec. | F0.5 |
---|---|---|---|---|---|---|---|---|
A.train | 10,880 | en_GB | 872 | 2,477 | 27,290 | 0.2604 | 0.031 | 0.1049 |
en_US | 897 | 2,748 | 27,265 | 0.2461 | 0.0319 | 0.1049 | ||
B.train | 13,202 | en_GB | 683 | 2,330 | 22,982 | 0.2267 | 0.0289 | 0.0956 |
en_US | 685 | 2,746 | 22,980 | 0.1997 | 0.0289 | 0.0916 | ||
C.train | 10,667 | en_GB | 271 | 1,440 | 9,207 | 0.1584 | 0.0286 | 0.083 |
en_US | 276 | 1,876 | 9,202 | 0.1283 | 0.0291 | 0.0763 |
Summary:
- There is little difference between using the en_US or en_GB spellchecker
- The performance in error detection is comparable to that of LanguageTool
- The performance in error correction is only about half as good as that of LanguageTool (both in terms of precision and recall)
Wikipedia
editwiki_db | language-code | featured_n-art | featured_n-sent | featured_n-err | template_n-art | template_n-sent | template_n-err | featured_err-per-sent | template_err-per-sent | prec |
---|---|---|---|---|---|---|---|---|---|---|
enwiki | en_US | 6090 | 1235144 | 1221727 | 1024 | 108060 | 148391 | 0.989 | 1.373 | 0.280 |
simplewiki | en_US | 30 | 5045 | 1714 | 15 | 435 | 675 | 0.340 | 1.552 | 0.781 |
arwiki | ar | 692 | 173033 | 1593977 | 512 | 22594 | 136618 | 9.212 | 6.047 | 0.000 |
cawiki | ca | 764 | 145185 | 182387 | 8 | 397 | 535 | 1.256 | 1.348 | 0.068 |
dewiki | de_DE | 2730 | 935452 | 1390710 | 0 | 0 | 0 | 1.487 | - | - |
elwiki | el | 129 | 30611 | 45876 | 0 | 0 | 0 | 1.499 | - | - |
eswiki | es | 1235 | 350673 | 721562 | 1547 | 99005 | 238857 | 2.058 | 2.413 | 0.147 |
fawiki | fa | 198 | 53013 | 205747 | 16 | 1024 | 3370 | 3.881 | 3.291 | 0.000 |
frwiki | fr | 2019 | 679560 | 2332701 | 0 | 0 | 0 | 3.433 | - | - |
glwiki | gl_ES | 218 | 59451 | 57637 | 209 | 11371 | 11756 | 0.969 | 1.034 | 0.062 |
itwiki | it_IT | 536 | 124571 | 220426 | 720 | 53209 | 117756 | 1.769 | 2.213 | 0.200 |
nlwiki | nl | 365 | 115060 | 109633 | 0 | 0 | 0 | 0.953 | - | - |
plwiki | pl | 944 | 268900 | 198679 | 1 | 22 | 29 | 0.739 | 1.318 | 0.439 |
ptwiki | pt_BR | 1315 | 328326 | 433376 | 1346 | 43865 | 85232 | 1.320 | 1.943 | 0.321 |
rowiki | ro | 196 | 58467 | 70200 | 256 | 15335 | 27174 | 1.201 | 1.772 | 0.322 |
ruwiki | ru_RU | 1627 | 651035 | 1069748 | 11 | 1157 | 845 | 1.643 | 0.730 | 0.000 |
svwiki | sv | 354 | 79217 | 146754 | 278 | 11898 | 26267 | 1.853 | 2.208 | 0.161 |
ukwiki | uk | 233 | 68208 | 75141 | 883 | 64783 | 72344 | 1.102 | 1.117 | 0.013 |
Summary:
- For featured articles, the error rate from the spellcheckers is much higher than that for LanguageTool (e.g. in enwiki we find about 1 error per sentence compared to 0.06 errors per sentence when using LanguageTool). In general this translates into lower precision for spellcheckers.
Filtering errors
editwiki_db | language-code | featured_err-per-sent | featured_err-per-sent-filter | template_err-per-sent | template_err-per-sent-filter | prec | prec-filter | prec-change-ppt | |
---|---|---|---|---|---|---|---|---|---|
1 | enwiki | en_US | 0.989 | 0.253 | 1.373 | 0.521 | 0.280 | 0.514 | 0.235 |
2 | simplewiki | en_US | 0.340 | 0.079 | 1.552 | 0.563 | 0.781 | 0.859 | 0.078 |
3 | arwiki | ar | 9.212 | 6.703 | 6.047 | 5.007 | 0.000 | 0.000 | 0.000 |
4 | cawiki | ca | 1.256 | 0.268 | 1.348 | 0.411 | 0.068 | 0.348 | 0.280 |
5 | dewiki | de_DE | 1.487 | 0.428 | - | - | - | - | - |
6 | elwiki | el | 1.499 | 0.600 | - | - | - | - | - |
7 | eswiki | es | 2.058 | 0.303 | 2.413 | 0.779 | 0.147 | 0.611 | 0.464 |
8 | fawiki | fa | 3.881 | 1.735 | 3.291 | 2.102 | 0.000 | 0.174 | 0.174 |
9 | frwiki | fr | 3.433 | 0.680 | - | - | - | - | - |
10 | glwiki | gl_ES | 0.969 | 0.182 | 1.034 | 0.487 | 0.062 | 0.627 | 0.565 |
11 | itwiki | it_IT | 1.769 | 0.378 | 2.213 | 0.929 | 0.200 | 0.593 | 0.393 |
12 | nlwiki | nl | 0.953 | 0.201 | - | - | - | - | |
13 | plwiki | pl | 0.739 | 0.222 | 1.318 | 0.591 | 0.439 | 0.624 | 0.185 |
14 | ptwiki | pt_BR | 1.320 | 0.243 | 1.943 | 0.618 | 0.321 | 0.608 | 0.287 |
15 | rowiki | ro | 1.201 | 0.257 | 1.772 | 1.059 | 0.322 | 0.757 | 0.435 |
16 | ruwiki | ru_RU | 1.643 | 0.465 | 0.730 | 0.303 | 0.000 | 0.000 | 0.000 |
17 | svwiki | sv | 1.853 | 0.582 | 2.208 | 0.801 | 0.161 | 0.273 | 0.112 |
18 | ukwiki | uk | 1.102 | 0.363 | 1.117 | 0.620 | 0.013 | 0.415 | 0.402 |
Summary:
- The filtering of errors substantially increases the precision of spellcheckers. The resulting approximated precision after filtering is comparable to that of LanguageTool but still remains systematically lower.
Takeaways
edit- Spellcheckers can also detect and surface many meaningful errors for copyediting
- Spellcheckers seem to suffer from a much higher rate of false-positives than LanguageTool. This can be partially addressed by imposing aggressive post-processing filters on the surfaced errors.
- LanguageTool has a clear advantage in suggesting the correct improvement (spellcheckers perform substantially worse in error correction)
- Since spellcheckers are available in many languages, they can serve as a backup solution for languages which are not supported by LanguageTool.
- Given the higher rate of false positives when using spellcheckers, it would be desirable to develop a model that can assign a confidence score to the surfaced errors so that the structured task could prioritize surfacing those errors which were assigned a high confidence score.