Learning patterns/Proofreading large amounts of text
A learning pattern forwiki design
Proofreading large amounts of text
problemProofreading large amounts of text is a very demanding task. The process should be planned intelligently and operate in an "always learning" mode.
solutionThe best advice is to classify the errors we want to correct in different types as soon as possible depending on the kind of supervision they need.
created on10:50, 5 June 2016 (UTC)
What problem does this solve?Edit
Catalan Wikipedia needed a thorough linguistic review (spell checking and grammar). This was a daunting task. With the help of the proofreading software LanguageTool we have made some progress. The most important lesson we have learned in the process is apparently trivial. But the more strictly you follow this advice, the better.
What is the solution?Edit
The errors should be classified depending on the kind of supervision they need.
- Errors that can be corrected always automatically. Of course, you must be absolutely sure that it is always fine to apply the correction. This implies that you don't change words in other languages (in Catalan, we have to take care specially of words in Spanish, Portuguese, French and Italian) or in non-standard language (old or dialectal).
- Errors that need supervision. It is enough to look a few words around the error in order to know if the correction is appropriate.
- Errors that need very careful supervision. You need to read probably the whole paragraph or even the whole article. For example, in Catalan, "hivernar/hibernar".
Moreover, some simple errors can be found in the online Wikipedia, but errors that need a full morphosyntactic analysis are to be found in the Wikipedia dump.
Things to considerEdit
When to useEdit
- IEG grant: Proofreading semiautomatically the Catalan Wikipedia with LanguageTool
- Some scripts (with documentation) used for proofreading the Catalan Wikipedia.