Meta:Requests for comment/Temporarily stop MinT translation
Statement of the issue
editRecently, the newly introduced tool "MinT" has caused a lot of problems in the translation of en-ja. The quality of its translations is extremely poor, even compared with common machine translation tools. What's worse is that this tool is available to any registered user and can be used to spam translations that are difficult to remove. After preliminary testing, a Japanese native speaker @Omotecho: and I both discovered this problem. We hope to initiate this discussion and request relevant engineers to pay attention to this problem and suspend the use of this tool as soon as possible.--Lemonaka (talk) 07:13, 6 October 2023 (UTC)
- Allow me put the RfC into ja language:
- コメント募集:
- 最近、ツール「MinT」が新しく導入され、英日翻訳に多くの問題が発生しています。その翻訳の品質は一般的な機械翻訳ツールと比較しても非常に悪いです。さらに悪いことに、このツールは登録ユーザーであれば誰でも利用でき、削除が難しい訳文のスパム送信に使用される可能性があります。予備テストの後、日本語ネイティブ話者 Omotecho と私はこの問題を発見しました。私たちはこの議論を開始し、関係する技術者各位にこの問題に注意を払い、このツールの使用をできるだけ早急に中止するよう要請したいと考えます。(訳文文責 --Omotecho (talk) 13:50, 7 October 2023 (UTC))
- FYI, the system composing MinT is noted here per Diff, dated 13 June 2023 by Pau Giner. Omotecho (talk) 13:47, 13 October 2023 (UTC)
Comments
edit- Support To better understand the relevance of this problem, I think it is preferable to give some figures and show how deep the impact is on English to Japanese translation. Your request seems based on best translation practices and common sense; still, a more detailed report might be of use for further decisions about this matter. Thank you. --M/ (talk) 18:56, 6 October 2023 (UTC)
- Special:Diff/25705731/25704876 an example for that. Lemonaka (talk) 23:42, 6 October 2023 (UTC)
- Support Original translation segment reads (en)[1]:
- This page has been moved to "$1" on [[$2|Wikimedia Foundation Governance Wiki]] where you can provide translations and feedback on it.
- MinT suggests:
- このページは,翻訳と反省を提供できる [[$2のウィキメディア財団統治ウィキ]]の"$1"に移動されました. (ja)
- ja language doesn't normally apply punctuation marks as "
,
" or ".
", but either "、
" or "。
"; - The translation engine picks up the part "where you can provide translations and feedback on it" (en) and places it before the subject [[$2|Wikimedia Foundation Governance Wiki]] in ja language, which destroys the original meaning;
- in this sentence, "can" is not talking about ability but a soft nudge to do something, a weak imperative form (命令形);
- anyhow, I admit a human translator often puts a "can" into "you are able to" in this context, too, which is not correct either;
- or neither sample (a) (translate.google.com) will output correctly (see bottom of this comment);
- MinT takes the term "feedback" (en) and outputs as "反省" (ja), which actually overwrites the translation database that will give you transcription of the en term as "フィードバック" (ja), or a fixed wiki jargon.[2]
- FYI,
- sample (a) (translate.google.com):
- このページは [[$2|Wikimedia Foundation Governance Wiki]] の "$1" に移動されており、ここで翻訳やフィードバックを提供できます。[3].
- sample (b) as my liberal translation:
- このページは[[$2のウィキメディア財団統治ウィキ]]の"$1"に移動されましたので、翻訳とフィードバックをお寄せください。
- +1 support --Omotecho (talk) 17:33, 7 October 2023 (UTC) / Omotecho (talk) 17:29, 7 October 2023 (UTC)
- Support Original translation segment reads (en)[1]:
- Special:Diff/25705731/25704876 an example for that. Lemonaka (talk) 23:42, 6 October 2023 (UTC)
- Support I've had some pretty weird MinT suggestions in translating into French. Edward-Woodrow (talk) 19:11, 7 October 2023 (UTC)
- Support This translation tool doesn't even use Japanese punctuation, and still uses English half-width punctuation. Additionally, the tool sometimes does not retain the equal signs in the original text, preventing it from becoming a title. I think these problems that have been solved by other translation tools are very obvious. ——Rinna (Talk) 04:00, 8 October 2023 (UTC)
- While I share the frustration (and got MinT disabled for Cantonese/yue), this doesn't seem like an appropriate venue to garner support since this isn't an exclusively Meta-Wiki issue. h78c67c (talk) 07:04, 8 October 2023 (UTC)
- @H78c67c, hello, be asured that the issue is tracked in Phabricator: Task T348361, whic is an open issue. Cheers, -- Omotecho (talk) 13:30, 8 October 2023 (UTC)
- @Omotecho: I sincerely apologise for the misunderstanding. Given your announcement on translators-l I was under the impression that the RFC seeks to disable MinT globally, since its low quality output has wider impact than just on Meta-Wiki. But yes, I do Support disabling it here. I'll also take the opportunity to express my disappointment towards the fact that MinT is being rolled out more broadly with little concern for its quality. h78c67c (talk) 05:54, 9 October 2023 (UTC)
- @H78c67c, thank you so much for your fairness to come back and to reply me: as double byte language might share some similar concerns, your input is very valued: like the half-space punctuation like period and comma, if not grammatical mismatch, or ja language tends to skip/omit the second time a subject of a compound sentence should other wise given for the convenience for computed translation. Again, appreciate your hardwork for i18n efforts, I remain. Arigatow, -- Omotecho (talk) 06:25, 9 October 2023 (UTC)
- @Omotecho One of the more obvious issues that I have observed across Japanese and Chinese output from MinT is the punctuation, as you and others have mentioned. I haven't had the time to dig deeper into how the NLLB model was trained, but it seems like they might have "normalized" punctuation into half-width counterparts and irreversibly destroyed the information and nuances they carry. A quick look at their human evaluation dataset suggests that this is expected behaviour, which would be absurd if true. I see that @UOzurumba (WMF) has addressed punctuation issues with other languages below, but I don't see any straightforward fix for Japanese and Chinese, unlike the fix for Devanagari Danda produced by the indictrans2 model they linked to. h78c67c (talk) 07:11, 13 October 2023 (UTC)
- @H78c67c, thank you so much for your fairness to come back and to reply me: as double byte language might share some similar concerns, your input is very valued: like the half-space punctuation like period and comma, if not grammatical mismatch, or ja language tends to skip/omit the second time a subject of a compound sentence should other wise given for the convenience for computed translation. Again, appreciate your hardwork for i18n efforts, I remain. Arigatow, -- Omotecho (talk) 06:25, 9 October 2023 (UTC)
- @Omotecho: I sincerely apologise for the misunderstanding. Given your announcement on translators-l I was under the impression that the RFC seeks to disable MinT globally, since its low quality output has wider impact than just on Meta-Wiki. But yes, I do Support disabling it here. I'll also take the opportunity to express my disappointment towards the fact that MinT is being rolled out more broadly with little concern for its quality. h78c67c (talk) 05:54, 9 October 2023 (UTC)
- @H78c67c, hello, be asured that the issue is tracked in Phabricator: Task T348361, whic is an open issue. Cheers, -- Omotecho (talk) 13:30, 8 October 2023 (UTC)
- Support MinT's accuracy is very low and it has never helped me. --Tmv (talk) 06:54, 9 October 2023 (UTC)
- Support I consider MinT of a poor quality regarding translations from English to Polish (this is what I've done). It frequently comes up with words that do not exist or skips some final letters. The general gramatical coherence is not well too. The issues are particularly visible when translating wiki-specific terminology but can be seen in general texts as well. The translation engine often breaks wiki markup like wikilinks or dollar-replacements or even skips these completely. Msz2001 (talk) 08:09, 9 October 2023 (UTC)
- @Msz2001, hello, would you care to give a sample that you find MinT problematic please? Maybe from metawiki, or any wiki project you were annoyed with MinT outout in Polish? I am zero with Polish and your insight is very valued. Cheers, Omotecho (talk) 05:22, 10 October 2023 (UTC)
- This is a sample extract from Interface editors:
- English source:
This global group, named $int ($group, also known as '''global interface admins'''), is enabled on every Wikimedia wiki that shares access via CentralAuth and SUL (<small>including [[$closed|locked (closed) wikis]] and excluding [[$private|private]] and [[$fishbowl|fishbowl]] wikis</small>). Projects may not opt out. However, interface editors '''must not''' make any controversial edits to the interface on projects where they are not in a local user group that grants this user right.
- Polish MinT output:
Ta globalna grupa, zwana $int ($group, znana również jako "administratorzy globalnego interfejsu"), jest włączona na każdym wiki Wikimedia, które udostępnia dostęp za pośrednictwem CentralAuth i SUL (w tym zamkniętych wiki wiki w wiki w wiki wiki wideo wiki wikis wiki wki wiki wpisujące w wiki w wiki wki w wiki w wiki) Projekty nie mogą się wycofać. Jednak redaktorzy interfejsu "nie powinni" dokonywać kontrowersyjnych edycji interfejsu w projektach, w których nie są w lokalnej grupie użytkowników, która przyznaje temu użytkownikowi prawo. włączając wiki zamknięte i z wyłączeniem wiki prywatne i rybne
- Google translation for comparison:
Ta globalna grupa, nazwana $int (grupa $, znana również jako „administratorzy interfejsu globalnego”), jest włączona na każdej wiki Wikimedia, która ma dostęp poprzez CentralAuth i SUL (<small>w tym [[$closed|locked ( zamknięte) wiki]] i z wyłączeniem wiki [[$private|private]] i [[$fishbowl|fishbowl]] wiki</small>). Projekty nie mogą zrezygnować. Jednakże redaktorom interfejsu „nie wolno” dokonywać żadnych kontrowersyjnych zmian w interfejsie w projektach, w których nie należą do lokalnej grupy użytkowników, która przyznaje to prawo użytkownikowi.
- Issues:
- Strange wiki wiki... part instead of the <small> text
- The parenthesised text about closed wikis etc. got moved to the very end and without any trace of wikilinks originally present
- must not got translated as nie powinni which is should not
- Messing up with wiki syntax – acceptable as these are general-purpose services not tuned for wikitext markup in texts
- Another example: By default, appointment will be temporary, lasting any time up to a year. was translated into W przypadku, gdy osoba jest zatrudniona w ramach programu, to jest ona niepełnosprawna. I don't even know how it did come up with the translation because it means If the person is employed as a part of a program, he/she is disabled then.
- English source:
- Msz2001 (talk) 09:34, 10 October 2023 (UTC)
- wiki wiki w wiki w wiki wiki wideo wiki wikis wiki wki wiki wpisujące w wiki w wiki wki w wiki w wiki
wiki wiki w wiki w wiki wiki wideo wiki wikis wiki wki wiki wpisujące w wiki w wiki wki w wiki w wikithis might be something buggy. Lemonaka (talk) 21:26, 10 October 2023 (UTC)
- wiki wiki w wiki w wiki wiki wideo wiki wikis wiki wki wiki wpisujące w wiki w wiki wki w wiki w wiki
- This is a sample extract from Interface editors:
- @Msz2001, hello, would you care to give a sample that you find MinT problematic please? Maybe from metawiki, or any wiki project you were annoyed with MinT outout in Polish? I am zero with Polish and your insight is very valued. Cheers, Omotecho (talk) 05:22, 10 October 2023 (UTC)
- Oppose MinT translations are useful as a "base" translation; it is not perfect and probably will always (for the time being, while it is improving) carry some inaccuracies, but you can always take the translation and improve it directly while on the Translate extension. Turning it off now on Meta would hamper MinT's improvement and learning process. dwadieff ✉ 14:54, 13 October 2023 (UTC)
- Support So far, I haven't found a single translation for which MinT would help me to translate the page, and I found dozens of cases in which the MinT translation actually say something different than the English original. I'd understand this if the mistakes in the automated translations would be related to wiki-terms or the usage of wikitext (which is often malformed). However, it appears to me it is having troubles with understanding rules of English grammar, which is a more fundamental issue. It also mistranslates common English words that are not at all wiki-specific. Personally, I've learned to ignore the suggestions in the right panel during my translation efforts and I hardly ever use them. As such, for myself, it doesn't matter much whether MinT is there or not. However, I've witnessed users who "translate" by picking the most suitable translation from the panel, which is super difficult to fix (one usually ends up deleting everything by that user, since it is too labor-intensive to separate acceptable translations from nonsense ones) and I'm afraid MinT would make that worse. Sincerely, --Martin Urbanec (talk) 07:36, 15 October 2023 (UTC)
- Comment This doesn't solve the general issue (general availability of a – at least sometimes – poor service) but can make it easier not to click on MinT suggestions thinking these are local messages from other pages: I've created a script that marks MinT suggestions visually and renders them with reduced opacity and at the very end of suggestion list. MinT output is still visible and clickable (in case it's useful) so that users can assure the quality hasn't yet improved. The script is available at User:Msz2001/common.js and can be enabled by typing
importScript('User:Msz2001/discourage-mint.js');
in one's common.js. Example screenshot. Msz2001 (talk) 10:52, 20 October 2023 (UTC)
Hello all,
The Language team has the following action plans to address most of the concerns raised in this thread:
- A mitigation approach to hide MinT suggestions for content with markup that can break the translation; this will eliminate the wikitext mismatch that destroys the original meaning of translation, as highlighted by @Omotecho, @Lemonaka, Haytham abulela and others.
- Improve MinT punctuation support for Japanese; the team will analyze the current situation and explore options to improve the punctuation support as captured in this ticket.
- Unexpected sequence of repeated words as reported by @Maz2001; you can find more details in the ticket.
You can track the progress of the above action plan from the Phabricator tickets.
We appreciate your patience as we work to address the following concerns. Best regards, UOzurumba (WMF) (talk) 06:46, 26 October 2023 (UTC)
- Hello @Omotecho, @Lemonaka, Haytham abulela and others. The above mitigation approach to hide MinT suggestions for content with markup that can break the translationhas been resolved. Thank you for your patience. UOzurumba (WMF) (talk) 02:59, 8 January 2024 (UTC)
Issues found in Arabic that might be affecting other languages
editSince implementation, the quality of suggestions offered by MinT for Arabic were generally of poor quality. Not only the terms common in the movement were translated into equivalents not suitable for movement context, common placeholders such as "$1" were translated as dollar amounts and links provided in single or double square brackets were at times garbled or mistranslated. Another issue was when MinT suggestions were placed above previously translated matches that were an 80% match or more, and at times MinT suggestions were the only suggestion provided even though there are matches in the memory of at least 75% match.
It would be beneficial for the movement to train translation services using movement data rather than rely on data from the web, since the quality of such data from the web is of questionable quality.
Haytham Abulela talk 16:03, 8 October 2023 (UTC)
Reply from the WMF Language team
edit日本語訳は下記※をご参照ください。Translation in ja follows※:--Omotecho (talk) 06:39, 27 December 2023 (UTC) The WMF Language team acknowledges that we have seen these comments and are working to address some of them, as has been captured in the Phabricator ticket.
However, we would like to clarify that this request (statement of issue) is about translation suggestions from the MinT service when using the Translate extension to translate pages marked for translation. So, the request is specifically focused on the translation of marked pages from English to Japanese, and the Translate extension is the only place the MinT service is provided for the Japanese language. The above is to emphasize that this page's heading could be clarified to avoid misunderstandings.
Furthermore, we want to state that the quality of the MinT translation should improve over time as it is being used and edited by translators. Currently, translating Wikipedia articles with Content Translation or contributing to Tatoeba are two easy ways to generate more quality data to improve the models. We also plan to integrate localization data from the Translate extension (more details in this ticket). In addition, contributing more Wikipedia-specific data will result in translations that align better with the community's expectations.
The quality of MinT varies from language to language and also depends on the content and context; people from different communities such as Kashmiri, Igbo or Icelandic have found MinT to be useful for translating into their languages (see links for more context). The Language team has incorporated corrections for recurring issues, such as problems with punctuation on some non-latin languages, and is working to improve the translations of structured contents such as wikitext and html.
The above is to say that MinT is in active development and has room for improvement. Still, for polishing a system that supports over 200 languages, it is very useful to expose it to the communities in ways that they can help make it better. Drawing from our experience with publishing limits on Content Translation, we will also explore ways to address the concerns raised by Lemonaka about the tool being available to any registered user, and hence being susceptible to spam. Thank you!
On behalf of the WMF Language team UOzurumba (WMF) (talk) 14:56, 12 October 2023 (UTC)
- @UOzurumba (WMF), you are so nice to keep us updated, thank you so much.
- Allow me to note: under Comments section, it is introduced that there are in-progress and planned works at the Tech team, dated 06:46, 26 October 2023 (UTC). -- Omotecho (talk) 12:22, 28 October 2023 (UTC)
- ※ 日本語訳
== WMF 言語チームからの返信 ==
WMF 言語チームは、これらのコメントを見たことを認めそのいくつかの対処に取り組んでおり、Phabricator チケット に記録しています。これらのコメントを確認し、その一部に対処するために取り組んでいることを認めます。
しかしながら、この申請 (問題提起) とは、翻訳対象としてマークされたページを翻訳拡張機能を使って翻訳する場合に MinT サービスが発する翻訳提案に関するものと明確にしておきたいと思います。したがってこの申請トは、翻訳対象としてマークされたページの英語から日本語に翻訳することに特化しており、、MinT サービスが日本語に対して提供されるのは翻訳拡張機能をおいて他にありません。上記で強調しようとする点は、このページの見出しは(文言を)明確化し、誤解を避けるべきと示すためです。
さらに、MinT翻訳の品質は、時間の経過とともに翻訳者によって使用および編集されるにつれ向上するはずであると述べておきたいと思います。現在、ウィキペディアの記事翻訳においてモデル改善に高品質のデータを生成するには、コンテンツ翻訳拡張機能の使用もしくはTatoeba に反映する方法が、簡便な2つの方法として採用できます。当チームはまた、翻訳拡張機能から得たローカライズ用データを統合する予定です(詳細はこのチケットご参照)。さらに、ウィキペディア固有のデータをより多く提供することで、コミュニティの期待に沿った翻訳が得られるはずです。
MinTの品質は言語によって異なり、内容や文脈によっても変わり、さまざまなコミュニティの人々としてスワヒリ語、イボ語(English)、アイスランド語(Icelandic)ではMinT がそれぞれの言語への翻訳に役立つと述べています(詳細は左記のそれぞれのリンクを参照してください)。 言語チームは繰り返し発生する一部の非ラテン言語の句読点に関する問題などの問題修正に組み込み、かつまたウィキ文、html文など構造化コンテンツの翻訳改善に取り組んでいます。
上記の趣旨は、MinTは現在開発中である点、改善の余地がある点を示すことにあります。そうであっても、200超の言語に対応するシステムを磨き上げるには、コミュニティに対するシステム公開を介し、システムの改善に貢献してもらうことが非常に重要です。当チームが経験したコンテンツ翻訳の公開制限という経験を活かし、このツールについて提起された懸念として、登録利用者なら誰でも利用でき、スパムの影響を受けやすいというLemonakaさんのご指摘への対処方法も検討する予定です。よろしくお願いします!
ウィキメディア財団言語チーム一同を代表して UOzurumba (WMF) (talk) 14:56, 12 October 2023 (UTC)
- @UOzurumba (WMF)さん、いつも最新情報を教えてくれて、本当にありがとうございます。
- 注記しておきます:コメント欄配下に技術チームにて、2023年10月26日06:46 (UTC) 時点で進行中および計画中の作業があると紹介されています。 --Omotecho (talk) 12:22, 28 October 2023 (UTC)
- To make it shorter, I have to lead your understanding that reality is not as stated above in your sentence. The jawp community has negotiated and your team has agreed jawp will not apply CX2, or Translation Extension, way before MinT came into footlight. I advise you will confirm the history please.
- I wished we would have somehow shared understanding that the kind of technical benchmarking involving MinT should be done outside final product, or the wiki projects.
- Why do we have to minimize the scope of the issut to a single language pair en-ja, on what data?
- The jawp does NOT use CX2, or translation extension for two years;
- MinT is NOT limited to Wikipedia, it also is applied on meta wiki translation.
- We are looking at different pictures, and how do we discuss anything in such condition anyway.
- 短くまとめますが、現実はあなたが上記に述べられたとおりではない点をご理解ください。MinT が脚光を浴びるずっと前に、日本語版ウィキペディア(jawp)のコミュニティは交渉を行い、あなたのチームは、jawp が CX2 (翻訳拡張機能) を適用しないことに同意しました。履歴を確認されることをお勧めします。
- MinT に関係する技術的なベンチマークは、最終製品または Wiki プロジェクトの外で行うべきであるという理解を何らかの形で共有できればよかったと思いました。
- なぜ、英語日本語の1つの言語ペアに最小限化する必要があるのでしょうか? 根拠となるデータは?
- jap は CX2、つまり翻訳拡張機能を 2年間、採用していません。
- MinT はウィキペディアのみならず、メタウィキの翻訳にも導入されています。
- 私たちは別々の絵柄を見ており、そのような状態でどうやって議論が成立するのでしょう。
--added translation to User:UOzurumba (WMF)'s reply, pardon me for the time-lag. Omotecho (talk) 06:39, 27 December 2023 (UTC) /訳文の追加と返信。
Proposal:RFC closed
editI'm proposing to close this RFC, since we got reply from Wikimedia Foundation Language team, nothing more can be done here.--Lemonaka (talk) 04:34, 27 December 2023 (UTC)
- @Omotecho Lemonaka (talk) 04:35, 27 December 2023 (UTC)
- Tell me, why do keep this RfC open anymore?
- The picture Language team drew for us is a bad dream, as facts are bended. As they have minimized the issue to en-ja language pair, and as far as jawp does not use CX2, no way writers of Japanese language could contribute any usage data to MinT through CX2. Am I wrong?
- Linguistically, any language very distant from ja language in grammer and structure will no doubt enjoy MinT, and my applause for Language team contributing to share Human Knowledge. That mirrors for Japanese language, my hunch tells me that we need more linguists to match patterns MinT is prone to produce good translation.
- if the language or tech team at large will consider a testing ground and leave at least fr, ar and ja wikipedias alone, count me in to input data for them. But as I have mentioned above, jawp is not conntected to MnT, no way for me to support the project.
- the subject of Tatoeba is out of the scope: sorry but my hands are tied to try out that one.
- I am busier on metawiki: Language team does not point out MinT is also extended to metawiki translation.
- AFAIK, we are risking and misleading admins and other people on bigger issues such as Movement Strategy 2030, or AutoModerator tools, not to mention Board member elections.
- Wish to prevent consuming right holders' time by having them double check against /en page... Omotecho (talk) 07:01, 27 December 2023 (UTC)