Research talk:Expanding Wikipedia articles across languages/Work log/2017-11-17

Friday, November 17, 2017 Edit

Can automatic translation replace section alignment? Edit

In our recent meeting, we discussed how given the advancements in automatic translation we may be able to drop the task of section alignment altogether and simply have the sections translated automatically, or at the very least, use the automatic translation as a reasonable baseline for the section alignment algorithms we're developing. We decided to give this a try and Diego used Google Translation for the top ~9K sections (section with more than 100 ocurrencies) used in English Wikipedia to translate them to de, es, and fa. (Note that these 9274 sections cover 81% of all section occurrences in enwiki.) Some hand-labeling of the results for fawiki shows that while the translations in many cases are understandable by a human to have the same meaning as the sections used in fawiki, they are not exactly matching those sections. The exact matches in the top 20 sections from which we could label 17 are 5 which puts us at 30% accuracy in that set. In Spanish, from the top 20, we label 18, with 16 perfect matches, meaning 89% of accuracy.

