Language EngineeringEdit


Time: 17:00-18:00 UTC
Channel: #wikimedia-office
Timestamps are in UTC.
17:00:09 <arrbee> #startmeeting Language Engineering monthly office hour - October 2014
17:00:26 <arrbee> Hello everyone
17:00:32 <arrbee> Welcome to the monthly office hour of the Wikimedia Language Engineering team
17:01:01 * arrbee is Runa, the Outreach co-ordinator for our team
17:01:21 <arrbee> Our office hours are held every 2nd Wednesday of the month
17:01:43 <arrbee> We delayed by a week this month due to the holiday season in India
17:02:07 <arrbee> Our last office hour was held on September 10th, 2014. The logs are at:
17:02:15 <arrbee> #link
17:02:49 <Niharika> o/
17:02:59 <arrbee> Before we begin, please be aware that the chat will be logged and posted on a publicly accessible wiki
17:03:08 <arrbee> Hey Niharika .. good to see you
17:03:23 <Niharika> Good to see you too, arrbee! :)
17:03:25 <arrbee> A quick introduction of the team
17:04:12 <arrbee> We are the Wikimedia Language Engineering team. Besides me, present today are aharoni kart_ jsahleen Nikerabbit divec santhosh pginer
17:04:53 <arrbee> We build and maintain language features and tools for the wikis in more than 300 languages and support the wiki communities around the world
17:05:34 <arrbee> Our team page on has details about the projects we work on and how you can participate in them:
17:05:37 <arrbee> #link
17:06:38 <arrbee> Since our last office hour, we have continued work on Content Translation and also made updates to CLDR and Translate
17:07:13 <arrbee> For Content Translation, we completed the 2nd release last month
17:07:39 <arrbee> September 30th to be precise
17:07:46 <arrbee> The first was the MVP release earlier in July
17:08:04 <arrbee> The announcement for the new version can be found at:
17:08:22 <arrbee> #link
17:09:16 <arrbee> However, due to technical difficulties with the setup on beta-labs the updated code is not yet available for wider testing and use
17:09:42 <arrbee> We hope to sort out the problems very soon
17:10:10 <arrbee> kart_ has been actively working with the Ops team on this
17:10:46 <arrbee> particulaly with akosiaris. Thanks a lot for the help.
17:11:11 <kart_> \0
17:11:36 <arrbee> Meanwhile we have already started working on the features for the next version, which is scheduled for release on November 18
17:12:10 <arrbee> In the latest version we have introduced a basic formatting toolbar for Google Chrome
17:12:28 <arrbee> We have also activated bidirectional machine translation for Spanish and Portuguese
17:13:11 <arrbee> However, due to the blocker in the labs setup, users will have to wait for some more time to use these 2 languages reliably
17:13:53 <arrbee> santhosh: would you like to talk a bit about the other features and the new features planned for the next release?
17:14:38 <santhosh> We were working on improving the performance and polishing the existing features
17:14:46 <santhosh> We worked on several infrastructure components.
17:14:59 <santhosh> Overall it should improve the responsiveness of the tools.
17:15:10 <santhosh> We are adding automatic category adaptation feature now.
17:15:27 <santhosh> We will soon start work on a translation dashboard
17:16:18 <Romaine> is there any plan yet to add other languages?
17:16:48 <arrbee> Romaine: yes. Unfortunately, we are blocked on the lab setup.
17:16:55 <santhosh> And some early prepartion for allowing a translator to save and resume later -probably by another translator
17:17:10 <arrbee> Romaine: we will begin testing more language pairs for machine translation support
17:17:44 <Romaine> I hope Dutch can be such language
17:18:16 <Romaine> I have had the opportunity to extensive test another translating tool earlier and I am veru much looking forward to this tool
17:18:34 <santhosh> Romaine: once we prepare our infrastructure - that is what we are slightly blocked, we should be able to focus on increasing language coverage
17:18:59 <santhosh> we will evaluate more language pairs
17:19:17 <Romaine> in what time frame you expect such to be happen?
17:20:15 <santhosh> by the end of this month we are expecting our infrastructure issues resoved. We have to deploy the current language pair- Spanish-catalan to production then
17:20:30 <santhosh> Also open up catalan->spanish.
17:21:10 <santhosh> The next candidate language we have is Portuguese-spanish
17:21:24 <santhosh> Mainly because we use Apertium MT.
17:22:04 <santhosh> We will evaluate all language pairs we can support with Apertium first.
17:22:39 <arrbee> Romaine: we hope to have reliable testing infrastructure (not beta) to be ready for preliminary testing of more language pairs before the end of this month
17:23:04 <santhosh> Romaine: You are interested in English-Dutch?
17:23:20 <Romaine> yes
17:23:33 <Romaine> German-Dutch orFrench-Dutch is fine as well
17:23:47 <arrbee> all 3 seem to be in Apertium incubator
17:23:56 <arrbee> Apertium supports Afrikaans and Dutch as a pair.
17:24:14 <arrbee> Romaine: have you ever tried this pair?
17:24:40 <Romaine> I was trying to but maybe I did not understand the tool
17:25:20 <arrbee> on
17:26:18 <Romaine> not there, apparantly I should have
17:26:49 <Romaine> (in earlier years I have tested the translate tool , but that is a totally different project I think)
17:27:49 <Romaine> in I can only select a language and translate to Portguease, Catalan or Spanish
17:28:41 <arrbee> Romaine: I can select Dutch-Afrikaans at
17:28:54 <arrbee> and the other way around
17:29:10 <Romaine> yes, that is the only option together with Dutch
17:29:14 <aharoni|mobile> CoSyne is a pretty cool project, but AFAIK it's more about comparing current language versions of articles (but it's possible that I don't remember correctly)
17:29:31 <arrbee> Romaine: are you familiar with Afrikaans?
17:29:39 <Romaine> I can read Afrikaans
17:29:47 <arrbee> oh nice
17:29:51 <Romaine> people in the Netherlands can pretty well understand it
17:30:03 <aharoni|mobile> Apertium are very keen about releasing pairs that they are sure about.
17:30:18 <arrbee> Romaine: if you do get the time to check out the pair, we would be very interested to hear what you think of the translation
17:30:31 <Romaine> sure I can test
17:30:33 <aharoni|mobile> So it may work only in one direction for some languages.
17:30:54 <arrbee> Romaine: Thanks a lot.
17:31:24 <Romaine> is there a place where I can report/discuss the things I notice?
17:31:33 <aharoni|mobile> And yes, please test Afrikaans and tell us what do you think about it'd quality. Such feedback is very important to us.
17:31:54 <Romaine> where should I post/send/etc the feedback?
17:32:00 <arrbee> Romaine: ContentTranslation project talk page, bugzilla, #mediawiki-i18n, direct emails to any of us
17:32:07 <arrbee> all work
17:32:16 <Romaine> I will find one :)
17:32:32 <arrbee> #link
17:32:35 <arrbee> Romaine: ^^
17:32:51 <Romaine> personally I would have expected German-Dutch as well, bot languages are very similar to each other
17:33:10 <arrbee> Its in the incubator
17:33:43 <aharoni|mobile> Or Dutch-Frisian :)
17:33:51 <arrbee> It would have been desirable to have that pair, and some others ready in production :)
17:34:10 <Romaine> or Norwegion bokmal-Dutch
17:35:30 <Romaine> (CoSyne is a tool to compare current language versions of articles but also about translating articles)
17:35:58 <Romaine> if more of these languages come available I am happy yto test it
17:36:09 <arrbee> Thanks a lot :)
17:37:00 <arrbee> pginer has been testing the Spanish-Portuguese & Portuguese-Spanish translations with users
17:37:14 <Romaine> I can organise such for Dutch
17:37:48 <arrbee> Romaine: I will make sure we get in touch with you once we have some clarity on the infrastructure setup
17:37:56 <Romaine> :)
17:38:52 <arrbee> Meanwhile, you may have noticed the message on the Portuguese Village Pump for Content Translation testing
17:39:02 <arrbee> #link
17:39:32 <arrbee> In other news
17:40:10 <arrbee> santhosh has also been working on updating the plural forms in mediawiki i18n as per the latest CLDR 26 release
17:40:28 <arrbee> This primarily affects a few languages including Russian
17:40:59 <arrbee> We are working on determining a timeline for implementation of this change and also reach out to the affected language communities to verify the change in content
17:41:29 <arrbee> Nikerabbit: would you like to add something about this?
17:42:34 <Scott_WUaS> Greetings
17:42:37 <Nikerabbit> yep
17:42:56 <Nikerabbit> it is quite complicated since we need to update the rules and translations simultaneously
17:43:09 * Romaine filled in the form in the pt: In other news section
17:43:16 <arrbee> Scott_WUaS: Hello.. I was hoping you would drop by
17:43:36 <Nikerabbit> we will be coordinating with various people to get these changes out in a way that causes minimal breakage
17:44:19 <Nikerabbit> as a user you do not need to do anything. if you are a translators, we will need your help to verify and update translations as needed
17:44:56 <arrbee> Thanks Nikerabbit
17:45:24 <Scott_WUaS> Hi Arrbee ... how are you ... looking forward to participating in the remaining 15 minutes or so.
17:45:48 <arrbee> We will be announcing the when we have more clarity on the timeline for the CLDR work
17:45:57 <arrbee> Scott_WUaS: Unfortunately, I could not get any more information about the interwiki projects and wikidata, that you had mentioned.
17:46:25 <arrbee> Scott_WUaS: I was hoping someone from Wikidata may be able to help with it
17:47:04 <arrbee> Timecheck: just under 15 minutes
17:47:10 <arrbee> remain
17:48:08 <Scott_WUaS> Thank you for checking, arrbee ... I've begun contributing to Wikidata weekly summaries ... and there do appear to be a number of external sister wiki projects that also engage Wikicommons - I think - possibly wikidata listing all genes, for example. Is someone here from Wikidata who might be able to add to this?
17:48:47 <Scott_WUaS> * external sister wikidata projects
17:48:48 * arrbee spots dennyvrandecic
17:49:03 <arrbee> and Lydia_WMDE of course
17:50:17 <arrbee> Scott_WUaS: we can possibly contact them over email. I am interested to know more about this :)
17:50:21 <arrbee> you got me curious
17:50:30 <Scott_WUaS> Yes ... sounds great
17:50:38 <arrbee> Alright
17:50:49 <arrbee> Moving on
17:51:06 <arrbee> There are quite a few language engineering projects which are open for this round of OPW
17:51:17 <arrbee> #link
17:51:25 <Scott_WUaS> (arrbee: it's the multi- and inter-lingual aspects of this potential that are fascinating :)
17:51:53 <arrbee> Please feel free to spread the word around to young women who may be interested in this field of work for OPW projects
17:52:01 <arrbee> Niharika: sucheta ^^
17:52:39 <arrbee> Scott_WUaS: very true. I hope we can get more information on this soon.
17:52:43 <Niharika> arrbee: I'm on it. :)
17:53:04 <arrbee> Niharika: Alright. :)
17:53:25 <Scott_WUaS> :)
17:54:39 <arrbee> aharoni|mobile: Nikerabbit, kart_ are listed mentors
17:54:59 <arrbee> possibly jsahleen will be joining in too
17:55:18 <arrbee> Please feel free to contact them
17:55:42 <arrbee> We have just under 5 minutes
17:57:06 <arrbee> Looks like there are no more questions :)
17:57:50 <arrbee> If nothing changes, our next office hour will be on November 12, 2014, but do lookout for the announcements for the exact date
17:58:10 <arrbee> Our mailing list is and IRC channel is #mediawiki-i18n
17:58:33 <arrbee> I will post the log from this meeting shortly on metawiki
17:59:27 <arrbee> Thanks everyone for joining in. See you next month.
17:59:29 <Scott_WUaS> Thank you, arrbee!
17:59:47 <arrbee> Do feel free to write directly to me if you have any questions about our projects
17:59:52 <arrbee> Thanks Scott_WUaS :)
18:00:11 <Niharika> Thanks arrbee. :)
18:00:14 <arrbee> Thanks Romaine. I will get back to you as soon as our infra setup issues are sorted.
18:00:16 <arrbee> Thanks Niharika
18:00:21 <Romaine> great
18:00:37 <arrbee> Thanks again!
18:00:38 <arrbee> #endmeeting

