CIS-A2K/IRC meeting 2018-04-06

CIS-A2K

CIS-A2K (Centre for Internet and Society - Access to Knowledge) is a campaign to promote the fundamental principles of justice, freedom, and economic development. It deals with issues like copyrights, patents and trademarks, which are an important part of the digital landscape.
If you have a general proposal/suggestion for Access to Knowledge team you can write on the discussion page. If you have appreciations or feedback on our work, please share it on feedback page.

Quick links

[19:16] == Ananth has joined #cis-a2k
[19:16] == mode/#cis-a2k [+ns] by herbert.freenode.net
[19:16] == mode/#cis-a2k [-o Ananth] by services.
[19:16] == mode/#cis-a2k [+ct-s] by services.
[19:16] == ChanServ [ChanServ@services.] has joined #cis-a2k
[19:16] == mode/#cis-a2k [+o ChanServ] by services.
[19:27] == Ravi has joined #cis-a2k
[19:27] == KCVelagahas joined #cis-a2k
[19:27] == ChanServ [ChanServ@services.] has left #cis-a2k []
[19:27] == KCVelagahas quit [Client Quit]
[19:37] == Drmundkur has joined #cis-a2k
[19:38] == barber1987 has joined #cis-a2k
[19:39] == axagastya [uid243544@wikinews/acagastya] has joined #cis-a2k
[19:39] == shrini [uid38773@gateway/web/irccloud.com/x-sbzsvjgiognmxzwg] has joined #cis-a2k
[19:39] == Titodutta has joined #cis-a2k
[19:40] == mode/#cis-a2k [+o Titodutta] by ChanServ
[19:40] == shrini has changed nick to shrini-afk
[19:44] == teleircbot has joined #cis-a2k
[19:44] <teleircbot> No chat_id set! Add me to a Telegram group and say hi so I can find your group's chat_id!
[19:46] == teleircbot has quit [Remote host closed the connection]
[19:47] == Minato826 [~androirc@wikia/vstf/Minato826] has joined #cis-a2k
[19:47] == Minato826 has changed nick to Anoop-Rai
[19:48] == Anoop-Rai has changed nick to Anoop-Rao
[19:48] == Bodhisattwahas joined #cis-a2k
[19:49] == Tan has joined #cis-a2k
[19:49] == AmritasyaPutra [~smuxi@wikipedia/AmritasyaPutra] has joined #cis-a2k
[19:49] == Tan has changed nick to Guest10225
[19:50] == teleircbot has joined #cis-a2k
[19:50] <teleircbot> No chat_id set! Add me to a Telegram group and say hi so I can find your group's chat_id!
[19:50] == teleircbot has quit [Remote host closed the connection]
[19:51] <Anoop-Rao> Hello
[19:51] <Guest10225> @Anoop Rao:Hello
[19:51] <Anoop-Rao> Hello tanveer
[19:51] == KCVelagahas joined #cis-a2k
[19:52] == KCVelagahas quit [Client Quit]
[19:54] == KCVelhas joined #cis-a2k
[19:56] <@Titodutta> Hello, we'll start in 4 mins
[19:56] <Anoop-Rao> Ok
[19:57] == Gopala has joined #cis-a2k
[19:57] <Guest10225> \nick Lahariyaniyathi
[19:57] == Titodutta changed the topic of #cis-a2k to: India IRC
[19:58] == Guest10225 has changed nick to Lahariyaniyathi
[20:00] == sangappa has joined #cis-a2k
[20:00] <@Titodutta> Hi, we are waiting for a few friends, ans start in next 4 mins
[20:01] == HoloIRCUser has joined #cis-a2k
[20:01] == Suyash has joined #cis-a2k
[20:01] <Suyash> Hello room
[20:01] == HoloIRCUser has changed nick to pavan
[20:03] <@Titodutta> Hello guys,
[20:03] <@Titodutta> Hello friends,
[20:03] <@Titodutta> Should we start now?
[20:03] <pavan> Hello all
[20:03] <Suyash> Hello Tito n pawan
[20:04] <AmritasyaPutra> Yo Tito.
[20:04] == gaot has joined #cis-a2k
[20:04] <gaot> Hello !
[20:04] <@Titodutta> Our today's topics:
[20:04] <@Titodutta> Wikisource OCR issue General Wikipedia discussion Focused Project Area
[20:04] <Ananth> hello all,
[20:04] == Satdeep has joined #cis-a2k
[20:05] <@Titodutta> WEe'll start with Wikisource OCR issue and general questions and answers
[20:05] <@Titodutta> related to Wikisource
[20:06] <Satdeep> OCR4Wikisource has poor performance for Punjabi (Gurmukhi) text
[20:06] <Ananth> In last OCR we had Discussed few Issues related to OCR in Indic languages. In this IRC we can talked more about those
[20:07] <@Titodutta> Last OCR issue (blank page) was it fixed?
[20:07] == Silent has joined #cis-a2k
[20:07] <@Titodutta> @Bodhisattwa
[20:08] <Bodhisattwa> No
[20:08] == Pavanaja [~androirc@2405:204:5009:ba61:7e16:862f:85af:ace7] has joined #cis-a2k
[20:08] <Bodhisattwa> Jayanta Nath and Shrini were working on it
[20:08] <@Titodutta> Alright,
[20:09] <@Titodutta> Was it a Google side issue, or something?
[20:09] <@Titodutta> Please throw some light on the nature of the problem and finding so far?
[20:09] == Pranayrajhas joined #cis-a2k
[20:10] <Pranayraj> hi all
[20:10] <Bodhisattwa> Yes, issue from Google Drive OCR
[20:10] <@Titodutta> @Shrini-afk
[20:10] <Bodhisattwa> They have stopped Drive OCR for PDF/DJVU for Bengali and Devanagari
[20:10] <Bodhisattwa> but jpg/png is working
[20:10] <@Titodutta> Confirmed?
[20:11] <Bodhisattwa> yeah, AFAIK
[20:11] <gaot> what result in today meeting in banglore....share minutes
[20:11] <Bodhisattwa> @Satdeep, its not the fault of OCR4Wikisource, the quality of Google Drive OCR for Punjabi is of bad quality itself
[20:11] <@Titodutta> @gaot, let's finish Wiisource thing first
[20:11] == Manavhas joined #cis-a2k
[20:12] <Manav> Hi
[20:12] <Silent> Yes Bodhi, I totally understand that. The question is what to do about it?
[20:12] <pavan> Is this because of Google's policy/strategy shift
[20:12] <Bodhisattwa> for what?
[20:13] <Bodhisattwa> @Pavan, cant say, I have no contact with anyone from Google
[20:14] <@Titodutta> Yes Bodhi and I discussed once, perhaps we can try to get a confirmation from CIS-A2K
[20:14] <Bodhisattwa> May be orgs like CIS or WMF can ask them whats happening
[20:14] <Lahariyaniyathi> Is it a good idea to officially write to them and find out?
[20:14] <Bodhisattwa> yes
[20:14] <Bodhisattwa> yes, @Tanveer
[20:14] <Satdeep> I agree!
[20:14] <Lahariyaniyathi> @Bodhi:Yes, thanks
[20:14] <Satdeep> Let's try reaching out to them and figure out about possible solutions.
[20:15] <@Titodutta> Good idea
[20:15] == VP_ has joined #cis-a2k
[20:16] <@Titodutta> It would be good if someone leads the contact process with Google
[20:16] <pavan> Can I?
[20:16] <@Titodutta> Yes, CIS can lead, well try to check.
[20:17] <@Titodutta> And we'll mark it as a task from our side
[20:17] <pavan> Okay
[20:17] <Bodhisattwa> Great, looking forward
[20:17] <Lahariyaniyathi> @Bodhi: I will request Ravi to speak to Google team as he is already engaged with them in another project. In the meanwhile we will also try
[20:17] <Pavanaja> What is the status of IISc OCR?
[20:18] <Bodhisattwa> I told Ravi previously
[20:18] == AmritasyaPutra [~smuxi@wikipedia/AmritasyaPutra] has quit [Ping timeout: 264 seconds]
[20:18] <Bodhisattwa> @Pavanaja, IISc OCR is a secret project LOL
[20:19] <Lahariyaniyathi> @Pavanaja: We have run it only once and it turned out to be clunky with less accuracy...
[20:19] <Lahariyaniyathi> @Bodhi; :) :)
[20:19] == satdeep_ [uid265017@gateway/web/irccloud.com/x-dmrrmpudkargwqaj] has joined #cis-a2k
[20:19] <Bodhisattwa> We dont need it, Google OCR is the most accurate for Indic projects till date
[20:20] <Ananth> I agree with Bodhi
[20:20] <Pavanaja> What will happen if tomorrow Google decides to charge for using it?
[20:21] <Bodhisattwa> @Pavanaja, yes, there is a chance for that
[20:22] <Bodhisattwa> The only solution for this is to develop Tessaract OCR for future, but needs technical expertise
[20:23] <@Titodutta> An update from: today's conference, Wikisource OCR issue was discussed, and all support strutures have agreed to give high priority on it.
[20:23] <Pavanaja> I was told Malayalam works well with Tessaract
[20:23] <Bodhisattwa> I would be highly interested if someone gives us training to develop Tessaract OCR
[20:23] <@Titodutta> Us? You mean Indian Wikisource editors?
[20:24] <@Titodutta> Or community in west bengal?
[20:24] <Bodhisattwa> us, whoever
[20:24] <Ananth> we can talk Wikimedia Reading Engineering Director and try to get and OCR tool or something
[20:24] <@Titodutta> I personally don't know how much time and expertise needed to develop OCR. I think it is quite difficult and time-taking?
[20:25] <Bodhisattwa> yes, it is

[20:25] <Bodhisattwa> but it is a good thing to invest time
[20:25] == Gapu [67e4df60@gateway/web/freenode] has joined #cis-a2k
[20:26] <VP_> I tried some with Tesseract in 2012 or so. It was getting better. (Needs tremendous time to generate character profiles). But then gave it up halfway as reportedly Google was already seriously working with that. There wasn't much interest from peers.
[20:26] <Gapu> Sorry, here there was a issue in electricity.. so i joined late.
[20:26] <@Titodutta> One point discussed today in conference was to try to involve Wikimedia tasks in GSoC, Mozilla fellowship
[20:27] <Bodhisattwa> Sooner or later, Google will stop or charge their OCR, so Tessaract will remain the only opton for us
[20:27] <VP_> Tesseract can still be worked upon, only we need super-extra time to generate learning data for it.
[20:27] == Manavhas quit [Ping timeout: 260 seconds]
[20:27] <pavan> +1 @Bodhi
[20:27] <Lahariyaniyathi> @Bodhi: Do you think it is a good idea if we organise something similar to Asaf;s Wiki Yatra in the coming year exclusively for Wikisource?
[20:28] <Bodhisattwa> At least give us the training to develop Tessaract,
[20:28] <Bodhisattwa> @Tanveer, yes, that would be a good idea
[20:28] <Satdeep> +1 @Tanveer
[20:29] <@Titodutta> VP, what basic skills are needed to learn tessaract?
[20:29] <Lahariyaniyathi> @Bodhi and @VP: Or organise intensive long term training for right Wikimedians with a super experienced Wikisource or Tesseract or OCR genius?
[20:29] <@Titodutta> programming language or other things?
[20:29] <VP_> Not much. Just some common sense and lots of patience.
[20:30] <VP_> There aare already many utilities on the tesseract collection folders. We can make use of them to ease the process.
[20:31] <VP_> Basically, we need to train each and every characters as they appear in the text.
[20:31] <Ananth> If we get trainning on it we all can start working on it
[20:31] <@Titodutta> Let's start planning for such a workshop? and see how it goes? (@bodhi, @vp
[20:31] <Ananth> and build it as early as possible.
[20:31] <@Titodutta> What do you think
[20:31] <VP_> yes, we may.
[20:31] <Satdeep> +1 Tito
[20:31] <Bodhisattwa> @VP, is there any documentation? Please share here or on-wiki
[20:32] == Gopala has quit [Ping timeout: 260 seconds]
[20:32] <VP_> I just need to brush up my Tesseract grip.
[20:33] <prashasti> hi
[20:33] <VP_> https://opensource.google.com/projects/tesseract
[20:33] <Lahariyaniyathi> @VP: Chetta, can we count on you to conduct one specialised training on Tesseract by June 2018?
[20:34] <VP_> https://github.com/tesseract-ocr/tesseract
[20:35] <VP_> I think, yes you can. We can learn together. :-)
[20:35] <Bodhisattwa> the github documentation was not helpful previously for a novice like me.
[20:35] == Gopala has joined #cis-a2k
[20:36] <Lahariyaniyathi> @VP: Sooper, looking forward, I hope that Ananth from our team will co-ordinate and put the event together
[20:36] <Ananth> For sure
[20:36] <Ananth> Will do it
[20:36] == Titodutta changed the topic of #cis-a2k to: India IRC:Focused Project Area
[20:36] == Satdeep has quit [Ping timeout: 260 seconds]
[20:36] == satdeep_ has changed nick to satdeep
[20:37] <@Titodutta> Okay, let's discuss about Focused Project Tiger
[20:37] <@Titodutta> Earlier we talked about Focused Project Area
[20:37] <@Titodutta> **
[20:37] <VP_> I have created some live statistics on PT2018 progress.
[20:37] == Sat has joined #cis-a2k
[20:37] <@Titodutta> Where CIS-A2K would like to work with focused projects
[20:38] <@Titodutta> Any question on FPA in general?
[20:38] <pavan> @VP_ topic is FPA, Tito's mention about PT might be an auto correct
[20:39] <@Titodutta> Yes
[20:39] <KCVel> @Tito, Are you only having one FPA per year?
[20:40] <Lahariyaniyathi> @KCVel: As of now (2018-19) Yes.
[20:41] == Suyash has quit [Ping timeout: 260 seconds]
[20:41] == shrini-afk has changed nick to shrini
[20:42] <@Titodutta> Yes based on responses and follow-up discussion as of now there is one FPA
[20:42] <shrini> Hello all. Now only I came
[20:42] == Silent has quit [Quit: Page closed]
[20:42] <VP_> You can see it here: https://docs.google.com/spreadsheets/d/1WMkbCh2ZjSVu3oFVragRrNPz2rvgcCAzdI2X2AaZ1bk/edit?usp=sharing
[20:43] <VP_> OK.
[20:43] <@Titodutta> Looks very nice VP_
[20:44] <KCVel> That' wonderful
[20:44] == Drmundkur has quit [Ping timeout: 260 seconds]
[20:45] == Anoop-Rao [~androirc@wikia/vstf/Minato826] has quit [Remote host closed the connection]
[20:46] <KCVel> @Lahariyaniyathi Apart from the general support, what kind of support is given to other projects of same language of FPA?
[20:46] == barber1987 has quit [Ping timeout: 260 seconds]
[20:46] <Lahariyaniyathi> @Chat room: As agenda points have been discussed this IRC open for other queries and discussions
[20:46] == Minato826 [~anoop@wikia/vstf/Minato826] has joined #cis-a2k
[20:47] == Minato826 has changed nick to Anoop-Rao
[20:47] <shrini> I am working on converting PDF to jpg for OCR4wikisource
[20:47] == Gapu has quit [Ping timeout: 260 seconds]
[20:47] <VP_> There are a few limitations on that (Basically that the query would not update witout manual intervention).
[20:47] <shrini> need find a smooth way
[20:47] <KCVel> AS Wiki projects are inter-dependent (just for an example, Wikisource might be in need of Wikimedia Commons etc.)
[20:47] <Lahariyaniyathi> @KC:Depends on Needs Assessment and Prog Design, open to all modes of collaboration. Which means design, implement and fund
[20:47] <shrini> the current methods with gs and imagemagicks kills the laptop
[20:47] <shrini> gs seems impressive
[20:48] <shrini> will give an update by next week
[20:48] <shrini> https://printalert.wordpress.com/2014/04/28/training-tesseract-ocr-for-tamil/
[20:48] <shrini> here a document on training tesseract
[20:48] == Pavanaja has quit [Remote host closed the connection]
[20:48] <Ananth> FYI - https://meta.m.wikimedia.org/wiki/2017_Community_Wishlist_Survey/Wikisource/Improve_workflow_for_uploading_books_to_Wikisource
[20:48] <shrini> my friend bala wrote it few years ago
[20:49] == Pavanaja has joined #cis-a2k
[20:49] <KCVel> Thanks @Ananth
[20:50] == ravidreams has joined #cis-a2k
[20:50] <KCVel> @Lahariyaniyathi Any update on Wiki Technical Training?
[20:50] <Bodhisattwa> @Ananth, people are already working on that proposal
[20:51] <@Titodutta> Clar plan was not made, and it is postponed
[20:51] <Bodhisattwa> Only thing which needs to be developed here is BUB
[20:51] <@Titodutta> We apologise for the delay.
[20:51] <Ananth> @Bodhi i just wanted to inform community about it.
[20:52] <@Titodutta> We'll start a separate discussion on it by the end of the month and inform you
[20:53] <@Titodutta> (WTT, Imean)
[20:53] <Bodhisattwa> @Tito, eager to know what happened in the Wikimedia meetup in Bangalore as we couldnt send our representation there
[20:54] <shrini> http://freetamilebooks.com/authors/nithyaduraisamy/
[20:54] <shrini> here we published few books in CC-By-SA license
[20:54] <Sat> Unedited version of today's discussion: https://etherpad.wikimedia.org/p/wikiconindia2018
[20:54] <shrini> the books have lot of images
[20:55] <shrini> how to send these to wikisource?
[20:56] == Minato826 has joined #cis-a2k
[20:56] <shrini> is there any tool that converts a ODT file to wikisource?
[20:56] == Minato826has quit [Changing host]
[20:56] == Minato826 has joined #cis-a2k
[20:56] == Minato826 has changed nick to Anoop
[20:56] == Anoop has changed nick to anooprao
[20:58] <@Titodutta> I don't know Shrini, sorry :(
[20:58] <shrini> okey
[20:58] <shrini> Please check any of the books
[20:58] == Pooja has joined #cis-a2k
[20:58] == Drmundkur has joined #cis-a2k
[20:58] <shrini> and tell, is it okey to store all the images in commons ?
[20:58] <shrini> are the screenshots are accepted in commons?
[20:59] <shrini> if they are allowed, then can work on writing a tool for epub to wikisource
[20:59] <shrini> we have many books adding there with CC-BySA license
[21:00] == Anoop-Rao [~anoop@wikia/vstf/Minato826] has quit [Read error: Connection reset by peer]
[21:00] <KCVel> @Shrini, if there isn't copyright or has been expired, it can uploaded to Commons?
[21:01] <shrini> those are computer science books
[21:01] <Ananth> @Shrini it can be uploaded on commons is we dont have any issue related to copyright.
[21:01] <shrini> okey
[21:01] <shrini> will do then
[21:01] <KCVel> Apart from that, I would like to whether information obtained can from RTI be uploaded to Commons, and then Wikisource, and eventually used on Wikipedia as a reference? @Ananth @Bodhi?
[21:02] <KCVel> Is there any procedure regarding this?
[21:02] <@Titodutta> Not all gthe information of RTI is under public domain
[21:03] == has quit [Ping timeout: 260 seconds]
[21:03] == has joined #cis-a2k
[21:03] <Ananth> @Tito i agree
[21:03] <enwnbot> No chat_id set! Add me to a Telegram group and say hi so I can find your group's chat_id!
[21:03] <Bodhisattwa> @Shrini, there is a tool to match text files to scanned images, if that you are asking for
[21:03] <enwnbot> No chat_id set! Add me to a Telegram group and say hi so I can find your group's chat_id!
[21:03] <Bodhisattwa> https://wikisource.org/wiki/MediaWiki:MatchSplit.js
[21:03] <enwnbot> No chat_id set! Add me to a Telegram group and say hi so I can find your group's chat_id!
[21:03] <enwnbot> <acagastya> Test.
[21:03] <shrini> will check this tool
[21:04] <enwnbot> <acagastya> (Sorry for that, people) technical fix.
[21:04] <Bodhisattwa> somehow it didnt work for us in Bengali
[21:05] == Pavanaja has quit [Quit: AndroIRC - Android IRC Client ( http://www.androirc.com )]
[21:05] <pavan> @Bodhi Thanks for the info. It will also help us, in Tewiki, if understand the function of tool correctly
[21:05] <pavan> Will try it out
[21:06] <Bodhisattwa> let me know, if it works for te and ta
[21:06] <pavan> Okay
[21:07] == anooprao has changed nick to Anoop-Rao
[21:08] <Lahariyaniyathi> @everyone: apologies for self promotion: this is our APG proposal link https://meta.wikimedia.org/wiki/Grants:APG/Proposals/2017-2018_round_2/The_Centre_for_Internet_and_Society/Proposal_form please provide feedback
[21:09] <@Titodutta> Any other question or comment?
[21:09] <@Titodutta> on any Wiki-topic?
[21:09] == gaot has quit [Ping timeout: 260 seconds]
[21:10] == Gopala has quit [Ping timeout: 260 seconds]
[21:10] <@Titodutta> Alright I think these are the takeways
[21:10] <@Titodutta> Takeaways from this IRC a) CIS-A2K will try to check with Ravi and Google independently on OCR status and issue b) Tesseract workshop:planning and discussion starts about the workshop, Inform communities. c) Wiki Technical Training: WTT discussion starts at the end of the montth. Will inform about the discussion and invite to suggest procedure. d) next IRC will be on a Saturday/Sunday evening, (based on feedback)
[21:11] <@Titodutta> THanks everyone for joining the IRC
[21:11] <Anoop-Rao> 👋👌
[21:12] <prashasti> thanks
[21:12] <KCVel> Thanks
[21:12] <pavan> Thanks
[21:12] == Bodhisattwahas quit [Quit: Page closed]
[21:12] <Lahariyaniyathi> @everyone: Thank you, have a nice weekend:)
[21:13] <Sat> Thank you everyone!
[21:13] == pavan has quit [Quit: pavan]
[21:13] == Sat has quit [Quit: Page closed]
[21:13] == Lahariyaniyathi has quit [Quit: Page closed]
[21:13] == Titodutta has quit [Quit: Page closed]
[21:14] <shrini> thanks all
[21:14] == ravidreams has quit [Quit: Page closed]