Requests for comment/Resolve massive copyright infringement on Wiktionary in Esperanto

Other languages:

Dialog-information on.svgThis is a subpage; for more information, see the Requests for comments page.


AnnouncementsEdit

RationalEdit

Robin van der Vliet reported in a discussion in the Kompetuko channel on Telegram that the Esperanto version of Wiktionary is full of material illegally imported from the Plena Ilustrita Vortaro de Esperanto (PIV) and Reta Vortaro (ReVo).

Here is the original statement (in Esperanto):

Sed mi persone ne tre rekomendas, la kvalito de tiu projekto estas vere malbona, kaj en ĝi nun ankaŭ troviĝas multaj artikoloj kontraŭleĝe kopiitaj de PIV/ReVo. Mi mem jam plene rezignis pri ĝi, mi pensas, ke estas pli bone fokusiĝi je la plibonigado de ReVo

Additional information were also povided that confirm that they are issues of regular unlawful contributions.

Some action need to be performed, like:

  • removing corresponding material
  • evaluate possibility to regularize the situation through a demand to license the original works into a compatible license and update involved pages to bring proper credits

In any case, first steps can be:

  • list affected material of suspected infringement
  • involve the Esperanto Wikitionary community to
    • see what can be done together
    • asses the actual state of situation,
    • see what we should do if any action indeed need to be done

Indeed, so far this issue only goes by the suspected copyright infringement given the input of Robin. But it should also be taken into account that there are some material that ReVo used in it's initial release (1997) which is explicitely taken from Plena Vortaro (PV), which was already in public domain by then. ReVo itself is under GPL.

PIV also derives from PV, with a first release in 1970. It's released under full raw copyright for now.

Following is a list of of some examples of copyright infringements (at time of report), as opposed to something that was legally taken from PV and also appears in PIV and ReVo because they have this common ancestor. Thanks to Taylor who reported these examples, and already took care of many other of similar issues.

That said, even if there was no copyright infringment and that all copied material came from the public domain PV, the work should be properly credited on each page using it. Not only to comply with law, as PV was edited in France where the droit d'auteur includes attribution without duration limit even for public domain works. But a will to give credit to authors and ability to trace information sources for our audience should be enough of a reason to make that happen, as it all comes to respect of human dignity which is hopefully still a core value of our community.

DiscussionEdit

Making contributors more aware of Wikimedia exigencies or copyright respectEdit

Taylor reported in the Phabricator ticket:

Unfortunately, we have several contributors (many of them sysops or bureaucrats) involved in copying large amounts of content from copyrighted online dictionaries to wiktionaries. Most likely they are not aware that this is copyright infringement. Some of them argue with "free for educational use".

This is clearly something which should be fixed. No matter each individual opinion of copyright, our line of conduct within the Wikimedia project scope is not infringe it.

A first step we could engage is to list all people that were involved in unlawful contributions, and address them a clear message that their future contribution must never again include bare copy/paste of copyrighted material (expect in the conditions permitted by law of course). It would probably good to invite them all to contribute to this present RFC.

Detecting copyright infringement and get rid of it in the most parsimonious manner regarding human resourcesEdit

Here are some avenues for reflection on this point.

A first point is "when can it possibly be done". Ideally, the sooner, the better, of course. If at publishing time using some hook system we could trigger some checking system that unsure the content is not most likely a copy/paste from PIV/ReVo minus PV, and refuse to save the contribution if it seems positive. That seems "not impossible" but probably somewhat more complex to put in place. It also make the concern of false positive more critical, since it would implies that legitimate contributions would become impossible.

An other way to deal with that in a mostly automatic way would be to use bots to perform the bull of the work.

In any case, it seems difficult to prevent this regular copyright infringement without using the copyrighted material itself. For ReVo, it's less problematic as it's GPL, no legal issue would prevent from having a local copy to check against it the Vikitortaro material. For PIV, it's more tricky: I'm not aware of any possibility to legally acquire a full digital copy, even for keeping in a non-shared space in order to perform the necessary control steps discussed here.

I know the Wikimedia community already have many tool to help with patrolling, but I am insufficiently knowledgeable on this topic and feedback of people with more expertise on this point would be highly valuable.

Possible useful resources:

Psychoslave (talk) 18:31, 21 May 2021 (UTC)[]

RoadmapEdit

  • Deal with contributors who already committed copyright infringement
    • list them
    • prepare and send a message to inform them that copyright infringement is unwelcome in Wikimedia spaces, provide recommendations on what are welcome contributions, and recall what might be possible consequences if they fail to adopt a behavior which matches the Wikimedia policies.
    • invite them to participate in this RFC
  • Deal with existing copyright infringment
  • Deal with future attempt of copyright infringment
  • See what is the state of affair, especially regarding administrators and burocrats involved (ie. is Pablo still admin and abusing its privileges to enforce unlawful contributions)
    • Possibly, find a committee to seize and fill a request to remove technical privileges from people abusing them and banning those who are striving for acting according to Wikimedia principles (ie. acting legally)
  • Transcribe Plena Vortaro on Wikisource
    • upload a PDF on Commons
    • create the corresponding work on Wikisource
    • evaluate the mis. available OCR efficiency on the PDF on hand, possibly try to find on other digital copy of the work if that proves unsatisfying
    • proofread
    • integrate PV material that is relevant and not yet present into Vikivortaro
    • extract PV data in a way usable for dealing with false positive cases of copyright infringement (see next point)
  • Create an automation tool to deal with copyright infringement
    • state of the art: what are other projects doing to deal with this kind of issue?
    • specify more precisely requirements for an automated solution
    • implement something that meet the previous requirements (bot/hook)

MiscEdit

I do not oppose hanging of -eo- wiktionary if convicted of piracy (which is very likely to happen given that I know the case details better than anyone else, and they are not pleasing at all), but other wiktionaries with same problems must get same punishment. WMF has been reluctant to this problem for too long time. It should not have come so long. The involved sysops and bureaucras should have been warned, and if needed desysopped, debureaucratized or banned many years ago. I would welcome creating a bot detecting and labelling suspicious pages (on all problematic wiktionaries), but I lack resources to create it myself. As for the "PV" dictionary (specific to -eo- wiktionary), the problem is to get plain text. So far I have seen only a bloated pseudo-PDF serving as container for raster images, not readable by any bot. Note that almost same issue was discussed 2+1/2 years ago (Requests for comment/Administrator abuse on the EO Wiktionary) with very little interest from the community and no result. Taylor 49 (talk) 17:16, 21 May 2021 (UTC)[]

Thank you Taylor for all the effort you already have put on this issue.
If that prove the only practical way to deal with this issue, I won't be against a "let's reset it to blank slate", like Wikiquote went through at some point. But if that can be avoided, it would certainly be better. I see that Jimbo_Wales indeed never replied to your post. So we should possibly look at some of these numerous international committee we have by now, surely there is one which include to deal with that kind of situation in it's description.  
Regarding PV, we might try to pass it through some OCR. Actually, publishing it on Wikisource should be a good action to perform per se anyway.
Can you make a quick feedback on the state of the art: is Pablo still admin on the wiki?
Also as a side note, I wouldn't use "punishment" for any legitimate action in our community: I don't adhere with this kind of "teaching by hurting" philosophy. Some actions might hurt some people, but to my mind it should never be a voluntary specific aim, just a regrettable side effect of not being able to deal with the situation in a more staid way. Psychoslave (talk) 18:03, 21 May 2021 (UTC)[]
Yes, technically Pablo is still and permanently sysop, whereas my sysop right is limited to one year. If the wiki is to be nuked, then I would like to know it in advance in order to save my work put into it. Taylor 49 (talk) 18:28, 21 May 2021 (UTC)[]
To be clear, nuke the wiki is for a solution I would favor. I'm sure there is also plenty of legitimate content available mixed in the wiki, isn't it? We still have large room to find other less extreme solutions. So anyway it won't be something that should be call to put in action tomorrow, or in the coming weeks. Psychoslave (talk) 18:40, 21 May 2021 (UTC)[]
Do we know if Pablo Escobar speaks English?
Pablo, ĉi tiu paĝo parolas pri vi kaj viaj agoj en la vikivortaro rilate al respekto de kopirajto. Ĉu vi interesas partopreni en la diskuto? Se jes, ĉu vi bezonas helpon por la traduko? Psychoslave (talk) 18:54, 21 May 2021 (UTC)[]
@Taylor 49 can you provide us a link to the PDF or upload it to Commons so we can start the Wikisource transcription? Psychoslave (talk) 15:39, 22 May 2021 (UTC)[]
PV can be downloaded here: " https://app.box.com/v/PlenaVortaro ". The file contains not only the original main part from 1930, but also the supplement from 1954. I do not know whether the supplement is public domain too. Taylor 49 (talk) 17:10, 22 May 2021 (UTC)[]
Thanks for this very informative feedback. Gaston Waringhien wrote the supplement, from what I read in the book, and he died in 1991. More importantly, he wrote the initial PV, from what I understand. So I am rather surprised that any part of the PV might be in public domain. Psychoslave (talk) 18:41, 22 May 2021 (UTC)[]
Now, reading eo:Plena Vortaro it seems that Émile Boirac wrote the original edition, not Gaston. There seems to be some attribution confusion going on. Psychoslave (talk) 18:45, 22 May 2021 (UTC)[]
The PDF file mentions 4 authors of the core part on page 3, and G.W. is one of them (not the boss). On page 3 and page 511 G.W. is claimed to be the only author of the supplement, and the year is 1953. Maybe we should contact SAT and ask. Taylor 49 (talk) 22:04, 22 May 2021 (UTC)[]
Do you have a contact entry? Psychoslave (talk) 21:49, 25 May 2021 (UTC)[]
Hi. If the same rules as for the French Wikisource are applied, we need to wait 70 years after the death of the last author. So if it’s G. W., we need to wait 2062. I think it’s better not to expect something from this. I’m not admin anymore, so I suppose that Taylor is left alone with Pablo. But I know that Pablo is not really cooperative.
I think our two big problems are the massive copyright infringement and the lack of a community. I don’t know a way to identify which pages are not good, so I think the best solution is nuking them all. At least, it would let us to start on a blank page. Whatever the choice will be, if we want to become one of the reference dictionary in esperanto, we have a lot of work. Lepticed7 (talk) 21:02, 26 May 2021 (UTC)[]
https://satesperanto.org/ scroll down for contact information. Taylor 49 (talk) 22:16, 29 May 2021 (UTC)[]

ProposalsEdit

Notes and referencesEdit