Grants talk:IdeaLab/Copyvio Detection

There is a subscription service called "Turnitin" which is being tested to scan Wikipedia articles. See en:Wikipedia:Turnitin. Blue Rasberry (talk) 12:22, 17 December 2014 (UTC)

The original copyvio detection done at nowiki is older than most attempts, and it is adaptable to the local language. Some of the problem with plagiarism detection is that they relies on a black box idea whereby nobody should know how it works, and it is thereby impossible to refute a claim made by such software. I think that is a flawed assumption. The method they use should be open, the underlaying model should be open, and the business logic should be open.
There is one basic premise that must hold for a copyvio detection and that is "who made the claim first". Comparing text is not enough, you must know when a claim is made, that is if you wait to long to check the text (that is you don't track the time for the texts) your claim will be flawed.
Then there is a secondary premise and that is who made the information available. It is not only about form, which is an ordinary copyvio, it is also about the information contained in the text. When we express some information we will try to form that according to rules given implicitly and explicitly in the context, and that can make two text appear as copyvios even if they are not so.
Those two premises, given the present copyvio detectors blackbox nature, makes me think that we need something we can inspect and ourselves. — Jeblad 11:00, 2 April 2015 (UTC)

unavailable API

Regarding "the API it used is not available anymore" , which API is that? John Vandenberg (talk) 14:40, 15 January 2015 (UTC)

It was the open Yahoo! API ( for searching the index. — Jeblad 10:45, 2 April 2015 (UTC)
