Community Wishlist Survey 2021/Archive/pre-check for plagiarism

pre-check for plagiarism

 N Proposes existing solution

  • Problem: sometimes a good faith edit turns out to plagiarism (e.g. not properly cited); it would be nice to know this before publishing...
  • Who would benefit: good faith editors
  • Proposed solution: a tool that allows for this check (possibly even automatic)
  • More comments:
  • Phabricator tickets:
  • Proposer: Fintor (talk) 07:14, 17 November 2020 (UTC)[reply]

Discussion

  • This would be great but I am pretty sure it is pretty hard to do (and may not be technically viable for many reasons). We do have a post-edit mechanism available today though maybe it's not well advertised in CopyPatrol? (See also phab:T141379 and CopyPatrol project.) --Izno (talk) 04:57, 18 November 2020 (UTC)[reply]
  • @Fintor: Try the Copyvios tool. This has to be used after publishing, but seemingly any solution would (also I imagine you as the editor would know ahead of time if you're plagiarizing?). As Izno points out there is also CopyPatrol for patrollers to review. Do any of these solutions work for you? MusikAnimal (WMF) (talk) 22:45, 19 November 2020 (UTC)[reply]
    I think we could probably investigate deeper into the en.wiki proposal of natively supporting TurnItIn in Wikipedia by adding a report to talk pages that spot plagiarism instead of using the Copyvio tool, or maybe even let users run the page through the TurnItIn software before publishing. See w:en:WP:Turnitin, it is a proposal in en.wiki back in 2012. WikiAviator (talk) 13:21, 20 November 2020 (UTC)[reply]
  • Wouldn't an editor already know if their edit is plagiarized or not? And how would you expect this to function? Would it be an optional check that the editor opts into, or a check that is run every time an edit is submitted? Unfortunately, doing a check for plagiarism isn't very fast, so I don't imagine we would want to make every editor wait for it to complete before saving the edit. Kaldari (talk) 18:02, 20 November 2020 (UTC)[reply]
    The problem is not so much for the editor oneself, but for the next one. Recently I translated a page into another language; only later I detected by accident that the previous' editor had indeed committed plagiarism, so I implicitly inherited the previous editor's problem. Copyright checking is indeed CPU and I/O + network intensive. Couldn't this be done offline by a bot that places a warning? Geert Van Pamel (WMBE) (talk) 18:43, 23 November 2020 (UTC)[reply]
    • Agree with Geert Van Pamel; undetected plagiarism in articles is a problem. Second-order, there's also plagiarisms in sources. I once accidentally cited a source which turned out to be plagiarized from another publication, and the resulting bad-faith editing, apparently by the plagiarist, lead to several domains being placed on the global blocklist (details). This faff wasted the time of half-a-dozen editors over the course of a couple of years. And a new editor might not know that copying text from another website is not OK; an automated message warning them and explaining how to insert the information without copyvio would be useful. I agree that this is likely to be to time-consuming as an edit filter, but perhaps it could be implemented as a bot targeting inexperienced editors making long edits? (suggestion independent of Geert Van Pamel, due to edit conflict) HLHJ (talk) 20:13, 23 November 2020 (UTC)[reply]
  • If we are talking about uncited text, not copyvio, then the appropriate response is a citation-needed tag; see Community Wishlist Survey 2021/Editing/Easier flagging. With the sole exception of BLP (biographies-of-living-people) material, uncited content is in fact allowed on Wikipedia; only if anyone challenges it does it needs to be cited or removed. And frivolous challenges (challenging things you know to be accurate) is heavily discouraged. There are good reasons for this policy; on non-controversial subjects, new-editor subject experts may be able to write reams of useful, encyclopedic content on a subject, but have no clue about citations or even sources at first. This can be a symbiotic relationship if experienced editors tag and gradually cite the text rather than just reverting the lot. The common misconception that anything uncited must be reverted is contributing to falling numbers of active editors; inline-tagging an edit engages the new editor and makes them more likely to stay, while reverting them makes them give up (my citations are here). HLHJ (talk) 20:13, 23 November 2020 (UTC)[reply]
  • @Fintor: We still haven't heard back from you. Do the existing solutions (Copyvios and CopyPatrol) work for you? I know you say "it would be nice to know this before publishing", but I'm afraid this isn't going to happen. Copyvio checks are very slow. My guess is the next best step is to write a bot to flag copyvios on-wiki after the fact. That's already on the CopyPatrol board at phab:T165951. Would you like to revise your proposal to be about this? MusikAnimal (WMF) (talk) 19:22, 24 November 2020 (UTC)[reply]