Community Wishlist Survey 2022/Citations/Automatic duplicate citation finder

Automatic duplicate citation finder

  • Problem: I'm a fairly heavy editor, but I mainly do smaller copy edits (capitalization, metric units, ndashes) and fairly often I hoist content into a lead that fails to summarize. It's only a small percentage of my edits where I add new material with citation, so I haven't gone hardcore on citation tools. But what surprises me in the raw out-of-the-box edit window is that you can add a citation with a known URL, and when you press "submit" for your partially completed citation, it never says, "hey, someone else on this or another page entered a citation with the same title or URL, would you like to crib some of those fields?" Citation is supposed to be a default activity on Wikipedia, like breathing. So it strikes me that I shouldn't have to install something or activate a special/fancy/cozy/streamlined edit mode (fie to all of them) to get basic assistance in not duplicating prior work.
  • Proposed solution: URL and/or title of incomplete citation templates automatically checked for duplicate citations on same or other pages when doing a preview submission. (There could also be a dedicated button to preview citations only.)
  • Who would benefit: Anyone who wants to add cited material who isn't already an expert in the citation system.
  • More comments: I don't want to create another ticket for this, but it's very clearly a barrier to entry and self-evident paper cut how annoying it is to reuse an existing citation (from the same article) amending only the quotation field or page number fields on subsequent reuse. In my own editing, 90% of the time I notice resources that have been exploited by others, and that's how my bag of tricks expands over time; only in rare instances do I do a deliberate deep dive into the documentation pages. If an easy way to re-use a citation exists, amending only the page number, I sure haven't seen much evidence of other editors making use of this in the thousands of pages I visit in a typical year. Another aspect of citation that should be as painless as breathing. Also, when editing a section and some of the named citations won't resolve (because they are defined outside the section) would it be crazy to offer an button to *really* preview the edited section in the context of the whole page (as found when clicking "edit" on the current section heading? (The section edit URL would somehow need to capture the source page ID to make this work.)
  • Phabricator tickets:
  • Proposer: MaxEnt (talk) 04:30, 11 January 2022 (UTC)[reply]


  • I also get irritated when I see that there several independent references to the same work. If the references use the same URL or ISBN or DOI, they seem easy to recognize as duplicates. Otherwise it can be hard to identify.
    I think that there is a way to produce references to the same work that differ only in the page number, but they are difficult enough for me that I don't remember how to find them. --Error (talk) 18:22, 11 January 2022 (UTC)[reply]
    @Error See WMDE Technical Wishes/Book referencing. This has been a top items on wishlist surveys for the past 10 years, but was abandoned by the Technical Wishes team in July 2021. See also phab:T100645. -- Ahecht (TALK
    ) 23:48, 11 January 2022 (UTC)[reply]
  • find duplicates Hack: open wikitext in a text editor, for every http, insert NEWLINE, then sort file. 0mtwb9gd5wx (talk) 10:13, 12 January 2022 (UTC)[reply]
  • Yes, please! I have been using refill to fix this for ages, which works pretty well. I think that AWB might do it automatically too but I'm not as familiar with that. It might be possible for visual editor to automatically fix this on publish, which can be done manually by copying and pasting the same citation. It would be nice to see a citation tool that can automate bibliography-style citations as well. Asukite (talk) 16:03, 12 January 2022 (UTC)[reply]
  • "had" worked well, before the developer forced only to use an unstable version, then stopped developing and abandoned his work on, due to some interaction with users with higher privileges. .... 0mtwb9gd5wx (talk) 16:40, 12 January 2022 (UTC)[reply]
  • w:User:Kaniivel/Reference Organizer might be partly helpful. ~~~~
    User:1234qwer1234qwer4 (talk)
    10:24, 14 January 2022 (UTC)[reply]
  • Take into account that there might be nearly equal entries which only differ in page, section etc. This should be handled too.—Hfst (talk) 06:54, 20 January 2022 (UTC)[reply]
    Yep, that's something that makes this a little trickier than it might at first seem. But it's still worthwhile, I think. {{u|Sdkb}}talk 19:13, 28 January 2022 (UTC)[reply]
  • Why is this a problem that needs to be addressed? If a page has say:
    Here is some text ... here is some more <ref>ReferenceA</ref> ... maybe a few paragraphs of text ... some more text <ref>That SAME ReferenceA</ref>...more text...more text.
  • I'm missing why having a "duplicate" full reference is a problem. Specifically because of this use-case: I edit that page and just delete the first referenced text along with the reference attached to it. If the second instance was just some sort of pointer to the first, now the second statement that I, the editor, didn't even see has a broken reference. @Ahecht: can you explain a bit more? — xaosflux Talk 16:22, 2 February 2022 (UTC)[reply]
    @Xaosflux Not sure why you pinged me on this, as I neither created nor supported this proposal (yet). That said, the issue is that it clogs up the references section, making it longer than necessary, and makes it difficult to "browse" the references or assess notability. In regards to your use case, at least on enwiki, en:User:AnomieBOT/source/tasks/ will automatically search for and fix those broken references. -- Ahecht (TALK
    ) 16:38, 2 February 2022 (UTC)[reply]
    @Ahecht: oops, bad copy and paste from above, that was meant for @MaxEnt: - but thanks for the input, feel free to stop replying to this. I don't think it would be a good idea to implement a software feature that may lead to a situation that would be dependent on other editors (even if via bots) to clean up orphaned reference labels (where the immediate affect from that use case is that readers will have no reference provided). — xaosflux Talk 16:50, 2 February 2022 (UTC)[reply]
  • Logic to do this would appear to be in AWB (same page), as AWB will correlate dupes. Neils51 (talk) 10:57, 3 February 2022 (UTC)[reply]
  • Maybe this could be added inside the VisualEditor business-logic. --Valerio Bozzolan (talk) 14:45, 11 February 2022 (UTC)[reply]