Community Wishlist Survey 2022/Citations/Automatic duplicate citation finder

Automatic duplicate citation finder

  • Problem: I'm a fairly heavy editor, but I mainly do smaller copy edits (capitalization, metric units, ndashes) and fairly often I hoist content into a lead that fails to summarize. It's only a small percentage of my edits where I add new material with citation, so I haven't gone hardcore on citation tools. But what surprises me in the raw out-of-the-box edit window is that you can add a citation with a known URL, and when you press "submit" for your partially completed citation, it never says, "hey, someone else on this or another page entered a citation with the same title or URL, would you like to crib some of those fields?" Citation is supposed to be a default activity on Wikipedia, like breathing. So it strikes me that I shouldn't have to install something or activate a special/fancy/cozy/streamlined edit mode (fie to all of them) to get basic assistance in not duplicating prior work.
  • Proposed solution: URL and/or title of incomplete citation templates automatically checked for duplicate citations on same or other pages when doing a preview submission. (There could also be a dedicated button to preview citations only.)
  • Who would benefit: Anyone who wants to add cited material who isn't already an expert in the citation system.
  • More comments: I don't want to create another ticket for this, but it's very clearly a barrier to entry and self-evident paper cut how annoying it is to reuse an existing citation (from the same article) amending only the quotation field or page number fields on subsequent reuse. In my own editing, 90% of the time I notice resources that have been exploited by others, and that's how my bag of tricks expands over time; only in rare instances do I do a deliberate deep dive into the documentation pages. If an easy way to re-use a citation exists, amending only the page number, I sure haven't seen much evidence of other editors making use of this in the thousands of pages I visit in a typical year. Another aspect of citation that should be as painless as breathing. Also, when editing a section and some of the named citations won't resolve (because they are defined outside the section) would it be crazy to offer an button to *really* preview the edited section in the context of the whole page (as found when clicking "edit" on the current section heading? (The section edit URL would somehow need to capture the source page ID to make this work.)
  • Phabricator tickets:
  • Proposer: MaxEnt (talk) 04:30, 11 January 2022 (UTC)

Discussion

  • I also get irritated when I see that there several independent references to the same work. If the references use the same URL or ISBN or DOI, they seem easy to recognize as duplicates. Otherwise it can be hard to identify.
    I think that there is a way to produce references to the same work that differ only in the page number, but they are difficult enough for me that I don't remember how to find them. --Error (talk) 18:22, 11 January 2022 (UTC)
    @Error See WMDE Technical Wishes/Book referencing. This has been a top items on wishlist surveys for the past 10 years, but was abandoned by the Technical Wishes team in July 2021. See also phab:T100645. -- Ahecht (TALK
    PAGE
    ) 23:48, 11 January 2022 (UTC)
  • find duplicates Hack: open wikitext in a text editor, for every http, insert NEWLINE, then sort file. 0mtwb9gd5wx (talk) 10:13, 12 January 2022 (UTC)
  • Yes, please! I have been using refill to fix this for ages, which works pretty well. I think that AWB might do it automatically too but I'm not as familiar with that. It might be possible for visual editor to automatically fix this on publish, which can be done manually by copying and pasting the same citation. It would be nice to see a citation tool that can automate bibliography-style citations as well. Asukite (talk) 16:03, 12 January 2022 (UTC)
  • https://tools.wmflabs.org/refill/ "had" worked well, before the developer forced only en.wikipedia.org to use an unstable version, then stopped developing and abandoned his work on wikipedia.org, due to some interaction with users with higher privileges. .... 0mtwb9gd5wx (talk) 16:40, 12 January 2022 (UTC)
  • w:User:Kaniivel/Reference Organizer might be partly helpful. ~~~~
    User:1234qwer1234qwer4 (talk)
    10:24, 14 January 2022 (UTC)
  • Take into account that there might be nearly equal entries which only differ in page, section etc. This should be handled too.—Hfst (talk) 06:54, 20 January 2022 (UTC)
    Yep, that's something that makes this a little trickier than it might at first seem. But it's still worthwhile, I think. {{u|Sdkb}}talk 19:13, 28 January 2022 (UTC)
  • Why is this a problem that needs to be addressed? If a page has say:
    Here is some text ... here is some more <ref>ReferenceA</ref> ... maybe a few paragraphs of text ... some more text <ref>That SAME ReferenceA</ref>...more text...more text.
  • I'm missing why having a "duplicate" full reference is a problem. Specifically because of this use-case: I edit that page and just delete the first referenced text along with the reference attached to it. If the second instance was just some sort of pointer to the first, now the second statement that I, the editor, didn't even see has a broken reference. @Ahecht: can you explain a bit more? — xaosflux Talk 16:22, 2 February 2022 (UTC)
    @Xaosflux Not sure why you pinged me on this, as I neither created nor supported this proposal (yet). That said, the issue is that it clogs up the references section, making it longer than necessary, and makes it difficult to "browse" the references or assess notability. In regards to your use case, at least on enwiki, en:User:AnomieBOT/source/tasks/OrphanReferenceFixer.pm will automatically search for and fix those broken references. -- Ahecht (TALK
    PAGE
    ) 16:38, 2 February 2022 (UTC)
    @Ahecht: oops, bad copy and paste from above, that was meant for @MaxEnt: - but thanks for the input, feel free to stop replying to this. I don't think it would be a good idea to implement a software feature that may lead to a situation that would be dependent on other editors (even if via bots) to clean up orphaned reference labels (where the immediate affect from that use case is that readers will have no reference provided). — xaosflux Talk 16:50, 2 February 2022 (UTC)
  • Logic to do this would appear to be in AWB (same page), as AWB will correlate dupes. Neils51 (talk) 10:57, 3 February 2022 (UTC)
  • Maybe this could be added inside the VisualEditor business-logic. --Valerio Bozzolan (talk) 14:45, 11 February 2022 (UTC)

Voting

  •   Support Bristledidiot (talk) 18:53, 28 January 2022 (UTC)
  •   Support One day, we'll centralize citations at Wikidata, rather than copying info for a work every time it's used. But until then, finding duplicates within the same article is at least a good start. {{u|Sdkb}}talk 19:14, 28 January 2022 (UTC)
  •   Support completely agree with the comment above, RobbieIanMorrison (talk) 19:42, 28 January 2022 (UTC)
  •   Support Miroslav Ličko (talk) 20:54, 28 January 2022 (UTC)
  •   Support Qwerfjkl (talk) 22:01, 28 January 2022 (UTC)
  •   Support SamuelInzunza (talk) 22:25, 28 January 2022 (UTC)
  •   Support EpicPupper (talk) 22:44, 28 January 2022 (UTC)
  •   Support — Draceane talkcontrib. 23:09, 28 January 2022 (UTC)
  •   Support Would love this Tr3ndyBEAR (talk) 00:14, 29 January 2022 (UTC)
  •   Support 5225C (talkcontributions) 00:58, 29 January 2022 (UTC)
  •   Support --𝑇𝑚𝑣 (𝑡𝑎𝑙𝑘) 01:19, 29 January 2022 (UTC)
  •   Support Betseg (talk) 02:03, 29 January 2022 (UTC)
  •   Support Shizhao (talk) 03:44, 29 January 2022 (UTC)
  •   Support SigTif (talk) 08:36, 29 January 2022 (UTC)
  •   Support - I can see that this would be tricky to implement, but it would be very nice to have. —Bruce1eetalk 08:43, 29 January 2022 (UTC)
  •   Support //Lollipoplollipoplollipop::talk 10:11, 29 January 2022 (UTC)
  •   Support Lion-hearted85 (talk) 10:57, 29 January 2022 (UTC)
  •   Support THainaut (talk) 10:57, 29 January 2022 (UTC)
  •   Support Terber (talk) 11:36, 29 January 2022 (UTC)
  •   Support Hemantha (talk) 12:17, 29 January 2022 (UTC)
  •   Support aokomoriuta (talk) 12:21, 29 January 2022 (UTC)
  •   SupportSHEIKH (Talk) 12:43, 29 January 2022 (UTC)
  •   Support Aca (talk) 12:46, 29 January 2022 (UTC)
  •   Support ACortellari (talk) 14:08, 29 January 2022 (UTC)
  •   Support Mbrickn (talk) 15:36, 29 January 2022 (UTC)
  •   Support User-duck (talk) 18:04, 29 January 2022 (UTC)
  •   Support — Jules* Talk 18:21, 29 January 2022 (UTC)
  •   Support Wostr (talk) 19:33, 29 January 2022 (UTC)
  •   Support Femke (talk) 20:38, 29 January 2022 (UTC)
  •   Support Douglasfugazi (talk) 21:13, 29 January 2022 (UTC)
  •   Support Goombiis (talk) 22:19, 29 January 2022 (UTC)
  •   Support Tgr (talk) 23:34, 29 January 2022 (UTC)
  •   Support Nw520 (talk) 23:48, 29 January 2022 (UTC)
  •   Support Gusfriend (talk) 00:26, 30 January 2022 (UTC)
  •   Support Agus Damanik (talk) 01:59, 30 January 2022 (UTC)
  •   Support Ali Imran Awan (talk) 07:11, 30 January 2022 (UTC)
  •   Support TheInternetGnome (talk) 07:19, 30 January 2022 (UTC)
  •   Support Lectrician1 (talk) 07:23, 30 January 2022 (UTC)
  •   Support Thingofme (talk) 13:55, 30 January 2022 (UTC)
  •   Support Geraki TL 14:45, 30 January 2022 (UTC)
  •   Support Andrewredk (talk) 16:40, 30 January 2022 (UTC)
  •   Support Rusalkii (talk) 23:33, 30 January 2022 (UTC)
  •   Support BugWarp (talk) 02:31, 31 January 2022 (UTC)
  •   Support Lfstevens (talk) 06:06, 31 January 2022 (UTC)
  •   Support Qazwsx777 (talk) 09:37, 31 January 2022 (UTC)
  •   Support Nosebagbear (talk) 10:10, 31 January 2022 (UTC)
  •   Support β16 - (talk) 10:52, 31 January 2022 (UTC)
  •   Support FenyMufyd (talk) 11:46, 31 January 2022 (UTC)
  •   Support Hb2007 (talk) 14:08, 31 January 2022 (UTC)
  •   Support Havang(nl) (talk) 15:40, 31 January 2022 (UTC)
  •   Support Matma Rex (talk) 16:32, 31 January 2022 (UTC)
  •   Support Bencemac (talk) 18:02, 31 January 2022 (UTC)
  •   Support Daniel Case (talk) 18:07, 31 January 2022 (UTC)
  •   Support JAn Dudík (talk) 18:53, 31 January 2022 (UTC)
  •   Support IOIOI (talk) 20:41, 31 January 2022 (UTC)
  •   Support Dave Braunschweig (talk) 22:24, 31 January 2022 (UTC)
  •   Support Shooterwalker (talk) 22:30, 31 January 2022 (UTC)
  •   Support Normal Name (talk) 22:47, 31 January 2022 (UTC)
  •   Support DRiveraP (talk) 00:08, 1 February 2022 (UTC)
  •   Support Horza (talk) 10:34, 1 February 2022 (UTC)
  •   Support Duplicates happen to the best of us Diriector Doc (talk) 18:32, 1 February 2022 (UTC)
  •   Support It seems like this could be done relatively easily with structured data citations (i.e. hosted on Wikidata), and I support efforts to get us there. Silver hr (talk) 19:33, 1 February 2022 (UTC)
  •   Support MaxBE (talk) 22:05, 1 February 2022 (UTC)
  •   Support KingAntenor (talk) 06:04, 2 February 2022 (UTC)
  •   Support Max Semenik (talk) 07:58, 2 February 2022 (UTC)
  •   Support Kpjas (talk) 10:21, 2 February 2022 (UTC)
  •   Oppose per my note above, having a reference in the text multiple times can be a feature, some options like putting references in a shared repository seem useful - but that isn't what this is proposing as a solution. — xaosflux Talk 16:52, 2 February 2022 (UTC)
    Also the proposed solution seems incomplete. It is asking for some software to be made to "find" something, but then what? Think it is very important that we never discourage contributors from making an edit with a refernce; even a notice "Hey you, your references is a duplicate!" could lead to them just abandoning their edit. — xaosflux Talk 16:55, 2 February 2022 (UTC)
    @Xaosflux: Cite web currently already displays a message on preview when there is some error, and I don't think that is scaring off anyone. Jochem van Hees (talk) 17:17, 3 February 2022 (UTC)
  •   Support Rdrozd (talk) 17:57, 2 February 2022 (UTC)
  •   Oppose I support an "Automatic duplicate citation finder", as in a tool that can be run to automatically detect and consolidate references, but this proposal just seems to be for displaying an error message when previewing an edit with duplicate citations. -- Ahecht (TALK
    PAGE
    ) 18:00, 2 February 2022 (UTC)
    AutoWikiBrowser currently does that. But a tool requires another edit to be made; a warning allows the editor to fix it while still making the edit. Jochem van Hees (talk) 17:19, 3 February 2022 (UTC)
    AFAIK AWB only does it for already labeled references. ~~~~
    User:1234qwer1234qwer4 (talk)
    17:47, 8 February 2022 (UTC)
    But manually finding and consolidating years of built-up duplicate refs in an article is a major chore, so we shouldn't be warning every editor who makes a minor spelling correcting that they need to do it before saving their changes. -- Ahecht (TALK
    PAGE
    ) 18:00, 8 February 2022 (UTC)
  •   Support ~ Amory (utc) 20:29, 2 February 2022 (UTC)
  •   Support WikiAviator (talk) 10:00, 3 February 2022 (UTC)
  •   Support Jochem van Hees (talk) 17:17, 3 February 2022 (UTC)
  •   Support Vega (talk) 18:02, 3 February 2022 (UTC)
  •   Support Ed [talk] [en] 21:49, 3 February 2022 (UTC)
  •   Support Sabjan Badio (talk) 03:47, 4 February 2022 (UTC)
  •   Support Kenraiz (talk) 16:30, 4 February 2022 (UTC)
  •   Support Yeeno (talk) 20:19, 4 February 2022 (UTC)
  •   Support —— Eric LiuTalk 05:10, 5 February 2022 (UTC)
  •   Support 公車迷阿暄 (talk) 08:16, 5 February 2022 (UTC)
  •   Support SD hehua (talk) 15:10, 5 February 2022 (UTC)
  •   Support Ealdgyth (talk) 15:56, 5 February 2022 (UTC)
  •   Support Exilexi (talk) 17:31, 5 February 2022 (UTC)
  •   Support Oliveleaf4 (talk) 17:58, 5 February 2022 (UTC)
  •   Support USI2020 (talk) 20:49, 5 February 2022 (UTC)
  •   SupportThanks for the fish! talkcontrib (he/him) 21:25, 5 February 2022 (UTC)
  •   Oppose --Ciao • Bestoernesto 02:37, 6 February 2022 (UTC)
  •   Support Nkon21 (talk) 03:27, 6 February 2022 (UTC)
  •   Support--Vulp❯❯❯here! 03:58, 6 February 2022 (UTC)
  •   Support Ayumu Ozaki (talk) 05:26, 6 February 2022 (UTC)
  •   Support Michael Barera (talk) 06:08, 6 February 2022 (UTC)
  •   Support Toadspike (talk) 01:39, 7 February 2022 (UTC)
  •   Support Ryse93 (talk) 12:28, 7 February 2022 (UTC)
  •   Support Tom Ja (talk) 17:54, 7 February 2022 (UTC)
  •   Support That's a great idea Bli231957 (talk) 19:04, 7 February 2022 (UTC)
  •   Support DGG (talk) 19:53, 7 February 2022 (UTC)
  •   Support ~Cybularny Speak? 20:18, 7 February 2022 (UTC)
  •   Support Wow, i love this idea.. for the fact that i always have to look down to the reference list to confirm whether the new reference I was adding doesnt already exist on the same page. Thumbs up Uncle Bash007 (talk) 21:59, 7 February 2022 (UTC)
  •   Support Throast (talk) 16:00, 8 February 2022 (UTC)
  •   Support Suonii180 (talk) 17:29, 8 February 2022 (UTC)
  •   Support KnowledgeablePersona (talk) 23:28, 8 February 2022 (UTC)
  •   Support Asukite (talk) 20:14, 9 February 2022 (UTC)
  •   Oppose It's against some manual of styles such as The Chicago Manual of Style which require duplicating references. 4nn1l2 (talk) 16:19, 10 February 2022 (UTC)
  •   Support Wikiusuarios (talk) 20:20, 10 February 2022 (UTC)
  •   Support Barkeep49 (talk) 21:21, 10 February 2022 (UTC)
  •   Support There is many, many citations that we manage not as group. Carn (talk) 14:56, 11 February 2022 (UTC)
  •   Support Forrestkirby (talk) 15:29, 11 February 2022 (UTC)
  •   Support --evrifaessa (talk) 15:57, 11 February 2022 (UTC)
  •   Support DSparrow14 (talk) 16:54, 11 February 2022 (UTC)
  •   Support overall, though I wonder how feasible it is when it comes to stealth duplicates – for example, citing two separate editions of a book with different ISBN numbers but otherwise identical text (though I imagine "say where you saw it" applies here...). -BRAINULATOR9 (TALK) 17:19, 11 February 2022 (UTC)