Community Wishlist Survey 2015/Bots and gadgets

Article assessor gadget/extension

Tracked in Phabricator:
Task T116092

Create an easy to use interface for adding WikiProject assessment templates to articles. Would probably require some sort of JSON config page for listing the available templates. Would work similar to the WikiLove extension (which adds barnstars to talk pages). Kaldari (talk) 17:30, 19 May 2015 (UTC)[reply]

Earlier discussion and endorsements
Gadget for en.wiki proposed here: en:Wikipedia:Gadget/proposals#AssessmentHelper. Kaldari (talk) 17:58, 7 July 2015 (UTC)[reply]
See also: Grants:IEG/Revision scoring as a service/Renewal#Scope. Helder 13:42, 9 July 2015 (UTC)[reply]
See also #Make quality/reliability of an article more clear to the reader I proposed above. --Piotrus (talk) 04:33, 12 November 2015 (UTC)[reply]
Comment: This is usually done with a bot that checks if all articles in a category contain a certain template and adds it if necessary (and in some cases the bot can prefill certain arguments of the template). That is far more efficient than using something similar to WikiLove. The Quixotic Potato (talk) 14:41, 12 November 2015 (UTC)[reply]
  Endorsed I use en:User:Kephir/gadgets/rater.js regularly, and it makes a world of difference for Assessment. I think having something like this baked in as a default gadget on WikiProjects would make a world of difference for maintaining Assessment. Sadads (talk) 15:05, 12 November 2015 (UTC)[reply]
  Oppose. Wikipedia specific and well within the capability of volunteer editors. There is no need for this to be an extension, the gadgets above should be sufficient. This proposal would also be made even more unnecessary by having a global repository of gadgets. MER-C (talk) 16:22, 14 November 2015 (UTC)[reply]

Votes

  1.   Oppose WMF specific and well within the capability of volunteer editors. There is no need for this to be an extension, the gadgets above should be sufficient. This proposal would also be made even more unnecessary by having a global repository of gadgets. MER-C (talk) 09:52, 30 November 2015 (UTC)[reply]
  2.   Support anything that makes the adding of assessment templates is a Good Thing. Allows us to look at overviews of the development of subjects. Casliber (talk) 05:04, 1 December 2015 (UTC)[reply]
  3.   Support--Shizhao (talk) 09:32, 1 December 2015 (UTC)[reply]
  4.   Support although I wouldn't use the Rater script as it currently exists as it snags the load of some pages. Something more advanced without bugs, yes. Stevie is the man! TalkWork 13:52, 1 December 2015 (UTC)[reply]
  5.   Support Sadads (talk) 15:41, 1 December 2015 (UTC)[reply]
  6.   Support Goombiis (talk) 16:17, 1 December 2015 (UTC)[reply]
  7.   Support Sethtalk 16:51, 1 December 2015 (UTC)[reply]
  8.   Support – I've just discovered Rater on en.wp and am finding it very useful. Smaller projects like cy.wp miss out on scripts like this as there are fewer editors with the coding expertise, which is why I'm supporting this proposal. Ham II (sgwrs / talk) 18:45, 2 December 2015 (UTC)[reply]
  9.   Neutral I believe it would be easy to implement this if each wiki would have local Wikibase repository --AS (talk) 09:23, 3 December 2015 (UTC)[reply]
  10.   Support as Casliber said, it would make adding assessment templates easier, it would be pretty useful as a second more general point. - SantiLak (talk) 10:24, 4 December 2015 (UTC)[reply]
  11.   Support - ƬheStrikeΣagle 16:11, 6 December 2015 (UTC)[reply]
  12.   Support - WikiProjects aren't so active these days, but assessments are still valuable, so I would support attempts to make engaging in such activity easier/accessible — Rhododendrites talk \\ 17:16, 6 December 2015 (UTC)[reply]
  13.   Oppose - while I'm not opposed to the concept, it seems that this is sufficiently covered by other efforts, so isn't the kind of thing I would want to tie up limited paid staff resources to work on. Wbm1058 (talk) 15:13, 7 December 2015 (UTC)[reply]
  14.   Support. This could be useful, and I can see myself using it. NinjaRobotPirate (talk) 10:49, 14 December 2015 (UTC)[reply]
  15.   Support --Rahmanuddin (talk) 14:54, 14 December 2015 (UTC)[reply]

Improve the "copy and paste detection" bot

Currently we have a bot that analysis "all" new edits to en WP for copyright concerns. The output is here. And there is the potential for it to work in a number of other languages.

Problem is that it is not up as reliably as it should be. Also presentation of the concerns could be improved. Would love to see the output turned into an extension and formatted similar to the en:Special:NewPagesFeed

Currently the output is sort-able by WikiProject. It would be nice to create WikiProject specific modules to go on individual project pages. Doc James (talk · contribs · email) 03:45, 4 November 2015 (UTC)[reply]

Earlier discussion and endorsements

Votes

  1.   Support 4nn1l2 (talk) 03:00, 30 November 2015 (UTC)[reply]
  2.   Support --Tobias1984 (talk) 11:17, 30 November 2015 (UTC)[reply]
  3.   Support Lugnuts (talk) 12:00, 30 November 2015 (UTC)[reply]
  4.   Support This is one of the most amazing and Wikipedia-changing automated tools to come to editor attention in some years. Having automated copyright detection should be a priority because of the time that it saves experienced editors and the credibility that it gives to Wikimedia projects. Blue Rasberry (talk) 16:34, 30 November 2015 (UTC)[reply]
  5.   Support Great idea which is going to save a lot of time. Bharatiya29 (talk) 17:37, 30 November 2015 (UTC)[reply]
  6.   Support This is very important. Tryptofish (talk) 18:15, 30 November 2015 (UTC)[reply]
  7.   Support Armbrust (talk) 22:29, 30 November 2015 (UTC)[reply]
  8.   Support --Isacdaavid (talk) 02:06, 1 December 2015 (UTC)[reply]
  9.   Support Risker (talk) 04:21, 1 December 2015 (UTC)[reply]
  10.   Support Casliber (talk) 05:03, 1 December 2015 (UTC)[reply]
  11.   Support Doc James (talk · contribs · email) 09:23, 1 December 2015 (UTC)[reply]
  12.   Support other languages--Shizhao (talk) 09:33, 1 December 2015 (UTC)[reply]
  13.   Support, especially "WikiProject specific modules to go on individual project pages". Perhaps this could also coordinate with the bot that creates cleanup listings for WikiProjects. Stevie is the man! TalkWork 14:05, 1 December 2015 (UTC)[reply]
  14.   Support --Arnd (talk) 14:41, 1 December 2015 (UTC)[reply]
  15.   Support Mbch331 (talk) 14:48, 1 December 2015 (UTC)[reply]
  16.   Support --Natkeeran (talk) 14:50, 1 December 2015 (UTC)[reply]
  17.   Support as an extension that can be easily enabled on other wikis/projects. -- Dave Braunschweig (talk) 15:09, 1 December 2015 (UTC)[reply]
  18.   Support it would be a great save of time! --Nastoshka (talk) 15:34, 1 December 2015 (UTC)[reply]
  19.   Support Cavamos (talk) 15:34, 1 December 2015 (UTC)[reply]
  20.   Support - SantiLak (talk) 10:25, 4 December 2015 (UTC)[reply]
  21.   Support Goombiis (talk) 16:18, 1 December 2015 (UTC)[reply]
  22.   Support --Jarekt (talk) 17:11, 1 December 2015 (UTC)[reply]
  23.   Support This is an important issue. --Frmorrison (talk) 17:13, 1 December 2015 (UTC)[reply]
  24.   Support --SucreRouge (talk) 17:40, 1 December 2015 (UTC)[reply]
  25.   Support --Wesalius (talk) 18:49, 1 December 2015 (UTC)[reply]
  26.   Support StevenJ81 (talk) 21:49, 1 December 2015 (UTC)[reply]
  27.   Support--Jey (talk) 22:03, 1 December 2015 (UTC)[reply]
  28.   Support, it would be extremely useful, and it is not just a Wikipedia tool, as it should cover all languages and all wikis (copyvio is also a big problem for Wikibooks, Wikinews, Wikiversity) — NickK (talk) 23:37, 1 December 2015 (UTC)[reply]
  29.   Support Spencer (talk) 01:05, 2 December 2015 (UTC)[reply]
  30.   Support Good idea. Beyond My Ken (talk) 02:09, 2 December 2015 (UTC)[reply]
  31.   Support --Chaoborus (talk) 02:18, 2 December 2015 (UTC)[reply]
  32.   Support --Rosiestep (talk) 02:34, 2 December 2015 (UTC)[reply]
  33.   Support --Shubha (talk) 04:41, 2 December 2015 (UTC)[reply]
  34.   Support --Jasonzhuocn (talk) 06:58, 2 December 2015 (UTC)[reply]
  35.   Support Litlok (talk) 08:10, 2 December 2015 (UTC)[reply]
  36.   Support Anything that might help reduce the rampant copy-pasting on India/Pakistan-related articles has to be A Good Thing. - Sitush (talk) 08:44, 2 December 2015 (UTC)[reply]
  37.   Support Amen Sitush, amen. Bgwhite (talk) 09:40, 2 December 2015 (UTC)[reply]
  38.   Support Especially "the potential for it to work in a number of other languages" bit. ...Aurora... (talk) 10:29, 2 December 2015 (UTC)[reply]
  39.   Support --β16 - (talk) 11:37, 2 December 2015 (UTC)[reply]
  40.   Support It's surprising this hasn't been done yet.  DiscantX 12:01, 2 December 2015 (UTC)[reply]
  41.   Support Matěj Suchánek (talk) 15:26, 2 December 2015 (UTC)[reply]
  42.   Support Fluffernutter (talk) 16:33, 2 December 2015 (UTC)[reply]
  43.   Support Gap9551 (talk) 20:11, 2 December 2015 (UTC)[reply]
  44.   Support Absolutely. Logical Fuzz (talk) 20:47, 2 December 2015 (UTC)[reply]
  45.   Support - tucoxn\talk 14:02, 3 December 2015 (UTC)[reply]
  46.   Support Tremendously important. --Dweller (talk) 15:25, 3 December 2015 (UTC)[reply]
  47.   Neutral I don't understand the idea. If it's about detecting copy-paste moving,   Support, otherwise   Neutral Krett12 (talk) 16:20, 3 December 2015 (UTC)[reply]
  48.   Support - Sarahj2107 (talk) 21:34, 3 December 2015 (UTC)[reply]
  49.   Support Nikkimaria (talk) 00:49, 4 December 2015 (UTC)[reply]
  50.   Neutral What's about quotes ? A human control after bot detection is essential ! Lionel Scheepmans Contact French native speaker, désolé pour ma dysorthographie 23:00, 4 December 2015 (UTC)[reply]
  51.   Support --Yeza (talk) 16:28, 5 December 2015 (UTC)[reply]
  52.   Support לסטר (talk) 18:13, 5 December 2015 (UTC)[reply]
  53.   Support J36miles (talk) 00:33, 6 December 2015 (UTC)[reply]
  54.   Support - ƬheStrikeΣagle 16:11, 6 December 2015 (UTC)[reply]
  55.   Support Jim Carter (talk) 07:50, 7 December 2015 (UTC)[reply]
  56.   Support Wbm1058 (talk) 15:19, 7 December 2015 (UTC)[reply]
  57.   Support Mpn (talk) 18:13, 7 December 2015 (UTC)[reply]
  58.   Support Daniel Case (talk) 19:14, 8 December 2015 (UTC)[reply]
  59.   Support AlbinoFerret (talk) 18:24, 10 December 2015 (UTC)[reply]
  60.   Support As more people around the world gain internet access, we'll see a lot more editors unfamiliar with Wikipedia policy, and who don't see a problem with copy/pasting large amounts of text. This is evident in India articles, and the problem will only grow. Nocowardsoulismine (talk) 16:08, 12 December 2015 (UTC)[reply]
  61.   Support Besides improving the GUI-and-features of the EranBot, additionally I would also like to see the GUI-and-features of the tool on which EranBot depends improved, the CopyViosTool at toolserver. In particular, the en:WP:DIFF-like functionality leaves something to be desired.[1][2] See also, Editing#Improved_diff_compare_screen proposal, which is also DIFF-related technology. 75.108.94.227 16:57, 13 December 2015 (UTC)[reply]
  62.   Support I had no contact with this bot before, but it appears really really useful.--MisterSanderson (talk) 03:00, 14 December 2015 (UTC)[reply]
  63.   Support. This should be a priority. NinjaRobotPirate (talk) 10:51, 14 December 2015 (UTC)[reply]
  64.   Support --Davidpar (talk) 14:22, 14 December 2015 (UTC)[reply]
  65.   Support --Rahmanuddin (talk) 14:56, 14 December 2015 (UTC)[reply]

Machine-learning tool to reduce toxic talk page interactions

Proposal

Build an AI tool to identify occurrences of apparent talk page abuse in the English Wikipedia in real time, building on existing en:WP functions such as tags and edit filters.

Envisaged benefits
  1. An edit filter could warn users before posting that their comment may need to be refactored to be considered appropriate:
    1. Cutting down on the number of abusive talk page messages actually posted.
  2. Editors could check recent changes for tagged edits:
    1. Bringing much-needed third eyes to talk pages where an editor may be facing sexual harassment or other types of abuse.
    2. Improving response times and relieving victims of the burden of having to ask an admin for help.
  3. Prevention of talk page escalation.
  4. Improvement of talk page culture.
  5. Enhanced editor retention.

Some prior discussion of this idea can be found at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)#Proposed:_Tag_.2F_edit_filter_for_talk_page_abuse

As User:Denny pointed out on the Wikimedia-l mailing list yesterday, a similar project has reportedly been run in the League of Legends online gaming community to improve the quality of social interactions, with considerable success: occurrences of verbal abuse in that community are reported to have dropped by more than 40 percent.

Another interesting finding from that project was: 87 percent of online toxicity came from the neutral and positive citizens just having a bad day here or there. [...] We had to change how people thought about online society and change their expectations of what was acceptable. That is something that seems to apply to Wikipedia as well.

The game designers and scientists working on this project started out by compiling a large dataset of interactions community members deemed counterproductive (toxic behaviour, harassment, abuse) and then applied machine learning to this dataset to be able to provide near real-time feedback to participants on the quality of their interaction. (They're also looking at identifying positive, collaborative behaviours.)

I would love to see the Foundation explore if this approach could be adapted to address the very similar problems in the Wikipedia community. The totality of revision-deleted and oversighted talk page posts in the English Wikipedia could provide an initial dataset, for example; like the League of Legends community, the Foundation could invite outside labs and academic institutes to help analyse this dataset.

There are considerable difficulties involved in building a system sophisticated enough to avoid unacceptable numbers of false positives, but this is a challenge familiar from ClueBot programming, and one the League of Legends team seems to have mastered: Just classifying words was easy, but what about more advanced linguistics such as whether something was sarcastic or passive-aggressive? What about more positive concepts, like phrases that supported conflict resolution? To tackle the more challenging problems, we wanted to collaborate with world-class labs. We offered the chance to work on these datasets and solve these problems with us. Scientists leapt at the chance to make a difference and the breakthroughs followed. We began to better understand collaboration between strangers, how language evolves over time and the relationship between age and toxicity; surprisingly, there was no link between age and toxicity in online societies.

A successful project of this type could subsequently be offered to other Wikimedia projects as well. It would address a long-standing and much-discussed problem in the English Wikipedia, and put the Foundation at the leading edge of internet culture. Andreas JN466 20:03, 14 November 2015 (UTC)[reply]

Earlier discussion and endorsements
  • Neither an endorsement nor a rejection at this point (I could see this being workable and helpful. I could also see it going down in flames. It would all depend on the implementation, legalities, and subsequent use of the tool), but here are some initial and somewhat rambling thoughts that might help you expand on this, Jayen466:
    • The dataset of "revdeleted and oversighted edits" contains within it a large quantity of personally identifying information about editors, article subjects, etc (as well as things like allegations of criminal wrongdoing). Even within the WMF, I don't think the majority of staff are under an NDA that covers this type of personal information (and appropriately so - that kind of stuff needs to be as compartmentalized and protected as possible), and certainly the majority of community edit filter managers have not been vetted by the community for handling this type of information. A tool running on the OS dataset would thus need some kind of...is "backstopping" the right word? It would need to be either developed and tested only by people who are under an NDA covering this type of information, or the dataset would need to be pre-sanitized of that kind of information (replacing it with placeholder text along the lines of "[home address removed]"? or something?) before being turned over to whoever will develop the tool. That's a lot of work and a lot of person-hours from NDA'd folk; would this machine learning work be valuable enough to prioritize over other tasks?
    • A tool of this type could only ever catch the lowest-hanging fruit. Like, certainly it could catch "you're an asshole" or "[username] is a shit editor", but the kind of machine learning that would catch, say, "as for my esteemed opponent, I do question whether he is entirely qualified to make such judgments, given his affinity for barnyard animals"...afaik that doesn't exist and maybe never will. That's not to say a tool that only catches the low-hanging fruit is a tool that's useless, either, though - the League of Legends data you cite about the bulk of the abuse coming from non-repeat-offenders makes me think that the bulk of the abuse was probably not of the exquisitely-phrased-to-evade-censors type, either. If a filter could catch even 25% of the abuse that flies, particularly from the people who are acting in the heat of the moment and could benefit from a tap on the shoulder and a "hey, think about that", I think I'd consider it a success.
    • You're not wrong to call the revdel + OS dataset a corpus of "unacceptable speech", but I think we need to keep something in mind about that data compared to LoL's crowdsourced voting: the vast, vast majority of OS/revdel decisions-to-hide-text were made by a single person on the basis of a) a much more general policy bullet point and b) that person's own judgment about whether that edit fit within that policy point. For things like the "big" racial slurs, that's probably a judgment that can be generalized to the community at large, but for a lot of things, it's more debatable than that, both because people's personal opinions vary and because the culture's understanding of what is and isn't grievously unacceptable changes over time (for the former, think of something like the British-American divide on the c-word; for the latter, think of how internet culture would have reacted to someone saying "that's so gay" or the gay-slur f-word ten years ago compared to today). When the eventual tool is developed to filter/flag these unacceptable-on-wikimedia utterances, whose standard will we use? A possible solution that occurs to me would be to merge LoL-style voting with machine learning and use an iterative stragegy: develop machine learning tool, run on test data to generate a set of potential flags, and then have the community vote/RfC on the machine learning tool's flags' accuracy (in the sense of "yes this thing that was identified is indeed a problem we would handle as humans if we saw it"). That's another huge time investment, both for developers and community, though. Fluffernutter (talk) 16:26, 15 November 2015 (UTC)[reply]
See also Research:Revision scoring as a service and ORES. Helder 20:22, 15 November 2015 (UTC)[reply]
  Endorsed This is a good idea. While there are always challenges in anything this complex, I suspect that toxic interactions are a major contributor to burnout of editors. Reducing the potential for such interactions will have long-term benefits that greatly outweigh the short-term costs of development of this tool. Etamni (talk) 17:19, 16 November 2015 (UTC)[reply]
  Support Also a good tools to fight vandals. Yosri (talk) 14:53, 19 November 2015 (UTC)[reply]
  Endorsed This is likely to to take a long time and a lot of experimentation to get right, but the possible benefits are enough to make it well worth trying. JohnCD (talk) 21:57, 20 November 2015 (UTC)[reply]

Votes

  1. Comment: This is something I mentioned to an OS during Wikimania actually. Basically, imagine a system that automatically hides suspicious edits pending OS review. Think of it like FlaggedRevs but with a more strict set of rules. Certain OSable content is fairly trivial to spot. Emails and phone numbers, social security numbers etc. follow a certain pattern detectable with a simple regex for example. AI would learn such rules. We often see same content be reposted as well. I would propose a OSed content campaign where clustered content would be hand labelled by humans (perhaps oversight users to avoid privacy concerns). Humans would go through the unsupervised clusters and label them (this one looks like a phone number so it should be hidden, that one looks ok so it doesn't need to be hidden) and we can then use this to train AI (clustering or multi class classification). AI would develop a level of confidence of "bad content" such that if the post is above a certain threshold it would be hidden until reviewed by AI. -- とある白い猫 chi? 19:47, 30 November 2015 (UTC)[reply]
  2.   Support Anthonyhcole (talk) 09:18, 1 December 2015 (UTC)[reply]
  3.   Support · · · Peter (Southwood) (talk): 13:50, 1 December 2015 (UTC)[reply]
  4.   Neutral I wouldn't mind seeing a pilot done on particular parts of the English Wikipedia where negative talk page interactions tend to be a big issue, but I will withhold support pending the results of said pilot. This idea sounds like "easier said than done", but I'm willing to be open-minded and see what can be done. At any rate, I would oppose any full rollout until any serious kinks are worked out. Stevie is the man! TalkWork 14:14, 1 December 2015 (UTC)[reply]
  5.   Oppose ORES is a better way to do this. After all, edit filters on enwiki (the database name for en.wikipedia) have for a long time pretty much used up the limits they have.--Snaevar (talk) 16:30, 1 December 2015 (UTC)[reply]
  6.   Opposedifficult in chinese.--Temp3600 (talk) 16:38, 1 December 2015 (UTC)[reply]
  7.   Neutral I did not notice this to be a big problem. --Jarekt (talk) 17:12, 1 December 2015 (UTC)[reply]
  8.   Support on an experimental basis. We'll never know if this will work unless we try it. Eman235/talk 21:01, 1 December 2015 (UTC)[reply]
  9.   Oppose - We've got too much of the Friendly Space baloney going on already without adding a highly fallible bot layer. These things don't work elsewhere and won't work here. Scunthorpe, anyone? - Sitush (talk) 08:41, 2 December 2015 (UTC)[reply]
  10.   Oppose any project for English Wikipedia only. A Community Tech project must be a global one, but making a reasonable machine learning tool for detecting problematic talk page comments is a huge challenge even for one language (as our words may be offensive of not depending on the context), and would take an unreasonable number of resources for all wikis — NickK (talk) 10:03, 2 December 2015 (UTC)[reply]
  11.   Oppose Seems that it would be too difficult to do effectively and with enough precision for it to be worth the man hours.  DiscantX 12:03, 2 December 2015 (UTC)[reply]
  12.   Support Keep its purpose to a manageable task (fending off gross abuse) and this bot could make WP a much more attractive place. Currently, Talkpages are only lightly guarded, and a surprising amount of unhelpful remarks (i.e. obvious vandalism in the form of profane non sequiturs and other gratuitous scribbling) is kept forever, much to the detriment of WP's public image. SteveStrummer (talk) 05:16, 3 December 2015 (UTC)[reply]
  13.   Oppose As even humans misunderstand each other all the time, I have no confidence a machine will cope with this task. --Dweller (talk) 15:26, 3 December 2015 (UTC)[reply]
  14.   Oppose per Dweller. ƬheStrikeΣagle 16:11, 6 December 2015 (UTC)[reply]
  15.   Neutral Searching for strings that could be phone numbers, social security numbers, and other sensitive information sounds like a great project, but this seems too nebulous/open-ended, with too many areas that can go wrong beyond those numeric strings which seems like a much more doable project that enwp could do on its own, since it's less applicable elsewhere. — Rhododendrites talk \\ 17:13, 6 December 2015 (UTC)[reply]
  16.   Neutral I like the idea, but I'm not sure we have sufficient actual-human resources to positively analyze and act upon the talk-abuse occurrences such a bot would detect, nor do I have confidence in the Foundation staff's technical abilities to develop such a bot. Unless maybe such tools developed elsewhere can successfully be leveraged by the Foundation? Wbm1058 (talk) 15:29, 7 December 2015 (UTC)[reply]
  17.   Oppose Mpn (talk) 18:14, 7 December 2015 (UTC)[reply]
  18.   Oppose While I endorse any effort to improve civility on Wikipedia, I'm not comfortable with using bots to accomplish this. However, if a pilot project such as described by Stevietheman were preformed first & that showed promise, I'll reconsider my opposition. -- Llywrch (talk) 18:59, 11 December 2015 (UTC)[reply]
  19.   Support --Tgr (talk) 22:29, 13 December 2015 (UTC)[reply]
  20.   Support. This is worth exploring in a limited trial. I agree that a full rollout is potentially a bad idea. NinjaRobotPirate (talk) 11:05, 14 December 2015 (UTC)[reply]

See also: mw:Archived Pages.

Most external links have am average lifespan of about 7 years before they go dead. As Wikipedia ages, the dead external links problem grows exponentially. Internet Archive has partnered with Wikipedia to ensure all new external links have a Wayback cache. However there has been no formal process of adding the Wayback links to Wikipedia (via the cite web |archiveurl=.. feature for example). There have been attempts to automate with various bots (see en:WP:Link rot) but the coding is non-trivial and multiple volunteer efforts have stalled. Likely what will be required is a team of programmers working full-time, something that is beyond the scope of a few volunteers working spare time. It's the sort of coding work that MediaWiki could sponsor and make a big difference in the quality of content, impacting every article. -- Green Cardamom (talk) 19:27, 7 November 2015 (UTC)[reply]

Earlier discussion and endorsements
  Endorsed Much needed and certainly very important for smaller Wikipedias as well. Jopparn (talk) 09:12, 8 November 2015 (UTC)[reply]
  Endorsed We have made significant progress with en:User:Cyberbot II adding links to archiveurls, but there needs to be a good technical way to store. Talked with @Jdforrester (WMF): about building it into citoid at WikiCon USA. Internet archive was there, and expressed an interest in pushing their API's to the limit, to fix the 404 and other errors on Wikipedia, Sadads (talk) 10:11, 8 November 2015 (UTC)[reply]
  Endorsed Agree that this is very much needed. ONUnicorn (talk) 13:47, 8 November 2015 (UTC)[reply]
  Endorsed Much globally needed, volunteer efforts shouldn't be the only way for an important feature like this. --AlessioMela (talk) 21:06, 8 November 2015 (UTC)[reply]
  Endorsed and also migrate to WebCite--Shizhao (talk) 01:57, 10 November 2015 (UTC)[reply]
  Endorsed Useful. --Piotrus (talk) 04:58, 10 November 2015 (UTC)[reply]
  Endorsed Useful. Aphaia (talk) 05:05, 10 November 2015 (UTC)[reply]
  Endorsed Very good idea, it could be very useful! Restu20 07:33, 10 November 2015 (UTC)[reply]
  Endorsed Esquilo (talk) 08:07, 10 November 2015 (UTC)[reply]
  Endorsed For all Wikipedias including RTL wikis. 4nn1l2 (talk) 09:09, 10 November 2015 (UTC)[reply]
  Endorsed Danmichaelo (talk) 21:26, 10 November 2015 (UTC)[reply]
  Endorsed Kropotkine 113 (talk) 21:32, 10 November 2015 (UTC)[reply]
  Endorsed I do this all the time by hand, it would be great if there was an automated procedure to take care of it. --Waldir (talk) 13:40, 14 November 2015 (UTC)[reply]
  Endorsed Useful. Afernand74 (talk) 17:28, 14 November 2015 (UTC)[reply]
  Endorsed particularly with newly-added links. If there is an API-equivalent to the "save page now" button at https://archive.org/web/ , it should be useful for this task. Davidwr/talk 05:41, 16 November 2015 (UTC)[reply]
  Endorsed Libcub (talk) 07:00, 20 November 2015 (UTC)[reply]
As a programmer Ill note that this shouldn't be that difficult since the release of MW 1.22. Key points is to use el_id from the database to make incremental dumps of all external links. Feed those to archive.org to archive, have a bot throw in those links. In reality the hardest part is figuring out which snapshot should be used. Otherwise the rest is easy to do via bot. https://tools.wmflabs.org/betacommand-dev/cgi-bin/sandbox?page=Redeemer_Presbyterian_Church_(New_York_City) is an example of what my tools have been doing for years. Δ (talk) 19:32, 20 November 2015 (UTC)[reply]
  Endorsed I did by hand, it is a terrible waste of time...--Alexmar983 (talk) 22:54, 21 November 2015 (UTC)[reply]
Comment: I was pointed here as someone already involved in this project. I am indeed already working on a bot, approved on the English Wikipedia on a bot that does just that. It's pretty far in development and am working with the WMF and IA to coordinate on getting a fully functional bot. Our aim is to get this running on top 30 Wikipedias. —cyberpower ChatHello! 05:12, 10 December 2015 (UTC)[reply]

Votes

  1.   Support 4nn1l2 (talk) 03:01, 30 November 2015 (UTC)[reply]
  2.   Support Jenks24 (talk) 10:16, 30 November 2015 (UTC)[reply]
  3.   Support Lugnuts (talk) 12:01, 30 November 2015 (UTC)[reply]
  4.   Support Debresser (talk) 12:55, 30 November 2015 (UTC)[reply]
  5.   Support Wildthing61476 (talk) 13:22, 30 November 2015 (UTC)[reply]
  6.   Support MrX (talk) 15:08, 30 November 2015 (UTC)[reply]
  7.   Support TeriEmbrey (talk) 15:55, 30 November 2015 (UTC)[reply]
  8.   Support Internet Archive should be Wikimedia's best friend! Blue Rasberry (talk) 16:35, 30 November 2015 (UTC)[reply]
  9.   Support Daniel Case (talk) 17:17, 30 November 2015 (UTC)[reply]
  10.   Support Bharatiya29 (talk) 17:37, 30 November 2015 (UTC)[reply]
  11.   Support BethNaught (talk) 17:43, 30 November 2015 (UTC)[reply]
  12.   Support PresN (talk) 17:48, 30 November 2015 (UTC)[reply]
  13.   Oppose Bots notifying/helping with a migration to archive.org links were appropriate are already available. A fully automated process, which is imho likely to automated cause errors as well is imho a rather bad idea. Money for professional development is better spend elsewhere on many other pressing issues. For the link management a half automated approach is the way to go imho.--Kmhkmh (talk) 19:38, 30 November 2015 (UTC)[reply]
  14.   Oppose Isn't it ironic that a community that uses the most flexible content management system you could possibly think of longs for becoming a museum of sorts, offering its readers the state that used to be seven or eight years ago? Wikipedia was successful against its competitors because it provided up to date content in any way you want. Now, it's becoming a museum. At best, an interface to the internet archive's wayback machine. But that's already available next door. Go and get moving, look for what's up now, instead. The future does not lie in the world we had yesterday.--Aschmidt (talk) 20:10, 30 November 2015 (UTC)[reply]
    Nothing stops editors from adding fresher references. Stevie is the man! TalkWork 14:38, 1 December 2015 (UTC)[reply]
  15.   Support --YodinT 02:00, 1 December 2015 (UTC)[reply]
  16.   Support --Isacdaavid (talk) 02:06, 1 December 2015 (UTC)[reply]
  17.   Support. This is a basic function for keepingthe links usable. DGG (talk) 02:07, 1 December 2015 (UTC)[reply]
  18.   Support - Not all links last forever and plus I've already been converting dead links to Archived versions and so It'd be better if a bot could do it. –Davey2010Talk 02:43, 1 December 2015 (UTC)[reply]
  19.   Support some good stuff gets archived and disappears every day. This is a good idea. Casliber (talk) 05:05, 1 December 2015 (UTC)[reply]
  20.   Support--Kippelboy (talk) 05:30, 1 December 2015 (UTC)[reply]
  21.   Support--Gbeckmann (talk) 09:14, 1 December 2015 (UTC)[reply]
  22.   Neutral-- Has been done. Is a great idea. Should be running in the next few weeks from what I understand. Doc James (talk · contribs · email) 09:24, 1 December 2015 (UTC)[reply]
  23.   Support--Shizhao (talk) 09:34, 1 December 2015 (UTC)[reply]
  24.   Support--Purodha Blissenbach (talk) 10:19, 1 December 2015 (UTC)[reply]
  25.   Support · · · Peter (Southwood) (talk): 13:52, 1 December 2015 (UTC)[reply]
  26.   Support. When sources are cited and then vanish, credibility suffers. LLarson (talk) 14:00, 1 December 2015 (UTC)[reply]
  27.   Support as long as this is well-tested enough to not cause new headaches for editors. Also, I don't want to see perfectly good original links replaced with archive links from the reader's point of view -- this replacement should only occur if the original link is dead. Another concern is that sometimes webpages migrate without proper redirects instead of truly going dead -- might we also have a tool that hunts around for where the webpage moved to, so we can maintain a fresh original link? Stevie is the man! TalkWork 14:35, 1 December 2015 (UTC)[reply]
  28.   Support --Arnd (talk) 14:42, 1 December 2015 (UTC)[reply]
  29.   Support --Natkeeran (talk) 14:50, 1 December 2015 (UTC)[reply]
  30.   Neutral It's a little hard to support this as long as IA maintains their policy of retroactively applying robots.txt. In the worst case, some domain is archived for years and the archive links are used, then the registration expires, the domain is scooped up by a squatter or some other company, they put up a robots.txt denying access, and boom, the existing archives for the former site under that domain are inaccessible. Or a site is bought out by some other company, and the new company redirects every URL from the old site to their existing homepage and throws up a robots.txt denying access to everything on the old site so as to prevent Google potentially penalizing it as a SEO trick, and boom, the existing archives for the old site are inaccessible. Or a site just reorganizes everything and puts up a robots.txt blocking access to the old URLs to get them out of Google searches, and boom, the archives for the old pages are inaccessible. Anomie (talk) 14:53, 1 December 2015 (UTC)[reply]
    The situations you describe happen rather rarely. More commonly seen are: 1) rare cases where robots.txt is changed for "censorship" purposes (this usually means the matter is controversial and there are probably secondary sources); 2) widespread cases of domain parkers which buy hundreds of thousands of domains and block everything in their robots.txt (these are usually smaller websites, but not always; it would be interesting to see how many are used as sources on Wikimedia wikis). Nemo 07:17, 7 December 2015 (UTC)[reply]
  31.   Support--KRLS (talk) 15:12, 1 December 2015 (UTC)[reply]
  32.   Support --Andyrom75 (talk) 15:12, 1 December 2015 (UTC)[reply]
  33.   Support It's for me a very important issue. People can be always able to check the source and to learn more about the topics mentionend. DanGong (talk) 15:14, 1 December 2015 (UTC)[reply]
  34.   Support Wittylama (talk) 15:15, 1 December 2015 (UTC) While the Internet Archive is an excellent 'catch all' source, I'd also like to see this solution be able to address the fact that many national libraries already perform web-archiving of their national domain. For example, the Pandora service of the National Library of Australia has a far more professional and consistent archive of Australian content than IA does, but it is done on a 'permission' basis due to local law. It would be good if any tool build to solve this request could be made to search other notable web-archives too. Wittylama (talk) 15:15, 1 December 2015 (UTC)[reply]
    Like Internet Archive, Pandora is member of the IIPC (International Internet Preservation Consortium), so they're supposedly already working together. Perhaps Wikimedia Foundation should join the consortium? If I understand correctly, your point is that we should look for more sources: that should be rather easy as long as they are IIPC members and use compatible tooling. In the long term, if WMF joined the IIPC, then it could push for some sort of federated openwayback (or just send patches upstream?).
    For now however, the main goal is probably to somehow store the archived URLs MediaWiki-side so that some sort of parsing can happen to fix links without running bots on hundreds wikis. Otherwise, bot owners are in a better position to deal with the issue. Nemo 07:34, 7 December 2015 (UTC)[reply]
  35.   Support--Bramfab (talk) 15:26, 1 December 2015 (UTC)[reply]
  36.   Support -- but work is already happening with The Wikipedia Library, Internet Archive, and the Citoid team, to support work on en:w:User:Cyberbot II implementation of archiveurls. If implemented, we need to build on existing work conversation with these teams. Sadads (talk) 15:43, 1 December 2015 (UTC)[reply]
  37.   Support Goombiis (talk) 16:19, 1 December 2015 (UTC)[reply]
  38.   Support JohanahoJ (talk) 16:42, 1 December 2015 (UTC)[reply]
  39.   Support This is a serious issue which definitely needs to be worked on, and is, in the end, going to need a much more permanent and inventive fix. I don't know if a full-on editing team is required -- maybe just an improved bot, or a section on some kind of common page (like the Wikpedia Community Portal) listing pages tagged with a "improve dead links" template (which would need to be created, I believe). I also definitely agree with user Sadads above, that any work done should build off of editors' current efforts, rather than starting completely from scratch. -- 2ReinreB2 (talk) 17:11, 1 December 2015 (UTC)[reply]
  40.   Support This is a growing issue that needs to be worked on. --Frmorrison (talk) 17:12, 1 December 2015 (UTC)[reply]
  41.   Support --Jarekt (talk) 17:14, 1 December 2015 (UTC)[reply]
  42.   Support I think that automatic migrating deadlink to Wayback Mashine will improve external links in Wikipedia. --Urbanecm (talk) 17:34, 1 December 2015 (UTC)[reply]
  43.   Support --SucreRouge (talk) 17:38, 1 December 2015 (UTC)[reply]
  44.   Support --Coentor (talk) 18:18, 1 December 2015 (UTC)[reply]
  45.   Support --Wesalius (talk) 18:50, 1 December 2015 (UTC)[reply]
  46.   Support --Usien6 (talk) 18:56, 1 December 2015 (UTC)[reply]
  47.   Support --Hkoala (talk) 20:23, 1 December 2015 (UTC)[reply]
  48.   Support --Akela (talk) 20:56, 1 December 2015 (UTC)[reply]
  49.   Support Something really, really needs to be done about the linkrot problem. Eman235/talk 21:03, 1 December 2015 (UTC)[reply]
  50.   Support It is very important for the verifiability now and in the future. Regards, Kertraon (talk) 21:33, 1 December 2015 (UTC)[reply]
  51.   Support StevenJ81 (talk) 21:51, 1 December 2015 (UTC)[reply]
  52.   Support Emptywords (talk) 00:01, 2 December 2015 (UTC) I was thinking about that for a long time.[reply]
  53.   Support Hondo77 (talk)
  54.   Comment Often, there is a better ("live") replacement for a dead link than the Wayback Machine's archived version. An automated process could discourage people from actively looking for such a replacement. Using an archived version should always be seen as a last-resort option; I'm not convinced that a blindly acting bot is what is needed here. Gestumblindi (talk) 01:11, 2 December 2015 (UTC)[reply]
  55.   Support --Rosiestep (talk) 02:36, 2 December 2015 (UTC)[reply]
  56.   Support but not without hesitation. Basically I agree with arguments used by Gestumblindi (discouraging pepole from active looking for "live" replacements)". On the other hand however an old link is better than none. Pawel Niemczuk (talk) 02:48, 2 December 2015 (UTC)[reply]
  57.   Support RoodyAlien (talk) 02:51, 2 December 2015 (UTC)[reply]
  58.   Support Syced (talk) 03:52, 2 December 2015 (UTC)[reply]
  59.   Support - Shubha (talk) 04:44, 2 December 2015 (UTC)[reply]
  60.   Support - WillemienH (talk) 05:15, 2 December 2015 (UTC)[reply]
  61.   Support --Moroboshi (talk) 06:57, 2 December 2015 (UTC)[reply]
  62.   Support --Jasonzhuocn (talk) 07:00, 2 December 2015 (UTC)[reply]
  63.   Support Litlok (talk) 08:10, 2 December 2015 (UTC)[reply]
  64.   Support - Sitush (talk) 08:38, 2 December 2015 (UTC)[reply]
  65.   Support, or use any other archive solution if Internet Archive is inappropriate, such as Wikiwix used by French Wikipedia — NickK (talk) 10:04, 2 December 2015 (UTC)[reply]
  66.   Support Graham87 (talk) 10:20, 2 December 2015 (UTC)[reply]
  67.   Support --β16 - (talk) 11:39, 2 December 2015 (UTC)[reply]
  68.   Support--Barcelona (talk) 11:49, 2 December 2015 (UTC)[reply]
  69.   Support Addition to multiple archives would be preferable though.  DiscantX 12:09, 2 December 2015 (UTC)[reply]
  70.   Support Bazj (talk) 12:13, 2 December 2015 (UTC)[reply]
  71.   Support--Manlleus (talk) 15:01, 2 December 2015 (UTC)[reply]
  72.   Support --Nux (talk) 19:42, 2 December 2015 (UTC)[reply]
  73.   Support WeeJeeVee (talk) 20:56, 2 December 2015 (UTC)[reply]
  74.   Support As the encyclopedia ages we will have more and more problems with dead links - this proposal sounds admirable. PamD (talk) 21:28, 2 December 2015 (UTC)[reply]
  75.   Support Thémistocle (talk) 21:55, 2 December 2015 (UTC)[reply]
  76.   Support Gap9551 (talk) 00:32, 3 December 2015 (UTC)[reply]
  77.   Support – This solution is better than nothing. (And agree with Steve is the man's point – I myself update links from Wayback Machine links relatively often...) IJBall (talk) 03:56, 3 December 2015 (UTC)[reply]
  78.   Support: Too many dead links. There are even no such bots running on Chinese Wikipedia now.- Earth Saver(talk)Peace, strive, save the Earth! at 05:54, 3 December 2015 (UTC)[reply]
  79.   Support Pbm (talk) 12:13, 3 December 2015 (UTC)[reply]
  80.   Support: It's necessary.--Bowleerin (talk) 13:20, 3 December 2015 (UTC)[reply]
  81.   Support - tucoxn\talk 14:02, 3 December 2015 (UTC)[reply]
  82.   Support Yes, yes, yes, yes please. --Dweller (talk) 15:27, 3 December 2015 (UTC)[reply]
  83.   Support Theredmonkey (talk) 19:35, 3 December 2015 (UTC)[reply]
  84.   Support - Sarahj2107 (talk) 21:35, 3 December 2015 (UTC)[reply]
  85.   Support Nikkimaria (talk) 00:49, 4 December 2015 (UTC)[reply]
  86.   Support - SantiLak (talk) 10:30, 4 December 2015 (UTC)[reply]
  87.   Support --Jane023 (talk) 16:19, 4 December 2015 (UTC)[reply]
  88.   Support - Wieralee (talk) 17:07, 4 December 2015 (UTC)[reply]
  89.   Support --The Polish (talk) 17:33, 4 December 2015 (UTC)[reply]
  90.   Support Bináris tell me 18:22, 4 December 2015 (UTC)[reply]
  91.   SupportLionel Scheepmans Contact French native speaker, désolé pour ma dysorthographie 22:59, 4 December 2015 (UTC)[reply]
  92.   Support - Shiftchange (talk) 03:24, 5 December 2015 (UTC)[reply]
  93.   Support --Yeza (talk) 16:29, 5 December 2015 (UTC)[reply]
  94.   Support J36miles (talk) 00:36, 6 December 2015 (UTC)[reply]
  95.   Support -- Gts-tg (talk) 01:54, 6 December 2015 (UTC)[reply]
  96.   Support -- Sir Gawain (talk) 14:18, 6 December 2015 (UTC)[reply]
  97.   Support - ƬheStrikeΣagle 16:11, 6 December 2015 (UTC)[reply]
  98.   Support - valuable for article sourcing, but also takes steps against a common form of spam (looking for deadlinks, copying content from an archive to a personal ad-filled website, and linking to it) — Rhododendrites talk \\ 17:15, 6 December 2015 (UTC)[reply]
  99.   Support --Waldir (talk) 12:55, 7 December 2015 (UTC)[reply]
  100.   Support --100   Wbm1058 (talk) 15:44, 7 December 2015 (UTC)[reply]
  101.   Support --Bender235 (talk) 01:23, 8 December 2015 (UTC)[reply]
  102.   Support Anyone working on a old half-finished article has had the experience of spending entirely too much time chasing down references that have rotted. Courcelles 08:15, 8 December 2015 (UTC)[reply]
  103.   Support - Bcharles (talk) 23:14, 8 December 2015 (UTC)[reply]
  104.   Support - Valuable proposal. The possibility of extending it to WebCite should also be investigated. - Pointillist (talk) 14:00, 9 December 2015 (UTC)[reply]
  105.   Neutral - WebCite and other ones must be added, too, as one engine only is not reliable. Zezen (talk) 08:58, 10 December 2015 (UTC)[reply]
  106.   Support Therud (talk) 09:16, 10 December 2015 (UTC)[reply]
  107.   Support AlbinoFerret 18:26, 10 December 2015 (UTC)[reply]
  108.   Support --João Carvalho (talk) 16:43, 11 December 2015 (UTC)[reply]
  109.   Support --Edgars2007 (talk) 08:54, 12 December 2015 (UTC)[reply]
  110.   Support Beagel (talk) 15:15, 12 December 2015 (UTC)[reply]
  111.   Support --R. S. Shaw (talk) 03:01, 13 December 2015 (UTC)[reply]
  112.   Support --Piramidion 13:10, 13 December 2015 (UTC)[reply]
  113.   Support --ESM (talk) 16:01, 13 December 2015 (UTC)[reply]
  114.   Support GREAT! Much needed!! --MisterSanderson (talk) 03:00, 14 December 2015 (UTC)[reply]
  115.   Support --Davidpar (talk) 14:22, 14 December 2015 (UTC)[reply]
  116.   Support We are at tewiki doing something similar. How to deal with caching already dead pages? --Rahmanuddin (talk) 15:00, 14 December 2015 (UTC)[reply]
  117.   Support -- AshLin (talk) 18:43, 14 December 2015 (UTC)[reply]
  118.   Support Armbrust (talk) 22:46, 10 January 2016 (UTC)[reply]
  119.   Neutral-- I am Pascal Martin, creator of Wikiwix, the French wikipedia link archiver since 2008. We host 80 million links from fr.wikipedia.org on our archive. We also archive the English ( not sure all ) Hungarian and Romanian link sources although we do not link to these. I think that a better solution than having all links archived by IA would be to allow users a choice of which archive version to use. This already works very well on dead links on fr.wikipedia.org, as for example in references 4 and 5 on this article: https://fr.wikipedia.org/wiki/Front_de_gauche_(France).And if Wikimedia Fondation is thinking of sponsoring someone for archives, Wikiwix would like to be considered for the job! Partner of the WMF since 2008 [3]

Pmartin (talk) 19:33, 12 February 2016 (UTC)[reply]