Talk:Spam blacklist

Active discussions
Requests and proposals Spam blacklist Archives (current)→
The associated page is used by the MediaWiki Spam Blacklist extension, and lists regular expressions which cannot be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any meta administrator can edit the spam blacklist; either manually or with SBHandler. For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.
Proposed additions
Please provide evidence of spamming on several wikis and prior blacklisting on at least one wiki. Spam that only affects a single project should go to that project's local blacklist. Exceptions include malicious domains and URL redirector/shortener services. Please follow this format. Please check back after submitting your report, there could be questions regarding your request.
Proposed removals
Please check our list of requests which repeatedly get declined. Typically, we do not remove domains from the spam blacklist in response to site-owners' requests. Instead, we de-blacklist sites when trusted, high-volume editors request the use of blacklisted links because of their value in support of our projects. Please consider whether requesting whitelisting on a specific wiki for a specific use is more appropriate - that is very often the case.
Other discussion
Troubleshooting and problems - If there is an error in the blacklist (i.e. a regex error) which is causing problems, please raise the issue here.
Discussion - Meta-discussion concerning the operation of the blacklist and related pages, and communication among the spam blacklist team.
#wikimedia-external-linksconnect - Real-time IRC chat for co-ordination of activities related to maintenance of the blacklist.
Whitelists
There is no global whitelist, so if you are seeking a whitelisting of a url at a wiki then please address such matters via use of the respective Mediawiki talk:Spam-whitelist page at that wiki, and you should consider the use of the template {{edit protected}} or its local equivalent to get attention to your edit.

Please sign your posts with ~~~~ after your comment. This leaves a signature and timestamp so conversations are easier to follow.


Completed requests are marked as {{added}}/{{removed}} or {{declined}}, and are generally archived quickly. Additions and removals are logged · current log 2020/07.

snippet for logging
{{sbl-log|20279799#{{subst:anchorencode:SectionNameHere}}}}

Proposed additionsEdit

  This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users on multiple wikis. Completed requests will be marked as {{added}} or {{declined}} and archived.

IP-address spammerEdit

links


  Already done





  Already done



  Already done



spammers









Noticed the IP spammer adding the IP-link, which leads to the spammers of calenderdayo.com. I have therefore also included the IP of calenderdayo.com in this request. --Dirk Beetstra T C (en: U, T) 09:02, 2 July 2020 (UTC)

youtube.com/redirectEdit



urls such as

https://www.youtube.com/redirect?q=de.wikipedia.org

or

 https://www.youtube.com/redirect?q=%64%65%2e%77%69%6b%69%70%65%64%69%61%2e%6f%72%67

can be used to circumvent the SBL. so i propose to blacklist

Regex requested to be blacklisted: youtube\.[a-z]+\/redirect

-- seth (talk) 21:32, 13 July 2020 (UTC)

@Lustiger seth:   Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 07:52, 14 July 2020 (UTC)

Proposed additions (Bot reported)Edit

  This section is for domains which have been added to multiple wikis as observed by a bot.

These are automated reports, please check the records and the link thoroughly, it may report good links! For some more info, see Spam blacklist/Help#COIBot_reports. Reports will automatically be archived by the bot when they get stale (less than 5 links reported, which have not been edited in the last 7 days, and where the last editor is COIBot).

Sysops
  • If the report contains links to less than 5 wikis, then only add it when it is really spam
  • Otherwise just revert the link-additions, and close the report; closed reports will be reopened when spamming continues
  • To close a report, change the LinkStatus template to closed ({{LinkStatus|closed}})
  • Please place any notes in the discussion section below the HTML comment

COIBotEdit

The LinkWatchers report domains meeting the following criteria:

  • When a user mainly adds this link, and the link has not been used too much, and this user adds the link to more than 2 wikis
  • When a user mainly adds links on one server, and links on the server have not been used too much, and this user adds the links to more than 2 wikis
  • If ALL links are added by IPs, and the link is added to more than 1 wiki
  • If a small range of IPs have a preference for this link (but it may also have been added by other users), and the link is added to more than 1 wiki.
COIBot's currently open XWiki reports
List Last update By Site IP R Last user Last link addition User Link User - Link User - Link - Wikis Link - Wikis
epsteinsblackbook.com 2020-07-14 10:28:21 COIBot 172.67.163.98 46.30.172.4
5.88.25.97
2070-01-01 05:00:00 6 2
mathworks.com.au 2020-07-14 10:59:32 COIBot 144.212.244.17 R Eragon1001
Hrisiana
Jan Spousta
Ming mm
2070-01-01 05:00:00 14 8
pyramidtracks.com 2020-07-14 11:04:11 COIBot 104.31.92.152 176.61.146.23
193.138.63.157
213.89.234.18
37.120.154.214
62.182.99.69
2070-01-01 05:00:00 11 6
thewisy.com 2020-07-14 11:10:29 COIBot 172.67.146.202 103.255.5.51
42.201.177.169
58.65.221.217
2070-01-01 05:00:00 5 2
zila.com.vn 2020-07-14 08:50:32 COIBot 112.78.2.125 Thienvankcvgk 2070-01-01 05:00:00 10 10 0 0 2

Proposed removalsEdit

  This section is for proposing that a website be unlisted; please add new entries at the bottom of the section.

Remember to provide the specific domain blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as {{removed}} or {{declined}} and archived.

See also recurring requests for repeatedly proposed (and refused) removals.

Notes:

  • The addition or removal of a domain from the blacklist is not a vote; please do not bold the first words in statements.
  • This page is for the removal of domains from the global blacklist, not for removal of domains from the blacklists of individual wikis. For those requests please take your discussion to the pertinent wiki, where such requests would be made at Mediawiki talk:Spam-blacklist at that wiki. Search spamlists — remember to enter any relevant language code

rheingoenheim-info.deEdit



i don't understand the reason for blacklisting. at User:COIBot/XWiki/rheingoenheim-info.de there only 2 selected additions. -- seth (talk) 10:22, 11 July 2020 (UTC)

@Lustiger seth: hijacked, now lands at dolabuy.ru It will have been spambot hits  — billinghurst sDrewth 12:26, 11 July 2020 (UTC)
the problem is that now archived links to the old content are blacklisted, too.
the question is: has there been spamming with that url? the old link additions should not count as spam. -- seth (talk) 15:06, 11 July 2020 (UTC)
@Lustiger seth: why do you not use the whitelist for that, whitelist the whole link with appropriate intermediate .*? inbetween. --Dirk Beetstra T C (en: U, T) 16:07, 11 July 2020 (UTC)
hi!
i use the whitelist, if it is necessary. but in this case i still don't see the necessity for the blacklisting of the domain, because i don't see evidence for spamming.
imho blacklisting this domain is counterproductive, because it prevents people from fixing links that don't work any longer. -- seth (talk) 16:31, 11 July 2020 (UTC)
@seth This Link is only in article from deWiki this Spam blacklist is for all projects. I have looked at it, the entry is correct Please use the whitelist in deWiki.--𝐖𝐢𝐤𝐢𝐁𝐚𝐲𝐞𝐫 👤💬 17:05, 11 July 2020 (UTC)
i know where i am. :-)
links should be added to the global blacklist, if there is spamming across several wikis. would somebody please show me the evidence of spamming with this link in any wiki? -- seth (talk) 17:21, 11 July 2020 (UTC)
@Lustiger seth: redirect sites are also added. One case of abuse is enough, and for regular redirect sites they are even added preemptive. It may hamper archived links, but that is easily solved with a proper whitelisting using a lookahead.
In this case, one edit introduced several of these redirect sites which are all going to the same one target. The page where that was added was deleted as spam, the user who added it was globally locked. It is more than reasonable to blacklist the links used in that blatant abuse. So yes, there was spam. --Dirk Beetstra T C (en: U, T) 17:49, 11 July 2020 (UTC)
i'm not sure, whether i understood it correctly. so please confirm (or correct).
there was exactly one occurence of spamming on a single page (that has been deleted already) and as the new content of that website is obviously worthless, the domain was blacklisted globally? -- seth (talk) 18:03, 11 July 2020 (UTC)
i just made now a short search across the largest wikipedias for all blocked link additions since 2013.
zhwiki: 0
svwiki: 0
dewiki: 13
 20200520055842, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Dr Lol; Fritz_Seitz
 20200520055859, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Dr Lol; Fritz_Seitz
 20200520055955, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Hamburgum; Fritz_Seitz
 20200520060035, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Hamburgum; Fritz_Seitz
 20200520060100, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Dr Lol; Fritz_Seitz
 20200520063237, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Fritz_Seitz
 20200520063242, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Fritz_Seitz
 20200520063252, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Fritz_Seitz
 20200630133613, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Dr Lol; Wilhelm_Caroli
 20200630133731, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Hamburgum; Wilhelm_Caroli
 20200630140834, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Wilhelm_Caroli
 20200630140839, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Wilhelm_Caroli
 20200630140850, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Wilhelm_Caroli
cebwiki: 0
viwiki: 0
warwiki: 0
enwiki: 0
nlwiki: 0
ukwiki: 0
ruwiki: 0
ptwiki: 0
plwiki: 0
frwiki: 0
jawiki: 0
itwiki: 0
eswiki: 0
this verifies what i thougt and said already: there is no prevention of spam by this blacklist entry. but several users were blocked from updating old links. i think blacklisting in such cases is counterproductive because it annoys the good guys.
i whitelisted the complete domain rheingoenheim-info.de at dewiki now, because there is no evidence of spamming anywhere. -- seth (talk) 18:45, 11 July 2020 (UTC)
@Lustiger seth: the spam was an edit that was performed. Seeing what was done there, and seeing that both the page was deleted as spam, and that the editor was blocked for spam IS clear evidence of spam, and that was why it was rightfully globally blacklisted. If you want to take the effort of cleaning up any future spam on de then that is your right, but I would, again, suggest to only whitelist the, likely, very few archive links. I really do not see why you have to allow a link that was already spammed and is not what you are ever going to link to, but whatever. —Dirk Beetstra T C (en: U, T) 19:07, 11 July 2020 (UTC)
I see I missed an answer from you (my apologies): yes, but spammers generally do not stop at one attempt. That that happened here is not a reason to suggest that blacklisting such links should not be done at first observation. We are not here to play whack-a-mole. —Dirk Beetstra T C (en: U, T) 19:10, 11 July 2020 (UTC)
Or, what that basically suggests: you think that we should first remove a link that is totally and obviously crap 10 times, then wait until it reappears and clean up again before we should blacklist the crap? And then, because local admins override the blacklist completely, we have to monitor and cleanup that crap again? Please, just whitelist the exact links you need, even if you are right on this occassion. —Dirk Beetstra T C (en: U, T) 19:18, 11 July 2020 (UTC)
  • if one wants to use {{Internetquelle |url=https://example.org |titel=foo |abruf=2020-07-12|archiv-url=https://example.org/archive}} the result would be foo. Archiviert vom Original; abgerufen am 12. Juli 2020.
    this means: if just the archived url is whitelisted, the edit would stell be blocked. -> users get annoyed -> solution: don't block the domain.
  • in my opinion the SBL-procedure (in cases such as this) is inconsistent. sometimes we delete tons of old entries from the SBL for several reasons (less false positives, better performance, better overview, no visible benefit in blocking, ...) if nobody tried to add those links since they were blacklisted. in our case it's even worse: no spammer tried to add links to rheingoenheim-info.de since it was blacklisted, but 3 different normal users tried to fix links and their edits were blocked several times. -> users get annoyed -> solution: don't block the domain.
  • of course we shall prevent spamming in wikipedia, but we should also prevent false positives. in the case of rheingoenheim-info.de we have: 13 false positive blocks, and 0 correct positive blocks. -> users get annoyed -> solution: don't block the domain.
there was spamming in a single case (before the domain was blacklisted). the user got blocked, the article got deleted. that should be enough. maybe a local blacklisting could make sense, if there are no other links to the domain in the same wiki. but it is not reasonable to globally blacklist the domain in such cases. -- seth (talk) 23:18, 11 July 2020 (UTC)
  •   Comment@Lustiger seth: The only reason that it will have come to to my attention will have been due to spambots trying to add it. (Did intimate that in my previous answer) I saw it, tested the link, and got redirected. As it is not on the report it would seem that they have been caught by other aspects of a filter, though that usually means it is a matter of time.  — billinghurst sDrewth 00:55, 12 July 2020 (UTC)
    The expectation that I can future guess how often a hijacked domain name is going to be abused into the future by spambots is not reasonable. I have better things to do than whack-a-mole If it was a general user abusing, sure; but spambots? I don't find it reasonable that a hijacked and abused domain should be protected forever to maintain a dead link. There needs to be a middle road, not leaving ourselves open to abuse.

    I agree that it is the use of a big hammer, but the issue is the better spam management and stopping the bots coming in, so we don't have to use the blacklist so readily and regularly.

    Action: What I will look to do where I identify a hijacked domain, and it is still in use, then I will try to remember to put a note on a local mediawiki:spam-whitelist page for that community to work out how to handle — billinghurst sDrewth 01:11, 12 July 2020 (UTC)

(edit conflict)
... in this case. Generally you see other editors trying it on other pages. Spam continues. People have to clean it up -> users get annoyed -> blacklist it on first sight.
... in this case. Generally there are no genuine additions afterwards. Whitelist only the archived link, the original does not need to be linked -> readers of Wikipedia will click it (I do, I always first go to the original!) the spammers will still profit and the user will be annoyed (or worse if there is now something malicious) -> readers get annoyed -> blacklist it on first sight.
So, now you have enabled it for this domain. That Russian website gets the IP of hundreds of annoyed readers of the German Wikipedia. Can now happily install popups that these readers accidentally install (yeah, you can swap ‘ok’ and ‘cancel’ to get your succes). You can install phishing scrips. You now may already have those phishing scripts running on your computer since you tried today to follow the original to check the website. And even if this Russian website does not do something malicious, you are annoying hundreds of innocent readers every week/month/year who click first ‘the original’ and do not get what they expected. Instead of just two (unless a bot can be ‘annoyed’)
So no, blocking an editor and/or reverting/deleting the page is hardly ever enough, and local blacklisting is not a solution either. Global blacklisting is needed and all malicious links should be removed (or better, replaced with an archive link), making sure that no innocent readers get ‘harmed’ by following the original. Believe me, it is worth annoying 2 or 3 genuine editors over (and generally a handful of spam fighters). Please solve that situation properly. —Dirk Beetstra T C (en: U, T) 01:16, 12 July 2020 (UTC)
hi!
i whitelisted the domain at dewiki, so that i could easily fix all remaining occurrences in dewiki. if the domain hadn't been blacklisted, then probably other wikipedians would have fixed the links already moths ago (and would have reduced the probability that somebody clicked on the misleading links).
now i keep it whitelisted, because the empirical probability that a spammer will use the domain is about 0 (see above). it is more likely that somebody will add a completely new spam domain that is and was not blacklisted. as i said already, it's similar as with the zero-hit entries that we delete sometimes. it's inconsistent to keep entries such as this.
nevertheless, i can't fix the removal at ptwiki with the text https://web.archive.org/web/20130409232839/http://www.rheingoenheim-info.de/index.php/weg-durch-die-zeiten2/die-roemer Rufiniana], Die Römer, Antigo sítio de Rheingönheim (alemão), because I would have to ask for local whitelisting there. no, i won't. that's just too much and stupid work that's just not necessary. and i won't do that for other wikis, too.
"Please solve that situation properly": a proper solution would be the removal of the global blacklist entry for there is no single spamming hit (but 13 blocked useful hits) since it was added almost one year ago.
well, ok, i guess, we won't get a step forward this way just (repeating our positions).
let's try to be constructive.
you said, you don't want to play whack-a-mole. so in my opinion it would be good, if the bot generated reports would contain the information on deleted spam.
at the start of this thread we had:
fortunately now User:COIBot/XWiki/rheingoenheim-info.de and User:COIBot/LinkReports/rheingoenheim-info.de are better. but still it is not easy to see, how much of the listed edits should be considered spamming. the only possible spamming edit could be "2019-08-28 11:07:12: wikt:chr:User:CesarMcinnis". but it takes several minutes to see/guess this.
so could the output be further improved such that it is easier for everyone to see, what was spam and what was useful?
is it possible to manually add users to the bot's whitelist? user:Boshomi fixes a lot of links and is the opposite of a spammer. same with User:ⵓ and user:Dr Lol.
in cases where there are more useful edits than spamming edits, there should be more than just one spamming edit, before globally blacklisting the whole website. it would be nice, if there would be something such as an alert if two or more different spamming users added links to the same (spamming) website. this could be a help for everyone (less work for meta admins, less annoying blocked edits for users, less annoying discussions with me). what do you think? -- seth (talk) 22:03, 12 July 2020 (UTC)
@Lustiger seth: There is a phabricator ticket that proposes for a right to edit around the spam blacklist, you may wish to comment there, there is no existing right or ability. I would still argue that better ability to stop the spammers getting through the front door, rather than having all the defences for once they are inside.  — billinghurst sDrewth 23:31, 12 July 2020 (UTC)
@Lustiger seth: my point in repairing was that the links to the original material should be removed, and ONLY the archive links should be in the documents. People will follow original links (as I said, I want to see original documents, only if they are not available I will use the archive), I will agree that now empirical the spamming stopped, and that we could remove it globally, but as it is currently redirecting, I would still prohibit all new additions, bad faith ànd good faith to protect the readers. You do not know where you are sending people, people do not know where they are going when clicking the link in good faith. The whitelisting should hence not be \brheingoenheim-info\.de\b, it should be \barchive\.org\/.*?rheingoenheim-info\.de\b
Regarding making the reports more clear on what is spam and what not: that is a purely human evaluation. The reports of the bot are just reports for analysis. The bot cannot distinguish that.
I should work on making whitelisted users a list on-wiki (e.g. user:LiWa3/UserWhitelist), the only way at the moment is that they have 'given rights' on a wiki, or manually on IRC (and the latter functionality is not optimal in itself).
I have for long been fighting (and will again on next possibility) to have a complete, total overhaul of the spam blacklist. The functionality is completely wrong, it is too black-and-white, it is a complete sledgehammer approach. That was already recognised 13 years ago, but WMF is utter oblivious and prefers to enforce other. A proper implementation would allow for a global whitelist, where we could override the blacklisting for specific cases. Then you can also whitelist specific official domains which are globally blacklisted. Even better, you could whitelist for use on specific pages, specific wikiflavours (wikiversity and wikitravel have other requirements), you could set levels of blocking (only new editors, or only allow admins to circumvent), etc. etc. But WMF ... ignores proper requests. --Dirk Beetstra T C (en: U, T) 06:38, 13 July 2020 (UTC)
hi!
thanks for your answers and your patience (and thanks for continuously pinging me; if i shall do the same, please give me a note; i thought, it's not necessary, because you are on this page anyway several times a day.). sorry for beginning with some beating about the bush now, but i hope this will make things more clear:
here, i distinguish between several types of editors: A) users, some of who might not even know what the SBL is (e.g. newbies or technical not so interested users), B) users with enough technical skills to understand the sbl sufficiently.
if somebody of group A is confronted with a SBL block, they might understand the message and adopt their edit. but many members of group A might be confused so much, that they imediately leave the page (and the wikipedia) or they try different variations of their edit before they give up. we can see this via the sbl log and even better at the edit filter log. it's alarming that sometimes very good large edits were blocked just because the user could not cope with a filter or the sbl.
if a user of group A tries to fix a link, using the above mentioned template:Internetquelle and get's (again) a warning, this user might get overtaxed. the link change would be an improvement (although the original link to the new content is still reachable with this template), but could not be saved.
users who fix/unlink disallowed links are normally members of group B. for them it does not make a difference, whether a disallowed url is a normal link of part of template:Internetquelle (or template:webarchive or template:webcite ...). (almost) all disallowed links can be found via linksearch.
@billinghurst: the right to work around the blacklist would probably given to members of group B. but that would not solve the problems of group A.
@Dirk Beetstra: yes, template:webarchive is better than template:Internetquelle in this case, because the original url is not shown in template:webarchive. but for a user of group A this might be too complicated. however, a group B guy who fixes disallowed links could replace the template afterwards.
\barchive\.org\/.*?rheingoenheim-info\.de\b: this would prevent other archive websites such as archive.today. so it might better be archive\.[a-z]+\/.*rheingoenheim-info\.de. this would prevent other archive websites such as webcitation.org. so it might better be (?:webcitation\.org|archive\.[a-z]+)\/.*rheingoenheim-info\.de. ...
apart from being a bit complicated, this solution still does not cope with problems of group A.
what is spam and what not: of course, humans have to decide. but it should be comfortable for them. if a link is added mostly by whitelisted guys, this is a strong indicator for not being all-time spam. thus such cases (as we have one here) need a special treatment.
user:LiWa3/UserWhitelist: yes, that would be great!
too black-and-white: i agree. in dewiki, we switch to the edit filter with warnings in some cases (although that tool has an overview problem). and in some cases we don't even block an url at all, but let a bot ask the link adder to remove the link. and if they don't remove the link, the article and link are written to a maintenance list, where experienced users cope with the entries. reason: wikipedia has a problem that many newbies leave the wikipedia, because it's to technical/complicated for them. we want to avoid that. -- seth (talk) 09:01, 13 July 2020 (UTC)
@Lustiger seth: Thanks. No need to ping me, unless you want a quick answer.
I understand that for the group A the problem is often too complex. That is often the case, you see that especially with youtu.be vs. youtube.com and the google.com/cse, -amp and -url where people don't understand why we blacklist the link and what to do to avoid it. It, unavoidable, blocks a lot of good-faith material.
Edit-filters for spam fighting is often too intensive on the server, though it gives much more flexibility. It can be done, but you in the end either run into a massive number of regexes in one rule, or a massive number of rules (with the latter likely more preferred, so you can also tailor material to the warning). In en.wikipedia we have XLinkBot as a soft approach to blacklisting.
I have suggested a form of edit filter, but not with the code but having there a plain regex (similar to the blacklist) that is tested against the added external links. See User:Beetstra/Overhaul_spam-blacklist. That would have give flexibility in what to do when a link gets added (hard block, warning and block, just warning), which namespaces (allowing petitions on talkpages is not an issue, we just keep them out of mainspace), or even (for wiki-specific implementation) which pages to allow the link on (no pornhub.com anywhere except on en:Pornhub), allowing admins to just add links (except for e.g. copyright violation stuff), etc. But well .. we wait and wait.
Porting the whitelists and similar to on-wiki is on my list: User:COIBot/Wishlist. Unfortunately no time. --Dirk Beetstra T C (en: U, T) 12:44, 13 July 2020 (UTC)

DiscussionEdit

  This section is for discussion of Spam blacklist issues among other users.
Return to "Spam blacklist" page.