Talk:Spam blacklist/Archives/2013-10

Add topic
Active discussions

Proposed additions

  This section is for completed requests that a website be blacklisted

Spambot urls 20131005





billinghurst sDrewth 04:04, 5 October 2013 (UTC)
  Addedbillinghurst sDrewth 06:05, 5 October 2013 (UTC)

Spambot urls 20131006













billinghurst sDrewth 14:17, 6 October 2013 (UTC)

  Addedbillinghurst sDrewth 14:18, 6 October 2013 (UTC)

achimthepooh.de



  Added -- seth (talk) 21:33, 14 October 2013 (UTC)

Spambot link



Cross-wiki spam added by several accounts. --Glaisher [talk] 08:05, 14 October 2013 (UTC)

  Addedbillinghurst sDrewth 15:02, 26 October 2013 (UTC)


Proposed removals

  This section is for archiving proposals that a website be unlisted.

Bet-at-home.com



Bet-at-home.com is the official website of bet-at-home.com, which has articles on multiple Wikipedias. Armbrust (talk) 18:23, 2 October 2013 (UTC)

They were blocked due to their having spammed the wikis with this url and variations. They are paying the consequence for spamming — billinghurst sDrewth 03:21, 5 October 2013 (UTC)
I only made this request, because the local white-listing for the article was closed as "Defer to Meta blacklist". Armbrust (talk) 10:40, 5 October 2013 (UTC)
That is a bit of a conundrum. This (global) blacklist at Meta, has no whitelist functionality, and little of the nuance that can be applied to local whitelist and blacklists, eg. one can whitelist a base url, though blacklist any extension, and vice versa. It is utilised in response to global issues of abuse and misuse of urls, and the local wikis are encouraged to make allowance for urls that need to be used locally. When sufficient whitelisting is undertaken, and no abuse has occurred, then we take the advice from local wikis that it is safe to unblock a url. — billinghurst sDrewth 01:31, 6 October 2013 (UTC)

It is whitelisted at some wikis .. I would suggest that it gets unlisted, the site is notable enough for articles on some wikis which have stayed now for quite some time (after initial deletions and deletion suggestions). We're not here to punish the spammers, but to stop the abuse, and I presume that the owners understand that that should not continue/start again (reblacklisting is always an option, which will then be more difficult to remove). Request is done by someone without a COI, so I think that all gives enough merit. --Dirk Beetstra T C (en: U, T) 08:49, 7 October 2013 (UTC)

  Removedbillinghurst sDrewth 09:30, 7 October 2013 (UTC)

hhvm.com



hhvm.com was previously owned by a domain squatter but it now owned by Facebook for promotion of HipHop for PHP

It is a blog, so I am unsure why it would be needed. I would suggest that you try to progress this through a local whitelist request at the wiki of your interest. — billinghurst sDrewth 10:30, 28 September 2013 (UTC)
  Closed no further comment made — billinghurst sDrewth 13:02, 19 October 2013 (UTC)

xist.org



This site is on the global blacklist for unknown reasons. It was added according to a COIBot report, which says "This site appears to be a redirect site."

xist.org redirects to geohive.com, which provides a LOT of data and references to many articles throughout Wikipedia. It is not a malicious website. Now, even high-traffic articles like World population have a "blacklisted link" template above them. If xist.org isn't removed, which it should be, then perhaps an automated relinking of the hundreds of thousands of links to xist.org to geohive.com?

Ithinkicahn (talk) 20:55, 28 September 2013 (UTC)

Well, indeed. xist.org should then be renamed to geohive.com .. such a thing has been done before with a bot on en.wikipedia, I guess that is the way forward. --Dirk Beetstra T C (en: U, T) 09:28, 29 September 2013 (UTC)
Well, I have no idea how to do that. Any thoughts? Ithinkicahn (talk) 11:11, 29 September 2013 (UTC)
There are people running such bots, probably through a task-for-bot request via en:WP:BOTS? --Dirk Beetstra T C (en: U, T) 13:44, 29 September 2013 (UTC)
en:Wikipedia:Bot_requests <- that's it! --Dirk Beetstra T C (en: U, T) 13:45, 29 September 2013 (UTC)
  Closed direction given for a resolution of the issue — billinghurst sDrewth 13:03, 19 October 2013 (UTC)

hockeyfights.com



Is it appropriate to have URLs on this blacklist for reasons unrelated to spam?

This site was added solely due to concerns that it contains embedded YouTube videos that may be copyright violations (although that presumption seems to be incorrect because of a licensing agreement between YouTube and the NHL. See Talk:Spam_blacklist/Archives/2010-10#hockeyfights.com).--SaskatchewanSenator (talk) 06:49, 29 September 2013 (UTC)

Yes, that can be appropriate. However, it appears that the licensing agreements have changed since it was added .. which may have changed the situation (at the time of blacklisting several editors claimed that at that time it were violations).
Still not sure whether these should be linked, you can link directly to the original site anyway, but that is a different question. Maybe now a removal should be considered? --Dirk Beetstra T C (en: U, T) 09:33, 29 September 2013 (UTC)
So this blacklist is also for copyright violations?--SaskatchewanSenator (talk) 10:19, 29 September 2013 (UTC)
That is not what I said, and not what you originally asked either. --Dirk Beetstra T C (en: U, T) 11:11, 29 September 2013 (UTC)
Let me expand: if it is found that a site is hosting a relatively high number of copyright violations, and people are, knowingly and unknowingly, still linking to them, and the use results in a significant amount of work to remove the links, and they are not necessary anyway (link to the original is a proper and suitable alternative, if it is notable enough there are also other (reliable) sources, and sources do not need to be linked anyway but (the original) source can be described) then blacklisting is certainly a consideration that one could make. Note that linking to copyright violations is just a complete no-no (not just discouraged, it should simply never be done; editors have been banned/blocked for posting copyright violations). I don't remember the details anymore of the time I added the link, but I will assume the good faith of the people that reported this problem and that that issue was grave enough. I will also assume that that issue has now been resolved (but I'd like to hear some more info from other editors about that), and suggest that maybe this should now be delisted (especially since the spam aspect was negligible). --Dirk Beetstra T C (en: U, T) 11:26, 29 September 2013 (UTC)
  Closed no further communications 14:19, 19 October 2013 (UTC)

chanelreplica.com



  • Previously when this was caught by the Cyberpower67 bot, I argued for this to have an exception as follows:
"I closely examined this site recently flagged as on the blacklist for en:Chanel and it seems to be legit - ChanelReplica.com is used by the Chanel company to give advice and warning about counterfeit goods, and the links from that page go to examples of sites where Chanel's lawsuits successfully shut down their operations and the Chanel company was granted ownership of those domains (such as [1]). Although it DOES look at first glance like mega-spam, it's not at all." It's an appropriate site for the Chanel page and appropriately used one time to support a statement about counterfeit goods on the page Mabalu (talk) 16:12, 8 October 2013 (UTC)
Moving this over here as per advice from Amalthea over on en:MediaWiki talk:Spam-whitelist who explained that it had been lumped in with other sites for xxxreplica, but a Whois check showed it was actually a legit site owned by Chanel for the purpose of informing about replicas. Mabalu (talk) 16:12, 8 October 2013 (UTC)
Original blacklist request, whois seems to confirm it's registered by Chanel Inc, from what I can tell it has never been abused and was blacklisted by mistake. Amalthea (talk) 21:54, 8 October 2013 (UTC)
It was lumped in with other sites which were spam, probably an oversight.   Removed. --Dirk Beetstra T C (en: U, T) 19:13, 9 October 2013 (UTC)

www.cosmoetica.com



Cosmoetica is the poetry and criticism website of poet Dan Schneider. While the website's design isn't much to write home about, it's content has been reviewed and praised by such media outlets as the New York Times (see link), assorted literary journals (see link), and even well-known film critic Roger Ebert (see link). However, Schneider is also controversial and about five years ago someone placed a number of links to his website on Wikipedia. I don't know if Schneider did this or if one of his fans or haters did it.

At the time I questioned the site being blacklisted but didn't raise an issue because it didn't seem like that big a deal. However, since then the blacklist has caused problems with Schneider's own Wikipedia entry being tagged with the blacklist tag. I've also been contacted by other editors who become irritated when they can't link to one of Schneider's essays on poets and films.

As additional proof that this site is not a spam site, I refer editors to the recent New York Times obituary of poet James Emanuel, which linked directly to an interview on Cosmoetica with the poet. If the New York Times can link to Cosmoetica it seems silly that Wikipedia can't.

I request this blacklist be lifted. If the bulk insertion of links reoccurs, we can always re-blacklist the site. But I'd be shocked if this happens again. And just FYI, I'm a long-term English Wikipedia editor and admin--over eight years experience--who specializes in literary articles. And no, I'm not Dan Schneider and I don't work for this website. Thanks. --SouthernNights (talk) 01:59, 15 October 2013 (UTC)

You can locally whitelist it with ease, so no actions here are needed. --Vituzzu (talk) 13:16, 15 October 2013 (UTC)
I understand I can do that, but that then raises the question of why the site's even on the global blacklist. The issues we had five years ago were only on the English Wikipedia; there was never any issue with this site on any other Wikipedia language site. Instead of only removing this from the English Wikipedia blacklist we should take the cleaner path and remove it from the global list. This is a site which is not a spammer and does not show up on any spam list put out by any of the world's anti-spam organizations. In my opinion the global blacklist of this site five years ago was an inappropriate response to a local Wikipedia issue and we should now correct that overresponse. Thanks. --SouthernNights (talk) 15:10, 15 October 2013 (UTC)
Actually the original request tells a different story. --Vituzzu (talk) 16:56, 15 October 2013 (UTC)
Thanks for finding that--I wasn't involved in original blacklisting and couldn't find that entry. Still, is there a reason for the global blacklist to remain in force? Five years have passed and the sock puppetry the link insertion was related to is no longer an issue. --SouthernNights (talk) 20:09, 15 October 2013 (UTC)
Since there was such a strong spamming I'd be more cautious, so I'll wait to see some legitimate use on different major wikis before removal. --Vituzzu (talk) 23:12, 15 October 2013 (UTC)

Makes sense to me. I'll go the local whitelist route. I've never requested to lift a blacklist before so I was merely following instructions, which said if a site was on the global blacklist to go here. I wouldn't have bothered with all this except the blacklist began putting those tags on a few articles I edit and monitor.

BTW, correct me if I'm wrong by as an admin I can simply add this site to the local English Wikipedia whitelist. Correct? Is there an issue doing this? As I've said, I've never done this before. --SouthernNights (talk) 12:47, 16 October 2013 (UTC)

No issues at all, it's one of your tools and you can use it at any time ;)
I'd suggest to closely monitor link usage before whitelisting. You can do it by adding the same regexp you can see in global bl (\bcosmoetica\.com\b) to w:Mediawiki:Spam-whitelist, otherwise you can whitelist single links by adding them to the same page (pre-pending \ to each .?=;[]{} and by enclosing the whole entry with \b). --Vituzzu (talk) 12:15, 17 October 2013 (UTC)
  Closed being managed at a site level — billinghurst sDrewth 14:25, 19 October 2013 (UTC)

goo.gl



I think we should allow url shorteners. As shorter urls look more neat that endless url text on the discussion pages --105.236.37.38 15:11, 14 October 2013 (UTC).

  Not done allowing shortners will make blacklisting itself useless. --Vituzzu (talk) 13:17, 15 October 2013 (UTC)

Troubleshooting and problems

  This section is for archiving Troubleshooting and problems.

Discussion

  This section is for archiving Discussions.

Now in MediaWiki default

As previously announced here, mw:MediaWiki 1.21, released yesterday, includes in the default installer the spam blacklist extension, which defaults to loading this blacklist. --Nemo 08:22, 26 May 2013 (UTC)

Unusable EBSCOHOST links



Should we block them here? It was suggested to continue the discussion here that started here *now here.--Elvey (talk) 07:30, 21 June 2013 (UTC)

Here's a specific suggestion:

ebscohost\.com(\.|.*(pdfviewer|EbscoContent))     #Block 3 kinds of unusable EBSCOHOST links but allow permalinks: Match proxies: there's a literal "." after "com", and temporary session links, which contain pdfviewer or EbscoContent

(This is a consolidation of these two simpler regexes:

ebscohost\.com.*pdfviewer          #Block unusable [[wp:EBSCOHOST]] links but allow permalinks
ebscohost\.com\.                   #Match proxies, which is where it's not the end of the hostname - there's a literal "." after "com".

)

A third kind of link that doesn't work: http://content.ebscohost.com/pdf23_24/pdf/2004/GLK/01Aug04/44734958.pdf?T=P&P=AN&K=44734958&S=R&D=jss&EbscoContent=dGJyMMTo50Seqa84v%2BvlOLCmr0qep7BSrqa4SLWWxWXS&ContentCustomer=dGJyMPGot02wrbVKuePfgeyx44Dt6fIA I guess we can match EbscoContent. Some more:
GOOD: http://connection.ebscohost.com/c/articles/957192/preparing-fedco-next-50-years works nicely. These 'articles' links are the best.
BAD: http://connection.ebscohost.com/library-search?s=1&an=9403090626 (same article as above; different kind of link)
GOOD? http://ehis.ebscohost.com/ehost/detail?sid=447666d5-c86b-4e3b-8b7f-ef15f147c2d5%40sessionmgr115&vid=1&hid=109&bdata=JnNpdGU9ZWhvc3QtbGl2ZQ%3d%3d#db=a9h&AN=5020846 (note: "sessionmgr"!)

--Elvey (talk) 22:19, 26 June 2013 (UTC)

{{declined}} at this point of time. There are between 3k and 4k links to the url, and it isn't blocked at any sites. How and why would we override the sites on this matter?
(above decline is by Billinghurst. -Elvey)

Why are you declining this? Why would we want to override the sites on this? Who says we do? What are you talking about? URLs have to be on https://someother.wikipedia.org/wiki/MediaWiki_talk:Spam-blacklist first? Says who/why? No response, to above so I nowiki'd the decline for now. --Elvey (talk) 20:16, 12 August 2013 (UTC)

{{declined}}Not one of the wikis has a rule in place to impose restrictions on EBSCOHOST, so it would seem inappropriate for meta to impose a rule from meta's global spam blacklist when the sites have not bothered. Further there are over 3000 links already in use, and we would be blocking every article that has such a link, and that is not an acceptable approach. To put this into place, I would want to see the links cleaned up first, paired with education, and then we can block the url. — billinghurst sDrewth 12:33, 15 August 2013 (UTC)
AGAIN, URLs have to be on https://someother.wikipedia.org/wiki/MediaWiki_talk:Spam-blacklist first? Says who/why? Why shouldn't meta go first, when the URLs are used all over? We wouldn't be 'blocking articles'; as I originally noted ( here), we already have examiner.com in here, with many links already in use, and they're at least not dead links. Why is EBSCOHOST any less eligible? --Elvey (talk) 20:55, 20 August 2013 (UTC)
  Declined It is not the job of meta to second guess the wikis. The wikis are using the link, so it would be horribly rude for us to blacklist a link that they are using. I cannot explain what was done for examiner.com, however, the practice is not to blacklist broadly existing urls. Deal with it at the local wiki levels. — billinghurst sDrewth 12:29, 26 August 2013 (UTC)
Can you think of any links that were "cleaned up first, paired with education, and then [blocked]?" It would be helpful to have examples to look at that demonstrate how to do so. (You don't have any reasons to think these dead-on-arrival links are valid references, right? I believe there are none.) --Elvey (talk) 21:46, 9 September 2013 (UTC)

examiner.com, EBSCOHOST and a similarly functioning parallel system with its own error messages!

ALSO, I noted (there), " (Arguably it would be better to have a similarly functioning parallel system with its own error messages handle sites like examiner.com and this ebsco problem, but in the meantime, I let's move toward (cautiously!) putting in regexes to handle them.) " - can you please comment on that? Let's have more discussion, eh? --Elvey (talk) 21:39, 9 September 2013 (UTC)

Question













Why the following site has been blocked?

They are in 4 it:voy articles and apparently are regular site. If there no particular reason for the block I'd like to re-add them where they were.

Let me know, --Andyrom75 (talk) 21:53, 24 June 2013 (UTC)

Just quickly, the following rules exist:
  • 33southbackpackers.com is caught by blacklists: [w:en] \b33southbackpackers\.com\b, [w:ja] .*t[^\/]?back.*\.(?:\w{2,4})(?:\/.*)?$, [w:gl] \b33southbackpackers\.com\b, [w:ne] \b33southbackpackers\.com\b, [w:sq] \b33southbackpackers\.com\b, [w:ta] \b33southbackpackers\.com\b
    • I don't think this is globally blocked, but a lot of local rules on a handful of wikis. They may have been pushing their envelope ..
  • grandjersey.com is caught by blacklists: [global] (fancy|open|reebok|wholesale|two|whole|china|ebuy|you|and|b2b|20|affordable|shopping|fans|wonderful|c2c|footballworldcup|superbowl)-?jerseys?\.(com|us|org|net)\b
    • Not sure, this may be a 'collateral damage' on a rule, need to look into this one further
  • lapiazzetta.lecce.it is caught by blacklists: [global] lapiazzetta\.lecce\.it
    • Globally blocked, specifically this link - I would guess that someone spammed this cross-wiki
  • backpackers-refuge.biz.ly is caught by blacklists: [global] \bbiz\.ly\b
    • Needs further looking into. If I recall correctly, biz.ly can be abused as a redirect service, which are just blanket blacklisted. Would suggest local whitelisting.
Just as a general note, much stuff which is inappropriate on Wikipedia got in the past blacklisted quickly, especially if it was spammed. There are now sites which are blacklisted globally which are still available on wikivoyage. There is possibly a 'conflict' there - what is useless and spam(my) on Wikipedia may be quite appropriate on WikiVoyage (an external link to a specific hotel in Paris is promotional on w:en:Paris, and if that keeps getting pushed (spammed), it will end up on en.wikipedia blacklist. If that spamming goes cross-wiki (:w:fr, :w:de, :w:nl, :w:es, ..), it will get blacklisted globally - it is simply spam). If something gets spammed however to multiple Wikipedia's, those sites do run the risk that they end up on a global blacklist, as it is the only means of stopping the abuse. The local wikis that then really do need the link should consider whitelisting them.
Can't find much more on most of them. Some items seem to be quite old. However, my suggestion would be to consider local whitelisting if they are of interest on the local wikis. --Dirk Beetstra T C (en: U, T) 09:19, 25 June 2013 (UTC)
Dirk, thanks a lot for your analysis. I've post the quesiton here because I prefer to clarify the situation of each site, before adding them to the whitelist. In the past, the official site of Jersey (jersey.com) was wrongly impacted by a too wide rule, and maybe it's the same case of grandjersey.com (although it's an hotel). Once has been decided what will be removed from the global BL and what remain, I'll update the local WL accordingly. --Andyrom75 (talk) 10:23, 25 June 2013 (UTC)
Any news? --Andyrom75 (talk) 19:19, 3 July 2013 (UTC)
I have modified the ...jersey.com regex as that is collateral damage, otherwise why would there be any news? They all seem to be specifically added, and you can see that by following the links. Beetstra suggested that you whitelist them locally, and it seems the route to take. — billinghurst sDrewth 10:56, 6 July 2013 (UTC)
BIZ.LY is a redirect and has zero hope of being removed globally, and I would strongly encourage you to not whitelist that domain, as you will open yourself up to all types of spam.
  •   Closed no further action required as there has been no follow-up. — billinghurst sDrewth 10:06, 30 July 2013 (UTC)

Curious edits

Why or what is causing these edits to the logs:

?? --Dirk Beetstra T C (en: U, T) 09:23, 12 August 2013 (UTC)

Perhaps cascading protection via http://meta.wikimedia.org/wiki/Spam_blacklist/Log is in order? (Just a WAG!) --Elvey (talk) 20:21, 12 August 2013 (UTC)
But they do not remove data, it almost looks like a bot removing extra spaces and newlines. I at first thought it was a spammer trying, wrongly, to remove their links from the blacklist ..
Protection is not a bad idea, there is hardly any need for non-admins to edit these pages. --Dirk Beetstra T C (en: U, T) 04:55, 13 August 2013 (UTC)
  Donebillinghurst sDrewth 11:55, 15 August 2013 (UTC)

Wikivoyage whitelist?

Please see Wikivoyage/Lounge#Global_spam_blacklist. PiRSquared17 (talk) 15:13, 26 September 2013 (UTC)

Only matches host?

@Billinghurst: If this blacklist really only matches the host (i.e., domain or IP address) of an URL, then please undo this edit. PiRSquared17 (talk) 00:57, 1 October 2013 (UTC)

Never mind, seems to work. The documentation is either misleading or wrong. PiRSquared17 (talk) 01:00, 1 October 2013 (UTC)
Otherwise I wouldn't had done it ;)
--Vituzzu (talk) 12:44, 15 October 2013 (UTC)

Whitelist question

Is there any way to simply block everything ending in .cn then whitelist as necessary? I've had problems on my own site with hundreds or even thousands of .cn websites being spammed there. I've found that very rarely do .cn websites being posted to an English language site have anything useful, and with the possible exception of "reputable" sites like Baidu or Google or a small number of others such as newspapers, most .cn sites tend to be nothing but spam sites.

Mostly I'm just curious if there's a way to do this, I might want to use it on my own site, and that I'm not proposing such a drastic ban for the Chinese Wikipedia. Paul Robinson (Rfc1394) (talk) 06:12, 26 September 2013 (UTC)

This isn't the place for the technical question, it would be more appropriate at Mediawiki in one of their forums. I wouldn't think that you would be using global blacklist which does not have a whitelist funcitonality. If it is just one wiki you can utilise MediaWiki:Spam-blacklist and Mediawiki:Spam-whitelist which would have more capacity for such. — billinghurst sDrewth 04:07, 5 October 2013 (UTC)
Return to "Spam blacklist/Archives/2013-10" page.