User talk:Beetstra/Archives 2020

Active discussions
Archive This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

I found a hole in the blacklist

Normally 4Chan's /b/ imageboard would be blocked by the blacklist (\bbags4chanel\.com\b and www\.4chan\.org/b/) but you can bypass it by using boards .4channel.org/b/ instead which will redirect you to the imageboard in question. I suppose it's up to you whether or not you wanna update the blacklist. --Trade (talk) 23:00, 12 February 2020 (UTC)

@Trade: basically a redirect to a blacklisted site: nobrainer, blacklist. —Dirk Beetstra T C (en: U, T) 03:59, 13 February 2020 (UTC)

Unmatched in regex

In a XWiki report today I am seeing special:permalink/19893755

Broken regex \bsci-hub\.[a-z]+(?!\/ (perl-corrected \bsci-hub\.[a-z]+(?!\/) on meta's global blacklist, error: Unmatched ( in regex; marked by <-- HERE in m/\bsci-hub\.[a-z]+( <-- HERE ?!\// at LinkSaver.pl line 5425.

which is saying that the regex is broken, which would equate with your stated lack of it acting. I have removed the escape character from that regex in spam blacklist guessing that it is the issue as they are naturally escaped by the coding, and it is just confusing things. I am regenerating that report to see if the error disappears. <shrug>  — billinghurst sDrewth 14:13, 14 March 2020 (UTC)

<facepalm> It is the crosshatch being seen as a comment. Can we escape that so it is not seen as a comment? Or can we unicode the character? Not my area of strength.  — billinghurst sDrewth 00:08, 15 March 2020 (UTC)
@Billinghurst: And that is also why my perl script 'fails' (the # and anything after the # is cut off, breaking the regex - heh, that is why I have that coded, so we notice as the spam-blacklist will ignore broken regexes without comment). We'll have to monitor your solution, it allows for top domain only, which can be abused (though sci-hub is not blacklisted for spamming per sé, so maybe of a lower risk. Still pushers who demand that we need to use sci-hub 'because it is freely accessible' will be there). --Dirk Beetstra T C (en: U, T) 04:51, 15 March 2020 (UTC)
Yep, it is still slightly problematic, and only done as we cannot easily do anything else with no ability to escape the hash. We can watch the one and see how it goes. I only did as it is their "about" page. <shrug>  — billinghurst sDrewth 10:34, 15 March 2020 (UTC)
@Billinghurst: If we whitelist the top domain for sci-hub, one edit on en.wikipedia and I 1) piss off hundreds or thousands of editors in en.wikipedia, 2) link to 10s or 100s of thousands of copyright violations, 3) I earn a very hard to remove indef block .. but a POV warrior would not care about any of those. Grmpf, we NEED that global whitelist .. now. --Dirk Beetstra T C (en: U, T) 07:01, 16 March 2020 (UTC)
I am not particularly advocating that the line is the right way to go, I am more stating that this is the only way that we can get to the #about uri component accurately for the root page without tricks. I have no skin in the game and will go with whatever is considered best collectively.  — billinghurst sDrewth 10:52, 16 March 2020 (UTC)
@Billinghurst: The point is, this is the first one we try, and exclusion does not work. It was a good idea, but the blacklist-extension does not allow for exclusion, and we do not have a global whitelist ... --Dirk Beetstra T C (en: U, T) 11:57, 16 March 2020 (UTC)
You added it, if it fails, I am happy for it to be removed. As I said, I don't have skin in the game beyond trying to help on this one.  — billinghurst sDrewth 12:00, 16 March 2020 (UTC)

Poke request

Hi, could you allow me to poke COIBot? I occassionally investigate COI and spam and I've found the bot quite useful. Thanks. Nardog (talk) 13:56, 20 May 2020 (UTC)

@Nardog:   Done. It will take some time before it takes effect, the core bot is not re-reading the settings too often. --Dirk Beetstra T C (en: U, T) 14:04, 20 May 2020 (UTC)
Thanks! Nardog (talk) 14:14, 20 May 2020 (UTC)

Issue with linksummary template

Hi. At User:COIBot/XWiki/ceoblognation.com there seems to be a problem with {{LinkSummary|*.ceoblognation.com}}. Just wanted to let you know if you have some time to look into it. Thanks, --DannyS712 (talk) 03:51, 3 July 2020 (UTC)

@DannyS712: I’ll have a look, there are some things like that. —Dirk Beetstra T C (en: U, T) 05:29, 3 July 2020 (UTC)

Report wont generate



report xwiki fucklooks.blogspot.com on irc repeatedly results in the bot saying that it saved a report, but no report was actually saved (wanted to see if there were any uses other than w:User:Fu*k looks before blacklisting). Any ideas? --DannyS712 (talk) 16:07, 23 July 2020 (UTC)

@DannyS712: Probably caused by .*fuck.* being in the title blacklist?, but it's <newaccountonly>. Given that the Special:Log/titleblacklist is disabled for Wikimedia wikis I can't fully confirm. Special:Log/spamblacklist/COIBot & AbuseFilter seems not the cause either. —MarcoAurelio (talk) 16:18, 23 July 2020 (UTC)
Its probably because .*\bfuck.* is only the local title blacklist (went to create the page in case the bot could edit but not create, and saw the warning). Added to the local title whitelist at MediaWiki:Titlewhitelist and the warning went away. --DannyS712 (talk) 16:27, 23 July 2020 (UTC)
@DannyS712 and MarcoAurelio: I don't see any reason in the log of COIBot either. It now suddenly created ... --Dirk Beetstra T C (en: U, T) 16:34, 23 July 2020 (UTC)
I added it to the whitelist, so now it could go through :) DannyS712 (talk) 16:34, 23 July 2020 (UTC)
@DannyS712: but there are no hits in the title blacklist log of COIBot? --Dirk Beetstra T C (en: U, T) 16:36, 23 July 2020 (UTC)
Apparently the title blacklist log is disabled on wikimedia wikis (per @MarcoAurelio above) DannyS712 (talk) 16:36, 23 July 2020 (UTC)
See task T68450 and task T155967. Enabling the TBL would, in some cases, potentially infringe upon Wikimedia's Privacy Policy. —MarcoAurelio (talk) 16:40, 23 July 2020 (UTC)

Bot is down

COIBot appears to be down - last edit was 7 hours ago, claims to have saved xwiki reports but no edits were made --DannyS712 (talk) 19:05, 22 June 2020 (UTC)

@DannyS712: that’s the second time. I’ll check in a bit. —Dirk Beetstra T C (en: U, T) 03:44, 23 June 2020 (UTC)
COIBot seems to be running, responding and logs are recent, though queues are zero. I am not seeing reports written, so wonder whether it is logged out. I am trying to remember how to check. I can just restart everything if you think that would help.  — billinghurst sDrewth 04:14, 23 June 2020 (UTC)
@DannyS712 and Billinghurst: I have just logged the bot back in. That is a strange thing, I did the same yesterday. There is NO reason for the bot to log out, except if one logs in on another machine (which makes my version of PerlWikipedia apparently loose the cookie on the machine). I'll need to check. --Dirk Beetstra T C (en: U, T) 04:56, 23 June 2020 (UTC)
@Beetstra it happened again - bot responds on irc, but last edit onwiki was 02:46, 26 June (its now 04:56, 27 June) DannyS712 (talk) 04:55, 27 June 2020 (UTC)
@DannyS712: and logged in again. Strange, this does normally not happen. --Dirk Beetstra T C (en: U, T) 06:33, 27 June 2020 (UTC)
@Beetstra is there a better way to be letting you know / rebooting it? Since the bot is still active on irc, maybe add a command (probably limiting access to a few users to be on the safe side) to tell it to try and log in again? DannyS712 (talk) 06:37, 27 June 2020 (UTC)
It is a console command, so not really. Just ping me on-wiki. Note, before this issue I did not log in for something like 6 months, and XLinkBot has no issue whatsoever over a longer time. Curious. —Dirk Beetstra T C (en: U, T) 07:17, 27 June 2020 (UTC)
(ec) Beyond guessing that it is not reporting, I am able to know that it is logged out. I can reboot it, though never certain what we may lose, so hesitate on that regard. As I see it, if it is still analysing, and not able to write a report, then it is less of a concern. We never catch all the rubbish, and the worse rubbishers will be back.  — billinghurst sDrewth 07:18, 27 June 2020 (UTC)
Dirk, how do you push the login only process? I have never worked that bit out.  — billinghurst sDrewth 07:19, 27 June 2020 (UTC)
@SDrewth: running coibot.pl with the password, which is <redacted>  :-). Rebooting does not help if COIBot’s login cookies got eaten. —Dirk Beetstra T C (en: U, T) 07:43, 27 June 2020 (UTC)
Appears to be down again, bot responds on irc but no edits on-wiki for a while --DannyS712 (talk) 22:23, 9 July 2020 (UTC)
@DannyS712: It appears we had a major logout-event, I got logged out last night as well. I have refreshed the login. --Dirk Beetstra T C (en: U, T) 06:46, 10 July 2020 (UTC)
Yeah, everyone's sessions were invalidated. Thanks DannyS712 (talk) 07:27, 10 July 2020 (UTC)
Bot hasn't edited in 40 minutes, despite multiple xwiki reports being queued via irc DannyS712 (talk) 04:35, 11 July 2020 (UTC)
Bot is logged in.  — billinghurst sDrewth 05:07, 11 July 2020 (UTC)
Think the bot was logged out again - no edits for over an hour, said it generated a report via IRC (checked that it wasn't the title blacklist this time) --DannyS712 (talk) 05:31, 26 July 2020 (UTC)
@DannyS712: It was, logged back in. (in which channel are you lurking?). --Dirk Beetstra T C (en: U, T) 06:18, 26 July 2020 (UTC)
wikimedia-external-links DannyS712 (talk) 06:18, 26 July 2020 (UTC)

Edit conflicts

COIBot should avoid them, rather than overwriting - latest one occurred at Special:Diff/20258233. Thanks, --DannyS712 (talk) 05:21, 8 July 2020 (UTC)

@DannyS712: I agree, but this is extremely rare I think. I will see if I can do a check as last thing before it saves. --Dirk Beetstra T C (en: U, T) 05:47, 8 July 2020 (UTC)
Happened again - Special:Diff/20316862 DannyS712 (talk) 08:33, 28 July 2020 (UTC)
And Special:Diff/20329247 DannyS712 (talk) 08:36, 1 August 2020 (UTC)

Bot thinks its June

At least in #wikimedia-external-links, where its alerts about edits from a month ago --DannyS712 (talk) 15:10, 27 July 2020 (UTC)

Backlog? Strange. —Dirk Beetstra T C (en: U, T) 16:58, 27 July 2020 (UTC)

Another bot bug

At User:COIBot/XWiki/talesbymales.com the links to commons should be to files, not to mainspace. Any idea why? --DannyS712 (talk) 11:53, 14 August 2020 (UTC)

@DannyS712: not the bot, its the template. Parameter namespace is set to File. I’ll have a look one of these days. —Dirk Beetstra T C (en: U, T) 13:19, 14 August 2020 (UTC)
I think I fixed it, with Special:Diff/20364878, but please revert if it messed something else up DannyS712 (talk) 22:06, 14 August 2020 (UTC)
It was commented out by @Billinghurst in Special:Diff/20375939 DannyS712 (talk) 22:21, 18 August 2020 (UTC)
Everything with a namespace was being killed. Don't know where the problem lies, and it was too hard to deal with late at night and I needed fixes. Apologies for any inconvenience.  — billinghurst sDrewth 22:31, 18 August 2020 (UTC)
user:DannyS712 & user:billinghurst - it needs more than that. if {{NAMESPACE|<pagename>}} returns something, then you need to use <pagename>, otherwise <namespace>:<pagename>. It is reasonably solved on en.wikipedia (though there is still a template bug there where a newline is appearing without reason in some cases).
The underlying bug in COIBot is that on some pages (probably old) the pagename also contains the namespace connected as well as having a namespace in <namespace>, on others it doesn't. I will try to add some lines to the LinkSaver to clean it up at some point (it is point 11 in User:COIBot/Wishlist, a rather simple hack though), but my work has pulled me back in for a high priority project so I will be busy full time for the coming weeks. --Dirk Beetstra T C (en: U, T) 06:03, 19 August 2020 (UTC)

COIBot doing Draft:Draft: in reports

Hi. I have noticed that the COIBot reports has recently started doubling up on the word Draft: in reports, eg. User:COIBot/XWiki/filmifeed.com. It is not a user issue, they are definitely not created with that naming. Nothing urgent.  — billinghurst sDrewth 01:11, 17 August 2020 (UTC)

@Billinghurst: that is probably due to the fix in the thread above. It is in the template in the reports. It looks however that there is an underlying bot-bug. --Dirk Beetstra T C (en: U, T) 04:49, 17 August 2020 (UTC)
It is only a recent phenomenom, and doesn't seem to be related to any template change in User:COIBot/EditSummary, so it seems more related to whatever is being handed to the template. <shrug> Let me see if I can faff around with the template a little more.  — billinghurst sDrewth 05:18, 17 August 2020 (UTC)
Bah humbug, I was going to do {{PAGENAME:{{{pagename|}}}}} but that doesn't work as that requires the namespace to exist to kill/exclude it.  — billinghurst sDrewth 05:36, 17 August 2020 (UTC)
@Billinghurst: I think the problem is that some pagename= are filled with namespace:pagename, others are just filled with pagename, and then namespace is sometimes filled and sometimes it isn't. On en.wikipedia it is solved with some nasty nesting functions: if the parameter pagename carries a namespace, ignore the parameter namespace, otherwise add namespace. (and I should just fix the bug). --Dirk Beetstra T C (en: U, T) 06:01, 17 August 2020 (UTC)

wmnyc spam links not detected

Since deleted, but wmnyc:Brief Guide Ꭲo Purchasing A Water Softener - Purchasing was a spam page created by DenishaGibbs97 with three links: [https://www.franzia.com fantastic deal], [https://www.websitebuilderexpert.com quality], and [https://amolife.com/reviews/what-you-need-to-know-about-under-sink-water-filters.html relevant site]. The xwiki reports at User:COIBot/XWiki/franzia.com, User:COIBot/XWiki/websitebuilderexpert.com, and User:COIBot/XWiki/amolife.com didn't include these additions. Is the wiki not monitored? Thanks, --DannyS712 (talk) 00:04, 26 August 2020 (UTC)

@DannyS712: no, nyc.wikimedia.org is not parsed by user:LiWa3. If those wikis are really prone to spam and the additions are informative then I have to add them to the parameter special in User:LiWa3/Settings (I presume they have a feed on IRC?). --Dirk Beetstra T C (en: U, T) 05:24, 26 August 2020 (UTC)
Not sure. I know outreach has an irc feed (outreach.wikipedia) and doesn't appear to be detected either (latest example at User:COIBot/XWiki/apricotlakecream.net) DannyS712 (talk) 20:19, 26 August 2020 (UTC)
Please do add outreachwiki to the settings (latest was User:COIBot/XWiki/enhancedvigorxr.org) DannyS712 (talk) 17:00, 28 August 2020 (UTC)
It has an irc feed at nyc.wikimedia DannyS712 (talk) 20:24, 26 August 2020 (UTC)
@DannyS712: added, but I will need console access to get it to work without violence. I will try to do that today or tomorrow. --Dirk Beetstra T C (en: U, T) 07:47, 29 August 2020 (UTC)
Also, at User:COIBot/XWiki/gefest-sv.kiev.ua an addition on svwikisource was missed. Is that wiki not tracked? Also User:COIBot/XWiki/mfasantafe.org missed dawiki --DannyS712 (talk) 01:55, 10 September 2020 (UTC)
@DannyS712: Both channels are monitored. In the da.wikipedia case the page was deleted 3 minutes after it was created, possibly the bot was just to late to parse it. For sv.wikisource that should not be the case, but I can't see what was added there. Note that the bot is sometimes overloaded by flooders in WikiData, and it then will offload its backlog for later parsing for wikis and namespaces where there is no hurry (basically, non-mainspace links go into a backlog file and possibly all wikis that do not have XLinkBot enabled (which currently is only en.wikipedia)). Maybe that resulted also in a large lag-time on userpages at that time.
Another reason may be that if the specific 'DiffReader' on the Wikimedia channels are getting overloaded they sometimes crash and restart, which means that for a short time (several minutes) no DiffReader is in the channel. If the edit falls in that period, the bot will also miss it. I will see if I can do some further optimizations in the code at some point, I am thinking to put the high-volume wikis on their own DiffReader, and put wikidata in a low-priority queue by default. --Dirk Beetstra T C (en: U, T) 05:42, 13 September 2020 (UTC)

Bot is logged out

The bot keeps posting in #wikimedia-external-links that it got logged out, but can't seem to log back in --DannyS712 (talk) 01:56, 6 October 2020 (UTC)

  logged in, should be done  — billinghurst sDrewth 03:36, 6 October 2020 (UTC)
@Billinghurst and DannyS712: ... that's why I did not see it logged out :-). --Dirk Beetstra T C (en: U, T) 05:42, 6 October 2020 (UTC)
Thanks both DannyS712 (talk) 05:43, 6 October 2020 (UTC)
It happened again DannyS712 (talk) 18:02, 6 October 2020 (UTC)

Backlog files

Is liwa3 just confused and labelling October backlog files as 09 /202009nn-..., or are these files coming from somewhere else? I had checked the backlog directory yesterday and it had been empty.  — billinghurst sDrewth 23:33, 17 October 2020 (UTC)

@Billinghurst: Bit confused - in perl January is month 0 and I am too lazy to convert.
Likely a flooder on wikidata. They find a new property and that is then being added with massive flood. I would really prefer that they had a couple of dedicated flood-bots that people can just program through a on-wiki settings page. Then I can easier filter out that data, there is no use in LiWa3 reporting thousands and thousands of the same additions, if only the users where whitelisted then we don't have that issue (but when you whitelist the user or domain when you notice the queue is already there. --Dirk Beetstra T C (en: U, T) 05:09, 18 October 2020 (UTC)
Okay, so 09 is October, rather than being 10. I thought that linkwatchers had possibly found a secret backlog of files from September. My confusion is addressed.  — billinghurst sDrewth 05:27, 18 October 2020 (UTC)

Community Wishlist Survey 2021: Invitation

SGrabarczuk (WMF)

18:25, 20 November 2020 (UTC)

Porn sites at Wikivoyage

When you have a few minutes, would you please take a look at voy:en:Wikivoyage:Travellers' pub#Abuse filter for porn and decide whether these sites should be handled locally or globally? Thanks, WhatamIdoing (talk) 17:50, 10 December 2020 (UTC)

@WhatamIdoing: I reported all of them in Talk:Spam blacklist, lets get reports and assess abuse. this is the typical way porn sites are abused, there may be a case to just ban this stuff and whitelist what is needed. --Dirk Beetstra T C (en: U, T) 05:59, 13 December 2020 (UTC)
Thanks. I knew that whatever the situation was, you'd know how to deal with it. WhatamIdoing (talk) 00:26, 14 December 2020 (UTC)

quit permission

please could I have permission to get coibot to quit. "Only Dirk Beetstra and Versageek can tell me to quit". I would prefer to do it from IRC rather than have to kick harder from command line. Thanks.  — billinghurst sDrewth 11:23, 11 December 2020 (UTC)

@Billinghurst: I have adapted the code. As this is in the core bot (coibot.pl) this requires a restart of the whole bot before it takes effect. --Dirk Beetstra T C (en: U, T) 05:41, 13 December 2020 (UTC)
Thanks for your trust, and noted.  — billinghurst sDrewth 06:01, 13 December 2020 (UTC)

Community Wishlist Survey 2021

SGrabarczuk (WMF)

16:09, 11 December 2020 (UTC)

coibot not connecting to mysql

It had disappeared and after standard push, killed the processes, and ran the .sh to know success. Running a plain nohup with the perl file it says that it cannot connect to mysql server. Haven't got the time to check that out so just noting the issue for the moment.  — billinghurst sDrewth 02:11, 17 December 2020 (UTC)

Bah. Finish the msg, kick it again, and voila it works. <grr> COMPUTERS!!!  — billinghurst sDrewth 02:13, 17 December 2020 (UTC)
@Billinghurst: it is the database replication bug on phab that bstorms is taking care of. I thought I added you to that bug? —Dirk Beetstra T C (en: U, T) 03:47, 17 December 2020 (UTC)
Thanks. Yes, you added me to the bug, though association of that bug and this issue was probably a disconnect.  — billinghurst sDrewth 04:16, 17 December 2020 (UTC)
@Billinghurst: It also took me a long time, 'why is COIBot repeating to save the exact same report', 'why do poked reports not come through', until I manually tried to login to mysql, and manually tried to delete a record ('maybe it has a forbidden character in it?') and was disallowed to alter the db because it was locked.... then to figure out why it was locked was another challenge. --Dirk Beetstra T C (en: U, T) 05:42, 17 December 2020 (UTC)
Return to the user page of "Beetstra/Archives 2020".