User talk:とある白い猫/Archive/2015

Latest comment: 8 years ago by محمد شعیب in topic Word list for Urdu
とある白い猫
A Certain White Cat
Bilinen Bir Beyaz Kedi

User Page | Talk Page | Bot edits | Sandbox
Kullanıcı Sayfası | Mesajlar | Bot edits | Sandbox


EN JA TR Commons

Hello this is an Archive. Please do not edit. You are welcome to post comments regarding material here at my user talk page.

Always believe in yourserf and your dreams, you have a wing!
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2006 02 03 04 05 06 07 08 09 10 11 12 2011 01 02 03 04 05 06 07 08 09 10 11 12
2007 01 02 03 04 05 06 07 08 09 10 11 12 2012 01 02 03 04 05 06 07 08 09 10 11 12
2008 01 02 03 04 05 06 07 08 09 10 11 12 2013 01 02 03 04 05 06 07 08 09 10 11 12
2009 01 02 03 04 05 06 07 08 09 10 11 12 2014 01 02 03 04 05 06 07 08 09 10 11 12
2010 01 02 03 04 05 06 07 08 09 10 11 12 2015 01 02 03 04 05 06 07 08 09 10 11 12
Archive 2015

January

Reqeust

Please work at Sindhi wikipedia. محمد مجیب (talk) 07:45, 12 January 2015 (UTC) User talk:とある白い猫/Archive/2015/02 User talk:とある白い猫/Archive/2015/03 User talk:とある白い猫/Archive/2015/04 User talk:とある白い猫/Archive/2015/05 User talk:とある白い猫/Archive/2015/06 User talk:とある白い猫/Archive/2015/07Reply

August

Synchbot request

Hello とある白い猫. I deleted user:Cool Cat and user:Cool Cat/* on all wikis as you requested via Synchbot; you can see the list of changes on your archive page. Note that ja:User:とある白い猫/Sidebar has a broken link to one of the deleted pages, but I can't fix it due to a local abuse filter that prevents me from editing others' pages. —Pathoschild 17:03, 02 August 2015 (UTC)

Comment about Estonian Wikipedia

This proposal hasn't gotten a lot of interest in et wiki. Compared to potential benefits it may require too much input to get that working reasonably well. Reason is both in the number of daily changes, that is still relatively low (people check that manually and bot edits would likely to be checked just as well), and in the fact that the vandalism is still rather rare in Estonian Wikipedia. So it seems as a minor improvement as people don't perceive the vandalism problem to be big. It is also rather challenging to do as 1) we don't have well fixed set of rules and people attend to review the articles in very different ways + if there is some vandalism, then that it is wildly different (it is easy to group it into different categories but the details of edits are so different that is can't be easy to teach a bot). The end of discussion goes to the direction that people are overly critical and some users are never satisfied with the presented text. Kruusamägi (talk) 18:44, 11 August 2015 (UTC)Reply

BTW. There are also questions on 1) could that AI tool be able to recoment articles for review based on what user likes to review; 2) could that tool help to find new (potentially) good articles. Kruusamägi (talk) 23:14, 11 August 2015 (UTC)Reply
@Kruusamägi:, hi so let me answer your questions. We intend to provide an AI frame work to deal with all sorts of backlogs wikis need assistance in. We merely started with vandalism detection and article quality assessment. :)
  1. We do not yet provide recommendations. In the future such a system could very well rely on the AI infrastructure we provide. This is something we are looking to implement in the not too distant future.
  2. The tool uses existing assessments - for example a sample of random other good articles - to learn what an Estonian good article looks like. We can provide scores which a tool can determine "good articles" as well as "featured articles" or however other assessments Estonian wiki has. System would merely recommend the community to look at the articles and wouldn't modify the wiki. So if an article has remained assessed as a stub for 5 years (because no body bothered to reassess) while the actual article was improved, our system would recommend that the stub assessment is wrong and the article should be re-assessed. It would suggest if it should be a good article, c-class or whatever but that is merely a suggestion. So the tool would take community consensus as input and base predictions on it.
-- とある白い猫 chi? 21:14, 12 August 2015 (UTC)Reply

Message

Ping :) --Lucas (talk) 15:25, 17 August 2015 (UTC)Reply

How can we improve Wikimedia grants to support you better?

Hi! The Wikimedia Foundation would like your input on how we can reimagine Wikimedia Foundation grants to better support people and ideas in your Wikimedia project.

After reading the Reimagining WMF grants idea, we ask you to complete this survey to help us improve the idea and learn more about your experience. When you complete the survey, you can enter to win one of five Wikimedia globe sweatshirts!

In addition to taking the the survey, you are welcome to participate in these ways:

This survey is in English, but feedback on the discussion page is welcome in any language.

With thanks,

I JethroBT (WMF), Community Resources, Wikimedia Foundation.

(Opt-out Instructions) This message was sent by I JethroBT (WMF) (talk · contribs) through MediaWiki message delivery. 01:23, 19 August 2015 (UTC)Reply

Your request on SRGP

Your request is still open, awaiting a reply from you. Otherwise, the request will likely be closed shortly. Savhñ 14:00, 29 August 2015 (UTC)Reply

September

Last call for WMF grants feedback!

Hi, this is a reminder that the consultation about Reimagining WMF grants is closing on 8 September (0:00 UTC). We encourage you to complete the survey now, if you haven't yet done so, so that we can include your ideas.

With thanks,

I JethroBT (WMF), Community Resources, Wikimedia Foundation.

(Opt-out Instructions) This message was sent by I JethroBT (WMF) (talk · contribs) through MediaWiki message delivery. 19:08, 4 September 2015 (UTC)Reply

Wiki labels in Estonian Wikipedia

Hello, I'm letting you know that I made changes to Word lists on the page Research:Revision scoring as a service/Word_lists/et. The lists are more or less complete now. What if we want to add words at later stage, is it possible? Cumbril (talk) 17:12, 8 September 2015 (UTC)Reply

Hello Cumbril, we have not yet built the model for Estonian so no harm done. :) It is always possible to add/remove words but getting it as accurate as possible in earlier stages are preferred. -- とある白い猫 chi? 07:22, 13 September 2015 (UTC)Reply

User talk:とある白い猫/Archive/2015/10

November

Wiki Labels for jawiki

とある白い猫さん、こんにちは!(Hello, とある白い猫!)

I read your message on jawiki. I'm very interested in the Wiki Labels project and I'd like to help introducing Wiki Labels into Japanese Wikipedia. (A while ago I made a similar proposal on 2015 Community Wishlist Survey#Suggesting AbuseFilter by machine learning.)

I have done below 3 works today.

But below 2 are not done.

  • Listing up of trusted usergroups: Where should I report this? Below groups is considers as trusted in jawiki
    • abusefilter
    • bureaucrat
    • checkuser
    • eliminator
    • interface-editor
    • oversight
    • rollbacker
    • sysop
  • Reviewing the auto-generated bad-words list: How can I report the review result? It seems that the auto-generated list is completely useless because they are only 1 character(Kanji)). It's hard to avoid such problem because they say that Japanese is a difficult language for computers to separate words...

Could you help me to introduce Wiki Labels into jawiki? Sorry if this page is not suited for this message.

よろしくお願いします。(Thanks.)--aokomoriuta (talk) 19:43, 11 November 2015 (UTC)Reply

Hello / こんにちは
So I processed the information and work you have done for us thus far. I am happy to report that we are very close to starting the wiki labels campaign on Ja Wikipedia as a consequence. :)
So you already posted badwords and informal words on the correct location. You are welcome to add more and even add regexes.
In the English language we delimiter words by spaces which isn't a good strategy for Japanese as far as I can tell. For Japanese our strategy is to treat each character as a word. If you have a different suggestion we will do our best to try to implement it. Indeed with my very limited understanding of Japanese I am aware it is more customary to have pairs or triples of Kanji. The generated list are kanji that statistically appear on reverted edits but not on regular edits. For this we use a TF-IDF approach. Some English curse words are made out of two or more words. "God Damn", "Fuck You" "Fuck Off" etc would be three examples. Words "God", "You" and "Off" would not normally be considered curse words as such our statistical approach would not treat them as such where as we would treat "Damn" and "Fuck" as curse words. Likewise we are trying to identify the Kanji that appear commonly in Japanese curse words even if they are not exclusively used in curses.
There also are words that are reverted in articles but not on talk pages. In English this would include words like "hello" or "hahaha". Which Kanji would be informal like this?
The idea here is to let the machine learning algorithm decide what to do with these words. Our approach relies on more features than just these word lists.
-- とある白い猫 chi? 11:45, 14 November 2015 (UTC)Reply
Hi, I also came from the thread on jawiki. Re word delimiter: Can character-based N-gram tokenization be used for CJK languages, at least as a starting point? There is an open source implementation in Java: NGramTokenizer of Lucene. This approach won't need a word-segmented corpus to learn from and it should be simple enough to re-implement in Python, if necessary.
A caveat is that the "generated list" would be less easy to read than that generated with more intelligent and complex approaches. Still, character N-grams should be more informative than one-character tokens which are currently shown on Meta—many of the character 2-grams, 3-grams and 4-grams of Japanese coincide with words and morphemes, while few characters stand as a word itself (and thus it can be hard for people to say whether a character is "bad" or not). I believe the same can be said to Korean and Chinese to some extent,
Perhaps this particular issue on tokenization should go to phab:T111179? whym (talk) 12:31, 14 November 2015 (UTC)Reply

December

Word list for Urdu

Can you create this list for Urdu wikipedia please? --Muhammad Shuaib (talk) 08:26, 2 December 2015 (UTC)Reply

Absolutely! :)
Could you please translate a line for Urdu at Wiki labels/Interface translation and Wiki labels/Interface translation/Edit quality?
We need to know which usergroups are "trusted". The usergroups I see are abusefilter, bot, bureaucrat, confirmed, flow-bot, import, ipblock-exempt, rollbacker, sysop. We normally trust usergroups such as sysop and bot because they are unlikely to vandalize Wikipedia.
We also need a localization of en:Wikipedia:Labels on Urdu Wikipedia to serve as our landing page.
-- とある白い猫 chi? 06:31, 3 December 2015 (UTC)Reply
We also need similar help with Arabic if you are up for it. :). -- とある白い猫 chi? 06:33, 3 December 2015 (UTC)Reply
Translations are done and landing page has been created here. And, as you stated, we also trust sysops, bots and rollbackers as well. :) --Muhammad Shuaib (talk) 16:20, 3 December 2015 (UTC)Reply
Return to the user page of "とある白い猫/Archive/2015".