Research talk:Classifying Actors on Talk Pages

Latest comment: 4 years ago by Bluerasberry in topic Possible queries

Present case edit

There are many such cases of misconduct, but right now, the English Wikipedia community is organizing a call against an organization with a public reputation for behavior which Wikipedia calls "long term abuse"

There are available lists of many such accounts engaged in misconduct. Misconduct for pay is one cluster of misconduct with its own characteristics.

Blue Rasberry (talk) 15:52, 29 January 2020 (UTC)Reply

Notes from a chat edit

I am undecided about where on meta to put notes like this. This discussion came from this project, but these notes are relevant more broadly. I would like to move these notes to some meta notes subpage of some ongoing long term project. There currently is no central place on meta for coordinating the various projects about classifying actors and edits and other factors related to conduct. Blue Rasberry (talk) 16:26, 15 February 2020 (UTC)Reply

Possible queries edit

Suppose that all English Wikipedia talk pages were in a database, and suppose that we could do queries on them. One type of query could be only with the structured data which is natural to the dataset. Another type of query could follow creating new data by analyzing the content of the text on each talk page.

  1. Talk page overview
    1. number of posts per talk page
    2. number of unique posters
    3. all of the above, but divide between bots / humans
    4. content
      1. what is the size of the average content on talk pages
      2. how much of that is human readable prose?
      3. average number of wikilinks on a talk page
      4. average number of external links
      5. average number of citations, meaning a "ref" tag or template {{cite (whatever)
      6. how does one separate the template portion of the talk page from the posts (typically the wikiprojects at top, and the user posts below - probably posts start with double ="
  2. Who posts on talk pages
    1. by activity
      1. Sort all users who post on talk pages by any definition of Research:Active editor which divides them among new/not previously active, more active, highly active, or whatever
      2. Now match talk page posts to users by their category for how active they are
      3. What is the active editor demographic of users who post to talk pages?
    2. by special account status
      1. What percentage of talk pages have any post from a user with some userright / account status?
      2. basic - confirmed / autoconfirmed
      3. semi-advanced, like extended confirmed, rollback, page patroller
      4. advanced, like admin, bureaucrat, or steward
      5. blocked now
      6. blocked ever
    3. by non-userright characteristic
      1. ever had Wikimedia Foundation affiliation
        1. requires associating WMF accounts to user accounts, as WMF people often have two
      2. identified as paid editors
      3. bot
      4. other list?
  3. coming to life
    1. define some amount of time as demonstrating that a talk page is inactive
      1. looking for largest number of consecutive days without a post from a human which applies to the largest number of talk pages
      2. for example, "80% of talk pages have not had a human post to them within the last 9 months"
    2. when a user posts on a talk page, and that talk page has had no activity in the past (7-90?) days, in what percentage of instances does a response post follow within 7 days?
    3. among those inactive talk pages, when someone does post, in what percentage of instances do these responses occur?
      1. no response within some amount of time (7-90 days?, or perhaps the term of defining inactivity)
      2. a response within 7 days
        1. when there is a response on an inactive talk page, what is the average number of posts which this creates
        2. for the top 10% of posts on inactive talk pages which receive a response, how many posts do they receive?
  4. WikiProject / category related queries
    1. Sort each talk page as being part of a WikiProject based on the presence of a WikiProject template on that article
    2. average number of WikiProjects per article
    3. relationship between WikiProjects and activity
      1. rank WikiProjects by average number of talk page posts for talk pages in a given WikiProject
      2. rank WikiProjects by count of users who ever posted to talk pages in that WikiProject, and who are
  5. relationship between mainpage edits versus talk page
    1. ratio of mainpage to talk page edits for various classes of users
    2. frequency of talk page edits for various classes of users
  6. repeat engagement
    1. at a single talk page
      1. average amount of time passing between talk page edits
      2. when someone who has not recently posted to a talk page does post, then how likely are they to post again very soon?
    2. at multiple
      1. When someone posts at one talk page, how often do they soon after post to another talk page
  7. mass messaging
    1. when a user posts to one talk page
      1. how likely are they to very soon post to another talk page
      2. how likely are they to make an identical post at another talk page
    2. what is the count of users who mass message, defined as reposting a message to 5+ pages? 50+?
    3. what is the average character count of a mass message?
  8. citing policy
    1. disregard the top template portion of talk pages and consider the post content
    2. among all the posts, with what frequency does anyone wikilink to any WP: or Wikipedia: page?
    3. among all posts with at least one link to a WP: page, what characteristics are there
      1. average number of links
      2. association of WP: links with wikiproject, then ranked
      3. association of WP: links with users by activity
      4. association of WP: links with presence of users by account status, such as admins or blocked users
    4. characteristics of WP: linked pages
      1. rank them by popularity
      2. categorize them, then percentage of catagories
        1. Wikiprojects
        2. articles for deletion discussions
        3. noticeboards
        4. anything labeled as a fundamental Wikipedia policy
  9. user network behavior
    1. two users doing things in the same space
      1. define collaboration somehow, such as two people each posting on the same two talk pages at least once
      2. in cases where users "collaborate"
        1. How many talk pages will they both appear
        2. what is the length of time between the first time they posted on the same talk page, to the last time, with no more than 90 days passing between two meetings?
    2. large group collaboration
      1. define a large group as a network of people 5+ who each meet that "collaboration" definition with two others in the network, or whatever is meaningful
      2. how many large collaboration groups exist
      3. in what Wikiproject sorts of articles do they exist
      4. how long do they stay active
      5. what is the user turnover in a stable network of 5+ people
        1. how often do new users join or users drop out

Blue Rasberry (talk) 17:24, 29 March 2020 (UTC)Reply

Return to "Classifying Actors on Talk Pages" page.