WSoR datasets/policy counts
This dataset contains edit counts of users who contributed to a selected set of relevant policy pages.
Location
editdb42:giovanni.policy_counts
Fields
editgiovanni@internproxy:~$ mysql -h db42 -e "EXPLAIN policy_counts; SELECT * FROM policy_counts LIMIT 3" giovanni +-----------+-----------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+-----------------+------+-----+---------+-------+ | title | varbinary(255) | NO | | | | | namespace | int(11) | NO | | 0 | | | user_id | int(5) unsigned | NO | MUL | 0 | | | user_name | varbinary(255) | NO | | | | | year | int(4) | YES | | NULL | | | editcount | bigint(21) | NO | | 0 | | +-----------+-----------------+------+-----+---------+-------+ +---------------------------+-----------+---------+-----------+------+-----------+ | title | namespace | user_id | user_name | year | editcount | +---------------------------+-----------+---------+-----------+------+-----------+ | Arbitration/Requests | 4 | 126234 | !! | 2007 | 2 | | Arbitration/Requests | 5 | 126234 | !! | 2007 | 1 | | Reference_desk/Humanities | 4 | 126234 | !! | 2007 | 77 | +---------------------------+-----------+---------+-----------+------+-----------+
Each row represents the edit count (field: editcount
) of a user (fields: user_id, user_name
) in a specific year (field... year
!) for a specific page (fields title, namespace
). The table is populated only with edit counts to pages from the following list of important policy pages (see here for more information; if you don't have access, you should ask Steven Walling to share it with you):
+----------------------------------------------------------+ | title | +----------------------------------------------------------+ | Administrator_intervention_against_vandalism | | Administrators | | Arbitration/Policy | | Arbitration/Requests | | Arbitration/Requests/Enforcement | | Arbitration_Committee/Noticeboard | | Article_titles | | Autobiography | | Banning_policy | | Be_bold | | Biographies_of_living_persons | | Blocking_policy | | Bot_policy | | Categorization | | Categorization/Ethnicity,_gender,_religion_and_sexuality | | Categorization_of_people | | Citing_sources | | Civility | | Conflict_of_interest | | Consensus | | Content_forking | | Contributor_copyright_investigations | | Copyright_problems | | Copyright_violations | | Criteria_for_speedy_deletion | | Deletion_policy | | Disambiguation | | Dispute_resolution_noticeboard | | Do_not_disrupt_Wikipedia_to_illustrate_a_point | | Edit_warring | | Editing_policy | | Editor_review | | Etiquette | | External_links | | External_links/Noticeboard | | Fiction/Noticeboard | | Fringe_theories | | Fringe_theories/Noticeboard | | Gaming_the_system | | Help_desk | | Ignore_all_rules | | Image_use_policy | | Manual_of_Style | | Manual_of_Style_(abbreviations) | | Manual_of_Style_(accessibility) | | Manual_of_Style_(article_message_boxes) | | Manual_of_Style_(biographies) | | Manual_of_Style_(capital_letters) | | Manual_of_Style_(captions) | | Manual_of_Style_(dates_and_numbers) | | Manual_of_Style_(disambiguation_pages) | | Manual_of_Style_(embedded_lists) | | Manual_of_Style_(footnotes) | | Manual_of_Style_(icons) | | Manual_of_Style_(infoboxes) | | Manual_of_Style_(layout) | | Manual_of_Style_(lead_section) | | Manual_of_Style_(linking) | | Manual_of_Style_(lists) | | Manual_of_Style_(lists_of_works) | | Manual_of_Style_(pronunciation) | | Manual_of_Style_(proper_names) | | Manual_of_Style_(self-references_to_avoid) | | Manual_of_Style_(spelling) | | Manual_of_Style_(summary_style) | | Manual_of_Style_(tables) | | Manual_of_Style_(text_formatting) | | Manual_of_Style_(titles) | | Manual_of_Style_(trademarks) | | Manual_of_Style_(trivia_sections) | | Manual_of_Style_(words_to_watch) | | Neutral_point_of_view | | Neutral_point_of_view/Noticeboard | | No_disclaimers_in_articles | | No_legal_threats | | No_original_research | | No_original_research/Noticeboard | | No_personal_attacks | | Non-free_content_criteria | | Non-free_content_review | | Notability | | Notability_(academics) | | Notability_(books) | | Notability_(events) | | Notability_(films) | | Notability_(music) | | Notability_(numbers) | | Notability_(organizations_and_companies) | | Notability_(people) | | Notability_(sports) | | Notability_(web) | | Offensive_material | | Ownership_of_articles | | Patent_nonsense | | Plagiarism | | Please_do_not_bite_the_newcomers | | Proposed_deletion | | Reference_desk/Computing | | Reference_desk/Entertainment | | Reference_desk/Humanities | | Reference_desk/Language | | Reference_desk/Mathematics | | Reference_desk/Miscellaneous | | Reference_desk/Science | | Reliable_sources/Noticeboard | | Requests_for_mediation | | Sexual_content | | Sock_puppetry | | Spam | | Vandalism | | Verifiability | | What_Wikipedia_is_not | | Wikipedia_is_not_a_dictionary | | Wikipedia_is_not_for_things_made_up_one_day | | Wikiquette_assistance | +----------------------------------------------------------+
Reproduction
editRun this SELECT
inside a CREATE TABLE
:
select
page_title as title,
page_namespace as namespace,
rev_user as user_id,
rev_user_text as user_name,
sum(rev_user) as editcount
from page p join revision r
on p.page_id = r.rev_page
where page_title in ("Administrator_intervention_against_vandalism", "Administrators") -- page titles here
group by rev_user, page_id, year(rev_timestamp)
The query will create a temporary table and sort data on disk, so it will take some time, depending on how many pages you want to populate the table with.