Research talk:Ideas/Requests for adminship and the retention of long term editors
DONE Grabbed source html for the pages that list the admin nominees. I've developed code that will take usernames and fetch: 1. A list of all of the contributions that user has ever made 2. Metadata about that user
IN PROGRESS Working on parsing the source to extract usernames, pages on which nominees were discussed during the nomination process, and the outcomes of the voting.
TO-DO Grab the pages from the parsing, combine all into a data structure, and the post all of this code to a repository. Run the code, generate the data set, and either zip it up and put it online as a dump, or upload it to github and link here. Start thinking about features in the data we might want to analyze! Profit (publish?)!
- Cool! If you can get me a list of usernames by tomorrow morning for RfA candidates, I'll generate some behavioral R:Metrics to share. --EpochFail (talk) 01:04, 9 November 2013 (UTC)
- Alright. I didn't get as much done as I'd like, but you can now see it all on bitbucket at https://bitbucket.org/milara/wikipedia-adminship-data-grabber/overview.
- Source HTML for all of the successful nominees saved and uploaded to repository
- Python code that:
- extracts from those files the nominee usernames, date of nomination, username of person who closed the voting, and the tally of votes
- passes the list of nominee usernames to the API's users method
- cycles through the list of nominees and gets all of their contributions to Wikipedia from the API's allcontribs method
- combines all of these into a standard json data structure and saves it to a file
- Turns out we didn't get the source HTML for the unsuccessful nominees, so someone has to do that.
- I put in code to deal with lag errors, but did not manage to test it. So, that part may need some debugging; hopefully not, but you never know. Everything else is, as far as I can tell, debugged.
- Someone still needs to grab the user page, user:talk page, and election page for the nominees, and add it to the data dumps
- Once all that is working, the script can actually be run, of course, and then the datalicious fun can start.
- Unfortunately, I'm out for the rest of the weekend; I have 8 hours of research ethics training tomorrow and a problem set due at midnight Sunday. So, I just hope that what I have gotten done is useful.
Thanks for sharing your progress Milara. Sadly, I wasn't able to push on this project during the hackathon since there were other projects that people expressed interest in. I'll be trying to fit in progress on this between other project, so it'll be a bit sporadic, but you'll hear from me here. --EpochFail (talk) 16:00, 12 November 2013 (UTC)
Seems as if everyone is ignoring the vast amount of research, tables, and stats that were created at en:Wikipedia:RfA reform (continued), and the problems with RfA that were clearly identified. I don't see why this needs to be duplicated. IMO, all the stats need doing is updating. I would be happy to identify the appropriate sub pages of that project so people know where to look. It neverthmess has an excellent navigation system already. Kudpung (talk) 09:11, 3 December 2013 (UTC)
- Hi Kudpung. We're not ignoring anything. We simply didn't know about that work. Thanks for pointing it out! I'd appreciate if you could link directly to any results demonstrating an effect (or lack thereof) of successful/unsuccessful RfAs on retention of an editor. I'm generally interested in any other behavioral changes that admins exhibit post successful RfAs. In the meantime, I'll start digging through the previous analysis myself. It looks like there's some useful data tables, but I haven't yet found a link to the underlying dataset containing with a record per vote on an RfA. Starting with that would be great. --EpochFail (talk) 15:54, 3 December 2013 (UTC)
- Hi Aaron. Over at en:Wikipedia:RfA reform (continued)/Unsuccessful RfAs there is a table under the heading "Failed candidates 2011. What they are doing now" that kept track of candidate editing after a failed RfC. It's out-of-date by a couple of years, but it has some interesting data and could easily be updated. This was the data I found most interesting at the time. Best. 126.96.36.199 05:01, 5 December 2013 (UTC)