User:EpochFail/Journal/2011-07-05

Tuesday, July 5th

edit

Started today with cleaning up some analysis and adding conclusions to last week's sprint. That went until lunch. During lunch I had a good chat about new user experiences with Staeiou and Melanie. After lunch, those of us interested in vandals, blocking and conversion had a meeting to flesh out what we wanted to do this week.

I hope to try to detect vandals that converted. The first thing I'll do is explore what can be done with the checkuser table to identify editors that have done work under different usernames. Next I'll be looking for editors who look like reformed vandals. --Ahalfaker 23:02, 5 July 2011 (UTC)

Oh yeah. I'm also building a table of huggle users. Debugging the regular expression for finding old style huggle edits took about an hour :(. There is definitely a bug in mysql's regular expressions for escaping "]". --Ahalfaker 23:02, 5 July 2011 (UTC)

Taking a break to write some python for Fabian to wrap standard in with some content to mimic real XML. --Ahalfaker 23:02, 5 July 2011 (UTC)

Just finished writing the python for Fabian. I committed to http://svn.wikimedia.org/svnroot/mediawiki/trunk/tools/wsor/scripts/classes/file_wrapper.py. --00:36, 6 July 2011 (UTC)

Just got done helping fabian plug it in and it works. I am all that is man! --Ahalfaker 00:46, 6 July 2011 (UTC)

Wednesday, July 5th

edit

OK... today I have one major interruption--talking with Howie about my revert effects paper. Otherwise, I plan to be generating datasets.

So I want to do this in a few steps that will eventually build a user table with a few columns. ("fes" stands for "first edit session", "les" stands for "last edit session")

  • user_id
  • user_name
  • fes_start
  • fes_end
  • fes_edits
  • fes_reverted
  • fes_vandalism
  • les_start
  • les_end
  • les_edits
  • les_reverted
  • les_vandalism
  • edits

From here, I'll be trying to find converted editors. --Ahalfaker 16:39, 6 July 2011 (UTC)

Now that I've been hacking for a bit, I want to gather information about an editor's last n edits rather than their last session since their last session could be only one edit and that wouldn't be especially helpful for understanding whether they converted or not. I'm thinking the last 10 edits should be sufficient. --Ahalfaker 18:06, 6 July 2011 (UTC)

I was constantly interrupted (by important but interrupty things) so I barely got the data aggregation script started by the end of the day. --Ahalfaker 16:08, 7 July 2011 (UTC)

Thursday, July 7th

edit

First thing I needed to do today was restart the data aggregation script. For some reason, the MySQL library that I am using to get the data out of db42 is disconnecting after processing about 10k users. I just added a quick feature to help the script pick up where it left off and started it again.

Now I have to get a presentation together for Dario and the metrics meeting. --Ahalfaker 16:08, 7 July 2011 (UTC)

While the metrics meeting was happening, I coded a fix for MySQL disconnecting me that uses a ton of memory, but works. Working on history fellows datasets while that runs. --Ahalfaker 18:20, 7 July 2011 (UTC)

I finally got my dataset, but it only has 280k rows. I have to sanity check that only ~280 users have made more than 20 edits. --Ahalfaker 22:19, 7 July 2011 (UTC)

OK... so that is definitely wrong. I should be getting ~530k users. I'm working on re-running. --Ahalfaker 23:27, 7 July 2011 (UTC)

Just had a bunch of my time eaten by Hadoop. It is a really messy black box. It looks like it was written by someone enamored with Java who doesn't value simplicity. --Ahalfaker 23:28, 7 July 2011 (UTC)

Re-running data aggregation. I still don't know why I missed a bunch of editors, but I'm getting them now. Also spent some time talking to Melanie and Staeiou about an experiment that we want to run next week with huggle revert warnings. --Ahalfaker 00:30, 8 July 2011 (UTC)

Friday, July 8th

edit

Answered WSC's questions about the last sprint right away today. After I was done with that, I found that the process I had been running for data aggregation had failed because of a user with an editcount of 35, but no revisions in the revision or archive tables. It looks like the person was a wiki-stalker whose edits were suppressed via SQL delete. I updated the script to ignore such cases and re-ran. It looks like I am more than half-way done. In the meantime, I'll get to documenting the huggle revert warning experiment. --Ahalfaker 17:26, 8 July 2011 (UTC)

Nevermind. Meeting all day :(. Dataset is ready for monday thought. --Ahalfaker 00:36, 9 July 2011 (UTC)