User:Renklauf/Archive 4

Monday April 30th edit

  • PT welcome template data - investigate issues with z5 template
  • Gadgets click-tracking follow-up with Aaron


Tuesday May 1st edit

  • 28 bot postings -- strange issue with missing postings, I imagine this is related to the issue I saw with the PT welcome templates
  • imagetaggingbot, worked on removing users that received test and control templates, it may not be worth doing this for all templates but im investigating how much of an effect this had
  • wrote a script to remove rows specified by a user list from a tsv file - this is to be used in accomplishing the above task


Wednesday May 2nd edit

  • removing users from ImagetTaggingBot data that received multiple templates in the test set (analytics dev chat)
  • de-duping newly processed ImageTaggingBot data, then re-running analysis on each case
  • debugging issues with generating "PT wiki welcome" and "28Bot" postings

The old results for blocking in the ImageTaggingBot experiment turned out not to be conclusive after removing the users that received both templates. There is still a single case where edit activity seems to be significantly effected however in the negative case for the test template (control outperformed).


Thursday May 3rd edit

  • Communicated with Nimish regarding a script to generate mobile views in Uganda in April, located squid inputs, executed script on emery, communicated results to Global Dev
  • Debugging postings script that generates revisions filtered by user, comment, and diff text -- this is necessary to generate postings for 28Bot and welcome templates
  • I think I've found at least part of the issue, it seems if the users talk page didn't exist before the posting then the api call fails to produce the diff text
  • Fixed this looks like the above suspicion was correct
  • Kicked off postings for 28bot


Friday May 4th edit

  • Postings for PT welcome tests
  • 28bot metrics and analysis
  • Discovered a bias in the 28bot template postings on analysis. This may still yield a result based on the response of users after a certain number of warnings
  • Check in with the team regarding 28bot analysis and experiments backlog


Monday May 7th edit

  • RscprinterBot
  • generate postings and metrics
  • analyze in comparison to registered users for 28bot (143 ,144)
  • Write method to generate postings for a specified user list
  • PT welcome template - metrics, begin analysis


Tuesday May 8th edit

  • RscprinterBot
  • Modified postings.py to process abritrary revision results and not rely on talk page messages explicitly. This means we can now get reverted users from comments also and opens the door to pre-filter on other revision info and not blindly admit records from the table
  • Generated correct postings, completed analysis
  • Welcome templates
  • Incubator: generating postings on internproxy, reviewing templates to weigh in on research questions
  • PT: investigating issue with postings here - solution may be suitable application for revision filters
  • A note on the welcome template postings, we may want to consider requesting a modification to the NewUserMessage extension that tags revision comments since, not doing this, dramatically increases the time to mine postings from the revision table, especially in enwiki


Wednesday May 9th edit

  • Welcome Templates - tracking down postings / debugging mining scripts
  • Incubator - the postings script died last night due to an error, however the sheer volume of revision records is too large to parse. We could filter these out by requiring some expression be present in the revision comment. I'm going to look into ways to do this since the current scheme is simply too resource intensive. I have one solution that would involve parsing users from the "What Links Here" output, which is what I'm going with at the moment
  • PT - tracking specific revisions including the z5 template tag in PTwiki, the scripts are still missing these revisions


Thursday May 10th edit

  • Make IRC Cloak request
  • Welcome Templates
  • Writing code to parse Template Links (e.g. https://pt.wikipedia.org/wiki/Especial:Páginas_afluentes/Template:Z5) to speed up data generation of template postings
  • developed method to tie in user ids from Template Links into posting scripts - this speeds up the process of parsing template postings without tagged edit summaries
  • discovered retrieving revisions for welcome templates on the incubator cannot be done via the slave - currently a blocker, these don't appear to be accessible via fenari either
  • debugging issues with retrieving PT welcome template postings - I think this issue arises from case sensitivity - testing... these appear to be fixed now, postings are generating


Friday May 11th edit

  • Create EE profile - [1]
  • Respond to Karyn's email about metrics
  • Welcome Templates
  • It was determined that for the moment revisions for incubator tests cannot be found
  • PT postings successfully completed
  • Since the PT wiki db is only available on fenari I'm going to work on exporting the necessary tables to another db to generate to provide access to the required scripts
  • exports underway
  • Article top-level category audit
  • While the exports are underway on PT wiki I'm revisiting the code used for generating top-level categories to assess the amount of work needed to have this table refreshed periodically and also to plan on how to develop improvements to the existing algorithm


Monday May 14th edit

  • Welcome Templates
  • PT - Complete migration of data from fenari PTwiki production tables
  • PT - modify source to compute metrics
  • PT - begin analysis - includes de-duping data
  • Incubator - check in to see if getting revisions for the incubator is feasible
  • Necromancy counts for Karyn
  • generate tsv of revision counts for users
  • modify queries to join with page table to filter by namespace
  • build counts of editors that made 30 and 50 edits at least in the period of 6 - 3 months ago and no edits in the following period up to today
  • Respond to Dario's email regarding macro-categories and first edit page tracking


Tuesday May 15th edit

  • Welcome Templates
  • PT - There may be some missing revisions missing in the original set that I was using - regenerating
  • PT - Revisit Analysis - found editors that had made at least one edit before receiving the template preferred the default graphical template - analysis


Wednesday May 16th edit

  • Go through Template Testing Report and meta stuff on new experiments - comment
  • Get counts for Imagetaggingbot and PT Huggle
  • Necromancy - edit behaviour before and after - waiting on user lists
  • Review "Necromancy", "Edit feedback", and "Timestamp position modification"


Thursday May 17th edit

  • Write scripts to process barnstarred editors
  • Build edit, warning, and block counts for barnstarred editors
  • Analysis on distribution of edits, blocks, and warnings for above editors


Friday May 18th edit

  • Build edit, warning, and block counts for barnstarred editors 3 months before their barnstar was received
  • Analysis on distribution of edits, blocks, and warnings for above editors
  • Macrocategories
  • Some modifications to Template report draft


Monday May 21st edit

  • Mine data for "barnstarred" editors for the following three day periods beginning at the following points before the barnstarring (days before): 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360
  • Analysis on this data - generate plots which demonstrate the effect of the barnstarring relative to each of these periods
  • Notes for team regarding recently past and upcoming experiments
  • Weekly summary
  • Produce script to provide failsafe for editor threshold script


Tuesday May 22nd edit

  • Macro-categories
  • Produce script to provide failsafe for editor threshold script
  • Review Template Experimentation presentation slide deck


Wednesday May 23rd edit

  • Macro-categories work
  • Produce script to provide failsafe for editor threshold script


Monday June 11th edit

  • First day back from vacation - bringing myself up to speed
  • Worked through backlogged emails
  • Chatted with team members regarding outstanding tasks
  • Sprint preparation for Post-Edit Feedback


Tuesday June 12th edit

  • Macro-categories - debugging
  • working out issues with with storing categories which have no subcategories
  • building web-API for retrieving page category data from db42 tables
  • Chat with Dario about analytics requirement for post-edit feedback
  • Monthly Product Meeting


Wednesday June 13th edit

  • Macro-categories
  • Regenerated subcategories data structure
  • Resolved issues with generating page category links and missing page category entries
  • Analytics infrastructure meeting - this turned into a conceptual sprint on udp2log and click-tracking logging and the implications that it had on the validity of our data


Thursday June 14th edit

  • Meet with Evan and Jessie
  • Went over datasources for analysis work
  • discussed tools and content of my analysis work
  • Brownbag with David MacDonald - Social Translucence
  • Macro-categories
  • Spent time tracing some of the issues that prevent shortest-paths for some categories from being computed
  • ran into issues internproxy while attempting to increase the maximum depth for the recursion used to build the category links structure


Friday June 15th edit

  • Complete WMF review work
  • Submit wellness claims
  • Macro-categories: Continue investigating missing path distances between nodes that prevents some pages from being categorized
  • Spec out work for performing analysis on new 1K editor groups (most of this will be replicated what was done in May)


Hackathon Tasks edit

  • backup category tables
  • Mobile reports for Amit
  • find the data source
  • regenerate categories
  • basic interface for article categories - Django