Research talk:Mapping Wikipedia in the Middle East and North Africa

Some things we need:

  • A better name. I propose "Locations and Representations" (Bernie and I think it should be called: "Mapping Wikipedia in the Middle East and North Africa" Markgraham 20:13, 9 November 2011 (UTC)) Yes check.svg Done
  • Fell grant number. Yes check.svg Done
  • Draw upon our grant proposal for more elaborate explanation.
  • We now have code for the aggregation script. We will publish our algorithm on this page soon. X mark.svg Not done
  • Geography subpage needs to be filled out.
  • Benefits needs to be elaborated beyond the bullets.
  • Dissemination should mention translation. Yes check.svg Done
  • Methods section needs to be elaborated according to what we agreed. It needs to accentuate the specific process where we are asking for private data. Discussed below:
  • We need to add more of a broader project justification. Answer the 'so what' question about the research. Explain why it is important. Markgraham 20:13, 9 November 2011 (UTC)
  • Richard: could you make a start on a new section about the 2012 workshops? We'll use this as a discussion starting point Markgraham 20:13, 9 November 2011 (UTC) Yes check.svg Done

Algorithm summary, from Ahmed (to be included in the main page):

  1. If an article has over a certain number of revisions, group its revisions under a label.
  2. Group together articles in a level1 region that do not meet the criteria in 1.
    • If the groupings in 2 meet the anonymity limit, then break them into groups meeting the anonymity limit and label each group's revisions with a unique label.
    • If not, then add them to other articles from other level1's in the same level0 not meeting that criteria
  3. Group together articles not meeting criteria in 2 for each level0(country).
    • if these groupings meet the anonymity limit, then do the same as in 2 for the level1 groups.
    • If not, group these articles with others from adjacent countries(will only be the case in small wikipedia versions not in english or french)

The idea we're articulating is to be available to get an assessment of the locations of the revisions, within our Research:Edit locations/Areas of interest, without revealing the specific location of a specific editor. Insofar as it is important that this data be made available to the research public broadly thereafter, we want to consider a mechanism that if not purely anonymous, at least sufficiently anonymized for most intents and purposes. One key for this mechanism is the "number of revisions" that we are grouping together. This is a tunable parameter in our script, which we can set to a different number depending on various criteria that we encounter, and discuss here and with the project team. Blurky 17:08, 3 November 2011 (UTC)

Unfortunately the algorithm needs a little work. Rich Farmbrough 12:01 9 November 2011 (GMT).
