Research:Labs2/Hackathons/August 6-7th, 2014

Wiki Research Hackathon
Wikis work in practice, but not in Theory. Let's change that.
August 6-7th, 2014
Share ideas, tools, analysis, and discussion...
You can join local meetups or participate 100% virtually.
That depends on your time zone.
...to improve collaboration, explore new ideas, and answer some pressing questions.
Organize local and virtual meetups and help others get set up in Wikimedia Labs .

[ sign up ] [ #wikiresearch ]

The L2 research hackathon day is an opportunity for anyone interested in research on wikis, Wikipedia, and other open collaborations to meet, share ideas, and work together. It's being organized by researchers in academia and the Wikimedia Foundation, but we want anyone interested in research to participate. Whether or not you consider yourself a researcher, or would ever want to be one, come with questions, answers, data, code, crazy ideas... or just your insatiable curiosity.

We will meet both virtually via google hangout and locally for those who are able to attend local meetup groups. You can take part through a persistent google hangout and IRC channel (#wikimedia-research) throughout the day, even if there is no local meetup in your neck of the world.

Who can participate? edit

Everyone who is interested is welcome to participate. You don't have to be a researcher or a programmer to get involved. We need your ideas, your questions, and your insights into how wikis work, where wiki-work breaks down and how things can be helped. Just find a meetup group (local or virtual) that works for you and add your name to the list of attendees.

[ sign up ]

How will we meet? edit

In order to be inclusive as possible, we'll be organizing both local and virtual meetups. Local meetups are organized by Wikipedians all over the world. Anyone is welcome to become a local host We currently have local meetups confirmed for:

We have virtual meetups confirmed for:

Hosts will be responsible for determining how they'll organize their own events and how they'll synchronize with others. It's generally recommended that hosts organize their events around the global synchronization periods described below, but that may be impractical in certain circumstances. Check with your host about how he/she plans to handle synchronization.

When will it happen? edit

It turns out that "When will it happen?" is a complicated question. Since this hackathon is a global event, participants will be joining all over the world, and due to the way that global time (and thus time zones) works, many of us will be awake while others are sleeping.

In order to deal with pain due to time zones, local and virtual meetups will be synchronized in sync groups based on similar time zones within the Americas, Europe/Africa and Asia/Oceania. These groups share at least an hour of overlap with nearby group(s). Meetup hosts can take advantage of these overlapping periods to share ideas and results with neighboring sync groups. If you are participating virtually, you are welcome to join whichever local meetup(s) and synchronization groups fit your location/schedule best.


Date Aug, 6th (Wednesday) Aug. 7th (Thursday) Aug. 8th (Friday)
hour (UTC) ... 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 00 01 02 03 ...
    Virtual (Asia/Oceania)      
      Virtual (Europe/Africa)      
London, UK         London, UK      
  Virtual (Philadelphia, PA)       Philadelphia, PA  
    Virtual (Americas)  

Key: scheduled timingapprox timingconference

What will we do? edit

Suggested meetup agenda edit

Below is a suggested agenda for meetup groups to work from. Times are based on a 9AM start time. Start times will depend on your time zone what your host has planned.

0900-0915 — Meetup begins
Introductions, ice-breakers, discussion of the event format, schedule and goals.
0915-1000 — Idea sharing (sync)
Participants review, share, and refine the project ideas in a lighting presentation format. Synchronization with the other meetups is optional.
1000-1200 — Morning Work Session
Project groups begin pursuing their projects for the remainder of the morning.
1200-1215 — Preliminary project reporting
Each project reports back to the group about what they've been doing.
1215-1400 — Lunch break
Eat and/or continue their work over lunch.
1400-1630 — Afternoon Work Session
Project groups re-convene to continue working.
1630-1715 — Final project reporting (sync)
Project groups report on their activities for the day. Synchronization with the other meetups is optional.
1715-1730 — Event closing
Coordinate project reporting, discuss ideas for subsequent meetups or events and reflect on what worked and what didn't.
1730-1900 — Socializing!
Participants are invited to join each other at a nearby cafe or restaurant for drinks, food, etc.

The Wiki Research Hackathon is an opportunity for anyone interested in research on wikis, Wikipedia, and open collaboration to meet, share ideas, and work together. It is targeted at Wikimedians, students, researchers, coders and anyone interested in crunching and visualizing data, designing new tools, and producing new knowledge about Wikimedia projects and their communities.


The goal of this event is to:

  • share knowledge about research tools and datasets (and how to use them)
  • ask burning research questions (and learn how to answer them)
  • get involved in ongoing research projects (or start new ones)
  • design new data-driven apps and tools (or hack existing ones)

Presentations & demos edit

We've lined up a series of speakers to demonstrate data resources and libraries for accessing and processing wiki data. Presentations will happen over an "On-Air" google hangout and speakers will be available for questions via IRC (#wikimedia-researchconnect). Talks will be recorded so that those who can not make it to watch the presentations like will still be able to review them afterwards.


Halfak's wiki data processing libraries (EpochFail, a.k.a. "halfak")
Along with quantitative research comes data and analysis code. In this presentation, EpochFail will introduce you to 4 python libraries that capture code he uses on a regular basis to get his wiki research done.
  • MediaWiki Utilities is a general data processing library that includes connectors for the API and MySQL databases as well as an XML dump parser and revert detection.
  • Wiki-Class is a machine learning library that is designed to train, test and deploy automatic quality assessment class detection for Wikipedia articles.
  • MediaWiki-OAuth provides a simple interface for performing an OAuth handshake with a MediaWiki installation (e.g. Wikipedia).
  • Deltas is an experimental text difference detection library that implements cutting-edge research to track changes to Wikipedia articles and attribute authorship of content.
Quarry -- Web-based querying system (Yuvipanda)
Quarry is a web-based, collaborative database client that will allow you to explore a live copy of Wikipedia's MySQL database. Quarry is designed to make it easy to get started with Wikipedia data. Using Quarry, you can run SQL statements, download query results and share your work with others. Unlike getting your shell account approved for Wikimedia labs, getting started with query will take less than 5 minutes and will enable you to share your queries and results with others.


Why are we doing this? edit

The scholarly research community studying Wikimedia projects, the Wikimedia Foundation, and the Wikimedia community all have a shared interest in answering research questions about Wikimedia projects using public datasets, developing new tools and sharing their work with others. This hackathon is intended to kick start L2, a space for improved collaboration between Wiki editors, Researchers and Foundation staff.


Resources edit

Participant information edit

Pre-event checklist edit

Sign up for the local or virtual meetup you'll be attending
If you haven't signed up yet, makes sure to notify your local host so that they know to expect you.
Sign up for a Wikimedia account
This account works across all Wikimedia sites including here on Meta and all language Wikipedias.
Create a Tool Labs account
... to get access to data, server resources, etc. This is optional, but highly recommended. If you're interested in running MySQL queries against live copies of Wikipedia, you should get a head start by creating following the guide to create Wikimedia Labs account and request access to Tool Labs. email us if you get stuck.
File your research ideas
Record your research ideas for the hackathon and check out other participants ideas. If you'd like to present the idea during a conference session, make sure that your meetup is scheduled to participate and your local host knows.

Communication edit

Communication/coordination will happen mostly via IRC and a Google Hangout. If you already have an IRC handle, please join the #wikimedia-researchconnect channel. Otherwise, you can use the web-based IRC client.

Google hangouts will be set up for conference sessions that cross meetup groups. Many meetups will be participating in conference sessions. Participants who have filed an idea will have an opportunity to present it and ask for help. Coordinate with your local host in order to make sure that you have a slot.

We need your help! edit

We need hosts for local events and help writing documentation about how to create a Wikimedia Labs account and getting access to datasets. If you'd like to get involved, feel free to be bold or contact the event organizers.

Event organizers edit

For any questions about the event (including volunteering for a local meetup), you can reach us at wrh@wikimedia.org or leave a message on the talk page.