Research:Claim Selection for WikiGrok

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

WikiGrok is an experimental MediaWiki feature used by the Mobile team to increase user engagement in mobile devices. The goal in WikiGrok is to provide an opportunity for engagement to users who are willing to lightly engage in contributions to Wikipedia over mobile devices.

To lower the threshold for participation, WikiGrok should be equipped with millions of questions that are easy to answer by humans on mobile devices and yet, not easy to answer by machines. This research aims to propose a methodology for finding a series of questions that the Mobile team can ask users for experimentation in the short-run as well as a methodology for finding such questions more systematically and in the long-run, and based on the result of the short-term experimentation.

Questions based on Wikidata Claims edit

We started by identifying the number of English Wikipedia articles (items) in the class tree of person, organization, event, work, place, and term, as some of the main Wikidata classes. The result is shown in the following table:

class Item code no. items
person Q215627 1259119
organization Q43229 203796
event Q1190554 47048
place Q3389680 7946
term Q1969448 22122

A sample API query used to compute the above number for class person is:

http://wdq.wmflabs.org/api?q=claim[31:%28tree[215627][][279]%29]%20AND%20link[enwiki]&noitems=1

Given that "person" has the highest number of articles affected, we decided to focus on questions related to human, Q5, an instance of person. To this end, we considered all the items in class of human and their corresponding claims. We then counted the number of co-occurances of claims and identified all those claims that co-occure more than 1000 times in English Wikipedia. (Note that the choice of 1000 is arbitrary. At the point of doing the research, such a threshold provides us with 794 claim pairs to consider, excluding pairs that include instance of, P31. Using the list, we identified potential questions of interest. For example, we know that on English Wikipedia the co-occurance of politician (occupation) and lawyer (occupation) is 5900 times. One natural question to ask users on all politician pages is "Is this person a lawyer?". This is a question that a machine cannot answer easily, while a human reading the Wikipedia page of a politician should be able to answer relatively easily based on the information already available in the page or previous knowledge.