Research:Claim Selection for WikiGrok

Contact

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

WikiGrok is an experimental MediaWiki feature used by the Mobile team to increase user engagement in mobile devices. The goal in WikiGrok is to provide an opportunity for engagement to users who are willing to lightly engage in contributions to Wikipedia over mobile devices.

To lower the threshold for participation, WikiGrok should be equipped with millions of questions that are easy to answer by humans on mobile devices and yet, not easy to answer by machines. This research aims to propose a methodology for finding a series of questions that the Mobile team can ask users for experimentation in the short-run as well as a methodology for finding such questions more systematically and in the long-run, and based on the result of the short-term experimentation.

Questions based on Wikidata Claims

We started by identifying the number of English Wikipedia articles (items) in the class tree of person, organization, event, work, place, and term, as some of the main Wikidata classes. The result is shown in the following table:

class	Item code	no. items
person	Q215627	1259119
organization	Q43229	203796
event	Q1190554	47048
place	Q3389680	7946
term	Q1969448	22122

A sample API query used to compute the above number for class person is:

http://wdq.wmflabs.org/api?q=claim[31:%28tree[215627][][279]%29]%20AND%20link[enwiki]&noitems=1

Given that "person" has the highest number of articles affected, we decided to focus on questions related to human, Q5, an instance of person. To this end, we considered all the items in class of human and their corresponding claims. We then counted the number of co-occurances of claims and identified all those claims that co-occure more than 1000 times in English Wikipedia. (Note that the choice of 1000 is arbitrary. At the point of doing the research, such a threshold provides us with 794 claim pairs to consider, excluding pairs that include instance of, P31. Using the list, we identified potential questions of interest. For example, we know that on English Wikipedia the co-occurance of politician (occupation) and lawyer (occupation) is 5900 times. One natural question to ask users on all politician pages is "Is this person a lawyer?". This is a question that a machine cannot answer easily, while a human reading the Wikipedia page of a politician should be able to answer relatively easily based on the information already available in the page or previous knowledge.