Research talk:Task recommendations/Work log/2014-07-17
Thursday, July 17th
editToday, I want to get three things done:
- Gather a sample of the most common "returnTo" pages in the main namespace that are edited
- For each returnTo page, gather 15 similar articles that implement a set of filters (that I'll describe below)
Filters
edit- article_length > 0
- Sanity check that it's not blank
- page_namespace == 0
- Main namespace (article)
- input_title != output_title
- Don't return the same page we're searching with
- Filter en:Category:Living people
- no biographies of living people -- too difficult for newbies to edit without being reverted
Sample of returnTos
editI originally thought that we could just sample the top N returnTo pages on user registrations, but now I realize that I'll need to sample from the whole set. If we only look at the most common articles titles, we could miss a very large proportion of articles that aren't returned to frequently. Stupid long tail distributions. No worries. It shouldn't be too hard to sample.
OK so, I'm going to need to filter returnTos. I don't want any returnTos that are not in the main namespace. I could probably also filter out BLPs, but I'm not sure that would really matter. I should probably filter returnTos to those that were edited by the newly registered user within 24 hours or so. Query time. --Halfak (WMF) (talk) 18:53, 17 July 2014 (UTC)