Research talk:Towards Modeling Citation Quality

Latest comment: 4 years ago by Jmorgan (WMF) in topic topic allocation

topic allocation edit

Hi, this is an interesting project, but wouldn't it be better to use the wikiproject tags from the talkpage rather than using Scoring Platform's draftopic tool? Draftopic is designed to find possible WikiProjects to new draft articles, but that can't compete for accuracy with the actual WikiProject tags on talkpages. WereSpielChequers (talk) 21:26, 5 September 2019 (UTC)Reply

WereSpielChequers good question. My take: there are not WikiProject tags for all articles, and many articles are claimed by multiple projects, with no obvious "primary" project (and hence, primary topic). If the goal is to come up with a more-or-less canonical topic for any given article, then the existing WP tags are better used as training data than applied directly. Cheers, Jmorgan (WMF) (talk) 15:24, 6 September 2019 (UTC)Reply
For what it's worth, I've been looking at the topics in the enwiki citation dataset that I downloaded from here https://figshare.com/articles/Accessibility_and_topics_of_citations_with_identifiers_in_Wikipedia/6819710 and I've been finding them very confusing. My focus has been math-related sources--there are surprisingly few citations tagged STEM.mathematics in the dataset, and a number of those tagged as that are not math-related (e.g. Julie Kennedy "Katherine Mansfield In Picton" found on page Picton,_New_Zealand). Also, searching the dataset by page name (e.g., "Euclidean algorithm" or "Linear algebra"), brings up a number of citations, all tagged for other topics (e.g., Geography.Europe, Culture.Performing arts, etc).
Return to "Towards Modeling Citation Quality" page.