Research talk:Automated classification of article importance/Work log/2017-03-27

Monday, March 27, 2017 edit

Today I will continue my WPMED conversation and training and evaluation of a classifier using the clickstream data.

WPMED categories of Low-importance edit

Certain types of articles in WPMED are automatically Low-importance. Per their importance scale it is at least these: very rare diseases, lesser-known medical signs, equipment, hospitals, individuals, historical information, publications, laws, investigational drugs, detailed genetic and physiological information, and obscure anatomical features. Or per our conversation on their talk page: "people, books, laws, journals, organizations". Can we easily identify some of these categories? Let's look at some examples:

Title Potential categories Wikidata
Patient Protection and Affordable Care Act Anything matching "legislation"? instance of "legislation"
Alexander Fleming "People from…", "People educated at…", and several others instance of "human"
Health Insurance Portability and Accountability Act Matches against "legislation"? Nothing
Medicare (United States) Not sure instance of "government program", "publicly funded health care", and "health insurance in the United States"
Benjamin Rush Births, deaths, "People from…", etc… instance of "human"
JAMA (journal) "…medical journals", "Publications established in…" instance of "scientific journal"
Merck & Co. "Companies listed on…", "Companies based in…", "Pharmaceutical companies…: instance of "business enterprise"

Individuals should be feasible, not sure about the others. I'll dig more into the Low-importance mispredictions to see what I can find.

Return to "Automated classification of article importance/Work log/2017-03-27" page.