Research talk:Measuring article importance
Comparing view rate and inlinks with WikiProject importance
editThe plots above summarize my progress thus far. Data can be accessed here: [1] --Halfak (WMF) (talk) 20:16, 5 November 2014 (UTC)
Navboxes
editHalfak: Some time ago I created a script (which seems to be broken right now =/) to plot a graph relating the number of backlinks and the size of articles in a category. One of the things I noticed is that navboxes sometimes contribute for many of the links to a given article, so for articles on similar subjects the number of links oscillates near the number of links in the navbox for that subject. I uploaded the other graphs I had saved. Helder 15:23, 30 January 2015 (UTC)
- Helder, you're right and I think that this could cause substantial issues. I have been doing some work to extract "organic" inlinks -- inlinks that are not automatically added by templates. This would exclude links from navboxes. I do this by processing the XML dumps and extracting wikilinks from the page text. I actually just got a good version of this dataset together so I should have plots of "all inlinks" vs. "organic inlinks" soon. I'll ping when they are ready. --Halfak (WMF) (talk) 17:07, 30 January 2015 (UTC)
Notability
editHalfak, is it the case that an important article (with the measure you want to define) will be considered notable? Does that mean the reverse is also true? I'm asking to figure out how much we can use your findings in increasing article coverage. Thanks! --LZia (WMF) (talk) 17:15, 6 July 2015 (UTC)
- Notability is a complex topic! So, just within English Wikipedia, there are varying criteria for notability based on subject matter (e.g. academics, books, web content, etc.). However, there is a sort of common denominator -- which is the "General Notability Guideline". The TL;DR: of this is that there is "significant", "independent" coverage of the topic in "reliable" sources. So, I'd imagine that a "notability" detector would want to search for coverage in reliable sources. It seems that this minimum requirement for notability sets a lower-bound for topics that could ever be covered in Wikipedia given it's citation/source-heavy focus.
- This was the plan of an IEG funded project I was involved in: Grants:IEG/Automated_Notability_Detection Regretfully, we didn't get that project off the ground yet. We got as far as generating some representative samples of new article creations and automatically identifying which subject-specific guidelines might apply. I'm sure Bluma.Gelley would be happy to share or collaborate on that work.
- Now, it seems like you are asking another of two potential questions:
- Is there a minimum level of importance necessary to be considered notable? or maybe a sufficient level?
- Do notability and importance correlate strongly?
- For #1, I think the answer is clearly "no" in the case of Wikipedia. As I have been approaching 'importance', I would argue there are certainly articles of minuscule importance that "belong" in the encyclopedia should someone wish to include them.
- For #2, I think the lines are very blurry. Many of the measurement proxies I'd like to use to look at importance would also suggest notability. However, I have not looked at this before, so it is hard to say. Regretfully, I only know of judgement made at the threshold of notable/not-notable. There's no notion of "more notable or less notable" that is codified in a way we might examine easily. --Halfak (WMF) (talk) 22:51, 6 July 2015 (UTC)
Notability is defined arbitrarily, but noteworthiness is likely to be proportional to overall importance. Models designed to capture importance only within WikiProject scales will not reflect global importance measures and so will not be proportional to noteworthiness. EllenCT (talk) 04:18, 19 July 2016 (UTC)
Overall scale please
editI recommend that models be built to measure importance in general, instead of within specific WikiProject scales. The inherently subjective nature of overall importance should be addressed through the use of mean opinion scores. EllenCT (talk) 04:18, 19 July 2016 (UTC)