GALILEO Masters 2004/proportion of geo-specific articles

To determine the number of geo-specific articles you have to sample articles and categorize them. You can easily use the "Random page" function for this purpose.

First sample

edit

I sampled 60 articles of the German Wikipedia and graded them into three categories:

Result

edit

1. locations like de:Verkehrs- und Tarifverbund Stuttgart, de:Lechtaler Alpen (area), de:Via Regia (road), de:Glandorf, de:Bergkamen, de:Großes Zeughaus Danzig (place): 9

2. articles that may have an indirect relation to a location

people, groups, organisations, films... like de:Germanen, de:Enercon, de:Michael de Larrabeiti: 14
other objects that may be connected to a location like de:Bockbier, de:Chow-Chow, de:Kauri, de:Cheerleading, de:SMS Pommern (Schiff): 6

3. articles without any relation to a specific location

astronomical objects like de:Artemis (Asteroid) that have a location but not at the earth: 3

Summary

edit
  • 9/60 locations
  • 20/60 articles that may have a relation to one ore more locations
  • 31/60 articles with no specific location at all

Conclusion

edit

The sample is pretty small so you can only conclude at a significance level of 95%:

  • between 5,96% and 24,04% (15%) of the articles are about locations
  • between 39,02% and 64,31% (51,67%) of the articles will not have geo-information connected to it
  • between 21,40% and 45,26% (33,33%) of the articles may be connected to locations in some way

Refering to the number of around 120.000 articles in the german wikipedia around 18.000 (at least 7.152) should have a location and 58.000 (46.824) may be connected to a locations in some way.

To give a value for location articles with an error of +-5% points you need at least a sample of 200 articles.

By the way: The sample shows that between 12,63% and 34,04% (23,33%) of all articles in the German wikipedia are about people, groups, organisations, films... (~28.000)