Talk:List of Wikipedias by sample of articles/Archives/2009

Active discussions

A question

Just to understand, how it works: during the last update the numbers for the Ossetic language have changed, but there have not appeared numbers for the large articles, that we had created during January, most notably the one about Nicaragua (probably in the list of 1000 as an acrticle about a country). The link to this rating is usually in the texts we use for attracting people to Wikipedia, as it emphasizes better practices than creating lots of unnecessary stubs, so I would like to understand the system better. Thank you for the great work you do. Amikeco 13:14, 3 February 2009 (UTC)

Nicaragua is not in the List of articles every Wikipedia should have. So, it unfortunately doesn't affect the score. Also, it is very important to put the interwiki links from the English articles to the Ossetic articles for them to be counted. --MarsRover 20:10, 3 February 2009 (UTC)
Thank you for your kind reply. I've found the base of the misunderstanding: the English list of "most wanted" articles has been changed since the fork we are using was made. The English text is „there should be articles on most or all of the 243 countries.. However, for the smaller Wikipedias, some of the more high-priority countries to have articles on are..“ (and a small list follows), while in our version (once translated from the Russian one) it is just one line without any „though“s: at least one sentence on every country from the list of countries (linked to the full list). That's why our latest changes to articles Nicaragua, Guyana and Cameroon had no effect here, now I see. Bad luck, this month our featured country is Bulgaria, also out of the list :) Thank you again, now it's much clearer. Amikeco

Wikipedia no page

There is a strange error in the missing articles page. For example, in the case of catalan, it says: "en:Pablo Picasso wikipedia.NoPage: Illegal character in ca:Fitxer:Pablo picasso.jpgthumbrightPablo Picasso!" --Meldor 22:05, 3 February 2009 (UTC)

The article looks fine. I guess it was some sort of network problem when reading the page. --MarsRover 02:14, 4 February 2009 (UTC)

Problematic translations, or: Bad luck, if your language is too rich of words


I accidentally found the article on wikipedia:behavior. I sincerly to not find a translation to any of the Ripuarian languages. Once every few months, I try to pick a missing topic and write a Ripuarian stub or better for it. For "behaviour" I believe, it would have to be a disambiguation, but of a non-word.

We have a bunch of possible translations for behaviour. The English version of the article says: "Behavior … refers to the actions or reactions of an object or organism, usually in [:wikipedia:[Relational theory|relation]] to the environment. Behavior can be conscious or unconscious, overt or covert, and voluntary or involuntary." Well said. This is a wide field, and some of our specific translations (briefly) are:

  • Benimm - Social behavior, or conduct, of humans (and sometimes domestizided animal and pets)
  • Benämme - similar, but seen from a different viewpoint, also well-behavedess
  • Donn - willfull action
  • Refläx - unconscious involuntary reaction
  • Räsong - obedient behaviour (also including drill, and (unwillfull) subordinaton)
  • Hüre - listening, including obedience, and reactive or responsive behaviour
  • Jehorsche - another set of flavours of obedience
  • Kujonneer - willfull treatment or mistreatment of subordinates or dependants, pestering, etc.
  • Ajeere - to act, both consciously and unconsciously, visibly, in a noted way.
  • Hanteere - to manually do something, to handle, etc. (There are dozens of excessively more specific words describing human (but not necessarily only human) handlings and similar activities. They all comprise some sort of behaviour)

I'll stop here. The list is by far not complete. We lack a concept, and a word for it, summarizing all these interesting aspects into a topic such as "behavior". If we talk on "behavior" as in "behaviorism", we use loanwords. Those are, of course, much more specific than their originals.

One of the downside of the possible translatons is that they are not very systematic. Mostly, we cannot take one ot the qualifiers of the English description and then have, say, two translations. "Behaviour of an organism" and "behaviour of an object", do not translate either. So, we cannot easily make a list, and say: the sum of all this is behavior, anything else is not. It would be too fuzzy.

There are lot of afterthoughts. The Sapir–Whorf hypothesis comes to mind, and more philosophical questions like whether or not a concept, such as behavior, is useful and valid, even if a language has no word for it; and can people who lack the word indeed not use the associated concept in their thoughts, as Wittgenstein suggests?

Likely, there are more samples of words missing for translations to various languages. I just wanted to exemplify one a little bit, giving some background data. --Purodha Blissenbach 11:23, 2 March 2009 (UTC)

I agree that en:Behavior is a problematic entry. But I think this should be discussed at Talk:List of articles every Wikipedia should have. --Yerpo 12:00, 2 March 2009 (UTC)

1001 articles?

In each row, absent + stubs + short + long = 1001 articles, not 1000 as claimed at the top of the page! This is most easily seen for the last entries where the site couldn't be contacted, but it applies to all rows. Either the total of 1000 articles given at the top of the page is incorrect, or there is a bug in the script somewhere (counting from 0?). Just thought I'd draw this to your attention... --HappyDog 11:16, 24 April 2009 (UTC)

No, its not a bug. We really do now have 1001 articles. See Talk:List of articles every Wikipedia should have#Now have 1001 articles. --MarsRover 19:37, 24 April 2009 (UTC)
OK - I have updated the info at the top of the page. --HappyDog 00:35, 27 April 2009 (UTC)

Language names

Where does this list get its language names from? Is it the same list that is used across Wikimedia? If so where does it get them from? Thanks -- 15:59, 3 May 2009 (UTC)

The language names in the score table originated from the "Local Language" name from the List of Wikipedias table. These names were copied into the source code (List of Wikipedias by sample of articles/Source about a year ago so they might be out of date. --MarsRover 18:25, 3 May 2009 (UTC)

Problem with :sl

@MarsRover: there was a rogue redirect syntax in the article about Sufism on slwiki (sl:Sufizem), so the script returned an error - see missing articles list for sl. I fixed the article now, so could you please run this part again? If you can't do it without running the whole thing again, no problem, it can wait a month. --Yerpo 08:35, 5 May 2009 (UTC)

I ran it again and updated the table. The sl.wp score remains the same since the previous "error" was not counted against the score. --MarsRover 22:04, 5 May 2009 (UTC)
Thanks. I thought that it caused the script to count sl:Sufizem as missing. --Yerpo 07:01, 6 May 2009 (UTC)


How come ml article corresponding to en:Francis of Assissi got demoted from being an "article" in between April and beginning of May although there were no significant edits? A bug? --Jacob 02:37, 9 May 2009 (UTC)

The article list used for scoring changed in mid-April. The en:Francis of Assissi article was replaced in the list with en:Laozi. (See Talk:List of articles every Wikipedia should have#Replacement suggestions). --MarsRover 05:12, 9 May 2009 (UTC)

Change to calculation

The Interwikis links in each article are excluded from the calculation of article size. This has always been that case in the formula. A few people noticed an issue in the calculation where it only excluded interwiki link's with 2 letter language codes and it didn't exclude the carriage return and linefeed after the interwiki link. I changed the script to now do that. It seems more in spirit with the original goal. This means the articles will each have less characters. I didn't think this would have a big effect on the scores but of 12 wikis I tested 10 had scores that went down. If you have a lot of articles barely at the 10k or 30k border or your language weight is high, you be hit hardest. Shall I keep that change or revert back to the original way? --MarsRover 16:58, 30 May 2009 (UTC)

I think you should keep the changes. As you say, more in the spirit. If there's somebody that will start pulling his hair out for his language losing score, well, that's his problem. It's not like any wiki is getting better or worse treatment than all the others. --Yerpo 21:10, 30 May 2009 (UTC)
*Pulls hair* --Yerpo 06:38, 3 June 2009 (UTC)
Hey, you improved in rank. It was a good change :-) --MarsRover 06:46, 3 June 2009 (UTC)

average or median?

I would think that the median size of the article would be a more interesting measure of size than is the average or number of articles larger than x bytes, as it is less dependent on fringe effects (add 100 bytes to a hundred articles and make the stats look *much* better) and independent on a single mega-article. I'm not suggesting removing the "number of articles larger than x bytes), but possibly (if there would be too many numbers and if all can't be squeezed in) to replace the average size. \Mike 11:27, 7 August 2009 (UTC)

Good idea. I would strongly support replacing average with median. I imagine that in many Wikipedias the listed articles don't have a normal distribution of size and in those that do, there won't be much change anyway. Number of articles larger than x bytes should stay because they are useful in a different way. --Yerpo 07:51, 23 October 2009 (UTC)
I'd oppose it -- if you improved the bottom 200-300 articles but the relative ranks stayed relatively unchanged (as they often do), you'd get a median that didn't change at all in response to the work that's been done. Almafeta 00:33, 24 October 2009 (UTC)
I added both average and median so you can see the difference. Sort of interesting. Any more opinions on whether we should have average, median or both columns? --MarsRover 06:59, 24 October 2009 (UTC)
I'd like both if it's not too much of a hassle. As Almafeta said, when I improve the bottom ones, I'd like to see something change. but as Mike said, median is not very susceptible to mega-articles. A side note, if you're going to keep both, it might be better to replace "average" with "mean". ...Aurora... 11:29, 26 October 2009 (UTC)
I prefer the median. When the bottom ones are improved, you can see the change in all the columns (stubs become articles, the score and the growth increases, ...), whereas the only column where you can see the real approximate size of how big articles really are is the "median" column. --Meldor 14:29, 27 November 2009 (UTC)

Calculating language weight

How is the language weight calculated? The reason I ask is... well, to be honest, nothing terribly important; I'd just like to see how Lojban fares... c.c; Almafeta 23:18, 6 October 2009 (UTC)

Click on the link that is the word "Weight" in the table. Everything is explained there. --Yerpo 05:18, 7 October 2009 (UTC)
Thanks... That gives Lojban a weight of 1.2 (1179/971), weighing the english standard version against the lojban version. And Interlingua has a weight of (1179/1167) 1.0. Almafeta 06:37, 23 October 2009 (UTC)
You can tell MarsRover to update the script (the default value has been used for Lojban until now), but I'm sure he'll see this discussion. This Omniglot page link you gave looks useful for determining weights for a couple of other languages as well. --Yerpo 06:53, 23 October 2009 (UTC)

Extended list

For my own entertainment, I ran the stats for the wikis by the extended list of articles. It's over at User:Almafeta/List_of_Wikipedias_by_extended_list_of_articles if you want to take a look. (From here on out, I'm going to be using that script to get an idea of my wiki's performance in more restricted lists on specific topics.)

My favored wiki's absolute value dropped by almost half, but their relative rank rose a few notches. How does yours do? Almafeta 02:26, 30 October 2009 (UTC)

Similar for slwiki, only the rank remained the same. Not surprising for the list that includes such obscure topics as magicians (25), racehorses (20) and American cars (some 40). --Yerpo 14:12, 30 October 2009 (UTC)
Now that the catalan wikipedia is the first one in the ranking, it is looking for higher goals. Could there be some consensus in making an extended list in meta? I think that the extended list linked above is clearly unusable, as there are surely more important articles than a single article about a racehorse. However, there are some important articles concerning Philosophy and Mathematics, for example, which don't appear in this simple list, and which would be useful to work at. --Meldor 13:28, 7 January 2010 (UTC)
I'm sure people will take interest if you start the list here. For starters, I suggest to throw out the obviously irrelevant stuff and begin adding topics you deem important. Yerpo 19:03, 7 January 2010 (UTC)
Return to "List of Wikipedias by sample of articles/Archives/2009" page.