Requests for comment/Large scale language inaccuracies on the Scots Wikipedia/other wikis

Announcements

RationaleEdit

Originally this topic was opened with the focus on the actions of mainly one user on the Scots Wikipedia. His actions, albeit well-intentioned, nonetheless exposed a large vulnerability not just in the Scots Wiki, but small language Wikis in general. The conversation that followed quickly turned to the topic of the fate of the Wiki itself: How to best fix the damage that had been done, to what extent pruning is preferable to fixing, is the project even salvageable? And how can this sort of thing be prevented in the future? This conversation is currently still on-going and probably will be for a while. In large part as a result of the mainstream coverage this issue generated, several native Scots speakers have stepped up to the task to help fix and moderate the Wiki.

More broadly though, what happened raises questions about other small language Wikis. There are 300+ public Wikis, of which the vast majority are tiny, that might suffer from similar problems that come with having little to no oversight. If you’re someone who doesn’t understand the language a particular Wiki is written in, it’s impossible to tell for yourself. The idea has been raised to start a “Small Wiki Audit”, which would have people fluent in one of these languages assess the quality of the articles written there. This too is still a work in progress. --ReneeWrites (talk) 13:40, 28 August 2020 (UTC)

Previous rationale

Wiki user AmaryllisGardener has made a significant number of contributions on Wikipedia overall, of which the vast majority were done on the Scots Wikipedia. There he contributed over 27,000 articles, making up close to half of the total number of articles on that Wiki altogether.

The problem is that none of these articles were written in Scots. AmaryllisGardener does not know the language, as seen in for instance this exchange. Despite this he is treated as somewhat of an authority on the language, judging by the contributions on his talk page by people who don't know the language either.

However, the bigger problem is the tens of thousands of articles and edits that were done in an endagered language. The articles use US-en grammar instead of Scots grammar, and the English words are replaced with a Scottish translation (some of which were not correct, either). For words where the author couldn't find a Scottish equivalent, either English was used instead, or a new word was made up altogether (like "pheesicist", although in Amaryllis's defense, he did not create that word, another user who doesn't speak Scots did).

Huge parts of the Scots Wikipedia can not function as a resource because of this, and do active harm to the language it pretends to be written in. Scots is a struggling language, and having it replaced with the dressed-up skeleton of another language is cultural vandalism at an unprecedented scale.

For more discussion and commentary, see this thread on Reddit.

--ReneeWrites (talk) 21:18, 25 August 2020 (UTC)


Wikimedia UK statement, 26 August:

Daria Cybulska, director of programmes and evaluation at Wikimedia UK said: “We do not own or control the Scots-language Wikipedia, which as with all parts of the Wiki community, is edited and managed by volunteers.

“We are aware of the concerns that have emerged about the content of the Scots-language Wikipedia and are in touch with the Wikimedia Foundation and volunteer editor community to offer support in helping to ensure that these issues are addressed.

“We are exploring ways of supporting the existing Scots Wikipedia editor community, by offering help with editing training for newcomers, facilitating partnerships with authoritative language organisations and organising editing events to harness current interest and energy.” from The Guardian



Media Coverage

Integration with small wiki auditEdit

A possible solution to many of these issues could be organized through the small wiki audit. Zoozaz1 (talk) 03:24, 31 August 2020 (UTC)

Other wikisEdit

It can't just be AmaryllisGardener that's offended within Wikimedia projects, and it can't just be scots. We need to perhaps start to investigate that by taking a look at wikis for languages with small numbers speakers that have an inordinate number of pages. Some may be due to dedicated native-speaking editors, but there are sure to be some with this same problem (while I would hope, godwillingly, there aren't, it seems too probable that there are). SecretName101 (talk) 23:22, 30 August 2020 (UTC)

Anything under the creole section here may be incorrect. Zoozaz1 (talk) 00:47, 31 August 2020 (UTC)
It's also worth looking at the lisf of wikis with prolific contributors, put together by PiRSquared17, here. There's a lot of small wiki projects where over 50% of the contributions are the work of a single editor.192.76.8.79 01:20, 31 August 2020 (UTC)
At first glance at that list, there are some wikis with top contributors whose babel tags indicate low-fluency in the language. SecretName101 (talk) 01:53, 31 August 2020 (UTC)

Norfuk / Norfolk / PitkernEdit

Amaryllis has 800 edits on the Norfuk/Norfolk Wikipedia, a language with ~400 native speakers according to Enwiki, plenty of these edits/pages he made are still intact. no way any of these are accurate, and seeing that the wiki only has about 800 articles total and much of the other ones not by him also dont seem to be authentic i think it might be due for a nuking too. Yvzcvtp (talk) 20:07, 30 August 2020 (UTC)

Perhaps more concerning is that there isn't a single editor on the Pitkern wikipedia who's listed a babel level of greater than 1. [1] 192.76.8.79 20:47, 30 August 2020 (UTC)

The Norfuk/Pitkern wiki it seems to have a lot of issues identical to the scots wiki.
The request for new languages page is here. The entire proposed editing community (at least according to the list of Editors in the box) consists of a single native speaker. Surely this shouldn't have gotten through the language committee, even by the standards of 2005.
The native speaker who founded the wiki seems to have left around 2012.
The current administrator is Russian, and identifies as having a low level understanding of Norfolk, yet despite this they have created dozens and dozens of articles, and made thousands of edits.
A number of other editors on the wiki seem to have had a limited/questionable understanding of the language. The user Bobbbcat for example created the article on hawaii, despite not being sure what the correct translation of Hawaii is.
The entire wiki needs to be checked over by a native speaker, but give that there's only ~ 400 of them in existence I'm not sure how practical that is.192.76.8.79 00:04, 31 August 2020 (UTC)
@192.76.8.79: I'd e-mail the Pitcairn Islands government and the Norfolk Island government over that. WhisperToMe (talk) 00:48, 31 August 2020 (UTC)
This edition seems quite concerning. It's not clear if a separate Wikipedia is viable for this language, lacking both modern native speakers nor a significant written history to draw on. While contributions in Pitkern are welcome of course, this seems like it really might be better as something like a Wikibook? e.g. something like wikibooks:Danish, where an introduction to the language, vocabulary, significant differences, and so on could be written. A full encyclopedia might be too much to chew off. Any relevant articles on the edition - perhaps the ones that Pall Mall contributed to, such as pih:Pitkern Ailen, could be moved over to the Wikibook as examples of how to write in the language. SnowFire (talk) 03:23, 31 August 2020 (UTC)
Or just move it to Incubator so that editors still can edit the wiki and when, or if, a native speaker desides to step in it can move back? --Sabelöga (talk) 11:01, 5 September 2020 (UTC)
Sure, I'm just saying that if moved back to the Incubator or some other archive, I don't think it's likely this project will ever be viable, even if a native speaker came back. If a fluent volunteer wanted to help out, I'd suggest that they write a public-domain Wikibook on this language first, and an entire encyclopedia eighth or so. Only multiple fluent volunteers would raise this back to viability IMO, and I don't expect that to ever happen. SnowFire (talk) 01:43, 7 September 2020 (UTC)
However, I support the wiki closure, because everything said about it is true. Unfortunately, there isn't anyone on wiki who does speak Norfuk/Pitkern for more than 7-8 years (and I'm not sure if they knew it as a native language), and it seems to be that noone would come. Coconutic (talk) 18:47, 16 October 2020 (UTC)

Haitian CreoleEdit

This seems worrying to me, along with the fact that the discussion on the main page and the kafe are in English and French, not Haitian Creole. There seems to be a systemic problem with language that are vaguely similar to/descended from a vastly more popular one; The small wiki audit, if implemented, could hopefully catch much of this. Zoozaz1 (talk) 23:21, 30 August 2020 (UTC)

@Gilles2014: How well do you know Haitian Creole? –MJLTalk 03:32, 31 August 2020 (UTC)
He wrote "Mwen pa palé Krelo aisyen mè mwen Kompran" ("I don't speak Haitian Creole, but I understand it") on his talk page in 2016. PiRSquared17 (talk) 05:58, 31 August 2020 (UTC)
Oh.. well he's the only admin on the project, so that's pretty awkward. Htwiki might want to look into that. –MJLTalk 07:28, 31 August 2020 (UTC)
Zoozaz1, most educated Haitians speak French fluently as a matter of course and English well; quite a few also speak Spanish. All four of those languages are accepted on that wiki. I've tried to help out there off and on, and I've seen a few editors who are likely native speakers and a few more that know enough to write simple sentences. From the discussions I've seen, there seems to be more than one "correct" version of Haitian Creole, with at least two different spelling standards. (My own contribution has been primarily fixing wikitext, which doesn't require knowing any of the local language.) WhatamIdoing (talk) 21:16, 1 September 2020 (UTC)
I think something like Haitian Creole is exactly the language that the small wiki audit should investigate. It could just be that for ease of discussion many of the conversations aren't in the language, but it could also be because many/some contributors of articles aren't native speakers of the language, and the small wiki audit could (hopefully) help in figuring out which one it is. Zoozaz1 (talk) 21:36, 1 September 2020 (UTC)
  • The most prolific contributor to htwiki is a native Haitian speaker. PiRSquared17 (talk) 05:58, 31 August 2020 (UTC)
    • He hasn't been active for years, and most of his contributions were the creation of articles about US towns by unflagged bot (very similar to the English Wikipedia's handling of the US Census). WhatamIdoing (talk) 21:09, 1 September 2020 (UTC)

This page (mit-ayiti.net) is worth a look with regard to ht.wp. SashiRolls (talk) 03:24, 3 October 2020 (UTC)

Jamaican PatoisEdit

Jamaican Patois seems to be afflicted with a similar problem. See here, one of the most recent discussions on the main page. As a side note, the babel categories don't seem to be working for that wiki. Could anyone try to fix that? Zoozaz1 (talk) 23:34, 30 August 2020 (UTC)

LadinoEdit

Seeing how little people know Ladino compared to how many know Spanish on Ladino wikipedia seems problematic. Zoozaz1 (talk) 00:47, 31 August 2020 (UTC)

@Zoozaz1: I'd check if there are Ladino associations in Greece and Turkey. They might be able to monitor those Wikis. WhisperToMe (talk) 00:48, 31 August 2020 (UTC)

De-index small wikis by default?Edit

A huge aspect of the tragedy on Scots Wikipedia is the fact that a bastardized parody of Scots has pervaded search results and possibly even affected the Scots public curriculum. Given the inherent challenges of quality control for wikis under a certain size and active editorship, should we de-index them from search engines by default until they meet some agreed upon threshold criteria? Axem Titanium (talk) 07:02, 4 September 2020 (UTC)

How would that help native speakers finding the wiki though? --Sabelöga (talk) 11:02, 5 September 2020 (UTC)
How do native speakers ever find wikis? The Scots Wikipedia has been up for a dozen years and only now did a Scots speaker notice it and bring its sorry state to attention. I think leaving the interwiki language links up in the sidebar while de-indexing it from search engines is plenty of exposure for native speakers. They could also find their language from the top page of Wikipedia.org. Axem Titanium (talk) 09:58, 6 September 2020 (UTC)
That's not what happened. AG had been criticized by native speakers many years ago but brushed off or ignored that criticism but there was no one (no community, no admin) who stopped him when he continued nonetheless writing in a fake language. That drove away competent speakers. --Janwo (talk) 08:19, 8 September 2020 (UTC)

Malagasy WiktionaryEdit

theres a lot of wikis that are largely bot made which is an issue, this wiki however in particular ive heard has a large amount of inaccurate articles = anything to be done? Yvzcvtp (talk) 14:16, 8 September 2020 (UTC)

See also Talk:Small_wiki_audit#The_problem_with_mg.wikt (--Janwo (talk) 11:31, 10 September 2020 (UTC))
And also Small wiki audit/Malagasy Wiktionary RexSueciae (talk) 02:41, 16 September 2020 (UTC)

Frisian WikipediasEdit

  1. Saterland Frisian Wikipedia
  2. West Frisian Wikipedia
  3. North Frisian Wikipedia

(1) Saterland Frisian has roughly 2,000 speakers left. According to one of the three admins, they are somewhat an expert in the language (even has their own article on them). (2) Actually, seems pretty fine. (3) Mostly the work of a single user: Murma174. –MJLTalk 07:25, 10 September 2020 (UTC)

With regards to the Seeltersk wiki; Pyt actually compiled the dictionary on the language, so we can assume has enough expertise. Concerning Murma174, this user claims to be a native speaker. --OosWesThoesBes (talk) 10:35, 21 September 2020 (UTC)

Asturian WikipediaEdit

A redditor notes that the Asturian Wikipedia is basically written in Spanish; see [3]. Plantdrew (talk) 14:55, 11 September 2020 (UTC)