Research:Converting Wikidata into a lexical resource and knowledge database in Arabic dialects
This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.
Nowadays, all Arab states are characterized by a kind of diglossia[1] · [2]. In fact, although Modern Standard Arabic is the main language used in official life, work, administration and education, it is not spontaneously used by Arab people in daily life communication[1]. Instead, they use Arabic dialects for such a purpose[1]. Although these dialects are considered as varieties of Arabic[1], they differ from each other and from Modern Standard Arabic at morphological, phonological, orthographic and semantic scale[3] · [4].
Examples[5]:
- قيدت آش باش نشري في الكاغذ و مشيت للعطار ياخي ما لقيتش اللي نلوج عليه is commonly understood by speakers of Tunisian Arabic. However, it is intelligible for an Arab person from the Middle East although it involves no loanword from European languages. Just for information, it becomes "لقد سجلت ما سأشتريه في الورقة و ذهبت إلى السوق و لكنني لم أجد ما أبحث عنه." in Modern Standard Arabic and "I wrote on the sheet all the things that I should buy and I went to the market. However, I did not find what I needed."...
- Many false friends between the varieties of Arabic: بندق means "pinenut" in Tunisian and "hazelnut" in Modern Standard Arabic, بطيخ means "melon" in Tunisian and "watermelon" in Modern Standard Arabic...
The existence of such significant linguistic differences between Arabic dialects and Modern Standard Arabic can let users misunderstand an important (even if slight) part of what they read in Arabic Wikipedia.
To solve this problem, we propose to translate the sum of all human knowledge to Arabic dialects by adding labels, descriptions and aliases in these dialects to all Wikdata entities. We propose as well to translate Wikidata's interface into Arabic dialects so that users can reach the information they need in Wikidata without having to be proficient in Modern Standard Arabic or in a foreign language.
Methods
editAdding labels and aliases: To add labels in a given Arabic dialect to Wikidata entities, I will use Wikidata query service to retrieve labels in Modern Standard Arabic for Wikidata entities.
After, I will process the retrieved data using Microsoft Office Excel 2007. I will keep the labels that are the same in the Arabic dialect and just change the ones that are different with reference to the following resources:
- Boukef, M. K. (1986). Médecine traditionnelle et pharmacopée: Les plantes dans la médecine traditionnelle tunisienne. Agence de coopération culturelle et technique.
- Ben Abdelkader, R. (1977). Peace Corps English-Tunisian Arabic Dictionary. Peace Corps.
- Blog posts, webpages and corpuses:
Finally, I will add the processed data to Wikidata using QuickStatements.
Adding descriptions: To add a given description in a given Arabic dialect to Wikidata entities, I will use Wikidata query service to find the entities that has the patterns that correspond to the description.
After, I will process the retrieved data using Microsoft Office Excel 2007 by just adding the required description.
Finally, I will add the processed data to Wikidata using QuickStatements.
Translating the Mediawiki system messages: Mediawiki system messages used in Wikidata's interface are translated into Arabic dialects using Translatewiki.
Results
editDuring this research project,
- We succeeded to add labels, descriptions and aliases in Arabic dialects to many Wikidata entities and properties.
- We translated Mediawiki system messages so that the Wikidata interface can be seen in Arabic dialects.
Added labels
editCategory | Modern Standard Arabic | Tunisian, Arabic Script |
---|---|---|
Iraqi people | List | List |
Syrian people | List | List |
Languages | List | List |
Countries | List | List |
Emirati people | List | List |
Plants | List | List |
Fruits | List | List |
Colours | List | List |
Mediawiki translation
editLanguage | Status |
---|---|
Algerian Arabic | Statistics |
Egyptian Arabic | Statistics |
Moroccan Arabic | Statistics |
Tunisian, Arabic Script | Statistics |
Tunisian, Latin Script | Statistics |
References
edit- ↑ a b c d Mohamed, Maamouri, (1998). "Language Education and Human Development: Arabic Diglossia and Its Impact on the Quality of Education in the Arab Region.".
- ↑ Zughoul, Muhammad Raji (1980). "Diglossia in Arabic: Investigating Solutions". Anthropological Linguistics 22 (5): 201–217.
- ↑ "Encyclopedia of Arabic Language and Linguistics - Brill Reference". referenceworks.brillonline.com. Retrieved 2018-03-10.
- ↑ Aguadé, Jordi (2006-03-05). "Writing dialect in Morocco". EDNA, Estudios de dialectología norteafricana y andalusí (in es-ES) 10: 253–274. ISSN 1187-7968 Check
|issn=
value (help). - ↑ Turki, Houcemeddine; Vrandečić, Denny; Hamdi, Helmi; Adel, Imed (2017-10-30). Using WikiData as a multi-lingual multi-dialectal dictionary for Arabic dialects.