Duration:  2020-December – 2021-February
Wikipédia en bengali, fossé des genres, croissance de Wikipédia, impact de Wikipédia

The Bengali Wikipedia has recently crossed the milestone of 100,000 articles after a journey of almost 17 years in December 2020. In this journey, the Bengali language edition of the world's largest encyclopedia has experienced multiple changes with a promising increase in the overall performance considering the growth of community members and content. This paper analyzes the various associating factors throughout this journey including the number of active editors, number of content pages, pageview, etc. along with the connection to outreach activities with these parameters. The gender gap has been a worldwide problem and is quite prevalent in Bengali Wikipedia as well, which seems to be unchanged over the years and consequentially, leaving a conspicuous disparity in the movement. The paper inspects the present scenario of Bengali Wikipedia through quantitative factors with a relative comparison with other regional languages.

## Introduction

The idea of a web-based free encyclopedia came into being in January 2001 through the formation of English Wikipedia, and soon it spread in various languages due to its high impact and 'anybody can contribute' nature.[1] On 27 January 2004, the largest online encyclopedia, Wikipedia, started its journey in the Bengali language. Since the very beginning, English Wikipedia has experienced an accelerating growth with high accuracy,[2][3][4] but the growth rate has not been the same for all the language editions of Wikipedia around the world. Small Wikipedias like Bengali one has seen increased traffic in a comparatively more slow-going manner over the years. On 25 December 2020, Bengali Wikipedia crossed the milestone of 100,000 articles after almost 16 years and 11 months' journey.[5] It is to be noted that although it took more than 13 years to create the first 50,000 articles,[6] the next 50,000 were created within the succeeding 3 and a half years. So, the growth performance is progressive and encouraging, which solicits a dedicated study to analyze the status of Bengali Wikipedia along this whole journey and what factors contributed the most, what essential factors fell behind. In a platform like Wikipedia, where the content is produced by the community, i.e., its dedicated volunteers,[7][8][9] the rate of new content creation, the role of active editors, etc. are considered vital parameters [10][11] and awaits logical reasons to find out what is bringing positive or negative effect on them. Different outreach activities including contests, edit-a-thons, and other sorts of campaigns are being organized from time to time, and the impact of these events are also major issues. Pageview is another important parameter to get the picture of people's interest and position of Bengali Wikipedia in the web trend world.[12][13] In this paper, these cases are examined quantitatively concerning the growth of new content, active editors, new users, etc. over the years.

At the initial stage of Wikipedia, growth was prioritized the most.[14] Later, it is noticed that the gender gap is significantly prevalent in the existing Wikipedia and its associating platforms in terms of contribution and participation,[15] although the rate is normal in terms of reading Wikipedia.[16] Bengali Wikipedia is also carrying the strong contrast of the participation from male and female contributors, leading towards an obvious polarized impact on the content and overall perspective of Wikipedia.[17][18] Being a major part of the knowledge ecosystem of the current world, it is imperative to maintain balance and become truly diverse by reducing this gap. It was found in 2011 that 90% of the editors of Wikipedia expressed their gender as male.[19] A study of 2013 found that among the users in Bengali Wikipedia, only 4.09% identified themselves as female as per their preference selection.[20] The sample of the study comprised the users registered in 2010-March 2013 timeline. Some later studies also supported the result that the disparity is strongly prevalent in the global movement.[21][22] In this paper, an analysis of the current situation of the gender gap in Bengali Wikipedia is presented with respect to the number of female contributors, their growth over the years, percentage in various user groups, and a relative position among other Indic languages.

## Données

Wikimedia Statistics contains publicly-available, open-licensed data about Wikimedia projects based on different metrics, from where the data utilized in this paper has mostly been taken including the increase of total content pages, the gradual growth of active editors, registered users, pageviews, etc. in the timeline starting from the birth of Bengali Wikipedia (January 2004) to December 2020, when it crossed the milestone of 100,000 articles. The Wiki Comparison Statistics from the Product Analytics team of Wikimedia Foundation (WMF) is used for mobile edits and related information up to December 2020. For gender-related information, quarries are performed where the available information is extracted from the user preference section. The applications for certain permissions in Bengali Wikipedia were investigated along with the quarry results to get the number of female contributors among the users expressing gender according to the user preference section.

## Résultats quantitatifs

### Nouvelles pages de contenu

Fig 1. New content pages of Bangla Wikipedia from January 2004 to December 2020

In the early period, Bengali Wikipedia had very few active editors and therefore the development rate was quite slow. At that time, the emphasis was given to creating short stub-like articles, so that users at least find something and may feel encouraged to contribute by expanding those articles. An article is termed as a stub if it doesn't contain enough information to be an encyclopedic entity.[23] In September 2006, 3176 new articles were created in such a way, as shown in Figure 1, among which 1886 were via group bots. However, the stub articles didn't attract many contributions from the newcomers, and the practice of creating very short/stub articles was ended by the Wikipedian community as they unitedly decide to focus more on quality rather than quantity, as understood examining the discussions on Bengali Wikipedia's archive. As a result, the new content page creation rate decreased substantially over the following months. Later, it got increased in an unhurried manner over the years. From 2014 onwards, the average rate of article creation per month crossed 500. In 2019, this rate crossed the 1000 articles' boundary, and almost immediately in that year, it crossed the boundary of 1500 articles per month. In 2020, the average rate of article creation per month was 1788. Among all the new content pages, 92.924% are from registered users, 4.480% are from group bots, and 2.587% anonymous IP users.

It is found in several studies that most of the contributions in Wikipedia come from a minor portion of very active contributors.[24][25][26][27] The observation has been proved to be valid in Bengali Wikipedia as well. In respect of edit count, the top 10 contributors committed almost 22.776% edits among all the edits done by users and IP address (except the bot edits). Regarding the total article count, the top 10 contributors created 26.97% articles in Bengali Wikipedia. These two percentages reach almost 51.81% 56.76%, respectively, if the top 50 contributors are considered.

### Utilisateurs enregistrés

Fig 2. Total registered users of Bangla Wikipedia from January 2004 to December 2020

A registered user is any person who has created an account on a Wikimedia project. Here, the users who have registered on the Bengali Wikipedia site is considered. Naturally, all the registered users don't edit, but it depicts a picture of the internet users' interest to contribute to Wikipedia and offers a reason about what put interest in them. The number of registered users has also gradually increased in this whole journey. For example, in 2005, 71 new accounts were registered per month on an average, which turned to be 160 in 2010, 1084 in 2015, and 1978 in the year 2020. The highest recorded number of registrations, i.e., the peak of the graphs are similar to the number of new content pages on Bengali Wikipedia, with the highest number being 4098 in July 2019, as shown in Figure 2.

### Utilisateurs actifs

Fig 3. Active editors of Bangla Wikipedia from January 2004 to December 2020

An active editor is a person who is a registered, non-bot user and has made at least 5 edits to content namespaces during a given month. In this study, the number of active editors has been analyzed on yearly basis from January 2004 to December 2020. The growth rate is quite promising in Bengali Wikipedia for the past few years, as shown in Figure 3. The earliest peak in the number of the active editor is found in April 2006. At the end of March 2006, a small group consisting of a few Wikipedians was formed, and a newspaper article was published in an acclaimed national daily inviting people to contribute in Bengali Wikipedia.[28] The immediate impact is noticeable; before April, the number varied around 5 at best, whereas the number went straight above 20 in April. But in terms of number, the growth rate over the following years is not satisfactory.

#### Events and active editor growth

The noteworthy peaks in the graph of active editors are during the timeline of February 2017 (from 189 in the previous month to 350), February 2018 (from 209 to 347), July 2019 (200 to 449). At these times, a large level article contest was going on in Bengali Wikipedia, attracting a lot of newcomers to join and contribute. The change is not so substantial but prevalent in December 2015 as well; countrywide workshops were going on at that time to celebrate the 10th anniversary of Bengali Wikipedia. The last peak (319 to 444) was in April 2020, when a Special Edit-a-thon was going on under the COVID-19 pandemic. So, the impact of these contests and similar activities is quite evident in the growth result. However, the number of active editors has not grown with the same acceleration as registration, as expected. From the year 2014, the 10th Anniversary of Bengali Wikipedia, the performance graphs saw an increase that didn't fade away with time. At this period, a good number of events were organized to celebrate the 10th Anniversary which includes workshops, school programs, photowalks, conference, and a good amount of press coverage in various media of Bangladesh. Events were also organized in West Bengal, India for the same.

Various events are organized from time to time so that more people can be aware of Bengali Wikipedia and can feel motivated to contribute here. Sometimes the goal of these events is to work collaboratively and improve the quality of a certain topic. These events include contests, edit-a-thons, etc., which are proved to help reducing gaps and encouraging engagement and retention of newcomers.[29][30][31] Not every event can manifest a peak in the active editor or new content curve, but they also bear important impacts on improving quality. These events are usually organized by the operating affiliates of this region or by the Bengali Wikipedia community members without any affiliation. For example in 2020, a total of 11 online contests or edit-a-thons were organized on Bengali Wikipedia, with varying amounts of time lengths ranging from 24 hours up to several months. Among them, 6 events were organized or supported by Wikimedia Bangladesh, the approved chapter of the Wikimedia Foundation[32] working in the Bangladesh region, and the rest 5 events were organized by the Bengali community members.

#### Modifications par mobile

Fig 4. Top three Wikipedias with the majority mobile editors from 2018-2020

In Bengali Wikipedia, the percentage of majority mobile editor is highest among all the Wikimedia projects, as of 2020. The majority mobile editor is defined as the proportion of non-bot registered users with more than 50% of their monthly edits performed using the mobile web or a mobile app. Bengali, Hindi, and Arabic Wikipedia are holding the top three places of majority mobile editor since 2018, from when data is extracted from the Wiki comparison table prepared by the Product Analytics team of WMF. In 2019, Bengali Wikipedia was second-highest on this list and in 2018 it was the highest. As of 2020, Hindi Wikipedia stands second on this list and Arabic Wikipedia stands third, as shown in Figure 4. Bengali Wikipedia also stands 16th among all the Wikipedias in terms of mobile edit number. These two data imply that the users are mostly using Wikipedia from their mobile devices to contribute. As of 2020, 52.33% of the active Bengali Wikipedia contributors with more than 5 edits are from the Bangladesh region, and the second-highest percentage (11.38%) is from India. Here it is to be noted that Bengali is the official and national language of Bangladesh, and an official language of a few states of India.[33]

### Profondeur de l'article

In order to measure the collaborative quality of the encyclopedia, article depth is utilized as a rough indicator. It shows the frequency of updating articles in a certain Wikipedia. It is rough in the sense that it doesn't measure the academic quality, rather it reflects the collaborative approach to build the encyclopedia.

Article depth is defined as:[34]

${\displaystyle {\text{Depth}}={\frac {\text{Edits}}{\text{Articles}}}\cdot {\frac {\text{NonArticles}}{\text{Articles}}}\cdot \left(1-{\frac {\text{Articles}}{\text{Total}}}\right)}$

Non-Articles include user pages, redirects, images, project pages, categories, templates, and all talk pages. Total refers to simply Non-Articles + Articles.

L'équation 1 peut être simplifié en :

${\displaystyle {\text{Depth}}={\frac {\text{Edits}}{\text{Total}}}\cdot \left({\frac {\text{NonArticles}}{\text{Articles}}}\right)^{2}}$

Bengali Wikipedia is currently ranked 4th among all the Wikipedias and 2nd among the range of 100,000+ articles with a score of 326 up to December. The stub ratio is 0.114.

### Vues de page

Fig 5. Total pageview of Bangla Wikipedia from January 2016 to Dec 2020

Wikipedia page view is another important parameter to get the idea of how Wikipedia is reaching its users, which is also thought to be linked with the web search trend.[35] The pageview statistics of the last 5 years has shown in Figure 5. The access to Wikipedia has increased manifold over the years. Mobile access has been tremendous over the years from January 2016 to December 2020. Most of them use the mobile web to access Wikipedia among the three options: desktop, mobile app, and mobile web.

However, the quantity shown in Figure 5 is from Wikimedia Stats, which is not totally accurate. In January 2018, an unprecedented peak is visible as desktop access by user account. Analyzing the results at this time, it is assumed that this was a possible bot or spider access that got mistakenly classified as user access here. This phenomenon is found in Bengali Wikipedia only among many other Indic languages. For this particular month, the most access was from the United States of America, and the unique device access rate was not different than the usual month(s), which also provide proofs behind this assumption.

## Biais de genre

The gender gap has constantly been an unwanted companion in the worldwide Wikimedia movement. As the movement is getting bigger and bigger, the gap is more clearly visible. Bengali Wikipedia is not an exception in this regard. In this study, the statistics of female participation in Bengali Wikipedia among the users who have expressed their gender are analyzed on a bi-yearly basis. A rough comparison between other Indic language Wikipedias is also presented, followed by a percentage of female participation in the active Bengali Wikipedian community. In the MediaWiki software, the base platform of Wikipedia and its sister projects, the feature of expressing gender in the preferences panel was implemented in 2010.[36] So, the result before 2010 is not considered for the gender-related quantities in this study.

Although study with an explicit focus on Bengali Wikipedia has not been done before to the best of the author's knowledge, result from a study of 2013 [36] showed that only 4.09% users identified themselves as female according to their preference selection among 1589 registered users. The study considered users who registered from 2010 up to March 2013. In this paper, the numbers from 2011 to 2020 is considered in a bi-yearly manner for Bengali Wikipedia. In the 2011-12 timeline, the female percentage was 3.69%, which increased to 7.41% in the 2019-20 period. However, it is to be noted that within this timeline, the total number of registered users has increased almost 490%. The detailed result is presented in Table 1. It is not an exact estimation as the major portion of registered users didn't express their gender preference, but the gap is obvious from the portion who has selected it. Comparing the bi-yearly data, it is apparent that female participation is increasing in a really sluggish manner.

Table 1: Percentage of registered users in Bengali Wikipedia who expressed their gender as a female in the 2019-20 timeline
Year Total registered user Percentage of user setting gender Percentage of users expressing gender as female
2019-20 47 429 1,45 7,41
2017-18 37 657 2,31 5,51
2015-16 26 325 4,00 4,85
2013-14 15 803 6,88 4,69
2011-12 8 043 14,47 3,69

In order to find the position among a good number of language editions of the Indian subcontinent and neighboring regions, Wikipedia of these locations is also considered for the 2019-20 timeline to depict the latest picture. According to the list of Wikipedias by country,[37] languages of Bangladesh, India, Nepal, Bhutan, Sri Lanka are selected, which comprised of a total of 28 languages. These Wikipedias started their journey at a varying timeline; from the early period of Wikipedia to the recent one on April 2020 (Awadhi Wikipedia)[38]. The number of registered users per year is also different in these Wikipedias, so a direct comparison is not possible due to the high variation among the parameters. However, to perform a reasonable comparison, threshold of 500 user registrations within the 2019-20 period is selected. This filter provides 14 Wikipedias of this broader region. The detailed result is shown in Table 2. It is evident from the numbers that under this filter, Bengali Wikipedia ranks the lowest position among other Wikipedias of this region. Kannada Wikipedia shows the best result with a promising 44.55% female participation.

Table 2: Percentage of registered users among the Indic language Wikipedia with more than 500 registrations who expressed their gender as a female in 2019-20 timeline
Code Language Total registered users Percentage of user setting gender Percentage of users expressing genderas female
kn Kannada 3 950 5,11 44,55
mr Marathi 8 017 1,05 21,43
te Telugu 5 186 1,89 20,41
ta Tamil 9 289 2,42 18,22
si Sinhalese 3 339 2,99 15,00
ml Malayalam 7 168 1,98 14,79
ne Nepali 1 766 3,91 13,04
hi Hindi 47 750 1,19 12,65
pa Eastern Punjabi 976 4,92 12,50
as Assamese 1 358 4,56 11,29
or Oriya 845 4,26 11,11
ps Pashto 884 5,20 8,70
gu Gujarati 3 019 1,69 7,84
bn Bengali 47 429 1,45 7,41

Next, the number of female participation with various rights in Bengali Wikipedia is considered which can be a better approximation result of female participation among the active editor section. Active Wikipedians are allowed to have certain rights with definite features to utilize according to their expertise, and the administrators are thought to be the most experienced ones and selected on the basis of community voting. The quantitative result ac hived after analyzing the administrator, autopatrolled, rollbacker, and reviewer right holders are shown in Table 3. The absence of active female contributors in Bengali Wikipedia is apparent from the presented result. Female users with autopatrolled right constitute only 4.23%, whereas the quantity is even worse for rollbacker and reviewer, being only 2.63% and 2.78%, respectively. There is no female contributor with admin or file mover right in Bengali Wikipedia.

Table 3 : Pourcentage d'utilisateurs inscrits dans la Wikipédia bengalaise qui ont déclaré leur genre féminin en 2019-2020
Statut Nombre total d'utilisateurs avec ce statut Nombre d'utilisateurs avec ce statut qui ont spécifié un genre Pourcentage d'utilisateurs ayant spécifié un genre féminin
Renommeur de fichiers 18 12 0
Relecteur 54 35 2,86
Révocateur 58 37 2,70
Autopatrolled 133 68 4,41

According to the Wiki policies, generally users apply for a certain right and admins accept or reject the request considering whether the user actually fulfills the criteria for that right or not. Examining the past 104 applications in the 2019-20 period for autopatrolled, reviewer, and rollbacker right, it is found that not a single contributor expressing gender as female applied for these rights.[39][40]

When outreach activities like contests, edit-a-thons, etc. are organized, a surge in the number of active editors and registered users is perceived. Some of them keep contributing in the later phase, and some don't, which is obvious and also apparent from the active editor graph. Editor retention is another important parameter, at which the female contributors are lagging.

## Conclusion

In this paper, a brief analysis of Bengali Wikipedia's journey to 100,000 articles is presented considering its various aspects. The impact of outreach activities is reported connecting them with the growth rate of total content pages, active editor number, and newly registered users. The growth rate of these parameters is really auspicious in Bengali Wikipedia. However, the study shows that all these positive outcomes failed to keep a significant demographic portion along with its journey – the female contributors. Despite the developments of other parameters, the gender gap is extant and showed little improvement over the years. Future works of this study may cover more specific features regarding the impact of outreach activities with a particular focus on the gender gap through surveys and interviews with the female Wikipedians to understand the downsides and obstacles they are facing in the overall movement.

## Déclaration

Aucune subvention n'a été reçue pour cette étude.

This is an accepted manuscript of Wiki Workshop 2021.

## Citer cette étude

This is an accepted manuscript of Wiki Workshop 2021. To cite, you can use the following template:

#### ACM Ref

Ankan Ghosh Dastider. 2021. A Brief Analysis of Bengali Wikipedia's Journey to 100,000 Articles. In Companion Proceedings of the Web Conference 2021 (WWW '21 Companion). Association for Computing Machinery, New York, NY, USA. DOI:https://doi.org/10.1145/3442442.3452340

#### Bibtex

 @inproceedings{10.1145/3442442.3452340, author = {Dastider, A.G.}, title = {A Brief Analysis of Bengali Wikipedia's Journey to 100,000 Articles}, year = {2021}, isbn = {978-1-4503-8313-4/21/04}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3442442.3452340}, doi = {10.1145/3442442.3452340}, booktitle = {Companion Proceedings of the Web Conference 2021}, articleno = {5}, numpages = {6}, keywords = {Bengali Wikipedia, Wikipedia's growth, Indic language Wikipedia, Wikipedia's impact, Gender gap}, location = {Ljubljana, Slovenia}, }

