Research:Anatomy of English Wikipedia Did You Know traffic
- Anatomy of English Wikipedia Did You Know traffic
Summary: This research examines the traffic of 544 English Wikipedia Did You Knows to try to determine which variables play a role in the determining the number of page views an article will get on the day. It largely concludes that the number of dependent and independent variables make it too difficult to isolate specific reasons why one type of article performs better than another, though there are some general times and topics that will likely result in greater views.
As a contributor to English Wikipedia, one of the community opportunities I enjoy most is the ability for my work to be featured on the front of English Wikipedia in the Did You Know section. It is a form of recognition by the community that new content development work was worth acknowledging and featuring. As a person interested in metrics who recently made a slight change in focus for content development, I wanted to understand the implications for Did You Know traffic.
The purpose of this research paper revolves around that and is two-fold: To understand why page views to DYKs I have authored perform the way they do, and to understand why Gibraltar related hooks perform the way they do. To a lesser degree, the purpose is to understand how DYK traffic as a whole works.
To give some basic background to English Wikipedia’s DYK, article hooks are reviewed by at least one other contributor against a set of criteria including article size, expansion timeliness, copyright compliance, and neutrality. A hook features one or more new or expanded articles. Once approved, the hook is moved into a holding area. The holding area is then moved to a queue by an administrator. Each holding area holds between 6 and 8 approved hooks with the top approved hook always containing a picture. A new holding area appears on the front of Wikipedia every 6 to 12 hours. When prepping a holding area, most people try to include a selection of topics so not all hooks are about the same general topic or the same geographic area. Depending on the number of approved hooks, the actual size of the holding pen and the length of time on the main page fluctuates. There is generally a discussion on the DYK talk page before changing the total number of article hooks.
Data regarding 544 DYK articles was gathered. Of these, 421 were written by me. Henceforth, I will refer to my own DYK work using my user name, LauraHale. To give a fair representation of Gibraltar related hooks, two contributor’s DYKs were selected for inclusion: All of Gibmetal77’s 12 DYKs, all of ACP2011’s 18 DYK between October 1, 2012 and June 1, 2013, and Prioryman’s 41 DYKs between January 1, 2013 and July 21, 2013. All three have had a number of DYKs about Gibraltar, which is why they were selected as representative. There is a substantial difference between LauraHale’s DYKs and those of Gibmetal, ACP2011 and Prioryman in terms of focus. The first are mostly sport and female centric. The latter are more focused on history, military, politics and Scientology related topics. For additional points of reference and to offset some the topic focus problem, all 25 DYKs from TonyTheTiger from January 1 to July 23, all 20 DYKs for Drmies from January 1 to July 23 and all 3 DYKs from Arsonal were included. All 544 DYKs account for about one month’s worth of hooks but should not be construed as inherently representative of DYK hooks appearing on English Wikipedia. For example, some perennial topics are not greatly represented in this sample including race horses, mushrooms, popular culture, non-British military history, the University of Michigan, baseball, men’s football, India, current events and biology topics.
Each hook was tagged as representing a continent and a country, unless this was not possible for a generic topic like chicken harvester. If a hook mentioned multiple countries, the first location was used. For example, Valerie Ogoke references both Australia and the United States. The article was tagged location wise in the data set as North America, United States because Valerie is first referenced as an American. The exception to this was Gibraltar related hooks. When a hook or article linked in the hook has Gibraltar in the name, it is automatically listed as Gibraltar for the name if it refers to that area in the UK/Spain. If the hook is about someone of something from Gibraltar despite other keywords, it was also geographically tagged as Gibraltar. For hooks with Gibraltar in the name but not about the territory, they are topic tagged as Gibraltar. This allows for comparison against articles that people wrote about non-Gibraltar territory Gibraltars. Articles were also tagged by gender, as either male, female or neither. This depended on if the article was about a specific person, pronoun usage in the hook, pictures in the hook, etc. All military hooks were genderized as male unless the hook explicitly contained a female reference point. For cases where there was no gender relation, such as chicken harvester or Gold Base, neither was used. Articles were tagged by contributor receiving credit. Lastly, articles were tagged by topic such as history, tourism, military, sports, Paralympics, Olympics, swimming, athletics, television, literature, biology, etc. Each article could be topic tagged three times. Articles were also tagged with the date and day of the week they appeared.  For many articles, the time of day an article was published and its order in the holding area was also noted.
The results of this research is broken down into largely three sections. The first looks at the results across all 544 DYKs. This is followed by the results for Gibraltar related hooks, and lastly at hooks by LauraHale with pictures.
First, it should again be acknowledge that in discussing the DYKs presented being representative of all DYKs, this is not entirely certain since the 421 hooks by LauraHale out of the 544 total mean her hooks are overweighted and not truly random.
Given the perception of a gender gap in content presented, there is the potentially interesting question of the relative number of page views for DYKs for male related hooks, female hooks and neither gendered hooks. 206 of the 544 hooks were labelled male, 293 female and 45 neither. Male topics with a picture did substantially better on average than female topics and non-gendered topics with a picture, 7365.0, 4611.7 and 4585.6 respectively. When the median is used instead, the picture changes dramatically: 4000, 3771 and 2225 respectively. This suggests that the male topics with a picture may have a slightly better chance of getting views than a female topic. A different pattern exists for hooks that do not have a picture. In this case, neither gendered topic has the highest average and median at 2964.65 and 1671. Then we have the same problem as before with male topics having the higher average and female topics having the higher median. The averages respectively for male and female are 1677.4 and 1346.3. The medians are respectively 717 and 1003. The likely cause for this is the male outliers at the top are much, much greater, with the female articles more bunched. The graph below demonstrates this to a certain extent. 
Geographic interest likely plays a role in article popularity. Certain countries, because of the viewership to English Wikipedia, should attract more views based on nationalistic topic interest than others. As the representation by country in this sample is small, countries have been placed in a broad geographic region. Overall, Europe averaged and had the largest median for the most views with 4290.7 for its 103 hooks. Asia, with 9 hooks, had the second highest average at 4104.8, but ranked third with 1411 behind the Middle East for median page views with 1924.5 views. The Middle East was third on average with 2734.5 for its 6 hooks. North America, with 42 hooks, ranked fourth with an average of 2452.2 and median of 1231. The 12 non-specific hooks ranked fifth on average with 2018.7 but sixth on the median with 1023. Following this are Oceania (338 hooks), the Caribbean (2 hooks) and South America (1 hook). As previously discussed, pictures and first hook location can skew this data. Only 5 regions had at least one hook with a picture. With this in mind, sorting on popularity based on region, Europe, with 13 DYKs, again comes out on top with an average of 11690.3 views but second on the median with 8174.5 views. Asia, with 2 DYKs, is second on average with 10622.5 but first on median with the same number of views. No other average/median differences in ranking occur, with North America with four hooks ranked third with average of 4798.5, Oceania with 28 hooks ranked fifth with an average of 4738.5 views, and non-specific last with an average of 1962.3 for three pictured DYKs. The order changes when only DYKs without pictures are included. Europe still comes out on top with an average of 3139.7, but the Middle East is first on the median with 1924.5 to Europe’s 1650.5. The Middle East is second on average with 2734.5. Asia with an average of 2328.5 and a median of 1402, North America with an average of 2205.2 and a median of 1195.5, Non-specific with an average of 2037.5 and a median of 855, Africa with an average of 1432.6 and a median of 1140, Oceania with an average of 1082.1 and a median of 783, Caribbean with an average of 801.5 and a median of 802 and South America with an average of 710 and a median of 710.
Some of the low regional averages and medians could be a result of the topics included. For Caribbean and Middle East, every hook was about the women’s sport and contained no picture. Other regions, with the relative exception of Oceania, have the benefit of having a bit more diversity in their representation. Europe possibly benefited from the over representation of military and history related DYKs that other regions lacked in their sample. Non-specific lacks traditionally popular topics for this type including mushrooms, rocks and animals.
Another possible reason for the lower times for some regions may be a result of the DYKs not running in the most ideal time slot or based on the day of the week. The next analysis will look at the averages for page views based on continent where for at least two distinct time periods three or more hooks were run at the same time. The time zones are all UTC+2. It will also look at day of the week where at least two days have three or more hooks run on it.
For Africa, three distinct time periods for posting DYKs exist. They are 2:00 AM UTC+2, 10:00 AM UTC+2 and 6:00 PM UTC+2. There clearly is a ranking in time preferences, with hooks at 2:00 AM performing best with an average of 1,773 and a median of 1,391. Hooks running at 6:00 PM have the second best average of 1,711 but third best median at 580. This all suggests the ideal placement for hooks is at 2:00 AM but that other factors are likely to impact page views other than time.
For Africa, the best performing days of the week are in the following in order based on average: Monday, Wednesday, Thursday, Sunday, Saturday and Friday. The median order changes only for the first two days. Overall, this suggests the best day to get views for an African DYK is on a Monday or Wednesday at 2:00 AM or 6:00 PM.
For Asia, with only 8 total hooks, there were not enough for an analysis by day of the week. As a component of time, three hooks were posted at 2:00 AM UTC+2 and 3 were posted at UTC 6:00 PM UTC+2. The 2:00 AM time performed better with an average of 7,001 and median of 1,506 views compared to 6:00 PM with an average of 1,180 and median of 1,299. Asian themed hooks seeking to maximize page views may want to try for a time of 2:00 AM UTC+2.
For Europe, which has an over representation of April Fools day hooks, military and history hooks and Gibraltar hooks, 97 DYKs have been identified with having a time associated with it. There are three times with three or more hooks: 2:00 AM UTC+2,  10:00 AM UTC+2 with 24 hooks,  and 6:00 PM UTC+2 with 34 hooks. Views averages and medians were examined overall, with pictures and without pictures. Overall, the best time to post a DYK overall and without a picture is at 6:00PM UTC. For European hooks with a picture, the best time to post a hook to get views is 2:00 AM or 10:00 AM. Overall, the second best time to post a hook is at 10:00 AM UTC+2.
Looking at the relative popularity of European DYKs by day of the week posted, overall the most popular day is Monday with an average of 9255.2 and a median of 7391.5. The order then does not match for average and median. For average, the most popular days following Monday, Tuesday, Wednesday, Thursday, Friday, Sunday and Saturday. For median, this order is Wednesday, Friday, Tuesday, Thursday, Sunday and Saturday. This order changes a fair bit when hooks are examined by day either with or without images. For European DYKs with pictures, the order by day using average is Monday with 26946.6, Wednesday with 19931.5, Tuesday with 6607, Thursday with 4292, Saturday with 4061.3 and Sunday with 2448 views. The median order for European hooks with pictures is Wednesday, Monday, Tuesday, Thursday, Saturday and Sunday. For European DYKs without pictures, average and median order again do not match. For popularity from most to least popular on average, the order is Monday with 6133.2 views, Tuesday with 5700.7 views, Wednesday with 2769.3, Thursday with 2703.0, Friday with 2668, Sunday with 1667.1 and Saturday with 1132.5. Using median, the popularity for European hooks without pictures, the most to least popular days are Monday with 3711, Wednesday with 2762, Friday with 2583, Thursday with 1950, Sunday with 1487, Tuesday with 1461.5 and Saturday with 784 views. Overall, if one were trying to maximize the views for a European hook, the ideal day is a Monday or Wednesday at 10:00 AM UTC+2 with a picture, or at 6:00 PM.
For non-specific DYKs, only two periods meet the inclusion criteria, that of Friday and Saturday. In this case, with 4 hooks, Friday posted non-specific hooks outperform the 3 Saturday hooks by an average/median of 3,147/2,642 to 678/495.
For North American hooks by time, there are 7 hooks for 2:00 AM UTC+2, and 12 hooks for 10:00 AM UTC+2, Hooks posted at 10:00 AM perform better on average and median than those posted at 2:00 AM, 2882.6/1947.5 to 1508/1150.5.
For North American hooks, three or more hooks in the sample were posted on Monday, Tuesday, Wednesday, Saturday and Sunday. Of these, Saturday is the best performing day of the week with an average/median number of views of 5679.1/3910.5 for the 6 hooks in the sample. The next best performing day is Tuesday with an average/median of 2757.6/2766.5 for the 8 hooks in the sample. The next best performing day on average and median for North American hooks is Monday with 1721/1688 based on the 5 sample hooks. Sunday and Wednesday are the worst performing days, with Wednesday ranking last on the average and Sunday ranking last on the median. If trying to maximize views for a North American DYK, the available data suggests publishing on a Tuesday at 10:00 AM UTC+2.
There are times when three or more hooks were posted for Oceania from the sample. They are 2:00 AM UTC+2, 10:00 AM UTC+2 and 6:00 PM UTC+2 with 30, 54 and 45 hooks respectively. The differences in average and median by time for 2:00 AM, 10:00 AM and 6:00 PM is negligible with average/median views of 1192.6/643, 1146.6/1012 and 1128.2/829 respectively. With only one occurrence of a picture for 2:00 AM and 6:00 PM, their impact is negligible.
For Oceania hooks, the average range by day was about 800 with the top day Wednesday averaging 1895.7 and the least popular day, Sunday, averaging 1000.4. The median range is smaller at around 600, with the most popular day, Wednesday, having a median of 1260.5, and the least popular day, Saturday, having a median of 608. With the except of the Saturday/Sunday swap on the bottom, the average and median ranking from most popular to least is Monday, Thursday, Tuesday, Friday, Saturday and Sunday. When only picture hooks are included, the day popularity changes dramatically, though to be fair Sunday is represented by only one hook. Sunday with a picture is the most popular and has has an average/median of 8181/8181. Saturday with a picture is next with an average/median of 4656.7/4061.5 Wednesday with a picture is third an average/median of 3673/3124. Thursday with a picture has an average/median of 3527.2/2900. Friday with a picture has an average/median of 3309.5/2410. Monday with a picture has an average/median of 1600.7/1001. Tuesday is last with a picture has an average/median of 1113.1/806. The order for Oceania hooks without a picture by day is the same as overall. Thus the best day to post a hook to maximize views is Monday overall, Wednesday if there is a picture.
There are 1,422 total topic related tags used for the 544 hooks, of which 93 of these are unique. 27 tags are only used once, 12 are used only twice and 5 are used only three times. Given the relatively small sample size for these articles with these tags, these topics will not be examined. On the other side, the most popular tags are Sports used 423 times, Paralympics used 154 times, Olympics used 128 times, History used 54 times, Basketball used 48 times, Military used 41 times, Athletics used 39 times, Popular culture used 38 times, Softball used 34 times, Field hockey used 32 times, Skiing used 31 times, and Swimming used 31 times.
The average number of views by topic were found. The most popular “topic” was April Fools, that is hooks appearing on April 1. The four hooks averaged 22800.7 views despite only one containing a picture. The next most popular topic was Scientology, with its four hooks averaging 12191 views. Species related hooks came in third with 7925.8 views. Culture related hooks came in fourth with 7168.5 views. Biology related hooks came in fifth with 6700 views. Tourism, with 10 hooks, averaged 5138.8 views and was ninth. Television was tenth and the military was eleventh with 4724.2 views on average. Hooks playing on Gibraltar while not being about the European Gibraltar ranked twenty-first topic wise with an average of 2223.6 for the six hooks. Rowing and field hockey were at the very bottom with 460.3 and 434.5 views on average.
As the results above show, page views do not appear to be derived based on a single variable. Thus, topic should be looked at with these other considerations in mind. The first is to consider the impact a picture or lack of a picture has on a hook. Using only picture, no picture pairs with at least 2 of type, eighteen distinct tops are left. Using only averages, biology and species topic hooks with pictures finish at the top with an average of 13527. In second is music with a picture, averaging 12154.5 views. In third are military hooks with pictures at 10074.5. The top non-picture related topic is species with an average of 4191.6 views, followed by science with an average of 3905 views. Next is history with an average of 3842.8 views for a hook without a picture. The average difference on the whole between hooks with a picture by topic and those without is 4697.7 with the median difference being 4650.2. Pictures really do go give a noticeable boost in page views on the whole, though there are some anomalies that are worth further investigating including architecture where hooks without a picture average 3763, noticeably more than architecture hooks with a picture which average 2831.6 views. Also politics, where the average views for a picture is 2175 to 2000 for a hook without a picture.
Topic popularity was also broken down based on male, female and neither focus, removing all cases where there were not at least two occurrences of male and female. This leaves 23 topic unique sets. Of them, neither gender military hooks have the highest average views with 6911.5 views, which compares to 4671.4 for male military hooks, ranked fourth, and 4092.5 female military hooks which is ranked seventh. The second most popular is male music with an average of 5169.6 while female music ranks tenth at 3365.5 average views. Neither gender history ranks third with 4747.2 views, while male history averages 4567.9 views and ranks fifth, and female ranks twenty-fourth with an average of 1626.2 views. Male popular culture ranks sixth with an average of 4122.7, while female popular culture ranks thirteenth with an average of 2546.6. The first topic where women outperform men and neither is swimming, with an average of 3182.8 and a rank of eleventh. Male swimming ranks 32 with average views of 1333. While Paralympic and Olympic female DYKs are close in average at 1707.1 and 1902.7 respectively, male Paralympic and Olympic DYKs are much further apart at 1274.1 and 364.9, though this is likely explainable by the over representation of field hockey in the men’s Olympic DYKs which are the worst performing of all topics by gender with an average of 308.6 views.
With the different variables giving different results for relative topic popularity, no one topic in and of itself would likely lead to maximum views if someone was writing a DYK. Biology, Scientology, music, science, military, history and popular culture related topics are likely the best choices if someone is aiming for this goal though.
There are 47 Gibraltar hooks in the sample from the three contributors. They had an average of 4112.7 views per article, median of 2612 views per article. First ranked hooks, that is hooks with pictures, performed measurably better than DYKs without pictures: Pictured hooks had an average of 7771.7 and median of 4936 views compared to hooks without pictures which averaged 2994.6 views and median of 2213.5 views. Pictures play a key role in the success of a hook getting page views.
Three contributors were selected for Gibraltar representative hooks: Gibmetal77, ACP2011 and Prioryman. Counted were 12, 14 and 22 DYK articles for each. The mean and median traffic totals for each hooked Gibraltar related contributor is different. Prioryman has the best performing Gibraltar related hooks on average, with 4870.9 views each. ACP2011 is the second best performer with an average of 3684.4 views. Gibmetal77 is last with an average of 3285.5. If median is looked at, the order and numbers change dramatically. The median for each contributor’s Gibraltar related hooks counted is 2,200 and 2,700. The best performer is ACP2011, then Gibmetal77 and then Prioryman. Not all Gibraltar related hooks are created the same: Some contributors perform better than others.
The Gibraltar hooks have 121 topics assigned to all 47 of them, with 23 of these hooks being unique. The most popular DYK topics are history (38), military (33), politics (5) and tourism (4). No other topic has more than 4 hooks. The most popular topic on average for Gibraltar related hooks are military ones with 4213.8 views (median 2674) followed by history 4064.3 (median 2674), politics 243.63 (median 2225) and tourism with 2417.7 (median 2077.5). Tourism related hooks for articles including The Rock Hotel, Sikorski Memorial, Gibraltar, Gibraltar Museum and Grand Casemates Gates just did not interest main page viewers as much. This same pattern for topic popularity is mirrored when only hooks with pictures are counted. For hooks without pictures, politics and tourism trade places with politics averaging 2485.75 (median 1760) views and tourism averaging 2579.3 (median 2222) views.
Topics can also be understood as connected to gender. Gibraltar hooks were sorted by topic connection to a gender. Biographies about men were labelled male, while biographies about women were labelled female. All military articles, unless explicitly about a female, were labelled male. Hooks not fitting into these easy groups were then assessed based on if the hook had a picture of a male or female, or if the hooked text around a DYK discussed men or women. Remaining articles not falling into male or female groups were labelled neither. There were 35 male hooks (9 hooks with pictures), 4 female hooks (0 hooks with pictures) and 8 neither hooks (2 hooks with pictures). With this in mind, male hooks performed the best an average of 4558.4 views. Neither gendered hooks performed second best with an average of 3339.6 views. Female hooks came in last with an average of 1759.2 views. When the picture benefit is taken out of the equation, the gap is slightly reduced to 3265.2 for male, 2645.8 for neither and 1759.2 for female.
Another way of understanding article popularity is day, time and location in the list for the DYK. The day of the week with the highest average views is Wednesday with 6199.1, followed by Monday, Tuesday, Thursday, Friday, Sunday and Saturday with 1568. On the issue of time, Gibraltar DYKs ran in seven different time slots, including Gibraltar local times of 2:00 AM (16 DYKs), 2:50 AM (1 DYK), 10:00 AM (10 DYKs), 2:00 PM (1 DYK), 6:00 PM (13 DYKs), 7:30 PM (1 DYK) and 8:35 PM (3 DYKs). Excluding times with only one DYK, the time with the highest average views was 6:00 PM UTC +2 which averaged 6260.4 views. This was followed by 10:00 AM with an average of 3837.6 views, and then 2:00 AM with an average of 3292.6 views. This pattern holds true for the average number of views for DYKs with pictures in the same time slot, though the 10:00 AM slot has median of 708 higher than the 6:00 PM slot. For DYKs without pictures, 6:00 PM has the highest average with 4361.1 followed by 2:00 AM with 3377.6 views and then 10:00 AM with 1497.6 views. The last consideration is the location in the DYK list. Hooks with pictures listed number 1 average the highest views with 7771.7. This is not surprising. The remaining order is 2, 3, 5, 6 and 4. Hooks appearing fourth average 1256.75 views compared hooks in the appearing fifth which averaged 3245.7 views. Gibraltar hooks that are likely to get the most views are ones with pictures or in the second position, posted on a Wednesday at 6:00 PM (UTC+2). Gibraltar hooks that are likely to get the fewest views are ones in the fourth position, posted at 10:00 AM local time on a Saturday.
Based on the sample of 47 hooks, the ideal DYK about Gibraltar in terms of getting page views would be a DYK written by prioryman that was a male centric military article where the hook had a picture that appeared on a Wednesday at 6:00 PM (UTC +2). If the goal was to assure the least number of views for a DYK, the hook would be written by Gibmetal77, be a tourism or political article about a woman published on a Saturday at 10:00 AM (UTC +2) in the fourth spot.
Laura Hale’s DYKsEdit
The analysis of Laura Hale’s DYKs only includes an analysis of hooks with pictures given the overweight her hooks have in the total analysis.
LauraHale’s DYKs with pictures were published in 2011, 2012 and 2013. Her best performing year was 2012 with 26 DYKs with pictures, followed by 2011 with 7 and 2013 with 1. Excluding 2013, there was little difference in the average number of page views for a DYK with a picture between these two years. 2012 averaged slightly more than 2011 with 4418.1 compared to 4274.5. Month wise, she has had 1 DYK with a picture in February, July and August. Excluding these three months, her best performing month was January (2 DYKs) with an average of 6232 views, followed up by March (4 DYKs) with an average of 5947.5 views, November (4 DYKs) in third with 5080.5. For days of the week for LauraHale's DYKs with pictures, Monday (6 DYKs) was the best performing day with an average of 7270.3 views. Next is Tuesday (3 DYKs) with 6433.3 views, Sunday (3 DYKs) with 4071.6 views, Wednesday (6 DYKs) averaging 3673 views, Thursday (5 DYKs) with 3527.7 views, and Friday (6 DYKs) last with an average of 2688.3 views.
LauraHale had 23 female gendered hooks with a picture, 9 male hooks and 2 non-gendered hooks. Female hooks with pictures averaged a bit more than male, with 4777.7 views to 4664.1. This number is rather small and suggests gender does not play a role in the total article views for her DYKs with pictures.
In terms of region of the hooked article with a picture, and ignoring Europe because it has only one hook, LauraHale’s hooks from Oceania perform the best on average with 4738.5 views compared to the United States with 3114.5 views and non-specific with 1962.3 views. Ideally, given these three options, Oceania hooks with pictures would be the best way of getting page views.
Another way of looking at LauraHale’s hooks with pictures is by sport. For sports with 2 or more pictures, water polo had the best average with 7,063.0 views. Following this were the Olympics with an average of 6,409.1. In third was basketball with an average of 6,293.0, followed by swimming with an average of 5,271.8. Paralympics are less popular than the Olympics with an average of 4,433.9 views. Wheelchair basketball follows with an average of 3,823.6. Softball is next with an average of 3,282.9 views. Skiing, which is entirely Paralympic related, is last with an average of 1,996.7 views.
Given the variables for LauraHale’s DYKs with pictures explored, if writing a hook with a picture for page views, the data suggests the way to maximize page views is for the hook to be posted in March on a Monday, with an Oceania hook about a water polo player.
The 544 hooks total represent about a month worth of DYKs so represents a fair number of DYKs but not a completely representative sample. If repeating this research for wider conclusions, four changes should be made. First, the selection of DYKs included should be broadened to include a greater diversity of DYK contributors and topics and the overall total of DYKs included should be expanded. Second, other variables that could be important should be included and analysed. This includes the total number of articles in a single hook, the total number of hooks in a DYK queue, the topic representation should be included, length of the hook, count of specific words in a hook, notations should be made for DYK hooks pulled from the queue while on the main page, DYK reviewer and DYK nominator could be included. Third, other variables impacting traffic should also be explored to account for traffic. Four, as corrections are found in the original data, all other data was not corrected which could lead to some other errors. For example, if a DYK on a topic runs on the same day that the topic is in the news, this should be noted. Holidays might also be worth considering as hooks on April 1 get a large number of views. Social media linking to the article on the day of the DYK should also be examined as these could be potential drivers of traffic.
One of the things demonstrated by this research is that there are an overwhelming number of variables involved in determining the relative popularity of a DYK. Some patterns are clear that certain days of the week are likely to attract more views than others, that hooks with pictures almost always outperform similar hooks without a picture, that some topics are likely to be much more popular than others, that some regional related hooks perform better than others especially when they appear at certain times of the day, and that certain contributors are able to average more views for their hooks than others. Specifically writing a DYK to maximize views would be somewhat difficult as there would be few guarantees of success.
If one were truly looking to create a DYK with a goal of maximizing pages, the data included in this sample overall would suggest the following categories would be beneficial: A European themed hook posted on a Monday or a Wednesday with a picture, with a biology, Scientology, music, science or military related topic run at either 2:00 AM UTC+2 or 6:00 PM UTC+2.
Acting on this data from a DYK point of view, there would seem to be a thought that the relative “interestingness” nature of the hook is not as important as these other factors for what determines if a person reads the hook. Rather, a goal of highlighting some components like location or gender for different topics in a hook may cause greater interest in the hook. People prepping DYK areas for the queue should also be more conscious of the time they are putting in geographic centric hooks and place them at a time that will maximize interest relative to the time of day, given that certain days of the week will naturally get fewer views and there is not much they can do about that other than consider making the main page presence longer on those days and shorter on days that attract greater views.
- Metrics are the focus of my doctoral research and I have written several pieces about Wikimedia projects to better understand the community.
- The interest in Gibraltar hooks relates to this topic being perennial on DYK as it pertains to sanctions because of an alleged effort to use English Wikipedia as a way to advertise for the Gibraltar tourism board.
- The record number of new/expanded articles in a single DYK hook is over thirty.
- Exceptions do exist for special occasions like holidays or major pre-planned events like the Olympics and Paralympics.
- The decision to limit the sample to 544 DYKs chosen was random, based on a larger question of when to stop. Time constraints prevent manual data collection of all DYKs. It might be a point for further research to build on to the data from the 544 selected DYKs, add more columns and more DYKs to see how this impacts results.
- Some hooks contain multiple DYKs. If a article is ranked number 1 but is not the hook with picture itself, it is not counted as having a picture.
- The total number of articles in a hook was not looked at, nor was the relative placement of the article in a multihook article. There gets to a point where the number of variables gets to be overwhelming. While this is an interesting research question that may impact the results, it should be done by another researcher.
- Factors not explored in this research that potentially impact the outcome include hooks with multiple articles. The total number of articles appearing in the same hook was not included.
- Because of the differing number of hooks present from holding area to holding area, this number may not be as entirely useful for bottom related hooks. If a hook was last was not recorded. It may be possible this is another variable that could account for traffic that was not properly recorded.
- Some hooks contain alternative words for hooked articles, which makes identifying their location in the archive difficult. At other times, the article hook appeared in as a DYK on the main page, but was removed while the prep area was live. If that was the case, it does not appear in the archive and there is no explanation as to where it went.
- The x axis is derived from where the relative position of the DYK on list. The list was sorted highest to lowest before the graph was made.
- The continent and country mentioned first in the hook determined hook location. Next, the articles linked to. If neither allowed location identification, non-specific was used.
- Oceania has more male representative articles
- This includes one time of 2:10 AM.
- This includes one time of 5:00 PM, one time of 6:25 PM and one time of 6:50 PM.
- One of these three was actually posted at 6:27 PM UTC+2.
- 1:48 AM, 2:15 AM, 2:19 AM and 2:50 AM all are posting times for one DYK. They have been grouped with 2:00 AM UTC+2 with 31 hooks,
- 10:05 AM and 10:21 AM are posting times for one DYK. They have been grouped with 10:00 AM UTC+2.
- 5:33 PM, 6:30 PM and 6:35 PM are posting times for one DYK. They have been grouped with 6:00 PM UTC+2.
- Please not a sample of only two hooks.
- Monday was April Fools day and most of these hooks were European based. This makes them more visible as there are fewer outliers present.
- There were no hooks with pictures for Friday.
- This includes a hook posted at 1:08 AM and 2:34 AM.
- This includes a hook posted at 9:15 AM, 10:20 AM, 10:21 AM, 10:30 AM and 10:45 AM.
- Given the huge amount of data mining involved, unlike other regions, an attempt was not made to get a posting time for every hook. 10:00 AM also includes on 10:15 hook.
- They include American football, Business, Mountains, Religion, Theatre, Baseball, Crime, Education, Hotel, Judo, Kickboxing, Morocco, Museum, Poker, Powerlifting, Social media, Spain, Tennis, Airport, Badminton, Beach volleyball, Bob sled, Boxing, Dance, Easter, Elections, England, Feminism, Food, Health, High schools, Horse racing, Insects, Journalism, Lakes, Modern pentathlon, Movie, Nature area, Reproduction, River, Roller derby, Snowboarding, Transportation, United Kingdom, Vision, and Weightlifting.
- The over representation of sports reflects the primary inclusion of DYKs by LauraHale and TonyTheTiger. The history and military tags are primary a result of Gibraltar related DYKs. The popular culture DYKs are primary a result of TonyTheTiger and Drmies.
- The decision not to do median was made based on time constraints of the researcher.
- Non-Gibraltar hooks appear to do better for Tourism if the average below is looked at. This is likely a result of an English bump and a Christianity related bump for tourism related articles about churches. The contributor hooks selected did not provide enough data to give a better idea how Christian hooks perform relative to other topics.
- The use of 2 or 3 or 4 or 5 as representative is not consistent in this research when looking for minimum comparison points. This is because there is a very real desire to make comparisons, but the sample size and the ability to conduct research in a timely manner are limiting factors. At the same time, hooks with pictures are always limited because queues for DYKs are at least 6 long and only one of them ever will be a picture. The odds are not in favour of having an even balance of picture to non-picture hooks under the best of conditions.
- They include Biology, Species, Music, Military, Science, Popular culture, Water polo, History, Olympics, Basketball, Swimming, Sports, Paralympics, Species, Science, History, Wheelchair basketball, Military, Architecture, Biology, Softball, Architecture, Popular culture, Politics, Politics, Skiing, Swimming, Music, Olympics, Sports, Basketball, Paralympics, Water polo, Skiing, Softball, Wheelchair basketball.
- In this case, both topics with pictures include only two hooks and are the same. Both are also Gibraltar country hooks written by prioryman.
- The sample size is small at 2 and 3 respectively.
- The possible exception is any hook run on April 1 with an April Fools funny type theme.
- The explanations as to why tourism related hooks perform less well may be tied into other facts. Two of those articles were expanded by Gibmetal77, who averages fewer views than Prioryman. Only one of those hooks ran on a Wednesday. Two of them were fifth in the list and one was sixth. While one had a picture, it ran on a Sunday in July in the afternoon European time and only had 1933 total views.
- None of the DYKs in the sample appeared in a 7 or 8 place. This, coupled with the time separation, suggests most hooks appeared on days when hooks were appearing groups of six and were running for eight hours though this may not always be the case.
- Most errors corrected after the fact were picture yes or no. This is because the initial assumption for pictured or not was if (pictured) appeared in the DYK listing on the article or on the writer’s talk page. All of these are ACP2011 and Prioryman hooks not involving Gibraltar.