Research:Understanding Search Engine To Wikipedia/Literature Review

Background edit

Studying the relationship between Google and Wikipedia is difficult, especially using public data. Google does not publicly release search volumes, and their trends data is normalised and perturbed with noise to avoid reconstruction of the original volumes. Their deployment of Knowledge Panels also follows no strict pattern, and are aggregated using various unspecified sources. This makes testing causal models and relationships between Google’s treatment of Wikipedia links and their page views difficult to measure.  

However, better understanding the nature of the relationship of Google and Wikipedia would shed crucial information on the value of Wikipedia on the internet, and of open access, community and peer driven projects. Contextualising Wikipedia behaviour with Google would also better help us understand the nature of reuse of Wikimedia content, especially if studied in the context of Knowledge Panels. Such an analysis would allow us to contribute to the characterization of the economic value of Wikipedia on the internet [1]. Another matter of concern for Wikimedia is how search drives traffic to Wikipedia - for example, they recently released a dataset to explore search relationships with Wikipedia across different axes [2].

A key aspect of the relationship between Google and Wikipedia that is not currently explored in the literature is what proportion of google search queries are either answered by Wikipedia or lead to a search user visiting Wikipedia. In other words - what fraction of search interest expressed on Google is met by Wikipedia? Here, we refer to interest being met by Wikipedia if a user clicks on a link to Wikipedia. We already know that a certain amount of these views are “cannibalized” by Google via the Knowledge Graph panels, but we do not (at the moment) focus on this - even though it could be argued that viewing these panels could also be thought of as an interest being “met” by Wikipedia.

Related Literature edit

Work by Vincent et al 2019 [3] and MacMohan et al 2017 [4] attempt to disentangle the relationship between Wikipedia and search engines, and highlight the importance of Wikipedia for search, and specifically google search. They find that Google is very reliant on Wikipedia in returning relevant search results, and that Wikipedia articles appear in over 80% of (first) results pages and appear in the particularly important “top three links” over 50% of the time. Aside from the direct links to Wikipedia on search results, the Google “Knowledge Graph” and its effect on Wikipedia has also been studied. Indeed, the Knowledge Graph was the primary motivating reason behind the study, with early reports in 2013 suggesting that nearly 33% of traffic to wikipedia pages dropped as a potential consequence of the knowledge graph.

These Knowledge Graphs are often put together using Wikipedia based data, but they do not credit Wikipedia; and often “cannibalize” the page views that would otherwise go through to Wikipedia. As we mentioned before, exploring the exact nature of this relationship is more difficult given that we do not have publicly available data on when a knowledge panel is added on a google search result. The study conducted by MacMohan et al attempts to quantify precisely this, by using a web plug-in to hide Wikipedia based results to study the click through rates, and find that Knowledge Panels indeed stop people from visiting Wikipedia. In another study (Johnson et al, 2021[5]), the authors show that, on DuckDuckGo, Wikipedia is sought after by users even when the information is already present in an “information module” (similar to knowledge graphs).

So we see that for web search engines, Wikipedia is crucial, as it greatly improves the quality of search . On the other side of this, Wikipedia also benefits heavily from search engine traffic, with nearly 75% of traffic coming from search, of which 90% is Google (link). So this relationship is seemingly symbiotic, but the details of many peculiarities are still hidden from us.

Understanding how the relationship between Google and Wikipedia differs for different topics also contributes to existing literature on how search and navigation differs for topics (Piccardi et al 2021[6], Dimitrov et al 2018 [7], Rodi et al 2017 [8]). We are also interested in the temporal properties of the proportion of Wikipedia visits, and estimating models of predicting proportions based on current events (Xie et al, 2019 [9]).

References edit

  1. Piccardi, Tiziano; Redi, Miriam; Colavizza, Giovanni; West, Robert (2021-04-19). "On the Value of Wikipedia as a Gateway to the Web". Proceedings of the Web Conference 2021 (Ljubljana Slovenia: ACM): 249–260. ISBN 978-1-4503-8312-7. doi:10.1145/3442381.3450136. 
  2. "Searching for Wikipedia". 
  3. Vincent, Nicholas; Johnson, Isaac; Sheehan, Patrick; Hecht, Brent (2019-07-06). "Measuring the Importance of User-Generated Content to Search Engines". Proceedings of the International AAAI Conference on Web and Social Media 13: 505–516. ISSN 2334-0770. doi:10.1609/icwsm.v13i01.3248. 
  4. McMahon, Connor; Johnson, Isaac; Hecht, Brent (2017-05-03). "The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies". Proceedings of the International AAAI Conference on Web and Social Media 11 (1): 142–151. ISSN 2334-0770. doi:10.1609/icwsm.v11i1.14883. 
  5. "Searching for Wikipedia: DuckDuckGo and the WMF". 
  6. Piccardi, Tiziano; Gerlach, Martin; Arora, Akhil; West, Robert (2023-01-13). "A Large-Scale Characterization of How Readers Browse Wikipedia". ACM Transactions on the Web: 3580318. ISSN 1559-1131. doi:10.1145/3580318. 
  7. Dimitrov, Dimitar; Lemmerich, Florian; Flöck, Fabian; Strohmaier, Markus (2019-06-25). "Different Topic, Different Traffic: How Search and Navigation Interplay on Wikipedia". The Journal of Web Science 6. ISSN 2332-4031. doi:10.34962/jws-71. 
  8. Rodi, Giovanna Chiara; Loreto, Vittorio; Tria, Francesca (2017-02-02). "Search strategies of Wikipedia readers". PLOS ONE 12 (2): e0170746. ISSN 1932-6203. PMC 5289465. PMID 28152030. doi:10.1371/journal.pone.0170746. 
  9. Chelsy Xie, Xiaoxi; Johnson, Isaac; Gomez, Anne (2019-05-13). "Detecting and Gauging Impact on Wikipedia Page Views". Companion Proceedings of The 2019 World Wide Web Conference (New York, NY, USA: ACM). doi:10.1145/3308560.3316751.