Traffic reporting

Traffic reporting refers to the collection of traffic information for Wikimedia content then processing that data in some way to present a meaningful insight.

Examples

edit
  1. Someone might be interested in discussing the extent to which Wikipedia is a popular source of information about a set of topics. To introduce data into the discussion, they might collect pageview data on a set of Wikipedia articles, then present a report that says, "These 20 articles had 1,000,000 pageviews in the past month." That information might provide insight about whether developing Wikipedia articles is a useful way to share information as compared to other communication strategies.
  2. Someone might share images in Wikimedia Commons, then be curious about how many people view those images. They might collect pageview data for images that interest them, then use that information as a basis for thinking about how useful it is to spend time sharing images in Wikimedia projects.

Definitions

edit
  • Web traffic - the attention that readers give to particular online media. People who communicate online typically want more web traffic from more relevant audiences, and they might want the audience to respond in a particular way. There is no standard way to measure or report web traffic. Different websites report traffic in different ways. The Wikimedia community has its own standards and customs for discussing traffic, as does every other online community and website.
  • Pageview - a user request for a Wikimedia page or file. Pageview is the standard measure for reporting web traffic for any Wikipedia article. One pageview represents one user request for an article.
  • Unique user - a single count per person who uses a web service. Many web traffic reports include a report of unique users. For privacy reasons, Wikimedia projects are designed to technically avoid collecting this common traffic reporting metric.
  • End user - a person who will use a tool or process. Often "end user" is used to refer to a person with no technical background or special training, and who will use a tool in the manner of a typical person

Traffic reporting for Wikipedia articles

edit

Pageviews Analysis

edit
Main article: Pageviews Analysis

The Pageviews Analysis tool is the most basic tool available for giving pageview reports on Wikipedia articles to typical Wikimedia community members. If given the title of a Wikipedia article and a date range, this tool reports how many pageviews that article got in that range. You can enter up to 10 pages for comparison.

  • Pageviews Analysis - this is the tool
  • Community Tech/Pageview stats tool - Development of the tool was advanced following the 2015 Community Wishlist Survey, and this page reports the work responding to that survey
  • Typical Wikipedia users access this tool by starting at any Wikipedia article, clicking "view history" in the top menu, then clicking "page view statistics". Doing this generates a report for the past 20 days for that article; the reporting period can be adjusts back to the most recent of the date of creation of the article and 2015-07-01, when data collection for this tool began.

Anyone who wants a quick traffic report of up to 10 Wikipedia articles may use this tool. It has a range of options for providing variations in the information it reports.

MusikAnimal established the tool.

Massviews Analysis

edit

The Massviews Analysis tool is the most basic tool available for giving pageview reports on over 10 Wikipedia articles to typical Wikimedia community members. If given a list of titles of Wikipedia articles and a date range, this tool reports how many pageviews each of those articles have in that range.

Here are the steps for using this tool with PagePile:

  1. Identify a set of Wikipedia articles for which pageview reports are desired
  2. Use another tool, called "Page Pile", to create a list naming each article in that set
    1. Page Pile
    2. Create new pile
    3. Collect the identification number for the Page Pile - this number can be used as input for various tools that generate various reports
  3. Input the Page Pile identification number into Massviews, along with a date range
  4. The output is a report of pageview traffic for multiple articles

MusikAnimal established the tool.

WikiShark

edit

WikiShark is a tool notable for providing access to Wikipedia traffic since 2008.

Wikipedia Tools for Google Spreadsheets

edit

Wikipedia Tools for Google Spreadsheets is a set of tools to provide data and analyze data about sets of Wikipedia articles.

The tool is provided by Tomayac.

Wikipediaviews.org

edit

Wikipediaviews.org is a tool for presenting pageview information. If given a list of titles of Wikipedia articles and a date range, this tool reports how many pageviews each of those articles got in that range.

The tool is provided by Vipul.

Stats.grok.se

edit

Stats.grok.se, also known as "Henrik's tool", was a pageview reporting tool which was a predecessor to the Pageviews Analysis tool and was accessed in the same way. The tool was the only common way for end users of Wikipedia to access pageview information from January 2008 to early 2016, when software changes broke the tool.

This tool is no longer functional. It was uniquely influential in guiding Wikipedia traffic discussions for the entirety of its working existence.

The tool was provided by Henrik.

Traffic reporting for Wikimedia Commons files

edit

Wikimedia Commons is a Wikimedia repository for non-text media files, most commonly images. Most images in other Wikimedia projects are actually hosted in Wikimedia Commons. For example, most images in any Wikipedia are stored in Commons, and any image in Commons might be reused in multiple Wikipedias for different languages.

Anyone who shares or curates files in Wikimedia Commons might wish to track how many views a file is getting. Files in Commons are typically viewed outside of Commons, like for example, in a Wikipedia article, and not in the Commons file cataloging system. Because of this, counts for views to Commons files typically include pageview reports for Wikipedia articles containing those files.

GLAMorgan

edit

GLAMorgan is a tool which generates reports of pageviews of Wikipedia articles which contain all of the files in any given Wikimedia Commons category.

Data source

edit

See wikitech:Analytics/Data/Mediacounts for the data on actual downloads for files and methods to use the data.

Usefulness of this information

edit

Who uses this data

edit

Web traffic reporting is useful for anyone who needs information about the popularity, use, and impact of content on Wikipedia.

Communications professionals are the most likely demographic to be the most interested in investing time to track and consider web traffic reports. Data scientists, communication researchers, and anyone with a general interest in the present state of communication might be interested in this data.

How this data is used

edit

Typically web traffic data is collected with the intent to compare it to other, similar traffic data reports. For example, someone might collect Wikipedia web traffic data, and then collect communication data on the traffic to another website or communication platform, then compare the traffic numbers. The comparison of traffic reports provides insight and justification for making decisions about labor investments in communicating through different channels.

To what extent does this data matter

edit

Web traffic data from Wikimedia projects is a way to measure and describe the popularity of Wikimedia projects as an information source. For many topics in many languages, Wikimedia projects host information resources which are extremely popular or the most popular sources of information for that topic. If an individual or organization has supporting evidence which establishes that Wikipedia is a popular source of information for their field of interest, then that individual or organization might respond by engaging with the Wikimedia community and Wikimedia projects to collaborate in sharing information.

Original Domas stats

edit

Tools providing consultation statistics

edit

Raw data

edit

Raw about consultation data is made available by Domas Mituzas. Files can be found here. One compressed file is created every hour and contains counts of every requests of a page from Wikipedia and other related projects. This data also contains pages which do not actually exist, and is not processed: for example, redirected pages are counted separately, as are pages accessed using different encodings.

Data format

edit

The data is stored in files such as pagecounts-20100412-170000.gz, indicating the date (12 April 2010) and hour (1700). Each line in the file follows the following format:

en White_lead 9 122588
en White_lie 2 138038
en White_lies 3 18907
en White_light 2 45042
en White_light_scanner 1 7881
en White_lion 9 152551

where the four fields correspond to

  • the project code
  • the article name
  • the number of hits
  • the total of bytes transferred

The project code is also to be interpreted as:

  • en.b for Wikibooks
  • en.d for Wikidictionary
  • en.m for Wikipedia mobile
  • en.m.b for Wikibooks mobile
  • ecc.

Tools

edit

The Kiwix project provides a few tools to download and merge these usage stats. They are available here:

To get merged, cumulated and consequently smaller log files, simply call periodically these three scripts with the directory where you want to store the logs as first and only argument.

Archives

edit

Files are available at dumps.wikimedia.org.

See w:en:User:Emijrp/Wikipedia Archive#Domas visits logs.

Daily user pageviews for all local Wikipedies (assembled by Dušan Kreheľ, separately local Wikipedies, format d0cmf, from 2015-07) are stored on archive.org (link).

See also

edit