Grants:Project/Dataviz 4 Wiki


statusnot selected
Dataviz 4 Wiki
summaryLet’s make data visualisation on Wikimedia projects better!
targetWikimedia Commons, English Wikipedia, Italian Wikipedia
type of granttools and software
amount56.138,17 €
nonprofitYes
contact• caranti(_AT_)balcanicaucaso.org• lauricella(_AT_)balcanicaucaso.org
volunteerMimauriKjeanclaudeEcpp
organization• Osservatorio Balcani e Caucaso Transeuropa/Centro per la Cooperazione Internazionale (OBCT/CCI)
this project needs...
volunteer
advisor
join
endorse
created on16:41, 17 February 2020 (UTC)


Project idea edit

What is the problem you're trying to solve? edit

What problem are you trying to solve by doing this project? This problem should be small enough that you expect it to be completely or mostly resolved by the end of this project. Remember to review the tutorial for tips on how to answer this question.

Data visualisation is sometimes better than simple text to convey information. However, creating data visualisations on Wikipedia and other Wikimedia projects is cumbersome, so they are underused, and -when they are used- they force contributors to waste a lot of time. Moreover, results are often unsatisfactory from a graphical point of view, and in most cases it is not possible to easily download the data in order to reuse them, undermining Wikimedia’s mission of disseminating content effectively.


What is your solution to this problem? edit

For the problem you identified in the previous section, briefly describe your how you would like to address this problem. We recognize that there are many ways to solve a problem. We’d like to understand why you chose this particular solution, and why you think it is worth pursuing. Remember to review the tutorial for tips on how to answer this question.

Building on our experience with Wikipedia editing, data journalism and data visualisation we plan to create a new system that will easily allow to use existing Mediawiki technologies (namely the Data namespace and the w:template:Graph).

The solution will allow contributors to upload datasets on Wikimedia Commons, then straightforwardly use them for creating data visualizations on other Wikimedia projects. Moreover, we want to allow readers to download the datasets in the preferred format.

Using Commons makes sense because contributors are already used in going there to upload or to find content which can be used in every project. This solution complements other solutions that may be based on Wikidata: while some of the dataset could be imported on Wikidata, this would require a long time and it is not a process that can be handled by single users. Uploading datasets on Commons, on the other hand, would be a first step to separate data from their visual form, allowing then imports and export toward other sister projects on the Wikimedia ecosystem. Moreover, while data on Wikidata must be PD (CC0), datasets on Commons can use other free licenses, allowing us to re-use data from free-but-not-PD sources.

Below, a quick description fo how we plan to design such tool.

What kind of tool?

An external tool, probably hosted on Toolforge based on a similar approach we used in RAWGraphs: through a visual interface, connect to the datasource in .tab format through link, map data fields to visual variables, make some visual choices, export the wikicode needed to render that chart using the Extension:Graph

Do we even know what types of problems users are facing? What will be different about your tool to ensure that it actually meets the needs of wikipedia editors?

We based the proposal on the issues we found trying to make charts using current solutions.

  • Polestar/Lyra: both of them have a visual interface, but are quite complex to learn and focused on Vega. The code they generate is not directly compatible with Extension:Graph since it must be re-incapsulate and not all the function of Vega AFAIK are supported by the extension.
  • Vega editor: it needs to know the Vega grammar, and it's not that simple to create even simple charts. Doesn't provide any graphical interface, and no support for .tab dataset.
  • Templates (e.g. w:template:Graph:Chart): don't support for the .tab format dataset, requires data encoding in (yet) another format.

What type of visualizations will your tool support developing?

We want to provide a simple to use tools that allows the creation of charts meant specifically for Wikipedia, that can be expandable, and that relies on the exiting extension. For now the goal is to provide the most common charts (bar charts, line charts) and maybe test some more advanced.

Project goals edit

What are your goals for this project? Your goals should describe the top two or three benefits that will come out of your project. These should be benefits to the Wikimedia projects or Wikimedia communities. They should not be benefits to you individually. Remember to review the tutorial for tips on how to answer this question.

  1. Simplify the creation of visualisations on Wikimedia projects by developing a new data visualisations tool in close contact with the community
  2. promoting the use of data visualisations in the Wikimedia community by demonstrating our tools and data more in general
  3. promoting the re-use of data and visualizations that are available on Wikimedia projects among data journalists and other readers

Project impact edit

How will you know if you have met your goals? edit

For each of your goals, we’d like you to answer the following questions:

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)

For each of your answers, think about how you will capture this information. Will you capture it with a survey? With a story? Will you measure it with a number? Remember, if you plan to measure a number, you will need to set a numeric target in your proposal (i.e. 45 people, 10 articles, 100 scanned documents). Remember to review the tutorial for tips on how to answer this question.

Outputs
  • a tool allowing for the upload of datasets to Commons from csv or xlsx files, also allowing copy and paste interaction to insert data
  • a tool to simplify the creation of data visualization
  • documentation in at least 2 languages (English and Italian)
  • community engagement with at least two wiki language communities (English and Italian)
  • outreach about reusable data on Wikimedia projects among data journalists
Outcomes
  • our tools will continue to be used by the community
  • the use of our tools will expand beyond the original language communities
  • data on Wikimedia will be reused by journalists and other readers

Do you have any goals around participation or content? edit

Are any of your goals related to increasing participation within the Wikimedia movement, or increasing/improving the content on Wikimedia projects? If so, we ask that you look through these three metrics, and include any that are relevant to your project. Please set a numeric target against the metrics, if applicable.

  1. Total participation: 100 (sample users, audience to our presentations)
  2. Number of newly registered users: Not Applicable
  3. Number of content pages created or improved, across all Wikimedia projects: 16.000 (we are counting Data pages on Commons as content pages)
  4. Number of sample users: 50
  5. Number of new datasets on Commons: 8000
  6. Number of new data visualisations on Wikimedia projects: 8000

Project plan edit

Activities edit

Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

We are dividing this section into two subsections: the first one describes people’s general roles and activities, the second one describes project activities more in detail. The table on the right gives a raw timeline of the work. We expect to start in September 2020 but this may change because of the disruption caused by COVID-19 pandemic.

Month Activity (dev) Activity (dissemination)
September 2020 Kick-off meeting and Software Design Kick-off meeting

Writing manual about data usage and database rights

October 2020 Coding

Milestone: Alpha Version

November 2020
December 2020 Testing and Debug

Milestone: Public Beta

Testing alpha and finding beta testers
January 2021
February 2021 Deployment

Milestone: Release Candidate

Beta testing
March 2021
April 2021 Writing manual about tools
May 2021 Online and offline dissemination
June 2021 Tools refinement
July 2021
August 2021

People’s general activities edit

  • Niccolò Caranti will act as liaison between the team and the Wikimedia community
  • Giuseppe Lauricella will act as project/technical coordinator
  • Michele Mauri will coordinate the design and development of the tool

DensityDesign Lab (Politecnico di Milano) will provide resources for the design and development of the tool.

Specific activities edit

Development
  • We will address two needs: (1) ease the upload and download of data in tab format, and (2) ease the creation of charts using existing technologies on Wikimedia (e.g. Graph extension).
  • To address the first need, we will provide a new tool allowing for the upload of datasets to Commons from csv or xlsx files, also allowing copy and paste interaction. The tool won’t require any log-in.
  • To address the second need, we will design a new tool, based on the state of the art, to simplify the creation of data visualization using the built-in extensions of Mediawiki (e.g. Graph extension).
  • While we already made a preliminary state of the art, we will endeepen it by involving the Wikimedia community in understanding which are practices and approaches already identified, in order to propose a solution that can ease existing practices
  • We will build our tools on existing and robust solutions that can therefore guarantee support in the future. We will probably start from the work done by DensityDesign on RAWGraphs, an open source platform which will soon release a new version thanks to a crowdfunding campaign.
  • We will develop the tool using git-based platforms (e.g. GitHub) in order to provide access to code to all the community and as well a space for providing issues.
Dissemination
  • We will write documentation in at least two languages (English and Italian). This will include guidelines on how to create new datasets, and how to reuse existing ones, including a discussion about copyright
  • we will work with volunteers to translate documentation and spread knowledge of the tools in more languages
  • Wikipedia users learn more by imitation than by reading manuals, so we ourselves will spread graphs created with our tools so that people will be able to find out about them and imitate them
  • we will organise at least one seminar (during a Wikimania or another wiki conference) to present our tools to the wiki community
  • we will present our tools to data journalists at one of the meeting of the EDJNet project
  • we will also propose a presentation to one of the several conferences about data journalism that take place yearly in Europe (e.g. Dataharvest)

As a demo we plan to upload on Commons the datasets of the demographics of the ca 8000 Italian municipalities, and to use them in the Italian Wikipedia articles, in cooperation with the WikiProject “comuni italiani”. Since ISTAT (the Italian National Institute of Statistics) data are released with a cc-by-3.0 license they cannot be used on Wikidata (discussion on Italian Wikipedia) but can be uploaded on Commons. This will allow us to update and improve current data visualizations, also widely demonstrating the functioning of our tool.

Budget edit

How you will use the funds you are requesting? List bullet points for each expense. (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!

Request for WMF
Item Cost Hours Rate Notes
Wikipedia liason 18.040 € 880 20,50 €/hour Niccolò Caranti
Technical Coordinator/Manager 14.994,70 € 691 21,70 €/hour Giuseppe Lauricella
Development 15.000 € 150 100 €/hour DensityDesign
Travels 3.000 € n/a n/a
Administrative fees 5.103,47 € n/a n/a 10% of the above
Total 56.138,17 € 60.617,20 $ (Oanda, Feb 20, 2020)
Cofinancing
Item Cost Source Notes
Lorenzo Ferrari 2.000€ OBCT/CCI editorial coordinator of EDJNet, he will advise us on the reusability of data on Wikimedia by data journalists and will help us organize a presentation with journalists and media that are part of the network
Budget narrative

The final objective of the project is the creation of a software that takes into account in the design the need to remain "alive" over time and be available to users for further developments. For this reason, the commitment of the available resources will unfold on 3 fronts: software design and development, improvement and adaptation to user needs, dissemination of the knowledge of the tools and the opportunities they offer, both inside the Wikipedia community and outside, with a focus on data journalists.

Niccolò Caranti will act as Wikipedia liason, being in charge of community engagement and dissemination. He will constantly provide the other team members the information they need about the inner working of Wikimedia projects. Together with the other team members he will write the manuals and the documentation. He will be an alpha tester and will find the beta testers in the community. He will personally engage the English and Italian community, and contact translators in order to reach other language communities.

Giuseppe Lauricella will coordinate the organization and technical part of the project, supervise the development of the software, debug the tools and verify the feasibility of requests for changes by filtering them towards the developers. He will collect and process the data necessary for the "Italian Municipalities" demo and will collaborate in the drafting of the manuals for the technical-IT part.

DensityDesign lab will provide the developers, and its scientific director Michele Mauri will coordinate the design and development of the tool in a volunteer role.

Travel money will be used for internal meetings and external outreach.

OBCT will supervise the administrative side (contracts, payments, and travels) of the involvement of the grantees and DensityDesign Lab.

Community engagement edit

How will you let others in your community know about your project? Why are you targeting a specific audience? How will you engage the community you’re aiming to serve at various points during your project? Community input and participation helps make projects successful.

We will focus on the English and Italian community. Working with the English community will allow us to have a huge reach, including other language communities through translations. The Italian community is the one we know better, and because of where are located we can easily organise in-person meetings at a low cost.

Relying on the experience gained by the DensityDesign team in the development of open-source tools (such as RAWGraphs) the development will be carried out collecting and addressing feedback from the community.

Community engagement will be one of the pillars of this project. Our purpose is not just to create great tools, but to have them widely used. For this purpose we will engage the community in many ways, including:

  • general discussion pages, such as the village pump on enwiki and the “bar” on itwiki
  • WikiProjects that use graphs widely (e.g. WikiProject Municipalities of Italy on itwiki, since every article about a “comune” as a graph with the Demographic evolution that could be remade)
  • Facebook groups such as “Wikipedia in italiano”, “Wikipedia weekly”
  • mailing lists

COVID-19 planning edit

Our offline activities should hopefully take place after the COVID-19 pandemic is finished: specifically we are planning to do (online and) offline dissemination in May-August 2021. If at that time it will be still unsafe to travel the events to which we intend to participate will be probably moved online (in that case we will participate online) or canceled (in that case we will do our dissemination differently). In any case the project would still be perfectly doable. We are also planning a kick-off meeting for the Autumn 2020, but it is even easier do it online.

Get involved edit

Participants edit

Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.

We are dividing this section into two subsections: the first one describes the main people participating in this project, the second one describes the two organisations involved.

Main people
  • Niccolò “Jaqen” Caranti (grantee) is a Wikipedian since 2006. He is a sysop on Italian Wikipedia since 2007 and on Wikimedia Commons since 2013. He has co-organised itWikiCon 2017, the first conference of the Italian-speaking Wikimedia community and several smaller events. In 2017 he has worked for the WMF as Italian Language Specialist Strategy Coordinator. He has collaborated with Wikimedia Italy for several projects, including being Wikipedian in Residence at MUSE, the science museum in Trento. He has been working at OBCT/CCI since 2018: amongst his responsibilities he is working on the Wiki4MediaFreedom initiative and sharing data visualizations from the European Data Journalism Network.
  • Giuseppe Lauricella (grantee) is a historian, database expert and web apps developer in Python, Java and PHP languages. He worked for the Bank of Italy, the European University Institute and the University of Siena where he also taught economics and computer science for humanists for years. In recent years he has taught at the University of Modena and worked as a software developer and SEO manager for OBCT/CCI within the EDJNet (European Data Journalism Network) project
  • Michele Mauri (volunteer) is a full-time researcher at Politecnico di Milano, Design Department. He is the scientific director of DensityDesign Lab, which focuses on visual representation and information design. He collaborated on several open-source projects such as RAWGraphs. His research focuses also on Wikipedia, both as object of study (e.g. the “Contropedia” platform, see “Digging Wikipedia: The Online Encyclopedia as a Digital Cultural Heritage Gateway and Site”) and as a space for the improvement of data visualization practices (see “Designing diagrams for Wikipedia”).
Organisations
  • Osservatorio Balcani e Caucaso Transeuropa/Centro per la Cooperazione Internazionale (OBCT/CCI) will act as fiscal sponsor. OBCT, an operative unit of the CCI, has been involved with Wikipedia since 2015, when it first hosted a Wikipedian in Residence in cooperation with Wikimedia Italia. In 2016 it started the Wiki4MediaFreedom initiative: through the years in its context OBCT has organized 4 international edit-a-thons (in Serbia, Bulgaria, Spain and Germany, in cooperation with the local Wikimedia chapters) and 2 writing and translation contests. Since 2018 OBCT is contributing to Wikimedia projects with data and data visualizations from the European Data Journalism Network (EDJNet) it is coordinating. OBCT has also contributed to articles about the European Parliament, organized a celebration for the 19th birthday of Wikipedia in its headquarter in Trento, and is active in projects with schools and universities. CCI is a non-profit association.
  • DensityDesign is a Research Lab in the Design Department of the Politecnico di Milano. It focuses on the visual representation of complex social, organizational and urban phenomena. Although producing, collecting, and sharing information has become much easier, robust methods and effective visual tools are still needed to observe and explore the nature of complex issues. Its research aim is to exploit the potential of information visualization and information design and provide innovative and engaging visual artifacts to enable researchers and scholars to build solid arguments. By rearranging numeric data, reinterpreting qualitative information, locating information geographically, and building visual taxonomies, we can develop a diagrammatic visualization—a sort of graphic shortcut—to describe and unveil the hidden connections of complex systems. Its visualizations are open, inclusive, and preserve multiple interpretations of complex phenomena. DensityDesign is committed to collaborating with other researchers and organizations devoted to academic independence and rigor, open enquiry, and risk taking to enhance our understanding of the world.
Volunteers
  • Volunteer Hello, I think I can help as an Alpha and Beta tester. Kjeanclaude (talk) 08:47, 20 February 2020 (UTC)
  • Volunteer I am working on a thesis about Wikipedia and Data visualization, so I would like to participate as a volunteer Ecpp 21:11, 5 March 2020 (UTC)

Community notification edit

You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc. Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?

Survey

In December 2019 we conducted a survey on data visualisation on Wikipedia in English and Italian.

The survey was promoted through various channels including Italian Wikipedia’s bar, Facebook groups Wikipedia Weekly and Wikipedia in italiano, mailing lists, etc.

43 people answered:

  1. Most people said that they would like to find more graphs in Wikipedia articles.
  2. A slight majority would have wished they could insert a graph in an article more than 10 times, ca one third less than times, while 14% of the respondents have never felt that necessity.
  3. Nearly everybody find interactive graphs useful
  4. A slight majority would prefer a more versatile but complex future instrument, while a slight minority would prefer a more simple but rigid tool.
  5. Three quarters of the respondents have never used development environments inside Wikipedia.
  6. A huge majority think it would be desirable to have a data visualisation tool closely interacting with Wikidata
  7. A huge majority think it would be important to be able to use already published tables as data sources
  8. 70% of the respondents would be willing to collaborate in developing such a tool.

Here are the detailed results:

1. Would you like to find more graphs in Wikipedia articles?


2. Have you ever wished you could insert a graph while writing an article?


3. Do you find interactive graphs useful?


4. Would you prefer a possible future instrument that is more user-friendly, but with limited flexibility, or more powerful, with greater possibilities of expansion, but with a steeper learning curve?


5. Have you already used other development environments (e.g. LUA) inside Wikipedia?


6. Do you think it would be desirable to have a data visualization tool closely interacting with Wikidata?


7. Would it be important to be able to use already published tables as data sources?


8. Would you be willing to collaborate in developing such a tool, even just as a sample user?


Proposal

Endorsements edit

Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).

  • We need graphs and we need simple tools: this project is a great idea!--Ferdi2005[Mail] 16:49, 17 February 2020 (UTC)
  •   Support we need more datasets and interesting data visualizations on the articles --Sabas88 (talk) 17:23, 17 February 2020 (UTC)
  •   Support I'd love to see a development of the Data namespace on Commons, this could be an interesting experiment. Sannita - not just another it.wiki sysop 17:49, 17 February 2020 (UTC)
  •   Support - I have known DensityDesign Lab of Politecnico di Milano for many years. They have been a partner of Wikimedia Italia for the Archeowiki project in 2012-2013. They are also the developers of the free and open-source visualization framework RAWGraphs. I can vouch for their extensive experience both with data visualizations and working with data from Wikipedia. Jaqen has shown his capabilities as an organizer through participating in various projects with Wikimedia Italia (including Archeowiki, among others). I think this project is well-suited for a Wikimedia project grant. CristianCantoro (talk) 20:30, 17 February 2020 (UTC)
  •   Support Data visualization can be very useful in mani Wikipedia articles and the team has all the competences to deliver the expected results. Mirko Tavosanis (talk) 09:08, 18 February 2020 (UTC)
  •   Support We really do need some aid in producing visual support to WP articles; Wikidata is way too complex for non-experts. M&A (talk) 11:07, 18 February 2020 (UTC)
  •   Support; A much needed improvement. [Disclosure: I am occasionally paid to teach at Politecnico di Milano. However, no one from PoliMi has ever commencated with me about this proposal.] Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:03, 18 February 2020 (UTC)
  •   Support I believe this initiative could, besides improving the visual aspects of articles, also pave the way for the long-awaited global templates. Much needed! —XanonymusX (talk) 12:57, 18 February 2020 (UTC)
  •   Support Very interesting initiative! A more effective way for data visualization would be very useful. --Uomovariabile (talk) 13:51, 18 February 2020 (UTC)
  •   Support User-friendly tools are very important to show results of Wikimedia projects. Nonoranonqui (talk) 14:07, 18 February 2020 (UTC)
  • Great idea! Libcub (talk) 00:59, 19 February 2020 (UTC)
  • Excellent and very useful initiative! We all need data visualizations on Wikipedia articles. Also, we could need to import useful datasets for re-use. Kjeanclaude (talk) 08:43, 20 February 2020 (UTC)
  •   Support A better data visualization could help Wikimedia projects do achieve in a better way their mission to spread knowledge. It could also help all projects to "fill the gap" with other and newer websites, that use more powerful softwares. --CristianNX (talk) 09:37, 22 February 2020 (UTC)
  •   Support Data visualization often is more immediate and expressive than a lot of words. The current lack of evident user friendly tools IMHO is a limiting aspect (the current tools are almost hidden and you may bet that less experienced users would never notice them or, if they do, would have no clue about how using them), therefore I fully endorse this proposal. Developing a simple (from the user experience perspective) yet effective tool that also less experienced users can easily access, will be definitely a big improvement for the projects. The discussion below clearly highlights that the current tools are practically unknown to the large majority of contributors and that's a real pity. If this proposed project would help both in bringing them up and in easing the user access and interaction, definitely any cent spent for this grant would be worth.--L736Etell me 10:28, 22 February 2020 (UTC)
  •   Support Wikidata needs new tool! Alessandra Boccone (talk) 14:48, 24 February 2020 (UTC)
  •   Support Interesting project that uses the little known Data namespace. Afnecors (talk) 13:44, 26 February 2020 (UTC)
  •   Support As a huge fan of data visualization! Daimona Eaytoy (talk) 17:51, 28 February 2020 (UTC)
  •   Support Reasonable proposal, good luck for development and communications when it's successful! —DerHexer (Talk) 11:40, 20 March 2020 (UTC)