Open main menu
Noto Emoji Oreo 1f4c4.svg This is an essay. It expresses the opinions and ideas of some Wikimedians but may not have wide support. This is not policy on Meta, but it may be a policy or guideline on other Wikimedia projects. Feel free to update this page as needed, or use the discussion page to propose major changes.

Wikidata is licensed as CC0. The choice of license was pragmatic: it allows the structured data the widest possible latitude in adoption. However this license is fundamentally irreconcilable with CC-BY-SA: content from Wikidata may be used anywhere, but content being added may not come from any CC-BY-SA licensed source.

In effect this requires forking of project missions where Wikidata products overlap, resulting in redundancies, or it requires violating copyright - so called 'license laundering'. It also causes completely unnecessary stress and friction between contributors.

Effect on authorsEdit

CC0 does not adequately represent or protect droit moral, a collection of legal concepts which are not generally recognized in the USA but are enshrined in law in other jurisdictions. These rights are non-economic, but have been discussed in Wikimedia communities since at least 20031.

One area of effect is the ability to disclaim authorship, especially of an altered work. With Wikidata's implementation it is not possible to dissociate an author from a node which the author has edited without third-party intervention. This is particularly the case for a poisoned pool event - where a person or group deliberately spoils the accuracy of an upstream dataset to harm competitors or to have the only known-accurate data. What this means is bad data may be credited to a contributor, potentially harming their reputation.

Despite this, for many and probably most contributors CC0 will have a very minor influence on their copyrights. This low-risk assessment leads to my conclusion that CC0 will have little effect for authors, but it does suggest a specific risk they should consider.

Effect on reusersEdit

The use of CC0 is often championed by for-profit organizations. The use of CC0 allows them to use crowd-sourced data to generate proprietary revenue-generating products. While this is not negative or harmful, it generously rewards organizations for exploiting volunteer efforts and dis-incentivises open-data-ing the organization's products.

For example, Google Maps is used to collect real-time traffic data from millions of volunteer devices being carried in millions of vehicles, which Google processes and markets in real-time and for retrospective analysis, yet the raw data is not available from free apis. (It is available through Google's own scripts - var trafficLayer = new google.maps.TrafficLayer(); - which, you guessed it, also collect data about the device which is accessing the map layer.)

As has been shown by Wikimedia projects, OpenStreetMap, and a quiverfull of others, CC0 is not necessary to bring about wide adoption and use of open data. Perhaps the most technically challenging due to the variety of input sources - many of an academic nature and others of an amateur nature - is OpenWeatherMap, an amazing project which makes live weather data and forecasts available under CC-BY-SA 4.0.

Open APIs are largely not under CC0 or equivalent licenses, and yet are reported to be more widely used than proprietary ones (web servers themselves actually constitute an open api consumed by html browsers, in one interpretation I suppose.) OpenStreetMap data can reliably be discovered in Bing and Google maps within a few minutes of edits, suggesting these are both closely monitored and intimately integrated in enterprise-level re-use.

Therefore my conclusion is the effect of CC0 on reusers is to encourage proprietary reprocessing of data, with no actual benefit as regards data uptake. It also may discourage corporate release of mashup products under a public license - under US law it would likely be illegal for corporate officers to allow such an asset dilution.

General conclusionsEdit

While the CC0 license appears, on the face of it, to be the best pragmatic choice for making data available, in practice it is not supportive of the Wikimedia missions. It also is unnecessary, given the history of other open data projects. While it may have little effect on most contributors, it will increase risks for acadmics and others whose reputation (and moral rights) may be endangered by subsequent content alteration.

Looking at the big picture suggest CC0 is not a good long-term license for crowd-sourced/multi-sourced content, whether that is structured data or inspired prose.