European Commission copyright consultation/Data mining

Disabilities European Commission copyright consultation
Text and data mining
User-generated content


The European Commission is considering modernizing European copyright laws. To get feedback and input on this modernization, the Commission has published a series of questions, and is looking to interested stakeholders (like our community) to answer them. This is a vital opportunity to participate in a dialogue that could have a major impact on copyright laws and the future of the free knowledge movement. More background is available from the European Commission.

We would like to prepare a draft response here, as a collaborative experiment. If we wish to respond, it will need to be finalized before the end of January 2014 (see the proposed timeline).

Welcome to the discussion! Please help by answering the questions below.

Text and data mining edit

Text and data mining/content mining/data analytics[1] are different terms used to describe increasingly important techniques used in particular by researchers for the exploration of vast amounts of existing texts and data (e.g., journals, web sites, databases etc.). Through the use of software or other automated processes, an analysis is made of relevant texts and data in order to obtain new insights, patterns and trends.

The texts and data used for mining are either freely accessible on the internet or accessible through subscriptions to e.g. journals and periodicals that give access to the databases of publishers. A copy is made of the relevant texts and data (e.g. on browser cache memories or in computers RAM memories or onto the hard disk of a computer), prior to the actual analysis. Normally, it is considered that to mine protected works or other subject matter, it is necessary to obtain authorisation from the right holders for the making of such copies unless such authorisation can be implied (e.g. content accessible to general public without restrictions on the internet, open access).

Some argue that the copies required for text and data mining are covered by the exception for temporary copies in Article 5.1 of Directive 2001/29/EC. Others consider that text and data mining activities should not even be seen as covered by copyright. None of this is clear, in particular since text and data mining does not consist only of a single method, but can be undertaken in several different ways. Important questions also remain as to whether the main problems arising in relation to this issue go beyond copyright (i.e. beyond the necessity or not to obtain the authorisation to use content) and relate rather to the need to obtain “access” to content (i.e. being able to use e.g. commercial databases).

A specific Working Group was set up on this issue in the framework of the "Licences for Europe" stakeholder dialogue. No consensus was reached among participating stakeholders on either the problems to be addressed or the results. At the same time, practical solutions to facilitate text and data mining of subscription-based scientific content were presented by publishers as an outcome of “Licences for Europe”[2]. In the context of these discussions, other stakeholders argued that no additional licences should be required to mine material to which access has been provided through a subscription agreement and considered that a specific exception for text and data mining should be introduced, possibly on the basis of a distinction between commercial and non-commercial.

Question 53 edit

53) (a) [In particular if you are an end user/consumer or an institutional user:] Have you experienced obstacles, linked to copyright, when trying to use text or data mining methods, including across borders?

(b) [In particular if you are a service provider:] Have you experienced obstacles, linked to copyright, when providing services based on text or data mining methods, including across borders?

(c) [In particular if you are a right holder:] Have you experienced specific problems resulting from the use of text and data mining in relation to copyright protected content, including across borders?

Yes edit

  • Your name here

No edit

  • Your name here

No opinion edit

  • Your name here

Comments edit

Instructions: If yes or no, please explain.

  • ...

Proposed Foundation answer edit

I respectfully disagree with the premise of the Free Knowledge Advocacy Group EU's answer, so propose the following significant change. — LVilla (WMF) (talk) 03:44, 31 January 2014 (UTC)[reply]

The Wikimedia projects demonstrate both the risks of current data mining policy in the EU, and the success of the rest of the world's policies on databases and database rights.
The risks to us stem from the complete uncertainty around database and data mining rules in the EU. This makes it extremely difficult for the communities who are creating our data sources (such as Wikidata and Wikipedia) to understand when they can or cannot use a given data source, as facts that seem unprotectable on their face may implicate other rights that are not obvious from the data themselves.
On the flip side, Wikipedia (and soon Wikidata) are some of the most widely mined and analyzed data sources on the planet. This has occurred because of our commitment to making this information freely available, and demonstrates that creativity and innovation are compatible with a scheme that reduces barriers to participation rather than increasing "protection".

Question 54 edit

54) If there are problems, how would they best be solved?

Responses edit

[Open question]


Proposed Foundation response edit

I respectfully disagree with the premise of the Free Knowledge Advocacy Group EU's answer, so propose the following answer, based on Kaldari's comment above and the Creativity4Copyright answers. — LVilla (WMF) (talk) 03:47, 31 January 2014 (UTC)[reply]

The EU should avoid creating new rights to protect previously unprotectable information, like the database and suggested data mining rights. Instead, legislation should provide a formal clarification that data mining (and databases) is not prohibited by copyright, and that contracts and technical protection measures cannot be used to override that position.

Question 55 edit

55) If your view is that a legislative solution is needed, what would be its main elements? Which activities should be covered and under what conditions?

Responses edit

[Open question]

  • ...

Proposed Foundation position edit

I respectfully disagree with the premise of the Free Knowledge Advocacy Group EU's answer, so propose the following significant change, based in part on the C4C answer. — LVilla (WMF) (talk) 04:02, 31 January 2014 (UTC)[reply]

As noted above, text and data mining should be specifically excepted and allowed, and the database directive should be repealed. In addition, TPMs and contracts should not be allowed to override these statutory decisions. This would ensure that vast amounts of information would be broadly available to the public and to researchers, which Wikipedia's experience shows will lead to a variety of new uses and means of delivery.

Question 56 edit

56) If your view is that a different solution is needed, what would it be?

Responses edit

[Open question]

  • ...

Proposed Foundation answer edit

Based on the Creativity4Copyright suggestions, I propose the following answer for the official Foundation response: —LVilla (WMF) (talk) 02:51, 31 January 2014 (UTC)[reply]

Only a legislative approach can solve the issues faced. See our answer to questions 54 and 55.

Question 57 edit

57) Are there other issues, unrelated to copyright, that constitute barriers to the use of text or data mining methods?

Responses edit

[Open question]

  • There are privacy concerns with data analysis. Lots of information about persons can be found from data analysis. There should be strict precautions to prevent human rights to be violated with data analysis. --NaBUru38 (talk) 14:40, 11 January 2014 (UTC)[reply]
  • ...

Proposed Foundation answer edit

Based on the comments above and the Creativity4Copyright suggestions, I propose the following answer for the official Foundation response: —LVilla (WMF) (talk) 04:06, 31 January 2014 (UTC)[reply]

A variety of problems further complicate use of text or data mining methods. Lack of clarity around privacy rules for data related to individuals, use of contracts and technical protection measures to impede legally-authorized access to information, and the use of proprietary or patent-encumbered data formats all can help reduce the promise of data mining.

References edit

  1. For the purpose of the present document, the term “text and data mining” will be used.
  2. See the document “Licences for Europe – ten pledges to bring more content online”: http://ec.europa.eu/internal_market/copyright/docs/licences-for-europe/131113_ten-pledges_en.pdf .