Wikisource and Gallica

This page is a proposal made by User:ThomasV. I would like to import texts from Gallica to Wikisource. I first discussed this project on IRC, and I thought I needed more input.

Gallica is a french website owned by the government. It proposes digitized texts that are in the public domain in France (death+70 years).

For quite some time, Gallica has been a primary source for french Wikisource contributors, who copy-pasted texts from there, and then edited them in order to improve readability. To my knowledge, there has been no legal issue with this copy-pasting so far; copy-pasted texts are in the public domain.

In order to do things faster, I would like to write a robot that does the same thing automatically. The robot will explore the Gallica website systematically, and download texts from there. The robot will edit these texts (removal of page numbers, wikification) and upload them to Wikisource. The bot will do pretty much what people at Wikisource have been doing manually so far.

I would like to draw the attention of the Wikimedia community on this project, and on the copyright issues that might be associated with it. Although the texts that are hosted by Gallica are in the public domain, the french law on databases is complex, and quite difficult to understand for me.

I found this copyright disclaimer on the site itself: http://gallica.bnf.fr/les_droits.htm

The French law on intellectual property can be found at http://www.legifrance.gouv.fr/ . Here are the articles related to IP and databases:

( 11 Articles )

CODE DE LA PROPRIETE INTELLECTUELLE (Partie Législative)

Première partie : La propriété littéraire et artistique
Article L112-3
Article L122-5
Article L332-4
Article L341-1
Article L341-2
Article L342-1
Article L342-2
Article L342-3
Article L342-4
Article L342-5
Article L343-1

Sorry, I cannot provide links because they are session-dependent. here articles are copy-pasted:

I would be glad to receive some feedback on this. Basically, the question is whether I am allowed or not to copy some PD texts from a government's database to Wikisource. The point is that although the texts are public, the government might claim rights on the structure of the database itself. I know this might sounds picky, but I do not want to invest time in a project and to be later told that I have to revert all the edits made by my bot.

ThomasV 06:44, 2 September 2005 (UTC)[reply]


Le producteur d'une base de données, entendu comme la personne qui prend l'initiative et le risque des investissements correspondants, bénéficie d'une protection du contenu de la base lorsque la constitution, la vérification ou la présentation de celui-ci atteste d'un investissement financier, matériel ou humain substantiel.
Cette protection est indépendante et s'exerce sans préjudice de celles résultant du droit d'auteur ou d'un autre droit sur la base de données ou un de ses éléments constitutifs.

Le producteur de bases de données a le droit d'interdire :
1º L'extraction, par transfert permanent ou temporaire de la totalité ou d'une partie qualitativement ou quantitativement substantielle du contenu d'une base de données sur un autre support, par tout moyen et sous toute forme que ce soit ;

2º La réutilisation, par la mise à la disposition du public de la totalité ou d'une partie qualitativement ou quantitativement substantielle du contenu de la base, quelle qu'en soit la forme. Ces droits peuvent être transmis ou cédés ou faire l'objet d'une licence.
Le prêt public n'est pas un acte d'extraction ou de réutilisation.