Talk:Abstract Wikipedia/Licensing discussion

Comments are welcome in any language.

We would like to invite the community to discuss these recommendations and hopefully to find consensus around the licensing decision. The goal is to keep the discussion open for about four weeks, and, if necessary, to extend and restructure it. In case this turns out to be insufficient to reach a consensus, we might restructure the licensing discussion to focus solely on Wikifunctions for now, and then follow up with a discussion about Abstract Wikipedia.

To guide the license choice, it may be useful to consider and discuss the following questions: What are the long-term objectives of the projects, and how can a copyright license support these objectives? Should the people involved in creating Abstract Content receive credit? How valuable is it to preserve consistency and compatibility with the licenses on Wikipedia?

The specific question to the community is whether to follow the given recommendations, or whether a different proposal would be more to the liking of the community. Particular questions of interest surround the licensing of Function Implementations, and the licensing of Abstract Content (and thus Output Content).

Even if you are just supporting the recommendations above, it would be great to see your voice expressed explicitly, in order to get a better understanding of what the community tends towards. If no rough consensus can be reached, we will organize a formal vote.

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
The first ten days of discussion are frozen to focus on newer discussion and avoid confusion.
Please see the #Restructuring the discussion to move on section and new request for specific feedback below. Thanks. Quiddity (WMF) (talk) 18:51, 2 December 2021 (UTC)Reply

Comments

edit

Views on the recommendations

edit
  1. Support Support Thadguidry (talk) 22:17, 22 November 2021 (UTC)Reply
  2. Support Support Kasyap (talk) 06:06, 23 November 2021 (UTC)Reply
    • Kasyap, could you please clarify whether you support CC0 for Abstract content, or CC BY-SA, or either? (As currently framed, the "recommendations" allow either, so your "support" vote could be interpreted as support for either.) --Andreas JN466 06:56, 24 November 2021 (UTC)Reply
      • Andreas I support CC0 for Abstract content — The preceding unsigned comment was added by Kasyap (talk) 04:28, 29 November (UTC)
  3. Support Support. Specifically in favor of CC0 for Abstract Wikipedia because it seems highly questionable to me whether most of its articles are copyrightable at all. —MisterSynergy (talk) 13:12, 24 November 2021 (UTC)Reply
  4. Support Support CC0 for Function Signatures, Abstract content, and Output content. Maybe CC0 for Function implementation, to the extent that function implementation can itself be abstract content / output content of yet other functions. I am still thinking but I think this is my position. Blue Rasberry (talk) 15:34, 24 November 2021 (UTC)Reply
  5. Support Support CC0 for function signatures, Apache for function implementations, CC BY-SA for both Abstract content and Output content.
    I am specifically convinced that, while the fact that Jupiter is a planet is not copyrightable, a construct that compares Jupiter to all other planets in the Solar System is sufficiently elaborate to make me think it would be more defendable with a CC BY-SA license, than a CC0. In this, I totally agree with the recommendations from the dev team. Please note that this, to me, is a non-negotiable baseline - I will support any proposal that grants more freedom than this (i.e. CC0 for abstract/output content), but will fight tooth and nail against any proposal that will grant less freedom than this. Sannita - not just another it.wiki sysop 20:01, 24 November 2021 (UTC)Reply
  6. Oppose Oppose CC0 ist nicht viral und sollte deshalb insgesamt verboten werden. CC0 widerspricht den Grundgedanken freier Lizenzen, daß sie sich selbst verbreiten. Ralf Roletschek (talk) 09:49, 25 November 2021 (UTC)Reply
  7. Oppose Oppose the use of CC0 for abstract content/output content. End users should always be able to see that the information comes from Wikimedia, and re-users should be forced to use free licences for derivative works. (See "The rise of the underdog" for a more detailed discussion.) / Dem Endnutzer sollte stets klar sein, dass die Information von Wikimedia kommt (Rückverfolgbarkeit), und Nachnutzer, die abgeleitete Werke erstellen, sollten gezwungen sein, ebenfalls eine freie Lizenz zu verwenden. Das ist bei CC0 nicht gegeben. --Andreas JN466 10:08, 25 November 2021 (UTC)Reply
    • The purpose of copyright is to protect your creative expression; it does not allow you to force users into exercising a particular data provenance protocol. —MisterSynergy (talk) 10:35, 25 November 2021 (UTC)Reply
      • CC BY-SA requires attribution. It's why there is a "Wikipedia" link in the Knowledge Graph panel, and why voice assistants tell you when they read out a Wikipedia article (if I recall correctly, it took some prodding from the WMF to make them do so). --Andreas JN466 10:53, 25 November 2021 (UTC)Reply
  8. Support Support the combination of CC-0 (function signatures), Apache [or MIT or LGPL] (code), CC-BY-SA (abstract content and generated output); Oppose Oppose other variations. To preserve history of generated output, I would recommend a) adding a link to generated output that points to the abstract content used to generate it, b) clarifying in terms of use that for re-users, it is sufficient to link to the generated output alone.--Eloquence (talk) 20:26, 26 November 2021 (UTC)Reply
  9. Support Support CC0 for function signatures, Apache for function implementation, CC-BY-SA for abstract content and output content. But I would appreciate if we could use MIT for function implementation as well... --Ameisenigel (talk) 16:33, 27 November 2021 (UTC)Reply
  10. Choices are being made as soon as what is grammatically (im)possible is defined. e.g. generate text(pronoun, language) = text (to take a currently polemical example, which by no means captures the full scope of the problem). I agree with Andreas, Eloquence, and Ameisenigel above: CC0 should be rejected in order to require attribution for any abstract/output content derived from these processes. SashiRolls (talk) 21:40, 29 November 2021 (UTC)Reply
  11. Support Support for CC0 for signatures (because they are more like a mathematical formula then a creative work) and Apache for code (which is one of widely-used licences in software). But when it's about a licence for the content, I Oppose Oppose CC0. I perceive the abstract content (as a whole) not as a set of pure information (which is the case with Wikidata contents) but as a prose written in a specific artificial language. In my opinion this would be like a declarative programming code, where there are also virtually only facts and declarations written. That's why I would like the authors of abstract articles be protected under copyright just as the authors of other language versions are. Msz2001 (talk) 19:00, 1 December 2021 (UTC)Reply
  12. In regards to open questions, Abstract content and code of Wikifunctions. For abstract content, I lean towards CC0 as I would expect it is mostly machine generated. For Wikifunction code we have had a tradition for GPL. But then the Lua in Wikipedia is CC BY-SA? For some business-related project I have been involved in it was Apache 2.0. I would perhaps be slightly in favor of GPL. — Finn Årup Nielsen (fnielsen) (talk) 15:17, 7 December 2021 (UTC)Reply
    I was confused by the terminology. I would say "Output Content" could be CC0. The "Abstract Content" is, I suppose, the "declarative" content. I think now that CC BY-SA would be better than CC0. It would allow information to between wikis, though it may be difficult to see how the abstract content can be setup for Wikipedia content. — Finn Årup Nielsen (fnielsen) (talk) 15:28, 7 December 2021 (UTC)Reply

Support but with modifications

edit
  • I support most of it, and I'm not necessarily asking for a modification. However, I'd love to see at least an explanation for the choice of a permissive license for the implementation. See a more detailed question below. --Amir E. Aharoni (talk) 12:26, 23 November 2021 (UTC)Reply
  • I support the output content being licensed CC BY-SA, as long as we have a description of how attribution will work *and* an assurance that the output content CC BY-SA can be maintained *in practice*. What I'm worried about is a situation such as the following:
A big tech actor takes the abstract content and the "lower level" software without attribution and then runs the software with Wikidata and the Abstract Content to get the same Output Content the Wikipedia might use, without any attribution required. Thus Wikipedia would have to attribute, but big tech would not have to attribute the same content. Can you reassure us that such a situation would not arise, and if it did that WMF legal would go to court to enforce the CC BY-SA license? I'll note that there are lots of cases of Wikipedia content being used without attribution, but no legal action is taken to enforce the license. Would the same situation arise here in practice? Smallbones (talk) 15:12, 24 November 2021 (UTC)Reply
Regarding non-compliance with the license violations, I want to refer to this section on English Wikipedia. Given that, do you still hold your recommendation for CC BY-SA, or do you prefer another license? --DVrandecic (WMF) (talk) 21:29, 25 November 2021 (UTC)Reply
Well, I do know for a fact that the WMF spoke to Amazon about their lack of attribution in Alexa – with, I believe, a positive result in the end. Admittedly though not the same as going to court. --Andreas JN466 00:36, 26 November 2021 (UTC)Reply
Exactly. There are many options in such a case, and Smallbones' request to give assurances about going to court would limit the WMF's options considerably. --DVrandecic (WMF) (talk) 00:28, 30 November 2021 (UTC)Reply
  • In general terms, I would adopt cc0 for data and functions, and cc-by-sa for the generated content. -Theklan (talk) 16:51, 24 November 2021 (UTC)Reply
  • Only support CC BY-SA for Abstract/output content, object to CC0 for Abstract/output content. Would prefer GPL to Apache. A key point to me is that re-users should have to indicate provenance. If lots of voice assistants, info panels etc. all end up agreeing with each other, end users should be able to see that this is because they all draw on the same Wikimedia source, and not because the entire world is in agreement. The other key point is that re-users should be forced to use the same licence for derivative works rather than converting free (gratis and libre) volunteer work into proprietary works. --Andreas JN466 18:29, 24 November 2021 (UTC)Reply
  • Support CC BY-SA and GPL licenses, as appropriate. I wrote my reasoning in the section below.--Strainu (talk) 19:40, 24 November 2021 (UTC)Reply

Questions and clarifications

edit

I hardly know what this discussion means.

  1. If this same content were coming from the most closed and restricted corporate actor, then what licenses would be applied to the various components? I want to know the precedent in the field.
  2. Can you show a third-party non-wiki example of these components in action anywhere, and also describe licenses in place for the parts
  3. What professional field produces this kind of content? Is this software, math notation, creative text, or what?
  4. Do different components routinely have different licenses?
  5. For which components, if any, are there past claims that they are uncopyrightable?
  6. Is this more of a casual wiki licensing decision, or is this a situation where the Wikipedia platform is an Internet giant and what we decide may potential influence a generation of development in a field?
  7. "Wikimedia Foundation's submission in response, we explained that AI algorithms should be treated like any other software tool and that the tool's user should be considered the copyright holder" - link to that please. That seems ethical and social position which needs to come from organized and informed Wikimedia community discussion, and not from a legal opinion. In the Wikipedia ecosystem a claim like that seems like it should come from discourse rather than from an office or role of designated authority. I am not aware of any past such community conversation.

Thanks. Blue Rasberry (talk) 00:07, 23 November 2021 (UTC)Reply

Hi, I will try my best to answer the questions, but note that I am not a lawyer. Also, in order to speed up the communication, I did not go through checking my answers by multiple departments etc., so please bear with me, I might make mistakes, have forgotten some things, and might not be aware of everything.
  1. I am not sure what you mean with precedent in this field, sorry. To the best of my knowledge we are the first large scale system aiming to allow a community to collaborate on and create multilingual content and generate language using a symbolic approach to make this content available to many. Other systems have been doing some parts of these, such as the Canadian weather service which has been using NLG technology for a long time in order to publish weather reports, etc. I do not know what kind of licenses they have used for their components. If I were to imagine the most closed and restricted actor, I would easily imagine that they don't publish the code at all, that they don't publish the abstract content at all, and that they merely publish the resulting natural language text under the normal rules of copyright. I hope that answers the question.
  2. No, I am not aware of any such system.
  3. I am not aware of any professional field producing text in many languages using a similar system.
  4. Yes. Think about Wikimedia in general: Wikipedia's text is published under CC BY-SA, Wikidata under CC 0, our software under various licenses (e.g. MediaWiki under GPL), and the images on Commons are also using a large number of licenses.
  5. The text by the Legal department discusses this, in particular for facts and for APIs.
  6. It is hard to predict the future. Wikidata's licensing decision has been used in several external places as an argument in favor of using CC 0 in other places too. Whether the decision for licensing the components of Wikifunctions and Abstract Wikipedia will have a similar effect or not remains to be seen.
  7. Here is the link to the document which has also been added into the project page by Quiddity.
Thanks for asking! DVrandecic (WMF) (talk) 22:23, 23 November 2021 (UTC)Reply
@DVrandecic (WMF): Thank you for the decisive answers. I now have a better understanding that the choices we have to make are new. I was missing that context. This is helpful, I will think more. Blue Rasberry (talk) 22:40, 23 November 2021 (UTC)Reply

I see what the function signatures and implementations may look like. And agree that CC0 and Apache License respectively are reasonable. But I have no idea how exactly the abstract content is going to look like. I'm quite sure that the example with Jupiter being the largest planet is example of a bare information that is not copyrightable. But the full article will certainly be more complex and I wonder if the arrangement of facts and their selection is creative enough to be copyrighted. So how do you imagine the abstract content to look like? And how the information produced by each statement is going to be joined to produce readable text (something more complex that first-grader's It's a pencil. It's blue. I like it. I use it very often)? Msz2001 (talk) 13:45, 1 December 2021 (UTC)Reply

@Msz2001: The Jupiter example is here: Abstract_Wikipedia/Examples/Jupiter. The text is basically the CC BY-SA text of the Simple English Wikipedia article. It reads as follows:
Jupiter is the largest planet in the Solar System. It is the fifth planet from the Sun. Jupiter is a gas giant, both because it is so large and made up of gas. The other gas giants are Saturn, Uranus, and Neptune.
Jupiter has a mass of 1.8986×1027 kg (318 earths). This is more than twice the mass of all the other planets in the Solar System put together.
Jupiter can be seen even without using a telescope. The ancient Romans named the planet after their god Jupiter (Latin: Iuppiter). Jupiter is the third brightest object in the night sky. Only the Earth's moon and Venus are brighter.
Jupiter has 79 moons. Of these, around 50 are very small and less than five kilometres wide. The four largest moons of Jupiter are Io, Europa, Ganymede, and Callisto. They are called the Galilean moons, because Galileo Galilei discovered them. Ganymede is the largest moon in the Solar System. It is larger in diameter than Mercury. In 2018 another ten very small moons were discovered.
Recreating that text in Abstract Wikipedia and licensing it under CC-0 would be a blatant infringement of the Simple English Wikipedia's CC BY-SA licensing terms. --Andreas JN466 18:25, 1 December 2021 (UTC)Reply
Many thanks for your answer. I see now how it's going to look like and I'm against the CC0 too. Msz2001 (talk) 18:29, 1 December 2021 (UTC)Reply

Discussions

edit
  • How big of a decision is this, and how much can our decision affect future Wikimedia activities, organization, and expenses? If this is a relatively small Wikimedia community decision, then a typical talk page chat like this one is sufficient. If there are big issues here then perhaps we have a bigger conversation. I do not have a good understanding of the potential impact of the answers we have here, but I do have some intuition that the decisions made here will have effects at scale, and potentially apply licenses to much or almost all of the Wikimedia Movement's future output. I neither want to be hasty in setting standards, nor do I want to delay this discussion too much. If this really is a big decision, I would like diverse participation in the discussion. The Wikimedia Foundation is investing in this tool and while I trust the WMF for what it is, that organization does not reflect the diversity of our movement, and also it has a tendency to adopt practices which align with the commercial tech industry when there are other options. Again, I lack understanding of how important the decision here is, but I do feel hesitation when a big decision is put on the table and I cannot recognize who would benefit from the various choices that can be made. I think there is not supposed to be a choice here at all, but rather just a description of current perspectives, but we already know that there is a lot of diversity in licensing beliefs, laws, and actual user practices. Those things already are not aligned with each other.
I would be more happy here the more that I thought that the outcome of this discussion had informed Wikimedia community participation within it.
Big thanks to everyone who put this discussion together. This is a great topic to discuss. Blue Rasberry (talk) 22:57, 23 November 2021 (UTC)Reply
@Bluerasberry: I agree, I would love to see wide participation when discussing this question. Then again, I also understand that the discussion is abstract and about a difficult topic with many subtleties. I don't think it is easy to know and understand all the ramifications of the different decisions. I don't claim to do so myself, which is why we are having this conversation, to make the best possible decision.
You write that a bigger issue needs a bigger conversation than a talk page. What do you mean with that? We have traditionally had very big decisions on talk pages, and I thought that's one of our strengths.
On the one side, I do not think that the impact of the licensing decision for the code in Wikifunctions is as big as you describe: it would not apply licenses to all of the movement's future output. Function application to content from other sources, in general, would and should have no effect on the license of the output. If you put in a copyrighted text into a "capitalize" function, the result remains copyrighted text - the license of the code for this function that was applied did not make any difference.
On the other side, the decision about the license of Abstract Content might have an impact on Wikipedia and how the content of Wikipedia can be reused. For example, if we choose a CC 0 license for Abstract Content we would now offer an alternative text to the handwritten text, without the restrictions of CC BY-SA.
Ultimately, we believe that the copyright licenses used by the Wikimedia projects should empower as many users as possible to easily collaborate by sharing knowledge (see also the legal note on the use of CC ).
It would be extremely helpful for me to understand which questions in particular you have. Then I can try to give some thoughts about them. Also, the page also offers some questions which might help us guide this discussion, in case the big picture gets too big. Thanks for clarifying! --DVrandecic (WMF) (talk) 00:13, 24 November 2021 (UTC)Reply
I do not know what I think, except that I am unsure how big of a decision this is. I am going to take a break from this conversation after making this post, but thanks if you have any brief response.
Denny: you say "-the licensing decision for the code in Wikifunctions ... not apply licenses to all of the movement's future output", and also " if we choose a CC 0 license for Abstract Content we would now offer an alternative text to the handwritten text, without the restrictions of CC BY-SA".
To me, this sounds like using data science to generate content for humans to read. I think this has the potential to quickly account for 50%+ of Wikimedia content, and possibly ~99%+. The idea is to convert Wikipedia's content to uncopyrightable statements, then use some algorithm to present the facts in every language for every reader.
Denny, you say "I do not think that the impact of the licensing decision for the code in Wikifunctions is as big as you describe".
If we have this process for extracting meaning from copyrighted Wikipedia text and converting it to uncopyrightable info, then when does our process get compared with anyone else doing the same thing to other media, like news, journalism, research, or novels? Can the same process convert scramble a Harry Potter book into a public domain approximation with renamed characters and changed situations? What will Disney's lawyers say when someone copies the Wikimedia community process and does approximate movie remakes of their content? I know there are lots of experiments out there with students and hobbyists, but when Wikipedia takes a position on this, then that seems like a milestone.
Whatever the outcome of this, I want it to advocate for the community. The Wikimedia community will choose the option which advances the Wikimedia Mission. To make an informed choice, I wish that I better understood what corporate interests want here. You seem to be saying that corporate advocates have not yet taken a position. If corporations have, then I do not want anyone normalizing their position yet before the Wikimedia community has its own independent discussion on the matter.
Questions:
  1. What percentage of Wikimedia readers will have a significantly different user experience as a result of this decision? I think the answer is ~100%
  2. What percentage of Wikimedia content will be modified by this process? I think the answer is ~100%
  3. Who is going to talk about what the Wikimedia community decides here? A lot of people, including outside the wiki community
I will answer the questions on the main page with my opinion and position.
  1. What are the long-term objectives of the projects, and how can a copyright license support these objectives?
    The goal is a next-generation increase in the amount, quality, and production speed of Wikimedia content. Copyright licenses are inherently designed for the benefit of corporations and commercialization, and only the unusual activist licenses like those of Creative Commons can favor communities and public benefit. Almost no Wikimedia contributors make actionable claims on copyright with their contributions, and almost all Wikimedia community members would like to see increased use of free and open licensing if not government regulation compelling less corporate copyright protection for global community benefit. There are years of precedent of license-violating re-use of user contributions to Wikipedia, and none among the Wikimedia Foundation, Creative Commons, or existing community organizations have ever been able to defend content creators effectively from inappropriate content use through the Wikimedia platform. For the technical components which this project describes, unchecked inappropriate reuse is even easier and more likely. Copyright protection will not favor Wikimedia community members in this case but will favor corporate actors. I do not wish to adopt the same idealistic myths of user protection for functions and their output, because those myths favor corporations and not the community.
  2. Should the people involved in creating Abstract Content receive credit?
    No. When communities enter the copyright game they cannot compete with corporate players and commercialization, and business interests will overwhelm community contributions. The precedent to make here is not an internal Wikimedia community decision, but the demonstration that if Wikimedia projects take a position then that position is a societal norm which should apply even outside of the Wikimedia platform. Sooner or later corporations are going to take a position on this issue, and when they start suing each other and the general public, the courts are going to ask what the precedent is. If Wikimedia projects have already established a precedent which normalizes advocating for community and public benefit, then corporate interests will have much more difficulty in claiming that other positions are natural. The community has a lot to gain by normalizing its own claims and practices that AI generated media and as much of the related machinery as possible is ineligible for copyright. These claims are reasonable and sensible as well, even though soon corporate propaganda will claim otherwise in an attempt to capture and privatize the public commons.
  3. How valuable is it to preserve consistency and compatibility with the licenses on Wikipedia?
    It is only valuable if that past consistency supports Wikimedia community plans for the future.
    1. When Wikipedia was established the conversation on licensing was less developed. Activists are more experienced now, understand free and open media better, and were under less corporate competition. The discussion now matters more and is more informed than past discussions.
    2. Licenses with Wikipedia have changed in the past. Originally Wikipedia used GNU Public Document Licenses for copyright. Somewhere along the way, the Wikimedia Foundation changed the copyright text from saying that Wikipedia text was available with either CC licenses or GPDL to "Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply". We used to say that we had to mention the GDPL as terms of the license, but now that license commitment is fulfilled by saying "additional terms may apply". The social context for change was that some people wanted to distance themselves from the GPDL for whatever reasons. This did not happen with a broad discussion or published legal opinions. It just happened because enough people wanted it and no one objected. Likewise, Abstract Wikipedia can do new things and break precedent also. A similar conversation is happening with a transition from CC 3.0 to 4.0; the 3.0 license does not describe updates while the 4.0 license does, and there is discussion about transitioning licenses somehow through some process, converting some 3.0 licenses to 4.0. Somehow interpretations change with culture and time and what seemed impossible can just change.
    3. Rules of Wikimedia projects exist to serve the community. The community does not exist to serve the rules. When making big decisions, we do not need to assume that all precedents are a divine consensus from infallible early commentators. There are some guiding principles we carry forth, like the fifth pillar which says "ignore all rules", and which we can apply to say that if the community wants a change then we can enact it.
thanks Blue Rasberry (talk) 15:30, 24 November 2021 (UTC)Reply

Alternate proposals

edit
  • Both CC0 and CC-BY-SA SHOULD be supported for Abstract Content - Regardless Wikidata material about real-world entities, languages etc. MUST stay CC0 only
It SHOULD be possible to publish Abstract Content and Output Content under either of CC0 and CC-BY-SA, depending on what the user chooses. If CC0 licensing is supported, content that is so licensed can and SHOULD be stored in Wikidata (subject to agreement by that project). One could expect that this would mostly involve either statements of fact that are not feasible for the current ("flat", non-compositional) Wikidata model, but where the choice of some peculiar expression or presentation is largely immaterial, or literal, verbatim "translations" of claims, quotations, texts etc. from pre-existing sources in the public domain.
A dedicated Abstract Wikipedia site, probably loosely comparable to the current Wikimedia Commons, would then be used to work on CC-BY-SA content, including content that's imported from other Wikimedia projects. Regardless, CC-BY-SA -licensed formal content about real-world entities (including lexical and/or grammatical data about languages) MUST NOT be published on Wikidata and/or commingled with the existing CC0 material - this would make Wikidata licensing legally uncertain which would destroy much of its current usefulness to outside reusers! Hupaleju (talk)
If non-CC0 content is to be mixed together with CC-BY-SA content on Wikidata, it would indeed need to be very clearly separated, also technically. Agreed.
Content that is either CC0 or CC BY-SA (or in fact, under any other or even no license) can be used with functions on Wikifunctions. Abstract Content can be published under a multitude of licenses. Agreed.
One of the questions is, what should the Abstract Content of Abstract Wikipedia, that will be used in the Wikipedias, be published under? I am not in favor of having two different licenses for that, and I understand that is your proposal? --DVrandecic (WMF) (talk) 01:26, 30 November 2021 (UTC)Reply
To clarify my POV explicitly, in my view, if non-CC0 content-- including CC-BY-SA! -- is to be mixed together with existing Wikidata material, this ought not to occur "on" Wikidata at all but on a separate site with a distinct project identity. IOW, "Wikidata" itself ought to ensure that all of *its* material relating to the real world is still CC0 licensed, regardless of how it's encoded. Anything else would introduce confusion and serious legal uncertainty for reusers. A separate Wikimedia site could nonetheless freely *import* material at will from Wikidata itself, and new Wikibase federation mechanisms make this quite technically feasible.
As to the matter of "what should the Abstract Content of Abstract Wikipedia, that will be used in the Wikipedias, be published under", the obvious answer to me, given that such Abstract Content will generally involve creative, copyrightable expression and presentation (plus of course the fact that much content will need to be imported from existing Wikipedias) is CC-BY-SA.
However, in my view, it would be quite desirable to allow for CC0 'Abstract Content' for other purposes -- perhaps most obviously the previously-discussed "Abstract Descriptions" for Wikidata entities, but also allowing for enhancements to the data model of Wikidata itself. The key difference however is that such claims, while somewhat more complex than what Wikidata allows at present, will nonetheless be a matter of "sweat of the brow" effort to make facts about the real world machine-understandable under the *model* of Abstract Content, as opposed to arranging content in a highly original and expressive presentation to create compelling, high-quality encyclopedic text across languages. This is what justifies having two licenses for two radically different projects. --Hupaleju (talk) 14:03, 30 November 2021 (UTC)Reply

Objections to the recommendations

edit
  • Object to CC0 for Abstract content/text output. A key point to me is that re-users should have to indicate provenance. If lots of voice assistants, info panels etc. all end up agreeing with each other, end users should be able to see that this is because they all draw on the same Wikimedia source, and not because the entire world is in agreement. Another key point to me is that re-users should be forced to use the same licence for derivative works rather than converting free (gratis and libre) volunteer work into proprietary works. --Andreas JN466 18:43, 24 November 2021 (UTC)Reply
  • I object to releasing code under the Apache License v2.0. There is a significant loss of value to the Wikimedia movement in generating functions that can be immediately proprietarized by someone else. A copyleft license like GPL should be used instead for all the same reasons we use CC-BY-SA as the default license for text. Furthermore, using the GPL is advantageous in that code which is licensed under any compatible license may be used under the GPL - this means we can reuse MIT, Apache 2, and plenty of other licensed-code if the code license is GPLv3 or later. That's a huge starting advantage for a project like this. Legoktm (talk) 06:28, 28 November 2021 (UTC)Reply
  • I would also like to object to Apache 2.0 as the initial default per Legoktm, although I am not opposed to allowing code to be licensed under Apache 2.0 so long as the GPL or a compatible copyleft license is the default provided in the interface when adding new code. Mahir256 (talk) 03:23, 29 November 2021 (UTC)Reply

FAQs - Questions and answers

edit

Some questions that showed up during the internal discussion and their answers.

From a licensing perspective, does text generated with AW differ from text directly contributed by the community?
That's exactly the question we have to answer with this discussion. Directly contributed text on the Wikipedias is published under a CC BY-SA license. Legal recommends that CC BY-SA would be an option for the Abstract Content and thus for the Output Content generated from the Abstract Content, so it may be the same if we decide so. But it could also be a different license, in particular CC0 or CC BY.
Is there any way to ensure that a given language's community has the final word on what makes it into an article?
Yes! Albeit not legally but technically: every Wikipedia language community will always have the opportunity to locally override Output Content coming from Abstract Wikipedia (or even to not enable it in the first place). The local communities retain the final decisions on the Wikipedia articles.
Will the list of contributors to an Abstract Wikipedia article be accessible via a "History" tab within each Wikipedia page where it is used?
It depends. There are (at least) two ways to create articles from Abstract Wikipedia: either by having individual function calls in the body of the article in the language Wikipedia, or by having the whole article created from the abstract content in Abstract Wikipedia. Regarding the first way, see below. In the second case: yes, we can link to the history of the abstract content easily.
How will the "History" tab work if there is partial AW-content & partial local-content?
The History tab does not currently include changes to imported content, be it through changes in templates, transcluded pages, media in Commons, etc. So if we wanted to do that that would be a bigger change throughout the ecosystem. Recent Changes though could easily integrate such changes.
What about via the API if someone is crawling or spidering Wikipedia?
For content or history? For history, see above - we usually do not do that currently with any form of included content. For content, depends on the exact API, whether one is asking for the raw content (in which case, no), or for the rendered content (in which case, yes).

For Function Implementation, consider following Wikimedia Commons

edit

Hi, with the "Function Implementation" section, I'd recommend taking a similar approach to Commons - see commons:Commons:Licensing - i.e., allow all free licenses, as long as they meet certain requirements. Commons isn't mentioned in this section, so I'm not sure if this has been overlooked or considered but dismissed, but this seems to be a good way to store multiple smaller works, since it maximises the number that can be stored without causing license problems. Probably there are issues with combining code with multiple licenses together, but that same thing happens when you create a compilation of Commons media, so presumably it's manageable. If there are specific issues with individual licenses (e.g., GFDL for images), then add exceptions for them like commons:Commons:Licensing#GNU_Free_Documentation_License. Moreover, have a community process that lets you decide which licenses are acceptable - including new licenses that might appear in the future. Thanks. Mike Peel (talk) 07:52, 23 November 2021 (UTC)Reply

Thanks for the comment, Mike! Legal indeed outlined that possibility, and we are thinking about it. My current thinking is to launch with only a single license, and then see how things work out - and then, once we have more data and can see whether there is a need for other licenses, to implement support for multiple licenses if that would help the project. But I would like to hear arguments for and against using multiple licenses, and also if that's a requirement pre-launch or if that can wait. --DVrandecic (WMF) (talk) 22:40, 23 November 2021 (UTC)Reply

Why permissive and not copyleft?

edit

Most of the recommendation make sense to me, but why is the Apache license suggested for the implementation and not a copyleft license, such as GPL or AGPL? Why is it good for the license to be permissive? Won't there be advantages for using a stricter copyleft license to protect the usage?

AGPL, in particular, was in the news recently. I'm not a lawyer, and I'm not sure whether such a thing is useful for WikiFunctions, but my intuition tells me that it's a good thing to consider.

At the very least, the choice of a permissive license must be substantiated and not just declared. Amir E. Aharoni (talk) 12:24, 23 November 2021 (UTC)Reply

Hi Amir! In the section on Software, legal explains that their recommendation follows current best practices for our software development. Most of our software work is published under less restrictive licenses than the GPL, with some notable exceptions (such as MediaWiki itself). The recommendation is to follow our current best practices. Note that this is not a declaration, but a recommendation with a discussion, which is what we are doing here right now. --DVrandecic (WMF) (talk) 22:40, 23 November 2021 (UTC)Reply
MediaWiki itself is kind of a big thing. Most MediaWiki extensions are GPL, too (although notably, WikiLambda is MIT; why?). The content of our projects is CC-BY-SA, which is also copyleft.
The reason for publishing the iOS app under a non-copyleft license is Apple's App Store policies, which are not friendly to copyleft, but the iOS app is far from being the most important thing in our universe, and it doesn't represent all the best practices.
So just saying that permissive licenses are our current "best practices" is quite unclear, and it's not a good enough explanation. Amir E. Aharoni (talk) 06:14, 25 November 2021 (UTC)Reply
@DVrandecic (WMF): I do not believe "Most of our software work is published under less restrictive licenses than the GPL" is correct. I believe the opposite, that most Wikimedia code is under a copyleft license like the GPL. As Amir says, saying MediaWiki is an exception is like saying most of the internet is under a restrictive license and Wikipedia is an exception, so we should use a restrictive license. In this analogy, what we care about is Wikipedia and MediaWiki - the other stuff is mostly tangential.
Furthermore, WMF employment contracts designate GPLv2 or later as the "typical license" that code is released under. Code posted Phabricator defaults to the GPL unless otherwise specified. I don't understand why we wouldn't continue using copyleft here. Legoktm (talk) 06:18, 28 November 2021 (UTC)Reply
@Legoktm: Sidenote: I'm going to revert your edits in the project page, because that part of the text was supplied by the legal team. However, we'll ask them to take a look at the suggested changes. Thanks for the input here and there! Quiddity (WMF) (talk) 23:47, 29 November 2021 (UTC)Reply
@Quiddity (WMF): ack. Would it be ok to add a "[disputed]" or similar tag and a pointer to this discussion then? Legoktm (talk) 02:13, 30 November 2021 (UTC)Reply
@Legoktm: Now re-added, edit approved. :) Quiddity (WMF) (talk) 17:49, 30 November 2021 (UTC)Reply
I would also find it more consistent to have a copyleft license for the function implementations. I understand from the previous answer that most of software work of the Wikimedia Foundation uses permissive licenses, but I'd argue MediaWiki itself is the exception that counts, and, most importantly, all the contributed content is currently copyleft. This means that the information remains free, even taken outside of Wikipedia and modified; and this is desirable. I don't think the function implementations should be handled differently. A copyfree license would mean that someone could modify the content, modify the function implementations, not provide the new implementation, effectively making the new content potentially very hard to use. I think the ideal license would be a dual (A)GPL + CC-BY-SA license, so we are sure about the compatibility, and we still allow re-using the functions for other things outside Wikipedia (maybe translated into another programming language), while guaranteeing that the modified functions will still be available. Copyleft is how Linux and many projects have thrived, including Wikipedia itself. I'm afraid a copyfree license for parts of the contributed content could weaken Wikipedia. The other licensing decisions otherwise look good to me --Raphael.jakse (talk) 07:33, 24 November 2021 (UTC)Reply

License for other templates, modules, and gadgets

edit

It's not exactly on the topic of WikiFunctions, but it's strongly related: Now that we're finally discussing the WikiFunctions license, can this be also an opportunity to discuss the license for the hundreds of thousands of templates, modules, and gadgets on the sites? They are much more similar to code than to content, but by default the same content license applies to them. Is it good? Perhaps we could give the editors a choice to apply GPL, Apache, BSD, CC0, CC-BY-SA or some other license to the templates, modules, and gadgets they write? It's probably possible to do it using code comments, edit summaries, or talk pages, but it would be nice to have something more structured, for example a dropdown next to the Publish button. Amir E. Aharoni (talk) 12:30, 23 November 2021 (UTC)Reply

Hi Amir! I think it would be great to have a discussion about the licensing of the Lua code published on the various projects, of the gadgets, and all the other code that is all over the projects. Personally, I am not sure whether CC BY-SA is a great choice for that, as it is a license that is not meant for source code. But I think that the discussion we are aiming for here right now is already complex enough as it is, and would invite that question to be tackled independently as I don’t see much of a dependency between these questions. --DVrandecic (WMF) (talk) 22:43, 23 November 2021 (UTC)Reply
I've replied at User_talk:Amire80#License_for_other_templates,_modules,_and_gadgets since it seems off-topic for this page. Legoktm (talk) 06:42, 28 November 2021 (UTC)Reply

Please clarify which recommendation for Abstract content you would like people to support

edit

The "overview and request" section says "We recommend to publish Abstract Content under the same license as the content of Wikipedia, i.e. CC BY-SA", but the two sections further down detailing the actual recommendations by the legal department and the development team say either CC0 or CC BY-SA might be good. The overview needs to be made consistent with the recommendations somehow, otherwise it won't be clear what people supported or commented on. --Andreas JN466 07:38, 24 November 2021 (UTC)Reply

I've edited the "Overview and request" section to make it consistent with the actual recommendations. See [1] --Andreas JN466 20:51, 24 November 2021 (UTC)Reply
Hi Andreas! I am sorry you found the text confusing. The overview stated "We recommend to publish Abstract Content under the same license as the content of Wikipedia, i.e. CC BY-SA." until you changed it. The recommendation by the development team also states "The development team further recommends to choose CC BY-SA for Abstract Content and Output Content.", and then follows up with a few paragraphs of text explaining why we prefer CC BY-SA. I have re-read the text, and whereas you claim that the development team recommends CC-0 for Abstract Content, I can not find such a sentence (and it would be wrong). Please let me know what led you to that reading so we can fix it. Because of that I am reverting your change to the summary. --DVrandecic (WMF) (talk) 22:19, 24 November 2021 (UTC)Reply
@DVrandecic (WMF): Well, then you won't revert this edit, correct? Because that literally said that the team recommends "to choose either CC0 or CC BY-SA for the Abstract Content and the Output Content." And if the wording after this edit of mine now indeed reflects your recommendation – which I would welcome – then you must also say that you differ from the recommendation of the legal department, because the legal department's recommendation is, and I quote, "Recommendation: Abstract content should be licensed under CC BY-SA or CC0."
So, do you agree with the legal department that CC0 would be good, or do you disagree and recommend CC BY-SA only? Please be clear. Right now you're hedging, because the last paragraph of the team's recommendations brings in CC0 through the backdoor, where you say, On the other hand, one could argue that by putting Abstract Content under CC0 we open the space for a larger amount of possible reuse in applications we cannot even imagine yet, nevermind properly decide the legal framework for. CC0 most certainly allows the most freedom in the reuse of the Abstract Content. So right now, the section is written in such a way that agreeing with you means agreeing that abstract content "should be CC BY-SA and not CC0, but could also be CC0". In other words, agreeing with you right now, as the section stands, gives you carte blanche to do whatever you like later on, and claim you had community support for your position. Regards, --Andreas JN466 22:44, 24 November 2021 (UTC)Reply
I am sorry that the development team's recommendation is still confusing to you. The first paragraph says that the development team agrees with the recommendation by legal, and then reiterates said recommendation - which is that the Abstract Content should be published under CC 0 or CC BY-SA.
The third paragraph then reads "The development team further recommends to choose CC BY-SA for Abstract Content and Output Content." That does not mean we differ from the recommendation by legal, but we follow it and we further specify the option CC BY-SA over the option CC 0 (because legal's recommendation says to choose either one or the other). But I consider that recommendation to be fully aligned with legal's recommendation. We are just more specific by choosing one of the two options.
Your edit though makes it sound as if the legal team has only recommended to use CC BY-SA, which it has not. It said that it is one of two options. I am afraid your edit is misrepresenting the recommendation by the legal team. I am rewriting that to make it clearer that this is the summary of legal.
Since you are asking explicitly: I agree with the legal department that CC 0 would be a good option. I also think that CC BY-SA would be an even better option.
I am not sure how to state it more explicit which license the development team recommends for Abstract Content but with the sentence "The development team ... recommends to choose CC BY-SA for Abstract Content ..." and to have that also in the overview stated as "We recommend to publish Abstract Content under ... CC BY-SA.". I think both sentences are rather clear. I wonder if anyone else is also confused about what our recommendation is.
And yes, we list a few arguments for both licenses, in order to get the ball rolling. I think that both licenses would have interesting consequences, and we will implement either of these licenses, whatever the community chooses. --DVrandecic (WMF) (talk) 21:32, 25 November 2021 (UTC)Reply
Well, that wording is better. You now make clear in the first paragraph that you are summarising Legal's recommendation, whereas previously you presented what Legal said as (also) the development team's own recommendation. You said, right in the first paragraph:
The development team recommends to follow the recommendations by Legal, i.e. to choose CC0 as the license for Function Signatures; Apache for Function Implementations (and to start with a single license, and only when we recognize the need for multiple licenses to extend Wikifunctions to support multiple licenses); to choose either CC0 or CC BY-SA for the Abstract Content and the Output Content.
If you still do not see the problem, could you ask someone outside WMF to read that sentence and then tell you what they understand the dev team's recommendation for abstract content is, based on that sentence?
And even though the wording is now better – thank you for that! – there is still a problem. What I have objected to, from the beginning, is that from your "Overview and request" it is not even apparent to the casual bystander that CC0 is an option here – an option that is prominently featured in both your recommendation and even more so in Legal's recommendation. As I've said before, when I first read the overview, I thought, "There is nothing to see here. CC BY-SA – yay!" And then, when I read the rest of the page the next day, incl. that sentence above, I could hardly believe my eyes. I wonder how many people have read your overview since your announcement and stopped there, thinking, as I did at first, that there is no controversy here, no need to even get involved in the discussion. (In fact, people on Facebook did say exactly that.)
Are you okay with people being misled like that by your overview? Is it your intention to keep people like Ralf above, who have a fundamental issue with CC0, away from a discussion which might result in CC0 being adopted? If not, then I'd ask that you kindly rework your overview accordingly, so people get an accurate idea of the discussion's scope.
Incidentally, it would also help to clarify who "We" is in the "Overview and request", given that you have recommendations from both Legal and the development team on the page. By the same token, it is not clear what "these recommendations" in the "Overview and request" actually refers to. Does it refer to the preceding paragraph? Does it refer to the recommendations from both Legal and the development team given further below? --Andreas JN466 23:29, 25 November 2021 (UTC)Reply
I have clarified the the "we" and the "these recommendations" in the "Overview and request". I hope this is now better. -- DVrandecic (WMF) (talk) 00:43, 30 November 2021 (UTC)Reply
Yes, much better. Thank you for the updates. --Andreas JN466 00:00, 1 December 2021 (UTC)Reply

We've been here before

edit

I'm reminded of the early licence discussions for Wikidata – which, as we all know, ended up importing masses of data from CC BY-SA Wikipedia and republishing them under CC 0, even though this was categorically ruled out in the early stages:

Alexrk2, it is absolutely true that Wikidata under CC0 would not be allowed to import content from a Share-Alike data source. Wikidata does not plan to extract content out of Wikipedia at all. Wikidata will ''provide'' data that can be reused in the Wikipedias. And a CC0 source can be used by a Share-Alike project, be it either Wikipedia or OSM. But not the other way around. Do we agree on this understanding? --[[User:Denny Vrandečić (WMDE)|Denny Vrandečić (WMDE)]] ([[User talk:Denny Vrandečić (WMDE)|talk]]) 12:39, 4 July 2012 (UTC)

This past U-turn has left a sour taste for me and permanently undermined trust. It's left me with a feeling that the WMF generally wants to push CC 0 as the superior licence and that any statements to the contrary are mere lip service. So right now, my suspicion is that whatever anyone here says – we'll end up with CC 0 for Abstract content and output content sooner or later. This will remove traceability of content to the Wikimedia movement, while re-users will be free to make any derivative content proprietary. I find neither desirable.

Therefore I would like anyone who does not want CC 0 to be applied to Abstract content and output content to be very clear about it here. Then at least it can't be said later on that the entire community agreed with a recommendation that said CC 0 might be the best way to go – which is what both the WMF legal department's and the development team's recommendations are currently saying. --Andreas JN466 12:22, 24 November 2021 (UTC)Reply

I agree that anyone who does not want CC 0 to be applied to Abstract Content to state so on this page. Legal says that CC 0 is an option, and recommends to use either CC 0 or CC BY-SA. The development team further narrows down the recommendation to be CC BY-SA. --DVrandecic (WMF) (talk) 22:25, 24 November 2021 (UTC)Reply
It doesn't narrow it down at all. Even now, after this edit, the development team section starts by saying the team recommends to follow the recommendations by Legal and ends with a list of advantages CC0 would have over CC BY-SA. --Andreas JN466 23:39, 24 November 2021 (UTC)Reply
You say it doesn't narrow it down. The text says, summarized: "Legal recommends two options, CC BY-SA and CC 0. We recommend CC BY-SA.". For me that reads a lot like narrowing it down. -- DVrandecic (WMF) (talk) 21:36, 25 November 2021 (UTC)Reply
You say, summarised, "We recommend CC BY-SA, but one could argue that CC0 is more future-proof, and it most certainly allows the most freedom." That's about as narrow as a barn door. --Andreas JN466 23:29, 25 November 2021 (UTC)Reply
Yes, I would agree with your summary, although not with your judgement. -- DVrandecic (WMF) (talk) 00:44, 30 November 2021 (UTC)Reply

Let's stay compatible, let's stay copyleft

edit

I support a copyleft license for all code (GPL, very likely) and copyrightable elements (CC BY-SA), as it is more fair to volunteers and external contributors. Also, it's highly likely that, at least in the beginning, a prime use-case will be people moving content around between Wikipedia and Abstract Wiki. The licenses should not prevent us from doing that.--Strainu (talk) 19:35, 24 November 2021 (UTC)Reply

Thanks for the comment, Strainu! Just a quick note: your reasoning about Abstract Content and Wikipedia content to both be under the same license is sound. Regarding the code though, a copyleft license such as GPL would *not* allow us to move code from Wikifunctions to the Wikimedia projects. Just because both GPL and CC BY-SA are copyleft licenses does not mean that they are compatible and that code can be moved from one license to another. --DVrandecic (WMF) (talk) 00:28, 25 November 2021 (UTC)Reply
DVrandecic (WMF), what I meant was that code should be license-compatible with other code we use and content should be compatible with other content.--Strainu (talk) 14:50, 25 November 2021 (UTC)Reply

Do we really have a choice?

edit

(This refers to “Abstract Content” and “Output Content” as defined on the front page. “Function signatures” and “Function implementations” are ignored here. Disclaimer: IANAL.)

The notion of this discussion is that we can pretty much freely choose a license for Abstract Wikipedia, with the options being the license CC-by-sa and the license-waiver CC0. However, this is based on the premise that the works to be created are copyrightable at all—which I doubt to be the case. After reading the WMF position on copyright of AI/ML generated content, I can understand how one could argue that these works may be technically copyrightable under some conditions, but I am not sure whether this really translates 1:1 to this situation.

“Abstract content” is pretty much non-copyrightable factual data taken from CC0-Wikidata, but with an added copyrightable selection and order/arrangement. The latter two aspects seem theoretically copyrightable. However, a threshold of creativity needs to be passed here in order to have these copyrightable aspects in the “Abstract content”. I suspect that this will often not be the case, since both selection and order/arrangement seem to be pretty much pre-determined by data availability from Wikidata, and text render function availability from Wikifunctions; I also expect generation of “Abstract content” to be a highly schematic process (read: no creativity required), so that much of if will eventually be created with automated processes.

The textual form of the “Output content” will be pretty much determined by the text render functions used. For copyrighted text, it legally is usually not a problem to extract the information (pretty much its “abstract content”), and to rephrase it into another textual form containing the very same information in the very same order. When done properly, this does not infringe existing copyright of the original textual representation. I fail to see how this “Output content” should be copyrightable to an extent that users are effectively restricted to comply with CC-by-sa. I also do not understand how the authors of the “Abstract content” should be considered as authors here when the actual textual representation is much rather a result of work done by function authors at Wikifunctions.

In summary, choosing CC-by-sa might give us the illusion that the content here is copyrightable and protected under the terms of that license, when effectively this might not be the case for large and substantial amounts of the content if one was to litigate this in court. This choice woutl also make it difficult to reuse data, since the legal situation from a reuser perspective seems to be unnecessarily complicated—which often scares potential reusers away entirely.

I would also like to mention that this situation is not really new. People are still surprised that facts are being imported from CC-by-sa Wikipedias to CC0 Wikidata. However, the CC-by-sa license does not make this legally impossible at all. Another notable example of a problematic licensing situation is OSM with its OdbL copyleft license which tries to put up a fence around the project’s data, but fails to be an effective protection as desired by many OSM contributors. —MisterSynergy (talk) 01:14, 25 November 2021 (UTC)Reply

The WMF lawyers seem to have looked at this, and came up with CC BY-SA as an option. The ultimate output of Wikifunctions is text, much like Wikipedia. Why would CC BY-SA make reuse of these texts more onerous than use of Wikipedia texts? --Andreas JN466 01:34, 25 November 2021 (UTC)Reply
I am not sure how much WMF lawyers have actually looked at the details. It all seems to be based on this paper, but the outcome is not really transferred to this situation. IMO a key problem is that “Abstract content” seems to be only weakly copyrightable at best due to its nature as factual data and the pretty low level of creativity required to generate it.
CC-by-sa for potentially not copyrightable content would confuse reusers. Regular Wikipedia articles are overwhelmingly indeed copyrightable, thus CC-by-sa is a sane choice there. —MisterSynergy (talk) 10:29, 25 November 2021 (UTC)Reply
MisterSynergy As I understand it, abstract content means you compose sentences much as you do in ordinary language use. You have a database of knowledge about the world and its constituents and their possible relationships, and words for them to enable you to talk about them. Abstract content translates these statements into a kind of machine language, so they can then be more easily translated into any human language, using a limited vocabulary, but you are still fundamentally writing text, expressing ideas, etc., just as you do when you write in a human language. --Andreas JN466 10:46, 25 November 2021 (UTC)Reply
Here we are getting to the actual point. I would like to hear from WMF/Denny whether this question of “Abstract content” copyrightability has been evaluated explicitly, since my impression of the nature of “Abstract content” is a much simpler one. —MisterSynergy (talk) 10:52, 25 November 2021 (UTC)Reply
Good point. But it's not just that, actually. There is also a language translation process involved before you get to an output sentence like "Jupiter is the largest planet of the solar system". Each and every type of abstract content has to be assigned a human-made or human-reviewed translation into every human language to be served. There will be innumerable challenges based on the intricacies of languages' different grammatical structures – differences in terms of how grammatical gender is or isn't distinguished, case endings, verb endings, ergative languages etc. – all of which will require output checking and fine-tuning by humans. Translation is a creative act, even when assisted by AI; translations are copyrightable. --Andreas JN466 11:04, 25 November 2021 (UTC)Reply
Yes, but all of this is in scope of Wikifunctions, thus the code license applies here. The process of “Abstract content” generation is really not much else than a selection of facts to be included in the output, from all the facts available in Wikidata, and also its arrangement in the “Output content”. This process might principally include some copyrightable aspects as described above, but I am pretty sure that it is extremely schematic and can probably be automated to a large extent, e.g. by using some sort of generic “Abstract content” templates that fit for all data items of a given type (e.g. humans, cities, companies, etc.).
Generally, the difficult parts of “Abstract Wikipedia” are the data and lexicographical repository Wikidata (excessive to compile, we’re already 9 years into this), and Wikifunctions (complicated because it needs to generate the text). Licensing of both is pretty much undisputed. Any user can come and use these tools to generate their own texts, without using “Abstract content” from Abstract Wikipedia, but using information from Wikidata only. In some sense, the “Abstract content” is simply a demonstration how Wikifunctions can be used to generate text for a given data item and users are free to alter the output as much as they like. We can of course not require them to make any textual output of Wikifunctions functions available under a specific license. —MisterSynergy (talk) 13:17, 25 November 2021 (UTC)Reply
Have a look at Abstract_Wikipedia/Examples/Jupiter. This seems more than "extremely schematic" to me. I don't agree that the resulting text lacks any kind of creativity, or "Schöpfungshöhe", and is just like an alphabetical series of phone book entries that could not really have had any other meaningful form. --Andreas JN466 13:30, 25 November 2021 (UTC)Reply
The Development Team actually put that very nicely, when they say:
  1. Editors have a very fine-grained selection of which facts are being displayed and which are not. In Wikidata we strive for completeness over careful selection.
  2. Editors have a very fine-grained control of the order the facts are being displayed in, constituting narrative elements, which are not available in Wikidata.
  3. We expect the natural language generation to allow editors to express to some degree of emphasis and selection of wording.
All of these point towards Abstract Content being more similar to text than to a collection of facts, and therefore we suggest that we follow the same license that we use for the text in Wikipedia, which is CC BY-SA.
That Jupiter text is a good example of that. --Andreas JN466 23:55, 25 November 2021 (UTC)Reply
Hi @MisterSynergy:, I wanted to answer your question whether the legal team has evaluated the question of Abstract Content copyrightability explicitly: yes, they have. We discussed this at length. A bit more text below. -- DVrandecic (WMF) (talk) 00:58, 30 November 2021 (UTC)Reply
That’s a great question, and honestly, a very difficult one. In the end we will only have clarity given more legal precedent, which is currently lacking, or clearer legislation. Until then it will remain a guesswork.
Will every text generated for Abstract Wikipedia clearly pass the bar to be copyrightable? Probably not. Will some text generated for Abstract Wikipedia pass the constraint to be copyrightable? I very much hope so.
Will this decision be cited as precedent for other projects generating text? Very likely. What kind of precedent do we want to be? What kind of precedent will be most conducive towards our mission to a world with more free knowledge? One could argue that a precedent of CC 0 could move other publishers to also choose CC 0, as we have seen with Wikidata. Others (I assume Andreas to be one of them) could argue that particularly for algorithmically manipulated text, a license such as CC BY-SA will strengthen transparency and thus would be a better precedent. I am split on that.
Will we want to use text from the Wikipedias as a source to guide the creation of content for Abstract Wikipedia? Yes, it would be naive to assume that this won’t happen. Would having a different license for that be problematic for that goal? Maybe. It will take a considerable time to be able to catch up with the prose in the Wikipedia articles. One could easily argue that any content in Abstract Wikipedia will be a new creation from scratch, and not merely just a translation of an existing Wikipedia article. -- DVrandecic (WMF) (talk) 01:36, 30 November 2021 (UTC)Reply
You know, Denny, if the availability of free knowledge is the goal here, what's the problem with people accessing that knowledge on Wikipedia, where it will always BE available? And don't kid people: CC BY-SA will still result in global reuse, just as it does today. There's surely no need to "Flickr-wash" Wikipedia content from CC BY-SA to CC0 (which would "maybe" be problematic, you say?? Wow!) to ensure that. --Andreas JN466 07:53, 1 December 2021 (UTC)Reply
edit

Heather published this as a chapter in the recent MIT Press Wikipedia@20 book (eds. Joseph Reagle and Jackie Koerner, open access).

As argued in that chapter, there are consequences for both end users and volunteers when "Wikipedia’s facts are now increasingly extracted without credit by artificial intelligence processes that consume its knowledge and present it as objective fact." --Andreas JN466 13:19, 25 November 2021 (UTC)Reply

I asked at the talk page of Wikimedia Enterprise API if it will be possible to make a video conference with the customers of the Wikimedia Enterprise-API. This can be a chance to understand how they process the information of the Wikimedia projects currently. Telling the source of a statement is an important thing and should be from my point of view also done if it is not necessary because of the license and so there it is from my point of view a responsibilty of the public who consumes content to think about the source. The consumer should pay attention that a source is mentioned and talk to responsible people if it is not. This is from my understanding independent of the license of the content. As far as I know there is used a kind of text mining in some KnowledgeGraphs like the GoogleKnowledgeGraph. For that I am not sure if a CC-BY-SA license can stop them from doing that and so the information could come also to the users. There were discussions as far as I read about a artificial intelligence system called Copilot offered by GitHub to propose code snippets to users and it is based on a big collection of Code of Free Software Projects and Free Texts. I read at Netzpolitik.org an German article with meaning of the author about that and so I am not completely sure if the mentioned legal view is a wider consensus.--Hogü-456 (talk) 19:31, 25 November 2021 (UTC)Reply
Just wanted to say +1. -- DVrandecic (WMF) (talk) 00:48, 30 November 2021 (UTC)Reply
As Heather mentions, for a while, around 2013, Google dropped the Wikipedia hyperlink from the Knowledge Graph panel, just presenting the word "Wikipedia" in pale grey as attribution (here and here are some examples of what that looked like). The result was a significant drop in Wikipedia pageviews that caused alarm in the WMF (see fundraising and Knowledge Graph references on that page) and led to the decision to exceed the 2013/14 fundraising target, to prepare for future challenges. Eventually, the hyperlink returned, but this episode and the past troubles with Amazon Alexa illustrate that even with CC BY-SA in place, proper attribution is not guaranteed. Without it, the WMF and the volunteer community would be completely powerless. --Andreas JN466 00:29, 1 December 2021 (UTC)Reply

Question about dates

edit

The content page (overleaf) says, If no rough consensus can be reached, we will organize a formal vote. The mailing list announcement says that you hope to have a consensus by 20 December.

Could you reassure us please, DVrandecic (WMF) and Quiddity (WMF), that this potentially quite far-reaching matter won't be decided on a vote held four days before Christmas? And that if there is no consensus discernible by Christmas, we will have that vote sometime in January 2022, once everybody is back at their desks?

Everybody who's intending to celebrate Christmas this year would appreciate that, I'm sure. Thanks --Andreas JN466 17:19, 25 November 2021 (UTC)Reply

Yes, of course. Plus I'm re-titling this section to avoid anyone else being confused by it. Quiddity (WMF) (talk) 20:59, 25 November 2021 (UTC)Reply

CC-BY?

edit

As Wikinews use CC-BY, it may be an idea to use a license compatible with it (as abstract contents can be used in any Wikimedia wikis in the future). One preferred version:

  • For contents that only combines constructors (such as Abstract_Wikipedia/Examples/Jupiter), we license them as CC-BY.
  • For contents containing transitional texts (i.e. texts that is translated though not in abstract content, see d:Template:TranslateThis), those transitional texts are still under CC-BY-SA. In many cases (abstract) articles will contain translatable text that is not yet in abstract form.

--GZWDer (talk) 16:06, 26 November 2021 (UTC)Reply

80.111.67.25's comment about international law

edit

IP 80.111.67.25 wrote this on the project page in the "Facts" section, but it is not part of the officially provided legal analysis so I'm moving it here. Quiddity (WMF) (talk) 18:50, 27 November 2021 (UTC)Reply

"It should be noted that this long-term project is effectively unprecedented. International Law is comprised of many treaties. Data Privacy Law is a relatively underdeveloped aspect of International Law and nothing other than regional Laws and Guidelines have been issued." — Preceding unsigned comment added by 80.111.67.25 (talk) 15:32, 26 November 2021 (UTC)Reply

The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.

Pinging previous participants. Thank you for your comments above and elsewhere! Please see the refreshed focus on two specific questions below, and add your signature/reasonings in the two sections. @Amire80, Bluerasberry, Eloquence, GZWDer, Hogü-456, Jayen466, Kasyap, Legoktm, Lucas Werkmeister, Mike Peel, Msz2001, Ralf Roletschek, Raphael.jakse, Sannita, SashiRolls, Smallbones, Strainu, Thadguidry, and Theklan:. Thanks! Quiddity (WMF) (talk) 03:05, 3 December 2021 (UTC)Reply

Restructuring the discussion to move on

edit
(Note: Some additional discussion has been happening on Telegram / IRC [2] [3], on the mailing list, and some on Facebook. Feel free to take a look there too)

Given that these discussions have been going on for ~10 days, and in order to facilitate consensus finding, I am boldly proposing the following: to summarize the consensus so far and restructure the discussion in order to find a resolution on the open questions.

The full discussion is above, and like every summary this summary will be incomplete.

It seems that the remaining open questions are the following two:

  1. What license should be the (either default or only) license for code implementations? The two strongest contenders are GPL and Apache. Please let us know below whether you have a strong preference for GPL, for Apache, whether you’re OK with either, or none.
  2. What license should the Abstract Content for Abstract Wikipedia be licensed under? The two strongest contenders are CC0 and CC BY-SA. Please let us know below whether you have a strong preference for CC0, for CC BY-SA, whether you’re OK with either, or none.

This also means that at this point we assume that the general framework of the discussion is accepted, that is, that we have may assume consensus on the following points:

  1. Everything is published under a free license.
  2. We are launching with only a single license for implementations. We *may* add the option to have other compatible open source licenses for software contributions after launch. This is a discussion for a later point.
  3. Textual content on Wikifunctions (i.e. documentations, project pages, talk pages, etc.) is all published under CC BY-SA. For the sake of consistency, we will use the 3.0 version of the CC BY-SA license at this point.
  4. Function Signatures and other structured objects (besides Abstract Content and Code Implementations) are published under CC0.
  5. Output Content has the same license as the input to the functions producing the Output Content. This means in particular that Output Content used for Wikipedia will be published under the same license as the license we choose for Open Question #2 above.

This is all mostly compatible with the previous recommendations (besides the open questions). I understand that there have been individual arguments that don’t fit exactly in how we are funneling this discussion at this point, and I apologize for that, but in order to facilitate the discussion and focus on the biggest points of content, I propose to focus the discussion as suggested.

In order to support transparency, the entirety of the discussion will remain available. We will update the content page accordingly by the end of the week if no one yells. As we said before, we plan to close this discussion on December 20th. The plan is to write up the consensus the week before that, and see if that resonates as consensus. --DVrandecic (WMF) (talk) 18:40, 2 December 2021 (UTC)Reply

@DVrandecic (WMF): You say in the Overview (overleaf), Textual content on Wikifunctions will be published under CC BY-SA. Above you say, This means in particular that Output Content used for Wikipedia will be published under the same license as the license we choose for Open Question #2 above.. Are these not contradictory? I thought textual content and output content (= text produced) are the same. --Andreas JN466 02:53, 4 December 2021 (UTC)Reply
@Jayen466: Thanks for asking for clarification. With "textual content" I mean handwritten text on Wikifunctions, e.g documentation of functions, project pages, policies, user pages, talk pages, etc. With "Output Content" I mean the output of a function applied to Abstract Content. I hope that makes it a bit clearer. -- DVrandecic (WMF) (talk) 05:48, 4 December 2021 (UTC)Reply

@DVrandecic (WMF):

Apart from my general concerns about this project: It seems that using three different licenses for one project is simply confusing. Stating that the function-signatures are unlikely protectable by copyright is similar as stating a pixel of an photo is not protected by copyright.

In Copyright you must see the work and not each bit individually, otherwise you missunderstand the whole point in the definition of a copyrightable work.

I propose therefore to use CC BY-SA in combination with GPL, which are two fully compatible licenses, the one for text and the other for software. Habitator terrae (talk) 18:36, 19 November 2022 (UTC)Reply

Thank you for raising your concern, and for your input.
I see that you are an avid contributor to Commons, and that your last few contributions were under a number of different licenses, so it is not a concept that is foreign to you. In fact, Commons has much more than three licenses to deal with. Your suggestion would reduce the number of licenses from three to two, which does not seem to make a major difference, to be honest.
But more importantly, the community has discussed this topic, came to a decision, the decision has went through legal review, and the suggestion you raise is not novel, as it is one of the options that have been discussed by the community. I don't see the benefit in reopening this discussion at this point with your proposal, as I do not see it raising novel arguments or options compared to the last time this was discussed. -- DVrandecic (WMF) (talk) 01:31, 22 November 2022 (UTC)Reply
@DVrandecic (WMF): ?
The point is not that there are different licenses. The point is, that there are different licenses for the same content, but on different layers, which could make it difficult to differentiate between this two on a copyright perspective and thereby weaken the general copyleft.
This comes from the following idea:
By licensing something under CC0 you imply, that there is something, that is able to be licensed, which means: something, that is protectable by copyright. Otherwise you would have used the Public Domain Dedication.
But we see, the function signatures itself fully individually aren't protected by copyright. So it could confuse about whether CC0 apply to the combination of the function signatures. But this combination is pretty much the essence of the work, the function implementation on the other hand are often merely technical implementation, which on itself doesn't constitute a copyrightable work.
So thats what I meant with licensing the pixels specially. It looks like uploading an image to commons with two licenses: One for the pixels, the other for photo.
Habitator terrae (talk) 21:03, 8 December 2022 (UTC)Reply

Open Questions

edit

Function Implementations

edit

Summary of the differences

edit

The main difference is that the GPL is a copyleft license, whereas Apache is not. This means that the GPL requires that people who distribute any GPL-licensed software must also make their source code available under the GPL license. Apache is a more liberal license and allows for wider reuse with fewer restrictions. One advantage of the Apache license is that it clearly covers patent rights, while older versions of the GPL and other open source licenses do not mention patents. If we choose GPL, code from Wikifunctions can only be reused in projects published under the GPL or in internal projects which are not published. If we choose Apache, code from Wikifunctions can be used more widely.

!Votes about the Licence for Function Implementations

edit

Reasonings and/or signatures are welcome.

For Apache and against GPL
For GPL and against Apache
  •   Support GPL seems to make more sense in the context of building an expanding commons serving a very wide variety of language communities, per discussion below. I wouldn't like to see renderers for specific languages becoming proprietary and standing outside the commons, limiting access to an entire language community to a specific player. (I might still change my mind, given a powerful argument to the contrary, but for now this choice makes the most sense to me.) --Andreas JN466 21:09, 8 December 2021 (UTC)Reply
  •   Support Allowing large corporations to profit from our free labor while giving nothing in return is not in our best interests. Like Jayen I'm worried about a powerful corporation taking our content and porting it to new languages and locking it behind a paywall without sharing those improvements with us. It also makes it harder to incorporate code from other projects. MediaWiki and code posted on Phabricator is GPL licensed, and so any code from there cannot be used in this new project if it is Apache Licensed. Our projects are largely CC-By-SA licensed, and an License or CC0 license makes it even more difficult to incorporate that content because CC-By-SA requires any combination of works to have an SA component. Our ecosystem is built around copyleft components, and choosing an Apache license makes it harder for us to cooperate across projects while making it easier for corporations to exploit our unpaid labor. I will not contribute to a Wikimedia software project that does not include a copyleft component; if I wanted Amazon to profit off my work I'd work for Amazon, I wouldn't just hand it to them for free. Wugapodes (talk) 06:35, 16 December 2021 (UTC)Reply
Either is fine, but I prefer Apache
  •   Support I share some of the thoughts Denny mentioned below and do not really see a need to use the more restrictive GPL instead of Apache. --Ameisenigel (talk) 20:37, 2 December 2021 (UTC)Reply
  •   Support However, I could see LGPL being a reasonable middle-ground. As I understand it, it would essentially impose a copyleft obligation on re-users that distribute modified versions of functions, while permitting the use of the library of functions (or a subset thereof) to build non-copyleft applications.--Eloquence (talk) 06:15, 3 December 2021 (UTC)Reply
  •   Support I think the main issue with both GPL and CC-BY-SA in this particular context, where we are planing to build many small building blocks to be combined together into larger things, is that they are making attribution (I see GPL's "keep the legal notice" as a form of attribution) legally required (instead of a more "academic" moral code) in all contexts for all reasons without exceptions. I am worried of an explosion in the list of required attributions. Imagine 100 years down the road, how long the list of all attributions necessary for everyone using this code, when it is mixed up and down with other code (all required to retain the license, because of copyleft). (Note: I prefer CC0 over Apache, see below.) Mitar (talk) 13:32, 7 December 2021 (UTC)Reply
Either is fine, but I prefer GPL
  •   Support I suspect there may be situations where implementations of certain functions could be Apache-licensed while others can (and maybe certainly should) be kept GPLed. As a shameless plugan example, Ninai makes higher-level decisions about how to construct sentences based on the constructors one uses (do I use "was running" or "ran" in a particular past-tense context?), and calls functions in Udiron to actually create/manipulate syntax trees for those sentences (add the verb "was" between the subject and verb, make the verb a gerund). Perhaps the low-level functions (in this case, of Udiron) could be Apache-licensed, while the high-level functions (in this case, of Ninai) could be kept GPLed? Mahir256 (talk) 19:51, 2 December 2021 (UTC)Reply
    I'd argue that the "viral", copyleft nature of the GPL itself would make it quite unreasonable to host both Apache- and GPL-licensed code as part of the same project site. Should Wikilambda support forms of cross-site federation in the future, as are now available in Wikibase, it would be rather straightforward to host a separate project for GPL code. --Hupaleju (talk) 00:57, 3 December 2021 (UTC)Reply
  • As I already wrote, I'm inclined to GPL, LGPL, or AGPL, but I'm not really rejecting Apache. What I am really asking for is an explanation: why would Apache provide an ideal level of permissive flexibility? Even more importantly, why is permissive flexibility desirable? Currently, it just states this, as if it's obvious, even though it isn't. If this explanation is good, I may change my mind. --Amir E. Aharoni (talk) 07:28, 3 December 2021 (UTC)Reply
    Hi @Amire80:! I offer a lengthy explanation in the discussion section below. Let me know if you have still questions after that. -- DVrandecic (WMF) (talk) 17:16, 3 December 2021 (UTC)Reply
    Replied there. Amir E. Aharoni (talk) 11:47, 5 December 2021 (UTC)Reply
  •   Support. There are good reasons for both, it's more a political statement.--Strainu (talk) 08:28, 3 December 2021 (UTC)Reply
  •   Support I prefer GPL. John Samuel 12:36, 18 December 2021 (UTC)Reply
Neither is fine (please give a counter-proposal)
  • CC0 From Wikidata perspective, Wikifunctions is more than a backend for Abstract Wikipedia. It is likely to develop into general inference engine that complements data stored in Wikidata. Wikidata and Wikifunctions will co-develop into mutually interdependent projects that cannot be used separately. Licensing Wikifunctions under Apache License or even GPL therefore means effectively limiting Wikidata to the same license.
    To give a concrete example, Wikidata currently stores extensive tables of lexeme forms. If inference rules for predictable forms are added to Wikifunctions, most of the predictable forms are going to be removed from Wikidata in order to ease maintenance and distribution. Once that's done, it will be impossible to fetch form tables from Wikidata without going through Wikifunctions and complying with Wikifunctions license. Wikidata already excludes more complex lexeme forms, which are going to be available only via Wikifunctions.
    For this reason I suggest using the same license as Wikidata: CC0. — Robert Važan (talk) 15:03, 3 December 2021 (UTC)Reply
    • PS: As an alternative to licensing entire function implementations under CC0, I would suggest licensing executable code (without comments, full symbol names, etc.) under CC0. That would allow bundling of Wikifunctions extracts with corresponding Wikidata subsets under CC0. — Robert Važan (talk) 15:19, 3 December 2021 (UTC)Reply
      Hello @Robert Važan: Note that Apache does *not* limit its reuse to the same license (it is not copyleft / viral / share-alike). Furthermore, Apache offers consideration regarding patent and liability which CC 0 does not. -- DVrandecic (WMF) (talk) 17:18, 3 December 2021 (UTC)Reply
      @DVrandecic (WMF): IANAL, but AFAIK you cannot just wholesale relicense Apache code under MIT or CC0. This will get particularly problematic when table-like rule sets get moved forth and back between Wikidata and Wikifunctions. Mentioned lexeme inflection rules lie on such a boundary between data and code. ML models blending Wikidata and Wikifunctions content might be affected too. — Robert Važan (talk) 01:07, 4 December 2021 (UTC)Reply
      That is correct - you can't just relicense it. But you can use Apache code in your own projects, no matter if that project is otherwise licensed under MIT, Apache, GPLv3, or any or no other license at all, and share and republish it together - which is not the case for the GPL. IANAL as well. -- DVrandecic (WMF) (talk) 01:40, 4 December 2021 (UTC)Reply
    •   Support I would also prefer simply CCO (over both GPL and Apache) for many of the reasons Robert Važan mentioned. I really worry about the issue that with anything else thant CC0, anyone using the code will have to have a lawyer or pretend to be a lawyer to think through the implications of using the code in some their context, for their idea, can they do it, what are requirements, where to put the notice, what can I link with/use, can I use this license for that or this additional code. I have been in too many small open source project meetings spending too much time talking about those things. Large players will be able to figure that our anyway. So I would prefer to optimize for simplicity. Mitar (talk) 13:32, 7 December 2021 (UTC)Reply
      I just want to point out that for the most pleasant lawyer-free experience, Apache is a more worryfree choice than CC 0, due to its explicit coverage of patents. -- DVrandecic (WMF) (talk) 23:14, 9 December 2021 (UTC)Reply

Discussion about the Licence for Function Implementations

edit

One of the people we discussed this question with is Luis Villa. Luis was Deputy General Counsel at the WMF, led the revision of the Mozilla Public License, was part of the team representing Google in the Google-Oracle lawsuit, served on the board of directors of the Open Source Initiative, as an invited expert at the Patents and Standards Interest Group at the World Wide Web Consortium, and on the Legal Working Group of OpenStreetMap. He is now co-founder at Tidelift.

Luis raised many good points, but one in particular struck with me, because it made me pause and led me to think about licenses for open projects like ours differently: “Do we expect any structural features of the project to create share-alike (or anti-sharing) pressures? As a concrete example, the biggest pressure to participate in the Linux kernel is not the license; it is the immense velocity of Linux kernel development. This creates a strong engineering reason to work with the community—participate actively, or else your changes will get left behind. Similar pressure drives many corporate contributions to OpenStreetMap.”

I have seen such pressures in place with Wikidata, and the same pressures are also in place with Wikipedia. In both cases it is not the license that protects the project (in Wikidata’s case it is most obviously not the license), but it is the activity of the community, it is the fact that if you want to stay compatible with Wikidata, if you want to stay up to date with your knowledge, you can’t just take Wikipedia and start maintaining it yourself. A number of projects tried that, none of them with any noticeable impact.

I believe that the same will be true for Wikifunctions. I hope for Wikifunctions to become an active, lively project, that will constantly grow and develop and discover and extend to new horizons and new use cases. We will cover more and more languages, improve the constructors for abstract content, and discover how to reuse knowledge between languages in order to quickly gain coverage on new languages which are underrepresented on the Web.

Is there a benefit in our code to be reused widely? Yes! The more people use it, the more mistakes and improvements will be found and will trickle back. I want users on StackOverflow and similar Websites to answer questions with links to Wikifunctions, and to create functions on Wikifunctions so they can answer those questions. I want them to copy the code without having to think about the license to their answers, to their forums, to their own projects, and so much more.

Would the GPL stop others from copying the code? Would it stop them from, as was discussed above, from taking the natural language library, copy it wholesale to their project, and then make improvements that they are not giving back? No, not at all. Even if it was GPL, they still could just copy the functions, and use that to create the text they need in the languages they need, and just publish the text. The value for another organization will be in the output of these functions - not in the functions themselves. The fact that they can create text in many different languages. Even if the code was released under GPL, there is nothing that would compel such players to give back to the project. Or, as James Forrester put it much more succinctly: more restrictive licenses such as the GPL will not stop the actors we want to stop, but they may very well stop the actors we want to share with.

What will compel such actors to give back to the projects? If we actively grow the project, if we keep acting languages, if we extend the expressivity of our language. Then forking our code just doesn’t make sense, just as forking Wikidata or forking OpenStreetMap or forking Wikipedia does not make sense. It is not the license, as Luis points out correctly, it is other pressures.

A less liberal license will only make it more likely that we will be used more by big players with legal departments who can deal with the complexities of a five thousand word license. There are projects where the GPL may be a great choice - complex, monolithic software projects like GIMP, MySQL, or Blender which are in a space where the risk of being commercialized and closed in a product is a real risk. But for a library of functions which are not even an application?

According to Wikipedia, the GPL is incompatible with being distributed through the Apple and Mac app stores. Shall we really use a license that is incompatible with being distributed on such widely popular devices?

Don’t misconstrue me as an enemy of the GPL. It has its uses. I don’t think Wikifunctions is one. I don’t expect to change the mind of anyone who came here to advocate for copyleft licenses, but I hope to make it clear why I think a project such as Wikifunctions would benefit itself, and would be most beneficial to others, by adopting a less restrictive license such as Apache.

My recommendation is to vote for the Apache license. These are my thoughts on the question, and not the opinion of the WMF. --Denny (WMF) (talk)

I want to note 2 things:
  • With GPL, you don't have to compel anyone to do anything. If you like their derivative work, you're free to just re-incorporate it in the base software. I know there are ways around that and they're easier to implement for small chunks of code than for monolithic applications, but that's not a reason to just give up on that ability.
  • You make a generalization around app stores. There is a single app store noted there. It has a significant market share, but it can't be described as "the most widely used devices". Android appstores, including the Google Play and the Huawei equivalent, do list GPL software. There are even dedicated OSS repos, such as en:F-Droid.--Strainu (talk) 10:42, 3 December 2021 (UTC)Reply
@Strainu: Thanks! The Wikipedia article was misleading me. I honestly was not aware that the Google Play store is compatible with the GPL. I fixed my statement accordingly. -- DVrandecic (WMF) (talk) 06:01, 4 December 2021 (UTC)Reply
I am sceptical of the idea that Copyleft licences favour Big Tech more than CC0. Google had CC BY-SA Freebase and chose to transfer its content to CC0 Wikidata. A player the size and wealth of Google can easily take CC0 material and create a proprietary black box derivative. (Also, I wouldn't necessarily rely on a Google lawyer   for conscientious advice on what best serves our interests.) --Andreas JN466 18:34, 4 December 2021 (UTC)Reply
@Jayen466 with all due respect, your "Google lawyer" jab is pretty much a low blow, and I would really like you to strike it off from your intervention. Thanks. -- Sannita - not just another it.wiki sysop 15:46, 5 December 2021 (UTC)Reply
I think it is only prudent, when taking advice from professionals with a history of defending the commercial interests of for-profit companies that have a very substantial commercial interest in our work, to bear the possibility of conflicts of interest in mind. (There is nothing improper in professionals working for a company like Google. We all have to earn an income.) --Andreas JN466 16:08, 5 December 2021 (UTC)Reply
Sorry, but this explanation is theoretical, generic, and not very convincing.
Here's a realistic, easy-to-imagine scenario:
  1. Wikimedia editors and developers (paid or volunteer) make functions that generate help documentation for MediaWiki features. They work well in English, French, Spanish, German, Polish, and Russian.
  2. A translation agency somewhere in the E.U. notices this, sets up its own a MediaWiki instance with the WikiLambda extension and copies the functions from Wikifunctions. They write abstract content that generates multilingual user manuals for consumer electronics products. So far, so good: everyone is supposed to be happy about this.
  3. That agency hires developers and linguists who extend the renderers so that they are able to work in Hungarian, Finnish, Turkish, Albanian, and Maltese.
  4. It is subsequently bought by a software company, which extends the renderers to Arabic and Japanese, and start selling the package as "the global technical documentation translator".
  5. A religious publisher buys a license for this software, develops renderers for twenty languages of Western Africa, and uses it to prepare its tracts in these languages.
So we have software, which is modified and distributed: scenarios where viral licenses are supposed to be useful.
Can Wikimedia force them to share the code that supports Hungarian, Finnish, Turkish, Albanian, Maltese, Arabic, Japanese, and languages of Western Africa? With a copyleft license, it can. With a permissive license, it cannot. After a couple of years, Wikimedians will perhaps develop renderers for Finnish, Hungarian, and Japanese. Other languages will probably have to wait for many more years until capable volunteers will make something for them, or to apply for grants from Wikimedia or other nonprofits.
Now, you could say that if these companies couldn't sell it without having to give the code back to Wikimedia, maybe they wouldn't develop support for these other languages in the first place. Well, maybe, but there's also another scenario: They do make it, they do share the code with Wikimedia and the rest of the world, but they build their business model around developing abstract content that would be useful for commercial reusers.
You could also say that they can avoid the virality by keeping the software on their own computers and offering text generation as a service. That's why I mentioned AGPL as a possibility: this license is perhaps better for online services. --Amir E. Aharoni (talk) 11:47, 5 December 2021 (UTC)Reply
@Amire80: I think your scenario is easy to imagine, but to be honest I don’t find it particularly convincing. Why would the software company in steps 2-4 offer to sell the software as off-the-shelf software? This would mean that its clients would need to train enough of its employees all about the proprietary set of constructors of that company and how to express their content in these constructors. The clients, such as the religious company in step 5, if they want to buy and use that software, would need to train a large enough team to avoid the bus factor within their company just to learn how to use that proprietary system. That’s an investment that I don’t see as particularly likely (how many companies employ in-house translators in the first place? For this team, each member of the team would likely be more expensive than a translator)
In my opinion, a much more likely scenario is that a company would offer the service of full-fledged translation of natural language text. The offer would be “give us your text, and we will give you the text in the languages you pay for - but unlike our competition, we don’t increase our cost linearly with the number of languages, but every language beyond the first one is much cheaper”. And that company then uses their internal fork of Wikifunctions to create the content they sell.
But in the scenario I describe, whether Wikifunctions uses Apache, GPL, AGPL, BSD, or any other license simply doesn’t matter. Since the running of the software is entirely internal to the service company, no free license could force them to give back their changes to the source code.
I don’t see a scenario where the GPL would lead to a benefit, but I see many scenarios where the GPL would be discouraging reuse. E.g. the fact that you can’t use it in any app that one wants to disseminate, even for free, through the Apple AppStore. What is more likely? The scenario that you describe, or the scenario that a hobbyist would like to use a function within an App they want to share on the iPhone? -- DVrandecic (WMF) (talk) 22:48, 6 December 2021 (UTC)Reply
About your religious organization question: strange as it may seem, I know of at least one such organization that did exactly this: bought per seat licenses and trained dozens or even hundreds of staff. It was doing it five years ago, when SAAS was already popular. (For privacy reasons, I won't mention this organization's name publicly, but it's well-known. I learned about it by talking directly to translation staff. They showed me the thick training materials and the software working on their desktops.)
About the iOS app: this dilemma would be relevant if we were developing a competitor to SwiftPM, NPM or Crates.io, but as far as I understand, WikiFunctions' primary goal is supporting the development of a multilingual Wikipedia (specifically Wikipedia, not even another kind of wiki!), so its most complex and unique algorithms are expected to be in language generation. (Apple's unfriendliness to GPL cannot be an excuse to every instance of deciding not to use a copyleft license. Wikipedia app's code is permissive, but the text it serves is copyleft. Besides, Apple is not even that important outside North America.) --Amir E. Aharoni (talk) 20:10, 7 December 2021 (UTC)Reply
To make it clear: Wikifunctions's primary goal is to support the development of a multilingual Wikipedia, but I hope that we will have many, many more use cases that we will cover. I would be surprised if the most complex and unique algorithms would be expected to be about NLG. Wikidata's primary goal is to support Wikipedia, but since then many more goals have been enabled. I fully expect and hope that Wikifunctions too will enable new use cases that we currently don't even dream about. And to stay with your statement: "Wikipedia app's code is permissive, but the text it serves is copyleft." - Wikifunctions is akin to the app code, and the text is akin to Abstract Content. -- DVrandecic (WMF) (talk) 21:58, 9 December 2021 (UTC)Reply
@Amire80: That's not how the world works. Forks and derivative works are not an asset. They are a burden on downstream projects. Contributors, whether individuals or large businesses, have to fight to get their modifications upstreamed. As opensource maintainer I have turned down pull requests from others and I had to work to get my own pull requests merged. Large projects from Linux kernel to Wikipedia are inundated with contributions. Maintainers of these projects have no need or desire to scavenge content. Instead, they have inclusion policies to limit contributions. I see Wikifunctions the same way - as a curated central repository of the most valuable code. Private projects should be allowed to grow around Wikifunctions without restriction to meet their specific needs. It's the same with Wikidata except that Wikidata curates central ontology instead of code. — Robert Važan (talk) 04:50, 8 December 2021 (UTC)Reply
I'm not sure how is it related to what I wrote.
Wikimedia has never been a movement focused on making reusable software. Wikimedia is a movement focused on making Free Knowledge, which uses Free Software because it's necessary for Free Knowledge. Wikifunctions is not likely to change this, and I'm not even sure that there's an intention for it to happen. Over the years, we've made some software with the intention to make it reusable as components, for example CSSJanus, standalone Visual Editor, jquery.uls, and maybe some more things. Did anyone of them actually become popular reused components? As far as I know, they have not, but do correct me if I'm wrong. It's not a big failure, because all of them are useful for making Free Knowledge, and Free Knowledge is being made, at least in some languages (I do wish that our excellent internationalization tools and practices were more known and used in the industry, however; but I digress).
The most important functions on Wikifunctions are going to be functions that are useful for generating content for Wikipedia, especially renderers. If anyone wants to contribute a function to Wikifunctions and see it actually used, this is going to be the primary test: is it useful for at least one Wikipedia. If it's a renderer that will make it possible to generate 20,000 useful articles for the Arabic Wikipedia, it will probably be "merged" (Arabic is just a random example). I put "merged" in quotes, because this is also the difference between wikis and other Free Software projects: on the wikis, we don't actually submit patches and wait for code review. We do it on Gerrit, but on the wikis, most things are published first and checked later, and this includes templates and modules, which are comparable to wikifunctions. On Wikimedia projects, only sensitive templates and modules are protected and have to go through "review".
Wikifunctions is supposed to make the development of renderers possible. I don't think that there is anything in the universe that can make the development of renderers easy. It will always require difficult software development and applying advanced metalinguistic knowledge. If there is no one in the Wikimedia community who has the skills to develop a renderer for Arabic, it will not be developed (and again, it's just an example, and the same goes for any other language, even English). But if there's someone who is external to the community, and has these skills, and has the motivation to develop it, then they can develop it.
And now the question is what are the possible motivations. Using existing Wikifunctions, and the general renderer framework to develop new renderers for commercial purposes can be such a motivation. That's the example I gave above. If the renderers framework is good enough, then the motivation will be strong enough to also want to fund the development of new renderers and release them as Free Software, available for use for Wikimedia and anyone else, and there will be more ways to build business models around it. Amir E. Aharoni (talk) 10:27, 8 December 2021 (UTC)Reply
Robert Važan, I do think it makes sense to think in terms of languages, as Amir is suggesting, because the input of linguists (native speakers of scores of languages) will be absolutely crucial. If the idea is to reach more language communities, isn't it desirable for the tools supporting access to any particular language community to be accessible to all (including Wikimedia), rather than just one company holding the rights to that technology? --Andreas JN466 21:04, 8 December 2021 (UTC)Reply
@Amire80 and Jayen466: I don't want to get too involved, because this project is not my primary focus. I just wanted to make a few points to contribute to the discussion. I think I am in the minority here in that I see Wikifunctions as a general-purpose inference engine that will complement Wikidata whereas others see Wikifunctions as mere NLP engine for Abstract Wikipedia. When I was talking about private projects meeting their specific needs, I meant NLP applications beyond Abstract Wikipedia: dictionaries, spellcheckers, translators, sentiment analysis, claim extraction, full-text search, ... These projects will be easier to do with more permissive licensing. People working on them are likely to become part-time Wikifunctions contributors, because it is in their interest to improve upstream project they depend on. — Robert Važan (talk) 02:22, 9 December 2021 (UTC)Reply
@DVrandecic (WMF): You're still missing the point I raised above. Follow the Commons approach, which has already been demonstrated to work well. Support multiple licenses as early as possible, don't restrict things to a single license where there are compatible ones out there. It just needs the license to be defined by a template like they are on Commons, which shouldn't be too difficult to implement. Thanks. Mike Peel (talk) 21:55, 5 December 2021 (UTC)Reply
Not sure whether this is really a good idea here. At Commons, there is usually one main creator/uploader of a file who can choose from a range of licenses. Most follow-up edits are either purely maintenance such as categorization and stuff, or minor modifications of the main work. Functions, however, are probably much more a collaborative effort where I cannot imagine how multiple licenses could work. Should the initial creator select a license and all others have to live with it? I don't thinks this works, just as it would not work for Wikipedia. —MisterSynergy (talk) 23:23, 5 December 2021 (UTC)Reply
@Mike Peel: The question is whether this is needed before launch. As I suggested above, let us start with a single license, and then, after launch, assess how and whether a multi-license model à la Commons is beneficial. This is a complex question (how would the different implementations play with each other? How do we display licensing information? How do we support, as MisterSynergy points out, the fact that different people might be working on a piece of code together? etc), and will require potentially a significant amount of implementation work. It would be great to postpone this discussion until we have a better understanding of how the system develops, and what the benefits of having multiple licenses would be. -- DVrandecic (WMF) (talk) 23:12, 6 December 2021 (UTC)Reply
After reading your position, I got reminded that StackOverflow's code is in fact licensed under CC-BY-SA, but I have not ever seen anyone respect that when they are copying code and using/remixing it in their own project. Has anyone ever changed their codebase to CC-BY-SA as a conseqeunce? No. Also, what exactly does it count as copying here? If I look at the answer and I write into my code something very much based on the answer on StackOverflow, is that copying or not? To me this shows that licensing such fragments of code under a copyleft license does not make the effect the license hopes to achieve. It just makes everyone be an copyright infringer. So I agree with your position, it makes little practical sense to use a copyleft license, being a live and large project is a much better protection. Mitar (talk) 13:32, 7 December 2021 (UTC)Reply

(Add your discussion points / argument here)


Abstract Content for Abstract Wikipedia

edit

Summary of the pros and cons

edit

The discussion has been mostly around whether Abstract Content is copyrightable at all, which is a prerequisite for CC BY-SA.

The main difference is that CC BY-SA is a copyleft license whereas CC0 is not. This means text incorporating CC BY-SA content must be attributed and must be shared under the same license. CC0 has no such requirements and allows for easier reuse.

Wikipedia content is under CC BY-SA (version 3.0, unported).

!Votes about the Licence for Abstract Content

edit

Reasonings and/or signatures are welcome.

For CC BY-SA and against CC 0
  •   Support This would be easier for reusing Wikipedia content on Abstract Wikipedia. --Ameisenigel (talk) 20:44, 2 December 2021 (UTC)Reply
  •   Support as previously stated. I agree that CC-0 is appropriate for Abstract Descriptions on Wikidata.--Eloquence (talk) 06:11, 3 December 2021 (UTC)Reply
  •   Support as described above, it's essential to have free flow between Abstract and other Wikipedias.--Strainu (talk) 08:30, 3 December 2021 (UTC)Reply
  •   Support I am still specifically convinced (as I was in the first round) that, while the fact that Jupiter is a planet is not copyrightable, a construct that compares Jupiter to all other planets in the Solar System is sufficiently elaborate to make me think it would be more defendable with a CC BY-SA license, than a CC0. In this, I totally agree with the recommendations from the dev team. Full disclosure: I will very likely join the AW team on January 2022; while this did not influence my position about the question, I wanted to go public with it for transparency's sake. --Sannita - not just another it.wiki sysop 17:13, 3 December 2021 (UTC)Reply
  •   Support as default, possibly w/ a way to explicitly flag subsets of abstract content that seem CC0 (by necessity). I.e. if we decide that subsets are clearly not copyrightable, there should be a way to note that on the projects and not ask reusers to guess. –SJ talk  18:37, 3 December 2021 (UTC)Reply
  •   Support per the reasoning expressed by Heather Ford here (end users can tell info comes from Wikimedia, plus problem-free re-use of Wikipedia content per Denny and CC licence rather than proprietary status for derivatives) --Andreas JN466 02:25, 4 December 2021 (UTC)Reply
  •   Support The most notable thing we license as CC0 is Wikidata, which is like a database: large and complex in aggregate, but consisting of tiny statements that are hard to license even with a permissive license, let alone a viral one. Abstract Content is not similar to that. In form, abstract content is more like code: Being able to write in a human language is not enough to be able to write it; no one will ever be able to write abstract content without at least some understanding of programming. In function, and in name, abstract content is content. There are some small exceptions, but we usually don't license code or content as public domain, and that's the right thing to do.
    As for the output, I am not a lawyer, but I doubt that it's as easy to enforce the "BY" and "SA" parts of CC-BY-SA with the pure output of the functions as it is with concrete content written by humans. But since this output is expected to be often mixed with concrete content, they should have the same license. --Amir E. Aharoni (talk) 11:00, 5 December 2021 (UTC)Reply
  •   Support I believe that any content created by Abstract Wikipedia should be required to be sourced to Abstract Wikipedia in its future uses. As I alluded to above some concepts exist in some languages while being absent from others. Concepts already coded into wikidata do not yet exist in other languages and may never do. A convenient example is agender gender. Standardized algorithmic decisions regarding how this will be rendered for people (and possibly for fictional characters) would involve, it seems to me, a certain amount of sweat (and possibly raising) of the brows (would it be different for people and for fictional characters? Would the author's own pronoun use be authoritative, overriding standard Abstract Wikipedia usage for fictional characters? for people? etc.) This is only the tip of the iceberg, of course. The same verb in different languages may obviously have different verbal valences, logical entailments ... Responsibility and humility suggest that it's best to require signing one's work. (killing the author also kills authority)... SashiRolls (talk) 19:44, 6 December 2021 (UTC)Reply
  • I don't want CC 0. I wanna the minimal CC BY license. CC 0 doesn't do be good way to import content from CC BY (Example for the data from some organisation). I don't want a copy (or a part) of Wikidata. ✍️ Dušan Kreheľ (talk) 14:23, 13 December 2021 (UTC)Reply
  •   Strong support per prior arguments. Vaticidalprophet (talk) 19:36, 15 December 2021 (UTC)Reply
  •   Support Same as my comment on software licensing. Wugapodes (talk) 06:38, 16 December 2021 (UTC)Reply
For CC 0 and against CC BY-SA
  • I am still not convinced that there is sufficient copyrightable aspect in Abstract Content to consider CC by-sa a viable option for it. Thus CC0. Unfortunately, the arguments brought forward for CC by-sa are mainly of non-legal nature, such as "we might not want to compete with (non-abstract) Wikipedias by choosing a less restrictive license", or "we want re-users to provide data provenance information by choosing a copyleft license, in order to enhance verifiability". MisterSynergy (talk) 20:22, 2 December 2021 (UTC)Reply
  •   Support If the output should have the same license as the Abstract Content (which I really don't think is necessary), then I think it follows that the Abstract Content should be CC 0 since I don't think the output can get copyright. My explanation is in the discussion below. Ainali talkcontributions 10:58, 4 December 2021 (UTC)Reply
  •   Support I maybe missing somethin but CC BY-SA implies an author, if I understand the author is the wikifunctions, it's feel very weird to have a non-person for author (and also a bit weird to have a different licence for the function and the output of the said function). A legal advice would be nice here. Cheers, VIGNERON * discut. 21:08, 6 December 2021 (UTC)Reply
  •   Support I think we should not be trying to expand what can be copyrightable further. By going with CC-BY-SA we would be signalling that a large community (Wikipedia community) is behind the idea that these artifacts can be and should be copyrightable. I always saw CC-BY-SA as a hack on copyright to get less things be restricted by copyright and I think we should not forget that. So having a fully restricted content which copyright holder licenses under CC-BY-SA (to retain some rights) and makes it more available is a win. But having something which could be fully open and not copyrighted at all and put a license on top of it to restrict it is losing, at least from the (my) perspective of free culture. Mitar (talk) 13:39, 7 December 2021 (UTC)Reply
  •   Support a large part of the content wouldn't be copyrightable and for the average users it'd be hard to tell which part has copyright and which part hasn't. We must remember that the attribution requirement prevents many users from using the content when there is not practical way to do so (e.g. in flyers, surveys, posters, etc.) and having the license would mean that large shares not copyrightable content would still not be used, because many users may have problems telling the difference. SFBB (talk) 12:57, 14 December 2021 (UTC)Reply
  •   Support CC0 because it's more open and especially for this type of content I feel like it's more aligned with the vision of Wikimedia. Not to talk about the risk of this becoming a negative prejudicial if CC BY-SA is selected. --Abbe98 (talk) 09:32, 17 December 2021 (UTC)Reply
  •   Support Support for CC0, firstly because it is more open and considering the success of Wikidata as well as the ambitions of Abstract Wikipedia Content for supporting multiple human languages. John Samuel 12:28, 18 December 2021 (UTC)Reply
Either is fine, but I prefer CC BY-SA
  •   Support I think that some abstract content constructors by themselves could be CC0, but at a certain level of complexity (particularly when context is specified that would control their presentation) they might warrant CC-BY-SA licensing. For more elaboration, see the example I posted in the discussion below. Mahir256 (talk) 18:43, 3 December 2021 (UTC)Reply
  • &SSupport I do not “prefer” CC BY-SA but I agree that it should be the default licence. Please see my comments under “Neither”, below. I do not agree that “Either is fine”; I believe that there will be cases when CC BY-SA would not be wholly appropriate or fully effective. Most particularly, I oppose CC BY-SA for abstract or output content that is equivalent to content that is in the public domain in the United States. I would support CC BY-SA “unless otherwise noted” (as at en:Wikipedia), with the expectation that the community would develop policy and guidance around the inclusion of content that should not (or cannot) be licensed CC BY-SA (most particularly, public domain and equivalent content).--GrounderUK (talk) 12:54, 9 December 2021 (UTC)Reply
  •   Support While I get the "output different to input" concern, I do believe that the reasoning provided, most notably similarity to content in function, if not form, and that of interoperability with Wikipedia - these reasons are sufficient to at least favour CC BY-SA. Nosebagbear (talk) 13:43, 13 December 2021 (UTC)Reply
Either is fine, but I prefer CC 0
Neither is fine (please give a counter-proposal)
  • The licence should be an automatic consequence of the licensing of the original content. This means that an original composition of CC0 content would be CC0. A modification of CC BY-SA content would be CC BY-SA. However, we should aim to move away from monolithic licensing at the article level. The abstract content should be understood (where it is the case) as a composition of modified content within which different subsisting copyright and licensing applies. This means that public domain content can be “translated” into abstract content and (effectively) remain within the public domain with CC0 licensing. Subject to WMF agreement, it would also allow the inclusion of NC content. This would be suppressed in the default BY-SA output content, just as BY-SA content would be suppressed in CC0 output content. In any event, where abstract or output content is licensed as CC BY-SA, we should avoid an absolute requirement for the “translation” to require attribution in addition to the attribution requirements arising from our use of the original content. CC BY-SA requires us (and subsequent reusers) to specify how the original content was modified; this should be sufficient attribution. --GrounderUK (talk) 10:48, 4 December 2021 (UTC)Reply
@GrounderUK: Yes, the license of the Output Content should be the same as the license of the Abstract Content, agree! The question here is not about the license of the output content in general, but rather how the Abstract Content for Abstract Wikipedia specifically should be licensed. -- DVrandecic (WMF) (talk) 23:07, 6 December 2021 (UTC)Reply
@DVrandecic (WMF): We agree on output content, but here I am referring only to abstract content for Abstract Wikipedia. Abstract content that is a derivative (“translation”) of public domain content can have CC BY-SA licensing only for the derivative. I oppose this because it applies restrictions on under-served language versions that do not apply to the well-served language version that uses (untranslated) content that is in the public domain. Of course, most Wikipedia articles are not wholly within the public domain, although they are required to indicate which parts are. Clearly, the BY-SA licensing does not (and cannot) apply to those parts. However, if we “translate” such an article into abstract content we would be compelled to license the whole “translation” as a BY-SA derivative. I propose, instead, to divide the original up according to its actually effective licensing and publish only the “translation” of BY-SA parts into abstract content as a BY-SA derivative. The “translation” of public domain parts of the original could then be published either as a BY-SA original or with a CC 0 waiver. I would oppose the BY-SA option (to repeat myself). Similar thinking might be applied to legitimate use of copyright material within the original article, but here I believe that we would not necessarily have the right to create the derivative. If we do, I would support BY-SA licensing for our “translation” of those parts.--GrounderUK (talk) 01:40, 7 December 2021 (UTC)Reply
As has been clarified previously, this discussion is only about how the Abstract Content in Abstract Wikipedia itself should be licensed! If one specifically intends to create a "verbatim", literal translation of existing public domain content into Abstract Content, I agree that this should be CC0 licensed but it should also not be part of Abstract Wikipedia as a project. Rather, it should probably be included in Wikidata, since it is after all a matter of structured factual claims about some real-world entity (viz. a publication). --Hupaleju (talk) 10:14, 7 December 2021 (UTC)Reply
Thanks, @Hupaleju:. I am happy to confirm that my comments relate only to public domain content that appears in an existing Wikipedia article. That includes an addition of public domain content into an existing Abstract Wikipedia article’s abstract content. The difficulty is that such public domain content would not be verbatim or with minor modifications, so an explicit CC 0 waiver seems to me to be the least problematic approach, since we might otherwise give the impression that our abstract representation of the public domain content is copyright protected.--GrounderUK (talk) 12:21, 7 December 2021 (UTC)Reply
Is that in any way different from, e.g. Wikipedia already incorporating text from PD sources? We have text from an out-of-copyright Encyclopædia Britannica in English Wikipedia. We have the full text of the Gettysburg Address on the respective Wikipedia page. There is no marker on the latter that explains that it is PD. I think that would be equivalent to the situation you describe? -- DVrandecic (WMF) (talk) 19:05, 7 December 2021 (UTC)Reply
@DVrandecic (WMF):It is very much my hope that the situations should be as nearly equivalent as possible. WMF Terms of Use say: “When you contribute content that is in the public domain, you warrant that the material is actually in the public domain, and you agree to label it appropriately.” For unaltered PD text, that requirement is easy to comply with. With an abstract representation of that text, it is not clear whether the abstract content is actually in the public domain, but I doubt it. To preserve equivalence, I should be required to dedicate it to the public domain under a CC 0 waiver. However, it seems likely that I cannot license a contribution under BY-SA and simultaneously apply a CC 0 waiver to some part of that contribution. This is not a problem where content is already in the public domain, since the CC BY-SA explicitly excludes PD content (“You do not have to comply with the license for elements of the material in the public domain”). The challenge we face is how to get our “adaptations” of PD content into the public domain. Sorry, I am not trying to be difficult, I just cannot see how we can secure equivalent access to PD content under a pure BY-SA licence. Perhaps we should just include it in the original language and link to a CC 0 abstract representation once one (magically) becomes available?—GrounderUK (talk) 21:02, 7 December 2021 (UTC)Reply
Yes, I agree, it would be good to preserve equivalence as much as possible. At the same time I hope that this can be something we account for later. It's a very grey area. Thank you for patiently explaining! -- DVrandecic (WMF) (talk) 21:51, 9 December 2021 (UTC)Reply
@DVrandecic (WMF):You’re welcome! Please note my qualified support above preferring CC BY-SA “unless otherwise noted”. Thank you. --GrounderUK (talk) 01:31, 10 December 2021 (UTC)Reply
Noted! -- DVrandecic (WMF) (talk) 01:34, 10 December 2021 (UTC)Reply
Maybe CC-BY. Want we have CC-BY-SA for import of code from Wikipedia? We have more import from source of data on Wikipedia. ✍️ Dušan Kreheľ (talk) 14:23, 13 December 2021 (UTC)Reply

Discussion about the Licence for Abstract Content

edit

My main argument for CC BY-SA for Abstract Content for Abstract Wikipedia is to allow for a smooth and unproblematic reuse of Wikipedia content to very directly inspire and guide the creation of Abstract Content without having to worry about a possible copyright infringement.

A second argument is that there is a certain risk when using CC 0: should Abstract Wikipedia be reasonably successful, it might actually create an alternative to Wikipedia proper that, even though still worse than Wikipedia, might become preferable due to the license. This unfair advantage might, in the long run, lead to wider availability of worse content. By keeping the license the same as Wikipedia we would not create an incentive to use worse content. This argument has developed from Luis’ question above and from Legal’s guiding question.

My recommendation is to use CC BY-SA for the Abstract Content for the Abstract Wikipedia. These are my thoughts on the question, and not the recommendation of the WMF. --Denny (WMF) (talk) 18:43, 2 December 2021 (UTC)Reply

If people are willing to choose worse content over having to comply with the terms of a CC BY-SA license, to me that is an argument against CC BY-SA. If it's a good license to choose, we shouldn't have to force people to use content which uses it. - Nikki (talk) 05:38, 3 December 2021 (UTC)Reply
@DVrandecic (WMF): I think you are just assuming that it is legally required for "smooth and unproblematic reuse of Wikipedia content" for Abstract Content for Abstract Wikipedia to be licensed under CC-BY-SA. We do not really know that and I find it sad that we would be taking a defensive position, a safe one, just because we are unclear, and lock up under a non-public domain license a new set of knowledge. With this we would even signal that we do think such knowledge can and should be copyrightable in the first place. Have you talked to Luis if he thinks if using Wikipedia content to derive Abstract Content is really something which would require this license? Because I find it that Abstract Content is like me writing new content, only I am taking knowledge/inspiration from existing sources. Like if I read books on a subject and then go to Wikipedia and write an article on that subject. I can pick any license for that new content. The fact that it was inspired and uses knowledge from some other books does not prevent this. Only if we will be able to make a tool which automatically converts Wikipedia content into Abstract Content we might be unable to claim that it is just inspiration and that such automatically derived content is then probably really just a different representation (translation) of the same Wikipedia content. Mitar (talk) 07:44, 9 December 2021 (UTC)Reply
Luis and I have not talked about this particular question. -- DVrandecic (WMF) (talk) 21:52, 9 December 2021 (UTC)Reply

Some users above have raised the concern that Abstract Content, taken in general, might be uncopyrightable. I find this very hard to believe wrt. Abstract Content for original encyclopedic text (specifically, for Abstract Wikipedia), which is what's being discussed in this section. One should keep in mind that even the so-called "structure, sequence and organization" of a copyrighted work can itself be protected when it involves some sort of original creation, as opposed to being dictated by outside constraints or entirely customary. Moreover a typical, high-quality encyclopedic article will also contain a significant amount of creative and original expression beyond that - and one of the goals of Abstract Content is to allow for this sort of expressive judgment to be faithfully preserved even when working across languages. Thus, I see CC-BY-SA as the only viable license for a project with Abstract Wikipedia's scope.

At the same time, the sensible choice of CC-BY-SA for the language-independent "source" content of the Abstract Wikipedia should not foreclose the possibility of licensing some content in Abstract Text form as CC-0 within some other, very different project(s). I'd argue that this opportunity should be discussed very clearly since it will have major impacts on how these projects are organized for the foreseeable future. An Abstract Wikipedia that's licensed as CC-BY-SA cannot realistically be hosted "within" or "under" Wikidata without creating an undue burden for that project (it can of course use Wikibase and federate *with* Wikidata!); while it might be entirely proper for Wikidata to contain Abstract Text structures of its own, for its own purposes (including e.g. for Abstract Descriptions, Abstract Senses of lexical data, etc.), licensed under CC0 as with the existing structured content of Wikidata itself. --Hupaleju (talk) 23:38, 2 December 2021 (UTC)Reply

Yes, to make it very explicit (and I am happy with suggestions to make it more explicit, if you find a place where to say it): This is only about the Abstract Content for Abstract Wikipedia. For example, in particular, Abstract Content for Abstract Descriptions should and will be licensed under CC0, following how Descriptions in Wikidata are licensed now. The same for Abstract Glosses. Fully agreed!
In fact, if someone wants to create Abstract Content and publish it without a free license, that will also be totally fine (but it wouldn't be hosted on Wikimedia projects, as we only host open licensed content). -- DVrandecic (WMF) (talk) 00:01, 3 December 2021 (UTC)Reply
If our Abstract Content would be licensed CC by-sa, someone could also create their Abstract Content and publish it under CC0. The Output Content might even be very similar to ours, depending how much (or little) protection it actually enjoys. —MisterSynergy (talk) 09:18, 3 December 2021 (UTC)Reply
  •   Support for the recommendation. Mathematics belongs to all of humanity and access to it should not be restricted by legal terms; many function definitions and simple implementations will be overlapping with mathematical concepts and insights that nobody, not even Wikipedia, has a right to put their conditions on. More practically, more permissive licenses will simplify usage for all the people and small organisations that do not have a large corporate legal department to protect them. We would fail the open source community if we adopt a license that excludes a big share of modern OSS projects (as a developer, I have founded and maintained GPL and Apache Licensed projects, and would like to use AW in either case, which I [to my legal understanding] only works if we choose a more permissive license). --Markus Krötzsch (talk) 07:19, 3 December 2021 (UTC)Reply
    @Markus Krötzsch: Thank you a lot for your answer! Would you mind if I put / would you mind putting your vote on the specific options above? -- DVrandecic (WMF) (talk) 17:26, 3 December 2021 (UTC)Reply
  • I prefer licensing CC 0. This makes it easier to reuse something without doing it against the license. Usually I learn something by looking at things that exist about that topic and then take these examples and prepare them for my use case. If I need to collect the URLs of the pages where I have it from and forget that in a case then I hurt the license. After my understanding of CC-BY-SA the license applys also for derivates of a work. So I am not sure how far a user can publish propably copyrightable content for what a implementation in Abstract Wikipedia exists under another license if the Lexemes used in it for example are changed. So the question is how long is a work a derivate of something else and prevents CC-BY-SA external users from using Abstract Text in their projects that are lincses under other licenses. This is a question for what I wish a legal investigation after as I understand there are different views about what follows out of the decision for other external users of the abstract text generation system.--Hogü-456 (talk) 18:11, 3 December 2021 (UTC)Reply

To provide an example of my reasoning for my abstract content license vote and I should note, despite being de-0, I'm only using a German example here because I expect no one else to understand/care for a Bengali one, consider the constructor combination

 Action(
   [[d:Lexeme:L617591#S2|L(617591)["S2"]]],
   Agent(Speaker()),
   Hesternal(),
   Topic(Concept("[[d:Q187880|Q187880]]")),
   Source(Instance(Concept("[[d:Q326653|Q326653]]"), Definite())))

This by itself, yielding the base rendering into German ("Ich nahm gestern dem Buchhalter Excalibur weg.", using the default parameters of Ninai/Udiron) is likely worth keeping CC0, and permutations of the arguments after the first argument to "Action()"—Action(L(...), Topic(...), Hesternal(...), Source(...), Agent(...)) would be an example of such a permutation—are also likely worth keeping CC0 as well. When contextual modifications to components are specified in a wrapper around that combination, such as

 With(
   # begin contextual modifications
   Apply("Action", Passive()),
   Apply("Action/Agent", Nosism()),
   Apply("Action/Hesternal", InitialEmphasis()),
   Apply("Action/Source/Instance/Definite", Emphasis()),
   # end contextual modifications
   Action(
     L(617591)["S2"],
     Hesternal(),
     Source(Instance(Concept("Q326653"), Definite())),
     Agent(Speaker()),
     Topic(Concept("Q187880"))))

which might yield the result "Gestern wurde Excalibur diesem Buchhalter von uns weggenommen." (possibly grammatically incorrect—I'm trying as best as I can), then I could see the resulting abstract content being CC-BY-SAable (with attribution to the person who decided to apply those contextual modifications). @MisterSynergy: for thoughts on this. Mahir256 (talk) 18:43, 3 December 2021 (UTC)Reply

Question is: what exactly is copyrightable in your example? This has unfortunately not really been addressed yet (by WMF/Denny).
Abstract Content is not automatically copyrightable just because it looks complex or lengthy, or is expensive to compile. As I said earlier (context: #Do we really have a choice?), we can have copyrightable aspects selection and arrangement in Abstract Content: you select facts/information (from Wikidata) to be included in the Output Content and arrange them in a way that the Output Content makes some sense; you also select functions (from Wikifunctions) to be used for the text generation, which is the way how one can vary expression. The Abstract Content itself, however, in spite of having some sort of complex form, is a technical notation that is basically a mapping of factual data as input parameters to functions—no creativity is required to compile the textual form of Abstract Content. In my opinion it has some conceptual similarity to, for instance, a JSON representation of a Wikidata item that is also nothing but a technical representation of factual data; JSON (and other) representations of factual data are not published under a copyleft license as well, for good reasons.
While I do acknowledge that we could publish selection and arrangement of the Abstract Content under CC by-sa, I am convinced that this is not what anyone would expect—neither Abstract Content authors, nor reusers of Abstract Content/Output Content. Any license that requires some sort of action by the reuser would thus very likely lead to frustration on both ends.
Besides all of this, we need to consider that the difficult parts of Abstract Wikipedia are Wikidata (needs to be rich of data) and Wikifunctions (does the complex job of text rendering). Both are available to external users, thus anyone could generate their own Abstract Content and use or release it under any conditions. It is effectively not under our control which Output Content is being generated with this tool and how it is being used, and we should thus not deliberately try to control it if this does not work anyways. —MisterSynergy (talk) 19:33, 3 December 2021 (UTC)Reply
I would argue that even with that, the control of the form of the output is little to none. The form may change depending on languages, later edits etc. This means that what the person creating the abstract content has done is to express an idea. He or she has not expressed the form of the output yet, and it is neither totally under their control, nor will the person have knowledge of the possible outputs of all the languages it can be rendered in. And since copyright can only be given for a certain expression and not for an idea, the output, in my opinion, can not be copyrightable at all. The best we can do is to be clear about that and acknowledge the output as CC 0. Ainali talkcontributions 19:32, 3 December 2021 (UTC)Reply
@Ainali: Compare it to literature. If an author writes a novel in English, then a Japanese or Russian translation may not contain a single letter written or controlled by the original author, yet there is no doubt that they have intellectual property rights – the novel is still their work, even if they share the copyright of the translation with a translator or publisher. If what you say were true, anyone could translate any book that's still under copyright into another language without seeking permission from the author. This is most definitely not the case. --Andreas JN466 02:41, 4 December 2021 (UTC)Reply
Not really, since there is no original form to compare with. As the abstract content is only expressing the idea of something rather than controlling the form, it is not comparable to regular translations. This is obvious since there may be near infinite ways to construct the abstract content (reordering the individual functions, writing functions in different programming languages, etc.) that will produce the same output. In literature, there is an expression of the idea that is being translated. In Abstract Wikipedia this would be important because if a word has synonyms, and the translations has synonyms in turn, it would generate copyright for all permutations which is the opposite of getting copyright for the expression of the idea. Let's consider this pseudocode as an example (which in English perhaps could render as 'the sky is white as fog'):
comparison(determined,subject(Q527),characteristic(Q23444),object(Q37477))
All these Swedish outputs would be correct:
  • Himlen är vit som dimma.
  • Himlen är vit som mist.
  • Skyn är vit som dimma.
  • Skyn är vit som mist.
These are four different forms from the same idea, and this is just a tip of the iceberg in a very simple example. And of course in my code, reordering the subject, characteristic and object would (in a good implementation) result in the exact output which shows that it is the idea that is expressed, and that the form of the Abstract Content is largely irrelevant and has little to no bearing on the output. Ainali talkcontributions 08:19, 4 December 2021 (UTC)Reply
What could be copyrightable though is the form of the Abstract Content. Which means that what actually is a translation of the form in Swedish is this:
jämförelse(bestämd,subjekt(Q527),egenskap(Q23444),objekt(Q37477))
This is hugely different from generating the output. I hope that this shows that rendering the idea is not comparable to translating an expressed form. Ainali talkcontributions 08:44, 4 December 2021 (UTC)Reply
@Ainali: This is getting too foggy for me ...   I'm not sure I follow. Could you please apply your reasoning to a concrete example, Abstract_Wikipedia/Examples/Jupiter, the Simple English Wikipedia article? To me, it sounds like you're saying all the translations we might generate of this (or any other) CC BY-SA text via Wikifunctions/Abstract Wikipedia would be CC0. Is that it? --Andreas JN466 10:14, 4 December 2021 (UTC)Reply
Not really. What I am saying is that the generated output isn't translations as much as following instructions to construct a way to express an idea. This would be even more clear if the sentence were built up by calculating something, like "San Francisco is the 4th largest city in California". With a slightly more advanced function than the one in the example, the rank could be calculated on the fly through a query to Wikidata and the number itself wouldn't be in the Abstract Content at all, and might also change over time. How could the author of the Abstract Content get copyright for the output, the form the idea expresses, when the form changes over time without his or her knowledge or influence. Ainali talkcontributions 10:27, 4 December 2021 (UTC)Reply
There is dynamic content in Wikipedia as well (age etc.). Can we agree that if we start with simple:Jupiter and end up with a version of that text in Swahili or Marathi for use in Amazon Alexa, Apple Siri or Google Assistant, then we've produced a translation and derivative work that's still CC BY-SA? --Andreas JN466 11:13, 4 December 2021 (UTC)Reply
No, I don't agree that this is translation. That example is deceptive in the sense that it looks like they are translations because the functions are too simple. Consider the San Francisco example instead to be generic for any city and administrative unit. The description of that could be: generates a sentence which ranks something compared to similar items in some location by some property. It would then behave something like (over simplified pseudocode):
rank(this,property(P1082),location(administrative unit(this)))
Depending on where it is used it could generate:
  • Stockholm ist die größte Stadt im Stockholms län.
  • San Francisco is the fourth largest city in California.
  • Gimo är den andra största staden i Östhammars kommun.
To me, it looks quite clear that these are not translations at all. Ainali talkcontributions 11:35, 4 December 2021 (UTC)Reply
Using Abstract Wikipedia to create CC0-licensed language versions of CC BY-SA Wikipedia articles is licence laundering. --Andreas JN466 18:10, 4 December 2021 (UTC)Reply
I don’t think that an exact reproduction of regular Wikipedia content will be the norm. The Jupiter example is meant to demonstrate that the system will likely be able to generate text on a Simple Wikipedia level, and how Abstract Content will roughly look like. The reproduction of existing content is not a core element of the example. That said, Abstract Wikipedia articles will of course have some conceptual similarity to regular Wikipedia articles since both aim to be encyclopedic articles about a given topic. This form factor already defines much of the expected structure of the article, as well as the content to be incorporated. —MisterSynergy (talk) 20:24, 4 December 2021 (UTC)Reply
I agree. And perhaps we even need to be restrictive if someone tries to capture an article word for word of lengths that reaches the threshold for originality. Most of all, it would be an incredible inefficient way to use Abstract Wikipedia to try to write articles one at a time. The strength of it comes when you write something that covers all planets in the universe, all mammals or all association soccer players at a time, not one. Ainali talkcontributions 20:37, 4 December 2021 (UTC)Reply
The fact that it should be possible (eventually) to recreate an existing Wikipedia article in Abstract Wikipedia indicates to me that the level of creativity of an Abstract Wikipedia article is basically no different than that of a conventional Wikipedia article. Which, I am happy to say, is also the development team's view. The question to me is how quickly machine translation will progress. Advances in that field may well outpace Abstract Wikipedia development. For example, if I stick the Simple English Wikipedia article on Jupiter in DeepL, I already get a perfect translation today. (Afterthought: One could actually re-purpose Simple English Wikipedia as a "Designed for machine translation" Wikipedia.) --Andreas JN466 22:06, 4 December 2021 (UTC)Reply
Well, most of the content will probably not have a threshold of originality or be copyrightable by other reasons that I have already shown. And putting CC BY-SA on that would be copyfraud since there is no one holding the right to license it as such. So the fact that some content possibly could reach the technical level for copyright (although I am still not convinced that recreating an article that momentarily happen to generate the output of something existing in one language can hold copyright if the moving parts of it changes) only means that we either have to have mixed licenses for the output or that what could be licensed as CC BY-SA will need to be CC 0 and that we have to fight content that cannot be licensed CC 0 the same way we fight copyright infringements on other projects. Machine translation does not have anything to do with Abstract Wikipedia or any of its parts, so I have no clue why you are bringing it up in this discussion. Ainali talkcontributions 22:22, 4 December 2021 (UTC)Reply
We have established we disagree profoundly on the licensing ... let's leave it at that. As for machine translation, my understanding is that the whole purpose of Abstract Wikipedia is to write articles – using Wikidata items and simple grammatical structures – that will be easy to translate into languages whose Wikipedias are currently poorly developed. So if they lack an article on a topic, the relevant language translation of the Abstract Wikipedia article can be substituted. The point is to give readers articles in their language, and to give re-users (like voice assistants and search engines) basic material to use for customers speaking those languages. In this way, Alexa e.g. can learn to speak Kannada, or Marathi, or Swahili, etc. In other words, Abstract Wikipedia articles are "easy-to-translate articles written in an artificial language". Now in Simple English Wikipedia we already have "easy-to-translate articles" that today's free machine translation programs can do a pretty good job with. And as soon as machine translation of Simple English articles can serve a language like Marathi, or Swahili, there is a ready-made alternative source of content fulfilling the same purpose as Abstract Wikipedia. --Andreas JN466 22:37, 4 December 2021 (UTC)Reply
I think you have either misunderstood how Abstract Wikipedia will work, or you have a very different definition than me of what translation means. This will be generating text, not from text at all, but from functions. It will not be possible to drag and drop text and get output. You will have to carefully design the idea that you want to express, something that no machine translation are capable of doing today. So no existing text will be able to serve as a ready-made alternative. At most, it will be possible to look at, and be inspired by, simple Wikipedia because it has already cut away a lot of the text that are more about the form than the idea. But there will still be the need for a human in the loop to create the Abstract Content. Ainali talkcontributions 23:04, 4 December 2021 (UTC)Reply
Yes, using Abstract Wikipedia to translate a Simple English article requires a human to write a code version of the article first (and to create the language tools to convert that code into a human language). Getting a machine translation from a Simple English Wikipedia article does not. I understand that the project isn't about recreating Simple English Wikipedia in Wikifunctions, but doing just that is one of the official examples given. I was merely speculating on when machine translation in the languages of interest will become good enough that you can simply write in English, complete with some dynamic content linked to Wikidata if you like, and get translations fit for purpose. (We don't have to continue that tangent here; feel free to contact me on my user talk page.) --Andreas JN466 12:24, 5 December 2021 (UTC)Reply
  • I'm thinking as I'm writing but it feel like this is a tool and product situation. Wikfuctions being the tool and Abstract being the resulting product. If I make a sentence, a map on any product, the way I use the tool do it the product is creative and copyrigthable but the result in itself is not since everyone using the some recipe would get exactly the same result. For instance, on WikiCommons, if I use a proprietary tool (and almost all cameras are), it doesn't influence the copyright of the result (and this is why one can choose the license). The other way around, if I produce the same drawing with Adobe or Inkscape, again it doesn't influence the copyright of the resulting map. With the same data and the same wikifunctions, the abstract content will always be the same, in that case what would a CC BY licence means, who would be the author? no-one, every-one, the functions? This seems a bit pointless to even search for an author here (like for a table made by a template on Wikipedia, the creator of the template is not credited for the result on every page the template is used). Cheers, VIGNERON * discut. 21:23, 6 December 2021 (UTC)Reply
    I agree with your view tool / product view, it's the same image I use when I try to visualize the situation. And if it was only a single function call, I agree that there wouldn't be much to copyright. But my assumption is that the Abstract Content consists of a rather lengthy string of individual function calls, each producing a sentence or two, until a whole article is built up. The selection of these function calls, and their order, etc., that is what I think would be able to enjoy copyright status. The author would be the contributors to the Abstract Content, *not* the functions and not the data. -- DVrandecic (WMF) (talk) 23:21, 9 December 2021 (UTC)Reply
  • @Mitar: CC BY-SA supports free culture in that any derivative works also have to be free. If you publish content as CC0, anyone can take it, make improvements or additions and put all of it in a proprietary box. Licences like CC BY-SA are designed to be viral, i.e. designed to lead to the creation of more free content, and designed to make sure that improvements made by others elsewhere can be fed back into the original project. Just sayin. ;) --Andreas JN466 19:25, 7 December 2021 (UTC)Reply
    I am not sure which of my comments you are addressing exactly, but in general I am trying to say that if nothing would be copyrightable, we would have free culture and would not need to have copyleft licenses. They were made as a hack to invert what copyright is doing. Copyleft licenses in general allow one to take stuff and make improvements and additions and put all of it in a proprietary box. Only if you start exposing that to others, then you have to give them access under same conditions. But if the thing is not copyrightable in the first place, then once you start exposing stuff, others can just take it, too. So copyleft licenses do not prevent one to do private proprietary improvements and get rich out of it, but only what happens when somebody exposes that publicly. So, in this case, if somebody takes all the knowledge encoded in Abstract Wikipedia and makes a nice AI model to do something, how exactly does requiring this AI model to be copyleft contribute back to the original project? I think it just complicates stuff (where do you put attribution, what happens when there are like 100s of inputs to this model, each requiring attribution, etc.). Contributing back is incentivizes through other ways. So, I think it is important to try to argue that what Abstract Wikipedia is doing is ideally not copyrightable at all instead of hoping for some legal enforcement and benefits out of that. I think we gain more benefits as a community if we can achieve that those artifacts are not copyrightable at all. Mitar (talk) 20:47, 7 December 2021 (UTC)Reply
    @Mitar: I'd expect a prominent use case to be that voice assistants (Alexa, Siri, Google, etc.) and search engines (Google, Bing, DuckDuckGo, etc.) will use our content to serve customers in the Global South voice assistance and infoboxes in their local languages (just like they presently read out Wikipedia articles, or display Wikipedia excerpts), as soon as we are able to make translations of abstract content in these languages available. Now, someone like Google who has a head start in machine translation could easily produce further language versions of our CC0 content and make them proprietary, i.e. neglect to share them and the relevant language tools with us. With CC BY-SA, this would not be possible, and this I think is a powerful argument in favour of CC BY-SA for abstract content and output content (i.e. the resulting foreign-language text). Moreover, CC BY-SA would ensure that re-users continue to credit Wikimedia, just as they do now. This is important for several reasons: 1. to ensure visibility of Wikimedia both for volunteer recruitment and revenue generation, 2. to ensure that end users know where (possibly contentious) info comes from and where they can change it if it is wrong, or one-sided. (More about this in Heather Ford's excellent Wikipedia@20 chapter, see [4].) --Andreas JN466 21:58, 7 December 2021 (UTC)Reply
    @Jayen466: Just focusing on your example. I think it is wishful thinking and not legally required by CC-BY-SA for "further language versions of our ... content ... and the relevant language tools" to be released publicly, only the final output they render on the website would have to be under CC-BY-SA, because this is the final "remix" of the content, shown publicly. What usefulness would we get from the final content I am not clear. So they can keep things internal and private and build whatever they want. CC-BY-SA does not help with that. Only if they do decide to release their internal improvements to the content, then they have to release them under CC-BY-SA. So in fact using CC-BY-SA just makes it harder for those who do want to share their improvements to do so, because they have to attribute all sources contributing to their release. And that list of sources will through time just grow and grow. So we are making it harder and not easier to share. I doubt that if Google takes CC0 content and adds their stuff and makes their improvements public, that they would do that under a different license or under no license at all. That is the only thing CC-BY-SA is protecting here. Mitar (talk) 07:37, 9 December 2021 (UTC)Reply
    @Mitar: There is every reason to think that Google would create proprietary products based on our content. Google generally prefers using proprietary formats for structured knowledge (for business reasons) [5], reportedly keeps the code for its proprietary Android apps and services secret although Android is in theory an open source model [6], and the Knowledge Graph is a proprietary black box. How does CC0 help free knowledge if it enables others to take it and make it non-free? CC BY-SA ensures that knowledge stays free. --Andreas JN466 13:33, 9 December 2021 (UTC)Reply
  • @VIGNERON: The abstract content certainly does have an author: it is written by a human, not by a machine. It is then machine-translated into the various languages, but a human being decides what statements to put into the abstract Wikipedia article and in which order. --Andreas JN466 19:28, 7 December 2021 (UTC)Reply
    To be more correct, a human may, in the simplest of cases, decide what statements to retrieve from Wikidata. Hopefully (if we want this to be any useful at scale), the human will rather only know what kinds of statements to retrieve, but will not know exactly which ones will be retrieved or the details of the facts they contain. (This is also why I think it cannot qualify as being described as translation.) Ainali talkcontributions 19:52, 7 December 2021 (UTC)Reply
    @Ainali: The aim of abstract Wikipedia, as stated in the first sentence overleaf, is "to create and maintain Wikipedia articles independent of language." Articles are articles. The editors will be writing articles. Once these articles exist, the aim then is to "generate text in multiple languages": translations of these articles. --Andreas JN466 22:07, 7 December 2021 (UTC)Reply
    No, it doesn't matter how much you claim it, users will not be writing articles. Users will create functions that will be used to generate articles, and hopefully to be generated in bulk. With bulk I mean that they will create functions that can generate entire sets of articles, for example for all nature reserves that are in Wikidata. When in their function form, they will not look anything like an article, and there might be math, algorithms and queries involved to generate the content for the article, way more advanced than just translating. And these will not be generated once and then stored for translation, but be made on the fly, with the latest updates in Wikidata when someone wants to view an article in their language (well, I guess there will be some caching mechanisms, but the point still stands as the cache itself will be the output, not something to just translate in the end). What is stated is true for the entire project, not for an individual editor, who only are in the loop seeding the process. Ainali talkcontributions 22:34, 7 December 2021 (UTC)Reply
    Someone still has to put these functions together to define the abstract content for, say, Jupiter. Overleaf it is said, that Editors have a very fine-grained selection of which facts are being displayed and which are not. ... Editors have a very fine-grained control of the order the facts are being displayed in, constituting narrative elements, which are not available in Wikidata. We expect the natural language generation to allow editors to express to some degree of emphasis and selection of wording. (The grammar of that last sentence actually needs a bit of attention.) Narrative elements, Ainali. This sounds like writing to me. Yes, with Wikidata bells and whistles, with standard article templates for different types of topics, and not in a human language obviously, but there is still a narrative that the editor has "very fine-grained control" over. @DVrandecic (WMF):, @Quiddity (WMF):, You write on the Abstract Wikipedia page, "In Abstract Wikipedia, people can create and maintain Wikipedia articles in a language-independent way." Did you mean to say this or not? --Andreas JN466 23:10, 7 December 2021 (UTC)Reply
    Yes, the hope is that the articles coming from Abstract Wikipedia will in many interesting cases be well beyond a simple function call, and expose all these features of narration etc. -- DVrandecic (WMF) (talk) 22:00, 9 December 2021 (UTC)Reply
    See the official description at Abstract_Wikipedia#Project: it's all about translating articles. "A particular language Wikipedia can translate this language-independent article into its language. Code does the translation ... translates the language-independent article from Abstract Wikipedia into the language of a Wikipedia. This allows everyone to read the article in their language." --Andreas JN466 22:31, 7 December 2021 (UTC)Reply
    Well, if that was literally true, we would finally see Wikipedia do something! Obviously, this is a simplification that is nowhere near precise enough to be used as a guide for the technology that will do this. It is a useful explanation for understanding the general concept, but not when it comes to coding it or figuring out licenses :) Ainali talkcontributions 22:39, 7 December 2021 (UTC)Reply

(Add your discussion points / argument here)

Unofficial result people vote

edit

Actual on 2021-12-16 14:25 UTC.

Licence vote result
What Licence
Function Implementations Apache
Abstract Content CC-BY-SA
License for Function Implementations.
Apache GPL CC0 User / Comment
✔️ Sannita
✔️ Thadguidry
✔️ Dušan Kreheľ
✔️ Andreas
✔️ Wugapodes
✔️ ✔️ Ameisenigel
✔️ ✔️ Eloquence
✔️ ✔️ Mitar
✔️ ✔️ Mahir25
✔️ ✔️ Amir E. Aharoni
✔️ ✔️ Strainu
✔️ Robert Važan
✔️ Mitar
License for Abstract Content.
CC BY-SA CC0 User / Comment
✔️ Ameisenigel
✔️ Eloquence
✔️ Strainu
✔️ Sannita
✔️ SJ
✔️ Andreas JN466
✔️ Amir E. Aharoni
✔️ SashiRolls
✔️ Dušan Kreheľ
✔️ Vaticidalprophet
✔️ Wugapodes
✔️ MisterSynergy
✔️ Ainali
✔️ VIGNERON
✔️ Mitar
✔️ SFBB
✔️ ✔️ Mahir256
✔️ ✔️ GrounderUK
✔️ ✔️ Nosebagbear
✔️ ✔️ Hogü-456
GrounderUK
Dušan Kreheľ

✍️ Dušan Kreheľ (talk) 00:23, 16 December 2021 (UTC) & ✍️ Dušan Kreheľ (talk) 07:50, 17 December 2021 (UTC)Reply

Thank you for this summary table. I appreciate the detail of the comments-as-tooltips, I will try to remember that for the future.
I will note for anyone else reading, that the table doesn't include details about the earlier comments further above and offwiki (as noted at #Restructuring the discussion to move on), however they generally followed the same pattern of response, or the same people added similar comments here. Quiddity (WMF) (talk) 01:19, 17 December 2021 (UTC)Reply

Proposed summary of the licensing discussion

edit

At Abstract Wikipedia/Updates/2021-12-16 (translatable), Denny has written a proposed summary. I will also copy it below for easier reference. Quiddity (WMF) (talk) 01:25, 17 December 2021 (UTC)Reply

The discussion about how to license the different components of Abstract Wikipedia is ebbing down. Given the status of the discussion, we propose the following summary and decision:

  • All contributions to Wikifunctions and the wider Abstract Wikipedia projects will be published under free licenses.
  • Textual content on Wikifunctions will be published under CC BY-SA 3.0.
  • Function signatures and other structured content on Wikifunctions will be published under CC 0.
  • Code implementations in Wikifunctions will be published under the Apache 2 license.
  • Abstract Content for Abstract Wikipedia will be published under CC BY-SA 3.0.

The last point was a particular focus of the discussion. As you can see, we are proposing to adopt CC BY-SA. This was the most common !vote. We also found the arguments for having the same license for Abstract Content as we already have for Wikipedia convincing: this will allow us to use Wikipedia as an inspiration for creating content without having to worry about a set of rather complex legal and also moral issues. This is much more likely to set up the relationship between the existing Wikipedia communities and the community that will work on Abstract Wikipedia for a good start.

There were three main arguments raised by the supporters of CC 0:

  • First, it was said that knowledge should not have a copyright in the first place, and it should be as free as possible. And whereas I deeply sympathize with that stance, more than twenty years ago Wikipedia decided to use a copyleft license, and Wikipedia did rather well.
  • Second, in a similar vein, it was said that abstract content should never have copyright. There is not much legal precedent for copyrighting abstract content, and therefore our licensing decision might influence the wider perception on the copyright status of abstract content. Given that, their argument is that we should encourage fewer things being copyrightable.
  • Last but not least, it was said that a lot of the Abstract Content probably would not pass the threshold required to be copyrightable in the first place (which means that applying a license based on copyright would not work). In order to not overclaim copyright protection on content that has no copyright, we should just declare everything public domain. But note that there are some Wikipedia articles that are so short and simple they also might not meet the threshold for copyright. But most of our articles do. And we don’t differentiate between these two. I expect a similar situation for Abstract Wikipedia, and hope that over time it will be increasingly obvious that much of our content will pass the threshold required for copyright.

Regarding the license for Code implementations: once Wikifunctions is launched and has been around for a while, and a community has formed, we will re-consider the question whether we should allow for several licenses, similar to how Commons does it. At that point we will understand much better how the system works, who is forming the community, and what kind of contributions we are missing by not having the option to have several licenses. I am looking forward to that discussion.

Note: the proposal leaves the decision open about the license for Abstract Descriptions for Wikidata (which likely should be under CC 0, in order to keep the licensing of Wikidata consistent), or about Abstract Content for other Wikimedia projects. These questions will come up in the future, as we get a better understanding of how the components interact with each other and the world. Given the limited participation of the community in this discussion, we will likely use a more lightweight approach towards deciding these questions, now that the wide strokes of the licensing questions are decided.

We are putting this proposal up until Monday, December 20, 2021, for the office hour, where we will adopt this as a decision (see below). We feel that the decision above reflects the will of the community. If you disagree that this decision is a valid outcome of the discussion, please raise your concerns by Monday.

Thank you all for participating in this discussion! Your voices and opinions were read carefully and were very valuable to us. --DVrandecic (WMF)

Quiddity (WMF) (talk) 01:25, 17 December 2021 (UTC)Reply

Thank you, Denny (WMF) and Quiddity (WMF); that is a good summary. Let's see how it works out going forward, but for now I feel I should apologise for some of the more uncharitable things I said on this page, to Denny in particular. Season's Greetings! --Andreas JN466 20:33, 17 December 2021 (UTC)Reply
While I do of course acknowledge this summary as describing the "valid outcome of the licensing discussion" as far as it goes, I nonetheless find it quite disappointing that the topic of Abstract Text as CC0 content in Wikidata (that AIUI encompasses quite a bit more than the simple Abstract Descriptions which are brought up in the summary; there is a well-known proposal for Abstract Glosses as descriptions of lexical Senses-- and allowing Abstract Content as a stored "type" for factual claims about an entity would then be a fairly natural extension as well) is once again being dismissed out-of-hand as something unimportant that can simply be 'tacked on' at some indeterminate time in the future. It is not so. AFAICT, if having CC0 Abstract Content in Wikidata is understood to be worthwhile, the whole project wrt. Abstract Content itself must be designed to make this feasible. That's is why it is nonetheless appropriate to discuss this topic now, whilst the high-level design for Wikifunctions, Abstract Text and the like is still in flux. --Hupaleju (talk) 00:29, 18 December 2021 (UTC)Reply

Decision

edit

At Abstract Wikipedia/Updates/2021-12-21 (translatable), Denny has written the final decision, which includes a slight change from the proposal above, based on additional feedback and discussion. Thank you all for collaborating on this. Quiddity (WMF) (talk) 20:43, 23 December 2021 (UTC)Reply

@Quiddity (WMF): You say that there is a slight change from the proposal above. Could you explain here what that slight change entails? --Andreas JN466 19:45, 7 February 2022 (UTC)Reply
Just the two items under "we incorporated two more points of feedback". Quiddity (WMF) (talk) 20:28, 7 February 2022 (UTC)Reply
Of which the first reads: "First, we will leave the question about the license of the generated content from the abstract content open for now. We will re-visit this question when we discuss the location of the abstract content for Abstract Wikipedia next year."
@Quiddity (WMF): Does this mean that "the license of the generated content from the abstract content", in other words the human-language article resulting from the CC BY-SA abstract content, could be anything other than CC BY-SA, contrary to the preference of the numerous editors logged above, and contrary to the development team's own recommendation? --Andreas JN466 23:27, 7 February 2022 (UTC)Reply

Pinging editors

edit

@Ameisenigel, Eloquence, Strainu, Sannita, Sj, Amire80, SashiRolls, Dušan Kreheľ, Vaticidalprophet, Wugapodes, Mahir256, GrounderUK, and Nosebagbear: Are all of you aware that the above decision was changed four days before Christmas, four days after the December 17 summary published above?

This affects the licensing of output content (the human-language content generated from the abstract content). While status of December 17, 2021 was that this would be CC BY-SA, in line with the development team's recommendation overleaf, this decision was rescinded on December 21, 2021, with that change only alluded to here on this page with the words "slight change" on December 23, 2021. --Andreas JN466 23:59, 7 February 2022 (UTC)Reply

What an unhelpful comment. Please don't ping only people who you think will disagree with a change. Yes, I am aware; it looks like a slight change to me, and in a sane direction; fine tuning after a few days is not a rescindment nor surprising to anyone who read to the last line of the original announcement. –SJ talk  03:57, 8 February 2022 (UTC)Reply
Unhelpful to whom? To my mind, the licensing of the output content is what matters most, as this is all end users will see, and all re-users will use in their commercial products. You are welcome to ping others; for now, I wanted to make sure that all those commenting here who wanted output content to be CC BY-SA – and who could be excused for thinking that this was what was agreed here – know that this was changed after their last involvement here on this page. I think that is legitimate, and I am more than disappointed that such a big change was only mentioned in passing here on this page as an unspecified "slight change" one day before Christmas and that Quiddity and Denny didn't ping all of us themselves, which would have been the decent thing to do. Gee, Sj ... words fail me. This could hardly have been handled in a less straightforward manner ... all the more so given that we were assured no such four-days-before-Christmas thing was going to happen. [7][8] Pinging User:Ralf Roletschek. --Andreas JN466 11:50, 8 February 2022 (UTC)Reply
That is none official update about the change of licences after 17 December. ✍️ Dušan Kreheľ (talk) 17:50, 8 February 2022 (UTC)Reply

These are complicated decisions, and it was rightfully raised by a number of users that the answer might very well be "it depends". There are many open questions still, most importantly where to store the Abstract Content. On the particular point you are raising: we did not change the decision, but we refrained from making a decision about this particular question at this point in time. We will need to make a decision in time, obviously, and we will take the arguments and votes raised here, and ping the participants when the new discussion actually starts. You too, will be pinged when we get to it. But I think it makes sense that we first figure out a few of the open questions. Once this is done, it might become easier to find consensus and discuss the issues at hand. --DVrandecic (WMF) (talk) 21:55, 9 February 2022 (UTC)Reply

Return to "Abstract Wikipedia/Licensing discussion" page.