In terms of process and timing, this round of request for feedback is meant to solicit ideas from you, as a member of the Labs community, on how to best revise the Terms. We will try to respond the best we can, but the main purpose of this round is to hear all your thoughts. After the feedback round, we will prepare a draft revision of the Terms based on that feedback and other minor revisions to clarify statements in existing the Terms. We will then engage in a community discussion about the revised Terms.
We have identified three major topic areas under which we want to hear your feedback. In addition, for other areas of discussion or input, please submit your thoughts in the “Open Discussion” area below.
We plan to leave this discussion open until June 9, 2016. Thank you for all your help and feedback.
We ask that participants observe the Friendly spaces expectations when discussing these topics.
The current Terms do not indicate whether developers can use or integrate resources hosted on third-party servers (e.g. libraries, scripts, stylesheets, images, etc…). The use of such third-party resources might be considered problematic for the following reasons:
- Some users may consider third-party tracking to be intrusive of their privacy.
- Some users may not be on notice and some projects may not have adequate notice of these practices.
- Users of projects involving third-party resources may be subject to a higher risk of security issues or intentional attacks.
These concerns are heightened in cases where Labs-hosted tools and extensions are installed or available for use in our other Projects.
On the other hand, it sometimes might be easier for developers to link to third-party resources rather than uploading them first. Some external services also might be not easily uploadable to or hosted on Labs. Furthermore, it is likely there are already some projects on Labs which use third-party resources; this is particularly problematic when such usage is undisclosed to end-users. Finally, perhaps a distinction should be made between loading third-party resources that will directly interact with an end-user and the loading of third-party resources on the back-end of a service. The latter practice is usually less intrusive to user privacy than the former since it does not result in the automatic transmission of user information to a third-party (of course this information may still be forwarded, perhaps inadvertently, to a third-party resource on the back-end).
Any policy change will ideally provide a flexible way to address existing project usage and behavior as well as avoid unnecessarily hinder the development of new projects. Revisions may also want to reflect the technical capability of enforcing such a TOU and providing the end-user with the ability to consent to the use of third-party resources.
Finally, this discussion is not about disallowing linking to third party sites from a Labs page. Hyperlinks are fundamental building blocks of an open Web and we should avoid prohibiting this.
Please share your thoughts below about how we might (or might not) want to revise the Terms to address the use of third party tools:
Discuss Use of Third Party Resources
- What about the use of third party API's like that of Google? I used the Google API to check the channel id of a YouTube account based on someone entering the userid or channel name in the Wikidata field for YouTube channel ID. And what about the change I made by fetching the page of a YouTube account and gathering the channel id from the metatags? Mbch331 (talk) 19:32, 20 May 2016 (UTC)
- Calling out to a 3rd-party service from your backend code is generally fine. You should not be sharing personal information about your user (e.g. on-wiki username, IP address, etc) with that 3rd-party unless you have warned the user before and received their consent.
- ZZhou (WMF) could you clarify here and/or in the main document that the concerns for "use or integrate resources hosted on third-party servers" are specifically scoped to such uses that expose sensitive data to the third-party and have not been explicitly agreed to by the end user? The ticket I filed at T129936 which in part led to this discussion was specifically about forced browser interactions. I think an over broad interpretation of disallowing all third-party interactions from the backend or of disallowing consensual browser interactions with third-party servers would be harmful rather than helpful. BDavis (WMF) (talk) 18:37, 21 May 2016 (UTC)
- You are right that perhaps we should be more lenient towards third-party interaction on the back-end or allow users to opt-in to third-party tracking. I have revised the discussion section. --ZZhou (WMF) (talk) 22:40, 23 May 2016 (UTC)
- Hosting or mirroring on labs has the disadvantage of possible duplicate downloading and possible duplicate caching on the client side. --Purodha Blissenbach (talk) 07:43, 21 May 2016 (UTC)
- If a CDN is used, and the user has visited a site that used the same CDN with the same file (e.g. MaxCDN with Bootstrap), then it will be cached in the user's browser and not downloaded again. This has potential speed benefits, and uses less storage on labs because everyone does not have to download and copy the files into their individual tools. Also, other services such as reCAPTCHA are useful to prevent spam, because locally hosted services are often less effective. Tom29739 (talk) 18:26, 21 May 2016 (UTC)
- Tool Labs itself has the cdnjs and static tools designed to host widely used resources. I personally don't believe that any particular third-party CDN is so commonly used that there can be a guarantee of a high rate of local browser cache hits. Even if that can be proven false, trading an IP address, cookies, and HTTP referrer information for a small data download is not a decision that I feel Wikimedia should be forcing on it's visitors.
- I'm creating such an API. The trouble with such a service is that it will be a traditional, unreadable text one. reCaptcha has a risk detection engine which makes it much more effective. The ConfirmEdit extension page on MediaWiki wiki says that unreadable text captchas are ineffective and reCaptcha is very effective. We shouldn't have to reinvent the wheel and go back to the dark ages for captchas. Most users would probably thank us for letting them use an easy checkbox rather than unreadable text. By disallowing reCaptcha, you are effectively giving spammers free access to tools. Tom29739 (talk) 21:44, 22 May 2016 (UTC)
- +1 (if I understand you correctly). Labs is for experiments, and those may (initially) require integrating third-party services. If a user has previously agreed to that I don't see harm, and I also think that in the long term, if a tool/application is useful for a wider range of audience, it should be cleaned up to not depend on any third-party services (and turned into an extension/gadget). But that should not be a prerequisite for starting to develop something. --Tim Landscheidt (talk) 23:30, 23 May 2016 (UTC) P. S.: Proxying requests is often as bad if the third party can correlate requests to their service with actions on Wikipedia.
I completely agree. I think the essential point here is consent. As long as the user explicitly consents ('Yes, I agree that some personal information will be shared with [host X, host Y]'), I think it is reasonable to allow this -- not unlike the current 'By using this project, you agree that any private information you give to this project may be made publicly available and not be treated as confidential.' message requirement. Valhallasw (talk) 08:20, 27 May 2016 (UTC)
- Currently, I'm using Google Analytics for one of my tool, but this change will disallow me from using it anymore. Hence, is there any service from WMF that can be used to collect some analytics on tools (e.g. number of visitor on pages over a time period)? Kenrick95 (talk) 09:02, 29 May 2016 (UTC)
- Hits can probably be measured manually, can't they? Not sure if it's possible to count hosts. But just putting a placeholder answer in expectancy that smarter guys will elaborate :) --Base (talk) 08:31, 31 May 2016 (UTC)
Privacy is important to end-users and developers alike, and we want to make sure that clear, useful information is provided regarding the treatment of information collected by projects on Labs.
- 1) Private information must be secured.
- 2) Private information can only be retained for a short period of time.
- 3) Private information can generally not be shared with third parties except with user consent.
We are interested in revising the end-user disclaimers to have all developers better notify end-users of the specific privacy practices applicable to projects on Labs. Currently, these disclaimers are only for projects that allow for account creation, collect private information, or contain beta or test wikis.
In addition to clarifying these existing disclaimers, it might be helpful for even Labs projects that do not collect private information to publish a disclaimer assuring end-users that private information is not being collected.
Please share your thoughts below on how we might want to change the current disclaimers or ask existing projects to revise their disclaimers.
Discuss Privacy Disclaimers
- Must the disclaimers be in English? Where is this limitation coming from? --Base (talk) 07:21, 31 May 2016 (UTC)
- Не бачу там нічого про мову. Я вважаю що люди можуть писати дисклеймер будь-якою мовою. Хоча, звісно, багатомовність є більш бажаною. --Base (talk) 22:01, 1 June 2016 (UTC)
- @Base: You are right - it does not state the disclaimer language (although the disclaimer as written on that page is itself in English). We do not want to force developers to write disclaimers in languages they do not understand. At the same time, the purpose of disclaimers is to inform end-users and that purpose is defeated if a disclaimer is published in a language they also do not understand. Thus, the provision of the disclaimer on the Terms provides a way for developers who do not speak English to still publish a disclaimer to their end-users. Are you aware right now of many translated disclaimers (based on the one in the Terms) being published in other languages on Labs today? If so, is there still a way we can better ensure end-users will understand the language of any disclaimers they come across on Labs? --ZZhou (WMF) (talk) 17:23, 7 June 2016 (UTC)
- This is useful for users, but may also 'scare' people off because the disclaimers may make the user think they are giving away loads of private info by using a tool. Tom29739 (talk) 18:26, 21 May 2016 (UTC)
- Instead of lots of boilerplate text no one is going to ever read, maybe the WMF could create some sort of easily-recognisable visual identity (like privacy icons). --Tgr (talk) 10:14, 22 May 2016 (UTC)
The TOU currently contains a section entitled “What can and can’t be done with user information?” This section provides details regarding the types of data can be collected from end-users, and the ways in which it must be stored and handled. We would like to ensure that developers understand the requirements -- are these parameters clear, and helpful when planning a project?
If you are collecting private information, you are required to inform end-users of that fact, and to tell them how you will use it and how long you will retain it. Is it easy for developers to create notices for end-users detailing this information? Do you use the list in this section of the TOU as guidelines for this notice?
A notice that specifically details how data will be used or handled in regard to a certain project is called a privacy statement. We are considering setting baselines for the information that must be provided to end-users in these privacy statements — e.g., the type of information that the project collects, whether the information is expressly shared with third parties outside of the Wikimedia Foundation, how long you will retain the information, etc. Would guidelines of this sort be useful to you when you write privacy statements for your projects?
Please comment below on whether or not the “What can and can’t be done with user information?” section is helpful; if not, please suggest what sort of information would be useful for you. Additionally, please comment on the suggestion that all projects provide a privacy statement including certain baseline information about their data collection and handling practices.
Discuss Privacy Statements
- Technically speaking, every tool or instance that's web accessible collects user information, at least in form of webserver logs. I would advise to create a "standard" set of what's being typically collected, and recommend/require a separate disclosure if information beyond that is being collected. Max Semenik (talk) 21:16, 20 May 2016 (UTC)
- Applications hosted on Tool Labs do not have access to the end user's IP address. This information is stripped from the request by the proxy server that also terminates the HTTPS connection and routes to the appropriate backend web server. The proxy server for other Labs projects relays the original IP address in an X-Forwarded-For header. User-Agent is available to both as is some level of information on the on-wiki user if OAuth is used by the application.
- The XFF header is truly only needed by a very small number of applications in Labs and it would be nice to change the proxy so that this is an explicit grant that must be asked for rather than a default privilege. For Labs hosts which have a public IP address, there is no intervening proxy that can anonymize access. Projects requiring this direct access to their clients should also in my opinion be required to both justify their need and be subject to some disclosure and retention policy for the data they do collect and retain. BDavis (WMF) (talk) 21:55, 20 May 2016 (UTC)
- Pretty much anything wanting to expose a network service to the world that isn't HTTP/HTTPS needs it's own public IP. What sort of disclosure and retention policy do you have in mind? --Krenair (talk • contribs) 22:10, 20 May 2016 (UTC)
- A listing of the data collected that could be considered sensitive (IP addresses, usernames, etc) and the duration for which that collected data is archived by the service. There may be some common classes of services that could be covered by a shared policy to make things easier for the people running the service. One example of a common class is irc bots which log content for the channels they join. BDavis (WMF) (talk) 22:18, 20 May 2016 (UTC)
Although we ask developers in the Terms to “not use or install any software unless the software is licensed under an Open Source license,” many open source licenses do not require the publication of the source code where such software is used exclusively on the server side of a web service, as the case is for many projects hosted on Labs.
We are interested in whether we should have some sort of requirement (or encouragement) in the Terms for developers to publish their source code, except for perhaps security sensitive code. We are also interested in what type of processes we should set up to allow developers to easily do so. Requiring the publication of source code alleviates problems with abandoned projects, as tracked on Phabricator here. At the same time, we should think how we should handle enforcement on existing projects.
Please share you thoughts about whether we should require the publication of source code in our Terms and, if so, what are suggested processes to allow for easy compliance:
Discuss Source Code Publication
- Strongly endorse in principle, and believe it should be required for any tool accessible/usable by non labs users outside labs terminal, such as web accessible services. John Vandenberg (talk) 20:57, 20 May 2016 (UTC)
- Endorse in principle, however it will take a looong migration period and might piss off a few users. Tact is required here. Max Semenik (talk) 21:09, 20 May 2016 (UTC)
- People tend to be reluctant when asked to publish half-done stuff or work-in-progress. Sometimes, debugging code or similar should not, or must not, be made public. So, we might need a two-layer approach. Doable but complicated. Needs broad acceptance. — The preceding unsigned comment was added by Purodha (talk) 2016-05-21T07:56:35 (UTC)
- Endorse. Any code exempted as 'security sensitive code' should be extremely well justified and overseen by project or Labs administrators. In fact I'm not actually sure if there is a good reason to allow anything to be exempted like this. --Krenair (talk • contribs) 22:08, 20 May 2016 (UTC)
- I read 'security sensitive code' as a reference to passwords and tokens needed to access services. Like Krenair, I also can't think of any other code product that should be on Tool Labs or Labs generally that would be sensitive. BDavis (WMF) (talk) 22:21, 20 May 2016 (UTC)
- If it's like we do in prod where the private config containing passwords is kept separately and not available outside of the servers, I think that's fine. --Krenair (talk • contribs)
- Cough cough countervandalism bots cough. Max Semenik (talk) 23:18, 20 May 2016 (UTC)
- I'm a bit concerned about the hurdles it imposes for what would otherwise be quick one-off tools, if they need to have a repository approved and created for them (which might be overkill, anyway). Indeed, just the fact that "it will be published" may discourage quickies, no matter that probably noone will interested on them and that there would have been no problem in providing it on request. Platonides (talk) 20:45, 22 May 2016 (UTC)
- Well, I hope that we're not talking about requiring the code is in a Wikimedia hosted source code repository, which is not user/hack friendly, and would exclude a lot of current users and usages. Even hacks can be easily thrown in a repo hosted on gitlab/github/bitbucket/etc/etc, either personal dumping repos, forks of maintained libraries/toolkits, but hopefully people would collaborate in shared dumping repos, and even migrate their mature hacks into maintained libraries and toolkits. But I do agree we can and should have some sensible limits on this 'source code publication' rule so it doesnt apply to 'one-liners' e.g. criteria like web accessible, performs API writes, is on a bot flagged account, etc. John Vandenberg (talk) 21:26, 22 May 2016 (UTC)
- I'm sure using github or bitbucket would be fine. I agree it would be good to have some sort of threshold, but how would the threshold be defined? Scripts for one time use? Scripts below a certain number of lines? Kaldari (talk) 18:11, 23 May 2016 (UTC)
- Endorse in principle. There should probably be some kind of exception for small one-off scripts though. Also, we may want to have a grand-fathering provision for old tools that aren't being actively maintained (but are still being used). Kaldari (talk) 18:11, 23 May 2016 (UTC)
- The problem I see with this are the excemptions others request above. What is a one-off script? We had examples in the past where users (at least in my perception) wanted to assert that they have a legal right to access a tool's source code. I wouldn't want to give those users the power to harass developers who might not be good or consistent at publishing their source code. The problem of abandoned tools is different (and covers the question of publication): If a developer abandons a tool and there would be a process for other developers to take over (T87730), those new developers could take over the tool (and publish the source code). On the other hand, if there would only be a requirement to publish the code, new developers would have to fork it and figure out how to set it up on their own. So IMHO T87730 should be fixed, with no requirement to publish code. --Tim Landscheidt (talk) 00:12, 24 May 2016 (UTC)
- I agree with Tim that there are essentially multiple questions here: 1) should all code on tool labs be open source, 2) does that mean others have the right to demand the source code, 3) can a tool be taken over once the owner disappears? I think the answer to 3) should be yes, and the TOU should be adapted to make that possible legally. I think it is in line with our mission to require 1) and possibly to require 2). Valhallasw (talk) 10:25, 27 May 2016 (UTC)
- Endorse. I'm pro source code publication with the obviously necessary exclusion of configuration files which contain credentials (database passwords, OAuth secrets, etc). My reasoning for this is that the computing resources provided by Labs and Tool Labs are funded by Wikimedia donations. Developers who choose to make use of those resources in my mind are obligated to contribute to the goals of the Wikimedia movement and to respect its values to justify use of the resources. Publication and libre licensing of source code is necessary for the value of freedom and the right to fork. BDavis (WMF) (talk) 16:06, 27 May 2016 (UTC)
- This makes me a bit uneasy. Of course project/tool creators should be encouraged to publish their source code, but making it a requirement has a number of concerns, as outlined by others. How about a "Freedom of Information" style of system, where users may request that the administrator(s) of a particular tool publish the source code of that tool, and the administrator(s) are required to publish all code to a public repository, except code that is subject to specific exemptions (such as private passwords, secret counter-vandalism algorithms, CAPTCHA logic, and tools with significant security/privacy implications like UTRS). In the case of abandoned projects, Labs or Tool Labs admins could fulfil the role of the project/tool administrator. This, that and the other (talk) 04:31, 5 June 2016 (UTC)
- A problem with the FOIA style plan is that without a mandatory source code escrow system to back it up there may be no way to recover source for abandoned/neglected projects. This is not a theoretical situation. We have tools today on Tool Labs that have no source code on the Tool Labs server. This is possible for any compiled language including Java, C, and C++. It is also possible that the source is present, but unlicensed which is functionally the same as having no source code due to the default copyright status of software.
- I find the "software secrets" argument to be less than compelling due to the shared hosting nature of the Tool Labs project. Many, many users have shell access to the servers that power tool labs and there are innumerable known and unknown ways to gain a local privilege escalation on a Linux host that would allow you to read files owned by another user. The expectation of file content privacy on such as shared host can be no more strict that the expectation that the window of your car will not be smashed in and the contents of your locked vehicle revealed. That is to say that the only barrier to such a loss is societal convention. If there are truly secrets worth having on a Tool Labs server it should be assumed that they are already compromised. --BDavis (WMF) (talk) 05:43, 5 June 2016 (UTC)
- I'm late to the party (too late?), but to add my two cents, echoing others, I feel closed source software that directly affects a Wikimedia project is contrary to our mission. If you write bad code and don't want anyone to see it, requiring it to be open source just means others can help improve it. If you are unwilling to work with others and want all the credit for your work, you probably shouldn't be participating in what is supposed to be a collaborative project. For the cases of spammers/vandals, those folks are always going to find a way, but I suppose there could be some extreme exceptions where closed source code might be permitted. In that case we should require there be multiple maintainers in the event the service goes down and the sole maintainer is unreachable. If there's no legitimate reason for the source to be closed, a "freedom of information" system I don't think is going to work. Case in point: What seeing with Merlbot on dewiki, which is just appalling. Meanwhile phab:T87730 has been open for over a year. In short, our on-wiki dedication to openness and transparency should be mirrored on off-wiki projects that are clearly and directly related — MusikAnimal talk 17:59, 10 June 2016 (UTC)
- I don't think those two spheres can be compared due to the technical differences they have. If I edit a wiki, I don't have to spend a single thought on openness and transparency; MediaWiki makes all the magic happen, and even better, instantaneous. The only (!) case in which this happens in Labs is if a project's software is in
operations/puppet, and the project's administrators do not ever test patches locally first. Otherwise, there will always be significant effort needed to document and update the code actually running. Looking at how often (paid) WMF developers have set up "temporary" software without immediately documenting and publishing its code, I don't think it is reasonable to hold volunteer developers to a higher standard. (NB: IMHO all code should be published; I only want to avoid an atmosphere of fear.) --Tim Landscheidt (talk) 07:04, 11 June 2016 (UTC)
Contribute a new idea, or talk about meta-level issues - go for it!
Some open questions are 1) the extent to which we want Labs to be an hosting service where the onus is on the developer to appropriately engage with their end-users and 2) the extent to which the Wikimedia Foundation should also develop guidelines, consistent with our main policies, to directly protect end-users of Labs projects.
Many thanks for your time, reflection, and wisdom.
- A core concept that should be defined is external usage. Tools that can only be accessed within the labs environment do not have the same problems re end-user privacy. Some tools only pull & push data onto the wikis, which means they don't have external usage, but still may need additional TOU, especially if they are critical functions. John Vandenberg (talk) 21:09, 20 May 2016 (UTC)
- @BDavis (WMF): I consider w:Wikipedia:Bots/Requests_for_approval/FacebookBot to be a critical service, as its disappearance would be very bad (removing WP all pages from Facebook wouldnt be bad, IMO, but leaving old/bad/etc WP pages on FB would be bad). If it was running on labs, even though it doesn't interface directly with users, we should still require that its code is open, so it can be maintained properly, and algorithms re-used for similar purposes (which avoids partnership lock in). And IMO, even if it isn't running on labs, it still should be open source for the same reasons, and the TOU could enforce that by way of additional TOU that kick in for "high volume API usage". John Vandenberg (talk) 01:56, 21 May 2016 (UTC)
- Thanks for the clarification John Vandenberg. I personally tend to agree that all projects hosted in Labs and Tool Labs should be published under an OSI approved license whether they are end-user facing or not. We should probably take that aspect of the discussion to the Discuss Source Code Publication section of this consultation.
- The idea of the enwiki or the general Wikimedia community requiring licensing and source code publication for bots that are granted certain on-wiki rights regardless of where the bot operates from is interesting. I think that however is a different discussion than the Labs TOU clarifications. BDavis (WMF) (talk) 18:29, 21 May 2016 (UTC)
- Sure; my intention here was not to discuss specific requirements, but to look at definitions that can be used group tools into cohorts which have specific requirements. Specifically 'external usage' / 'end-user facing', which trigger a bunch of additional requirements. John Vandenberg (talk) 18:40, 21 May 2016 (UTC)
- One major issue that has come up repeatedly is the use of Labs to aggregate, analyze, and display public information about users' interactions with the site. Some specific examples:
- Showing which hours during a day a user typically edits the site.
- Showing the top number of edits that a user has made to pages within a MediaWiki namespace.
- Does aggregating, analyzing, and displaying this type of public information require user consent? To be clear, this is not working with any private data. Yet users have sometimes expressed dismay at having their public contribution activities ingested and redisplayed in certain ways.
- It would probably be helpful to dig up some of the past discussions related to this. --MZMcBride (talk) 22:23, 20 May 2016 (UTC)
- @MZMcBride: Good point, I will look for these. Do you have any idea where they might be (on our mailing lists (wikitech, labs-l) or somewhere else)? --ZZhou (WMF) (talk) 22:45, 23 May 2016 (UTC)
- Hi ZZhou (WMF). Requests for comment/X!'s Edit Counter and Wikimedia Blog/Drafts/Handling our user data - an appeal are probably decent starting points for the types of discussions I'm talking about. There have been many other discussions, but finding them is annoying. Maybe someone else will help out.
- In short, before Wikimedia Labs, there was the Wikimedia Toolserver, hosted by Wikimedia Deutschland. Some of the restrictions on aggregating data came from stricter German privacy laws and practices. When tools were moved from the Toolserver to Wikimedia Labs, and consequently were hosted more directly by Wikimedia Foundation Inc., the issues surrounding "profiling" editors re-arose. --MZMcBride (talk) 01:20, 24 May 2016 (UTC)
- @MZMcBride: Thanks! --ZZhou (WMF) (talk) 23:16, 24 May 2016 (UTC)
Hi ZZhou. When I go to <https://tools.wmflabs.org/dispenser/view/Checklinks>, I get a warning about being "Leaving Wikimedia" and I'm required to press a "Proceed" button. Do you know anything about this? --MZMcBride (talk) 15:40, 29 May 2016 (UTC)