Grants talk:IEG/Citation data acquisition framework

Eligibility confirmed, round 2 2014


This Individual Engagement Grant proposal is under review!

We've confirmed your proposal is eligible for round 2 2014 review. Please feel free to ask questions and make changes to this proposal as discussions continue during this community comments period.

The committee's formal review for round 2 2014 begins on 21 October 2014, and grants will be announced in December. See the schedule for more details.

Questions? Contact us.

Jtud (WMF) (talk) 17:14, 7 October 2014 (UTC)Reply

Other languages


Please state clearly if the amount includes also the rollout of this tool in other languages. --Ilario (talk) 22:36, 14 October 2014 (UTC)Reply

It did include rollout in other languages. However, I have reduced the scope and funding request to bring it in line with something more likely to be funded. My expectation would be to propose additional work in a future IEG. Makyen (talk) 11:51, 20 October 2014 (UTC)Reply

Make an example


May you do an example (scenario) of how this tool can improve the daily writing of a wikipedian? I have read it but not understood clearly and I am a technician, probably my colleagues may have problem to go into details. --Ilario (talk) 22:38, 14 October 2014 (UTC)Reply

I feel the same way. I am to this talk page to look for this example. Blue Rasberry (talk) 18:10, 31 October 2014 (UTC)Reply
@Ilario and Bluerasberry:: The primary thing that this framework enables is for Wikipedians without specific knowledge of JavaScript and access to Citoid to be able to contribute to developing new, or maintaining already existing descriptions of how citation information is obtained from webpages.
The current direction is that the methodology to obtain citation information is being consolidated in Citoid. The expectation is that this is used across all Wikipedia projects as an adjunct of Visual Editor, and other tools which automatically generate citation information. While it is good to have automated citation generation available across all projects, it currently limits those who can contribute to the subset who can program in JavaScript. It also moves maintaining the content outside of Wikipedia to GitHub. On GitHub, access is controlled by the people authorized to actually update the project. Anyone can request to contribute, but ultimately, it is controlled by a very small number of people.
This framework provides a method for all Wikipedia's to contribute to this development and maintenance without the need to know JavaScript. While knowledge of some programing concepts is desirable, the intent is to have a GUI which can be used to develop page scraping descriptions.
How it would improve the daily writing of a Wikipedian:
Say you are using as reference a particular website for which you are wanting to generate multiple citations on different pages of the site. In general, you will want to do this using automated tools. Currently, if the tool you are using to generate citations does not have the particular website which you desire to use as one which it understand, you are largely out of luck. You either have to ask the developer to include that particular site in the tool, and hope that they get around to it at some point in the future. With Citoid (not this proposal), if you know JavaScript, you can attempt to contribute, but your request to contribute has to be approved by those that control the project.
With this framework, you can contribute directly, without the need to know JavaScript. If you need a new website to be supported, you can just create it by editing a Wikipedia page. There will also be a GUI available to help design how the page scraping is performed. — Makyen (talk) 20:51, 31 October 2014 (UTC)Reply
Makyen Can you simplify further? Give me an example of a tool people use to generate citations, and an instance when that tool does not work. I use tools to make citations from dois and PMIDs. Are these the kinds of tools you are talking about? I feel like I am so ignorant of this topic that I do not know where to begin asking questions. Blue Rasberry (talk) 22:16, 31 October 2014 (UTC)Reply



This appears to be redundant with the Citoid service, which integrates Zotero's scraping plug-ins ("translators"). Zotero translators already have an active community of maintainers and a testing suite behind them. It's not clear that an on-wiki approach is superior to just having a well-curated set of scrapers and tests in a GitHub repository that's maintained in collaboration with other stakeholders in the citations community.

You can test Citoid here: --Erik Moeller (WMF) (talk) 20:57, 15 October 2014 (UTC)Reply

@Erik Moeller (WMF): I don't view it as redundant to Citoid. Citoid keeps control of the algorithms in the hands of a very limited pool of developers as opposed to in the hands of the wider Wikipedia community. In addition, Citoid algorithms are JavaScript specific. In fact, they are written in JavaScript. This proposal puts control of the algorithms and how the data is presented within citations in the hands of the Wikipedia community in general.
To a significant extent, it comes down to a question of: Should how the data is obtained for citations and how it is presented in citations be something that is controlled on-wiki (available to all Wikipedians), or be presented as a centralized service (input limited to a select few people) ?
My point of view is that editing of how the data is obtained and presented should be open to a larger group of possible editors than is the case with Citoid.
Obviously, even with this proposal, the set of people who will actually edit such algorithms will be self-limited to a much smaller group than all people who make citations. However, the algorithms will be available on-wiki where editors can modify or add to them should they choose. As such, they will be available to be customized on a per Wikipedia project/language basis, if that is needed or desired.
In actuality, there is nothing that makes Citoid and this inherently incompatible. It would be possible to have a Citoid "translator" which performed page scraping based on such algorithms and/or an algorithm which accessed the Citoid service. Perhaps the appropriate division is that Citoid could be used to scrape pages and these definitions could be used to define how that data was used within citations. That is certainly something which could be explored. Makyen (talk) 12:36, 20 October 2014 (UTC); add reply to 15:04, 20 October 2014 (UTC)Reply
@Makyen: I'm a bit confused here – citoid, just like all other parts of our infrastructure, is open source and contributions are welcome from everyone. The "JavaScript" comment seems a bit misleading – that code runs on the server, after all. There's nothing stopping people writing tools that use citoid in ASM, C++ or whatever (such as XUL scripts for Firefox plugins). Also, I think you're missing that code written as a bespoke solution for each wiki means that the vast majority of our wikis will never get that functionality; writing code shared by all significantly improves the experience on the ~800 of our 850 wikis that aren't so lucky as to have extremely active local developers like you. Jdforrester (WMF) (talk) 18:10, 20 October 2014 (UTC)Reply
@Jdforrester (WMF):, I am not saying that there is a specific effort being made to exclude editors from making changes or additions to Citoid. I am saying that the number of people in the subset of those who who want something different/additional from the automatic citation generation and can program in JavaScript, and know of the existence of Citoid, and are willing/able to use the repository, is much, much smaller than the number of people who want something different/additional from the automatic citation generation and would be willing to make changes or additions to a description held in an on-wiki page, or available from a GUI. As such, there is a significant inherent limitation in having the page scraping in JavaScript off-wiki. An example of wanting something different/additional from the automatic citation generation would be finding that a website which used to work has changed sufficiently such that it no longer works, or having a site for which they would like to have citations automatically generated which is not currently supported.
It is very important that a solution be available to all of the wikis. A very simple solution for this is to have the algorithms default to what would be available on the English Wikipedia, or whichever base is appropriate for the particular wiki in question. Doing this would allow the definitions to be changed on a per-wiki basis, if desired. This is arguably superior to a solution which provides a single definition which is not customizable per wiki. — Makyen (talk) 00:18, 31 October 2014 (UTC)Reply
@Jdforrester (WMF):, I would also say that the way that GitHub is architected is not completely complementary to the philosophy behind Wikipedia. On GitHub, the project is ultimately controlled by one person, or a small number of people. Anyone can attempt to contribute by making a pull request. However, unlike Wikipedia, that request has to be approved by the people who control the project. While this could be seen as analogous to making an edit request, the people actually in control are not those who have been granted the administrator, or other heightened user right, through the process used to grant that right on any particular Wikipedia. Thus, it moves control of this information out of the hands of Wikipedians in general and into the hands of those who have not gone through whatever process is normal for the Wikipedia in which it is being used.
I am not saying that any abuse is happening, merely that using Citoid, as it currently exists, moves the acquisition of information which is critical to Wikipedia outside the control of the entire community.
This proposal begins to put control of that information back into the hands of the Wikipedia community at large, which is where Wikipedia's established policies and philosophy intend for it to be. — Makyen (talk) 21:07, 31 October 2014 (UTC)Reply



Also, for mw:GSoC we don't accept any application which doesn't come with a completed microtask; similarly, I recommend that first of all the proposer fixes a bug in citoid. --Nemo 09:34, 17 October 2014 (UTC)Reply

@Nemo bis: I do not have a problem with the completion of a microtask being a prerequisite for funding. I may just go ahead and do so.
However, I do have a problem with the way you have presented the recommendation(?) of this requirement. If there is going to be such a requirement applied to IEG proposals which have programming components, then that requirement should be stated up-front, in the Guidelines, project selection criteria, or some other place.
You are effectively stating: "Prove you have the ability to act on the proposal you have made." In this instance, the way you have worded it here feels more like: "I don't believe you have the ability to do what you propose. Before we make any consideration of your proposal you must prove to us that you can do what you propose by performing this other work for us." At a minimum, different wording would have been appreciated.
On the other hand, I absolutely agree that the ability of the proposer to complete the work proposed is a very reasonable, and necessary, consideration in evaluating all proposals. In fact, such consideration is explicitly mentioned in the project selection criteria. However, if there are going to be explicit requirements for the method of demonstrating that ability, then that should be stated up front, not in the back end. If there are going to be such explicit requirements for proposals with programing components, then thought should be put into determining what, if any, explicit requirements are to be applied across all types of proposals, not just ones which are programming based. There is an argument that technical tasks make it more difficult for a non-technical person to evaluate if the proposer has the ability to perform the work required. This makes it desirable to have some method of demonstrating such ability.
I can certainly understand such an explicit requirement for applications for mw:GSoC. In such cases you are drawing on a pool of people who are, potentially, completely unknown within the Wikipedia community. Further, such individuals are, generally, at the beginning of their professional lives, usually with little or no professional track record. In addition, some demonstration of ability is not an abnormal requirement for a job interview, particularly in technical fields (e.g. programming). Expanding that to IEG is not unreasonable. On the other hand, such explicit demonstrations of ability are not nearly as common for contracting work, which IEGs also resemble.
I am siting here trying to evaluate why the way that you presented this stuck in my craw. I think it is that by making this proposal I feel I have put my reputation on Wikipedia on the line stating that I can do what I have proposed. While I can understand that for some people their reputation here may not matter, but my reputation here does matter to me. Your statement felt like you are saying "I don't trust you. I think you are lying that you can do what you propose. I am assuming bad faith on your part. Before we make any consideration of your proposal, you must prove that you have the ability to produce what you propose by doing this other work for us (not work of your own choosing)."
As I said, I don't have a real objection to performing a "microtask". Although, I would prefer to have more say in exactly what that task is than you are recommending should be available to me. Makyen (talk) 16:42, 20 October 2014 (UTC)Reply
I'm sorry you took it this way. I thought my link was enough for you to find context, but apparently it wasn't; mw:Mentorship programs/Lessons learned explains something more. The microtask is not a way to challenge your reputation, it's something useful for you to learn about existing code etc.
As for suggesting microtasks, I'm happy to help our mentees because they enroll in programs specifically intended to make them learn, but here you're offering some work (for twice the money Google gives to students, FWIW) so I feel it should be your job to find one. Anyway, I asked about the issue tracker; I can't do more than that, but I'm sure that if you ask them they'll be glad to recommend you a suitable bug to fix. --Nemo 09:01, 29 October 2014 (UTC)Reply

Aggregated feedback from the committee for Citation data acquisition framework

Scoring criteria (see the rubric for background) Score
1=weak alignment 10=strong alignment
(A) Impact potential
  • Does it fit with Wikimedia's strategic priorities?
  • Does it have potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
(B) Innovation and learning
  • Does it take an Innovative approach to solving a key problem?
  • Is the potential impact greater than the risks?
  • Can we measure success?
(C) Ability to execute
  • Can the scope be accomplished in 6 months?
  • How realistic/efficient is the budget?
  • Do the participants have the necessary skills/experience?
(D) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
  • Does it support diversity?
Comments from the committee:
  • In principle, improving citation tools is a high-impact project that aligns well with Wikimedia strategic priorities. Love the idea. However, there appears to have been little research into existing tools, and a strong case for why a new tool is necessary has not been made. Does not appear to be solving a key problem or prevalent issue.
  • Overlap with the existing Citoid project is a concern – if proposal could be transformed to complement Citoid, that would be preferable.
  • Purchase of testing equipment is not an ideal use of grant funds
  • No notifications, no endorsements, not much activity on the talk page. No indication that the target community will actually use this tool.
  • The criteria for success are poorly defined. It is unclear whether a successful outcome would result in a tool that could actually be used on Wikipedia.
  • Trust the proposer's technical skills, but would like to see more details about execution

Thank you for submitting this proposal. The committee is now deliberating based on these scoring results, and WMF is proceeding with its due-diligence. You are welcome to continue making updates to your proposal pages during this period. Funding decisions will be announced by early December. — ΛΧΣ21 17:06, 13 November 2014 (UTC)Reply

Round 2 2014 Decision


This project has not been selected for an Individual Engagement Grant at this time.

We love that you took the chance to creatively improve the Wikimedia movement. The committee has reviewed this proposal and not recommended it for funding, but we hope you'll continue to engage in the program. Please drop by the IdeaLab to share and refine future ideas!

Comments regarding this decision:
We encourage you to look into how you can get involved with Citoid, and build more discussion with potential users of your proposed tools.

Next steps:

  1. Review the feedback provided on your proposal and to ask for any clarifications you need using this talk page.
  2. Visit the IdeaLab to continue developing this idea and share any new ideas you may have.
  3. To reapply with this project in the future, please make updates based on the feedback provided in this round before resubmitting it for review in a new round.
  4. Check the schedule for the next open call to submit proposals - we look forward to helping you apply for a grant in a future round.
Questions? Contact us.
Return to "IEG/Citation data acquisition framework" page.