Grants:Project/Diegodlh/Web2Cit: Visual Editor for Citoid Web Translators


statusselected
Web2Cit: Visual Editor for Citoid Web Translators
summaryTool to help non-technical users create and edit Citoid web translators to increase website coverage of this citation metadata retrieval service.
targetWikipedia (especially non-English)
amountUSD55,425
granteeDiegodlhScann
contact• delahera(_AT_)gmail.com
this project needs...
volunteer
advisor
join
endorse
created on22:35, 15 March 2021 (UTC)


Project idea edit

What is the problem you're trying to solve? edit

What problem are you trying to solve by doing this project? This problem should be small enough that you expect it to be completely or mostly resolved by the end of this project. Remember to review the tutorial for tips on how to answer this question.

 
WikiCite 2017 - Citoid performance for news article citations - Research by Fuzheado & Gamaliel showing only 32% of the 90 most popular news sites cited in English Wikipedia could be successfully extracted using Citoid/Zotero, for the four basic reference fields: headline, date, author, and publication name.

The Citoid extension in Wikipedia's visual editor uses the Citoid API to resolve a URL, DOI, QID, etc, into a citation template. To do so, the Citoid service relies (in part) on Zotero web translators to get citation metadata from a website.

Websites which embed metadata appropriately are understood by generic translators. However, this is often not the case, and site-specific translators are needed, most of which rely on web scraping techniques. For instance, an evaluation by Fuzheado and Gamaliel done in 2017 found that even popular websites in English Wikipedia weren't displaying metadata properly.

Most of these site-specific translators seem to be for English sources (see here, here, or here). Contributions to the Zotero's translators repository are open, but they require programming skills.

Zotero developers have always shown willingness to help with new translator requests, but the demand may be too high (currently there are ~40 open issues with the "New translator" tag), and sometimes translators become broken. Although they will hire a specific person to work on translators starting from May, which may shorten review times, some translators may pose cultural and language challenges. For example, a translator for a mainstream Argentinean newspaper has recently been created by one of Zotero developers, following a request from a non-technical user in their forums. In spite of the developer's good will, it was created on the wrong cultural assumption that most last names in Argentina have two parts (in addition, the translator seems to be no longer working already).

Lack of Zotero web translator coverage forces editors to fall back on manually transcribing citation metadata. For the majority of editors using visual editor, this is a cumbersome process that may deter them from adding references to their contributions, bias references toward those whose sites expose metadata appropriately, or leave broken citations.

What is your solution to this problem? edit

For the problem you identified in the previous section, briefly describe your how you would like to address this problem. We recognize that there are many ways to solve a problem. We’d like to understand why you chose this particular solution, and why you think it is worth pursuing. Remember to review the tutorial for tips on how to answer this question.

Based on a comment by User:Strainu, the idea is to develop Web2Cit: a visual translator editor, that would enable non-technical users collaboratively create and edit web translators, and define test cases.

Web2Cit would have an API that the Citoid service could use as an additional source (i.e., in addition to official Zotero web translators, Crossref, Worldcat) to resolve URLs provided by Wikipedia editors using community-created translators.

Workflow would be:

  1. A user enters a source URL into the Citoid Extension of Wikipedia's Visual Editor.
  2. The URL can't be resolved, or the user is unhappy with the results (i.e., retrieved metadata has errors).
  3. A separate "community generated" section is shown, with citations formatted after results returned by Web2Cit API, using community translators. "Edit" and up/down-vote buttons may be available next to each of these citations (only for logged-in users).
  4. The user can choose one of these community-generated citations, or open Web2Cit to create a new translator or edit existing ones.

Web2Cit would also act as a web proxy server, adding structured metadata to websites using one of the community translators. This way, the proxied web site will be available for translation with official generic translators by any service relying on them; including the Citoid service (until they add Web2Cit as an additional source), Zotero's browser connectors, Zotero's ZBib, etc.

Project goals edit

What are your goals for this project? Your goals should describe the top two or three benefits that will come out of your project. These should be benefits to the Wikimedia projects or Wikimedia communities. They should not be benefits to you individually. Remember to review the tutorial for tips on how to answer this question.

  1. By mid-2022, up-to-date figures of the current Citoid coverage gap (i.e., Wikipedia sources not understood by Citoid) will be available.
  2. By mid-2022, there will be an open source tool, Web2Cit, that enables Wikimedia, Wikipedia, Zotero and other communities to easily, non-programmatically, create and edit web translators, to collaboratively increase website compatibility with the citation metadata retrieval service, Citoid.
  3. By mid-2022, the tool will be known to and understood by as many Wikimedian communities as possible, across different languages.

Project impact edit

How will you know if you have met your goals? edit

For each of your goals, we’d like you to answer the following questions:

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)

For each of your answers, think about how you will capture this information. Will you capture it with a survey? With a story? Will you measure it with a number? Remember, if you plan to measure a number, you will need to set a numeric target in your proposal (i.e. 45 people, 10 articles, 100 scanned documents). Remember to review the tutorial for tips on how to answer this question.

Goal 1: up-to-date Citoid gap figures edit

Outputs (what we will do from July 2021 through June 2022):

  • We will conduct research on different language Wikipedias to understand what the current Citoid coverage gap is; that is, which Wikipedia sources are not understood by Citoid

Outcomes (continued positive impact):

  • This research will provide a series of sources that the community can create site-specific web translators for using Web2Cit.
  • In addition, these up-to-date figures will provide a baseline value to compare against in the future, after Web2Cit has been available for some time.

Goal 2: Web2Cit development edit

Outputs (what we will do from July 2021 through June 2022):

  • We will develop Web2Cit front-end, API and web proxy, and make the source code available under free (libre) software licenses.
  • We will propose Citoid service, API and extension enhancements to more seamlessly support Web2Cit (optional; see Citoid enhancements below).

Outcomes (continued positive impact):

  • By enabling non-technical users create and edit web translators, Web2Cit will help increase the coverage of websites supported by Citoid, hence encouraging the insertion of a higher diversity of references to Wikipedia articles. This would especially benefit non-English Wikipedias, since most site-specific translators currently available are for English sources.
  • In addition, most popular community translators may be identified and submitted to Zotero's translators repository, hence more widely benefiting all services relying on Zotero translators as well.
  • The release of the source code under free (libre) software licenses will enable continued improvements by the Wikimedia developer community.

Goal 3: spreading the voice edit

Outputs (what we will do from July 2021 through June 2022):

  • We will continuously communicate with the communities, to provide updates about the status of development, and to get their feedback.
  • We will create written and video documentation and training materials.
  • We will set up mechanisms to engage the community in translating the tool to other languages.
  • We will organize a set of public workshops to present and explain the tool.

Outcomes (continued positive impact):

  • Engaging different language Wikipedia communities and providing documentation, training materials, translation tools and workshops, will help ensure wide and continued adoption of Web2Cit.

Do you have any goals around participation or content? edit

Are any of your goals related to increasing participation within the Wikimedia movement, or increasing/improving the content on Wikimedia projects? If so, we ask that you look through these three metrics, and include any that are relevant to your project. Please set a numeric target against the metrics, if applicable.

As part of our efforts to promote the tool, we will organize a series** of workshops in English and Spanish. In these workshops we expect to have:

Three shared metrics:

  • Total participants: 10-20 for each workshop, between 30-60 people engaged through the workshops
  • Newly registered users: N/A
  • Content pages improved: 20

Other metrics:

  • User-contributed translators: 15
  • Citations added using these community translators: 20
  • Languages in which the tool will be translated: 5 new languages aside from English and Spanish

** Note on the "series of workshops": initially we plan to organize 3 workshops, but if we discover that there's a lot of interest from the community in receiving the training, we are open to give more workshops as requested.

Project plan edit

Activities edit

Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

Activities will develop from July 2021 through June 2022, and are divided into three main areas. Activities are described below with further details. A summary is provided below.

  • Research: the current Citoid coverage gap (i.e., sources used in Wikipedia not supported by Citoid) will be identified. This will provide a baseline value of coverage against which to compare in the future, and target sources to address with community translators using Web2Cit.
  • Software development: development of Web2Cit browser extension, API and web proxy, and Citoid enhancements if applicable. Note Citoid enhancements are optional and are not required for the success of the project (see below).
  • Dissemination: in order to make sure that the tool reaches its expected audience, we will be involving the community during the development of the tool and after the tool is developed so they can provide feedback and widely disseminate the tool in different language communities. In order to carry out this activity, we will create documentation, set up mechanisms for translation, and run workshops with the community.

Research edit

We expect most community translators will be created after the grant has ended and the service has been up and running for some time. However, we would like to provide a baseline value about the gap in Citoid coverage to compare against later on.

To do this, the source code of a sample of Wikipedia articles from different language Wikipedias will be scanned and Citation templates used will be extracted. Then, the URL of these citation templates will be fed to the Citoid API, and the response will be compared with the metadata in the template. If data do not match, it will be assumed that Citoid is not handling the URL appropriately. Finally, the proportion of websites correctly handled by Citoid/Zotero translators will be calculated. This coverage gap is expected to be reduced with the help of the tool proposed here.

We understand this approach has some limitations. The Citoid translator may have rendered wrong metadata when the citation was inserted. We assume the editor would have corrected this wrong metadata. However, this may not have been the case. Therefore, if the translator has been fixed since then, metadata won't match and we will be overestimating the number of mishandled URLs. On the other hand, if the translator has not been fixed yet, metadata will match, and we will be underestimating this number.

In addition, other resources will be considered as well, such as Citoid logs for domains on which they get Zotero "misses", Zotero translators repository issues with "New translator" tag, previous research by User:Fuzheado, etc. All in all, this research will help provide a list of sources to work with in the Workshops we plan to organize (see below).

Development edit

Development would in turn imply six areas: Planning, Web2Cit frontend, Web2Cit backend, Web2Cit API, Web2Cit web proxy, Citoid enhancements.

Development will borrow from visual scraping tools such as opensource Portia, and from User:V111P's WebRef, a tool with similar goals from pre Citoid era, but which lacks collaborative features and integration with Citoid.

All software developed will be released under the GPL version 3 license, or another FLOSS license compatible with Wikimedia software policies.

Planning edit

If the grant is approved, in this initial stage we will more thoroughly consider the plan outlined below, including refinement of the system modelling and data architecture.

Web2Cit frontend edit

The Web2Cit frontend is the part facing the user.

It will comprise a sidebar injected onto the website that wants to be translated, using a bookmarlet or a browser extension (in a way similar to how the Hypothesis web annotation bookmarklet/extension works). This sidebar will probably be developed using React.

To make sure users can switch URLs to better refine their translators and test cases, the browser's local storage will be used to keep draft translator definitions until the user is ready to save them.

The frontend can be further subdivided into two sections: the dashboard, and the editor

Dashboard edit
 
Sketch showing Web2Cit sidebar Dashboard. It depicts a web browser with URL http://thenewspaper/ loaded. In the sidebar's dashboard we can see user Diegodlh is logged in. The formatted citation produced with results from the Zotero translation server (Citoid API) is shown at the top. Below, two formatted citations, each produced with results from a community translator matching against the current URL, are shown. Translator author name, test results, user votes and edit/fork buttons are provided for each community translator.

The side panel would show a list of formatted citations.

The top part would show the citation formatted after the metadata extracted from the site using the Zotero official translators. These metadata will be retrieved from the Citoid API. (*)

Below, a list of citations will be shown, formatted after the metadata extracted using the community-contributed translators. These metadata will be retrieved using Web2Cit API.

Citations will be formatted in a way similar to how they are shown by the Citoid Extension, and in case of community-translated citations sorted according to translator popularity (i.e., number of votes received).

Next to each citation generated using one of the community translators, there will be:

  • the username who contributed the translator used;
  • an Edit button to edit the translator (including add test cases); in principle, only the original author would be able to edit a translator, whereas other users would be able to fork it;
  • a button to get a proxied version of the URL (see Web2Cit web proxy below);
  • the result of the test cases defined in the translator (passed/failed);
  • up/down-vote buttons to rank the translator (only logged-in users), and vote count.
  • a button to export the translator used in Zotero's JS format (useful to submit them to Zotero repo).

Finally, below the list of community-generated citations, there will be a button to create a new translator.

(*) Alternatively, using a full Zotero translation framework (as done by Zotero browser connectors) will be considered, to know the UUID of the translator used (since this information is not returned by the Zotero translation server, hence nor by the Citoid API). This information may be useful to create community translators that partially rely on metadata extracted by official translators (see Editor section below).

Editor edit
 
Sketch showing Web2Cit sidebar editor. It shows the "Selection" step of the editing process. In this step the user clicks on elements of the website that contain relevant metadata to select them. All steps of the editing process are shown in tabs below: URL matching, Selection, Post-Processing, Mapping, and Saving.

Editing a new/existing translator will imply five steps/sections: url pattern, selection, post-processing, mapping, tests, and saving.

Given that, in principle, community translators will be saved as Wikipedia User Scripts (see Saving subsection below), only users logged in to Wikipedia will be able to edit translators. Users will be asked to log in to Wikipedia using OAuth before being able to enter the Editor section of the frontend.

In principle, manually editing raw Zotero translator JavaScript code will not be available. If a user has sufficient knowledge to edit a JS translator, they can use Web2Cit to create a first draft, export it as a JS translator, make the necessary changes, and post it to Zotero's translators repository directly.

URL matching pattern

In this part the user will be suggested a URL matching pattern to which the translator should apply. The user will be able to modify this using wildcards, etc.

Selection

In this part, the user will point and click on different elements of the website where the relevant metadata is. This will return the corresponding CSS or XPath selector. This will be partly based on tools such as open source visual scraper Portia, CSS Selector, DOM Element Picker, Selector gadget, etc.

In addition to the visible HTML nodes, in this step the user will also be able to select:

  • the url
  • HTML meta tags
  • JSON-LD fields (not yet supported by Zotero's generic translator)
  • fields resulting from Zotero official translators (including generic and site-specific translators). For this, the UUID of the translator used will be needed, to save it to the translator definition at the end.

Post-processing

Items selected in the previous step will be post-processed here. This includes basic operations such as trimming, splitting into separate items, merging two or more items into one, regex extraction, and advanced custom JS operations.

Mapping

In this step, users will map the items selected and post-processed above into the predefined fields that Zotero translators should output, including item type, title, authors, etc.

In addition to using selected and post-processed items, users may define hard-coded strings as well. For example, the default language of the sources for a given URL matching pattern.

Testing

Finally, here users will be able to define test cases. That is, provided a URL, what are the values expected for each output field. These tests will be run regularly by the backend, and the results will be (1) shown in the frontend dashboard (see above) and (2) used to decide whether to use a translator or not by the Web2Cit API (see below).

Saving

(The storage strategy outlined here is one possibility. The topic has been opened for comments in the discussion page)

Storage of user-created translators and their associated properties (such as test results and popularity) will be saved in two central repositories: a database and a file storage.

When a user creates a new translator and clicks "save", the translation instructions or translator definition (including URL matching pattern, items selected, post-processing, mapping and tests) will be converted by an automated script to a Zotero JS translator, and be given a translator UUID (just like any official translator). Since this conversion is one way only (i.e., the JS translator can't be converted back to the translator definition), a file including both (that is, JS translator with commented out definition in custom XML/JSON format at the top) will be created.

In principle, the idea is to save this file as a Wikipedia User script. This way, Web2Cit would not have to take care of user accounts, storage and tracking of changes. However, by doing this, (1) only users logged in to Wikipedia will be able to create translators, and (2) only translator owners will be able to edit them (other users will have to fork them instead if they wish).

On the other hand, an entry will be created in the Web2Cit database including translator UUID, file location (i.e., path to User Script), and URL matching pattern for quick look-up. This database will also hold votes given to the translator and test results (which will be updated periodically and each time the translator is edited).

Web2Cit backend edit

The backend will communicate with the community translators database and will process requests sent to the API.

The translators database will hold the following information about the community translators:

  • UUID: translator's unique identifier;
  • URL matching pattern;
  • path: path to the User Script file, containing both translator definition and JS translator code (see Saving section above);
  • checksum: checksum of the last known version of the translator file; this is needed to decide if test results saved are reliable.
  • test result: results of the last translator test
  • votes: up and down votes issued against the translator

The backend will regularly go through the translators database, download translator code, and run the tests defined in them. It will update the translator checksum and test results accordingly.

In addition, in response to API translation requests (see below), it will search the database for translators matching the URL provided, download the translator code from the path specified (and update the translator checksum if applicable), translate the URL provided with the translators downloaded, and return translation results. The backend may keep a cache of the translators code for better performance.

In response to API edition requests (see below), it will update translator information.

Web2Cit API edit

The API will be the interface to the Web2Cit backend. It will be used by the Web2Cit frontend and web proxy, and it may be used by the Citoid service as a source, in addition to already-supported Zotero translation server, Crossref, Worldcat.

In principle, this API will have two POST endpoints: translate and edit.

Translation endpoint edit

This endpoint will get a URL and return translation results using community translators matching the URL provided. It will reply the same way as Zotero translation server, with multiple results (one per applicable community translator). For each translation it will also return information about the translator used, such as UUID, author name and vote count.

This endpoint will accept an additional parameter to include or ignore translators which are listed in the database as having failed the test cases. The Citoid service will set this parameter to "ignore" (as we won't want failing translator results be shown by the Citoid extension).

Edition endpoint edit

This endpoint will be used to update the information about a translator in the database. It will be used when a translator is created or edited in the Web2Cit frontend, and when an up- or down- vote is issued against the translator both from the Web2Cit frontend or from the Citoid extension.

This endpoint will require users be logged in to Wikipedia, so it will be expecting the necessary OAuth token parameters.

Web2Cit web proxy edit

Takes a community translator UUID and a URL (e.g., https://web2cit.toolforge.org/<translator-uuidid>/<url>) and returns a proxied website with structured metadata embedded. The Hypothesis' Via web proxy will be taken as example.

As a result, the proxied website will be able to be translated by a generic translator. In addition, the proxied webpage will have a canonical URL set to original URL, that the generic translator will use as source URL.

Citoid enhancements edit

Citoid service, API and extension changes proposed below will improve integration of Web2Cit with current workflows, but it is important to underscore that they are not necessary for the success of the project. As explained in the What is your solution section above, alternative workflows will be available until these changes are implemented (in case they are).

Alternatively, some changes can be thought of as User Scripts, therefore not requiring in principle that they are included in the official source code.

Citoid service/API edit

The Citoid service may be updated to use Web2Cit service as a source, in addition to Zotero, Crossref, Worldcat, etc.

Citoid extension edit
 
Possible implementation of "Community-generated citation" section in the Citoid extension. After the user has entered a URL in the Automatic tab of the Citoid extension, in addition to the Citation generated using Zotero translation server results, a series of Community-generated citations (hidden by default) are shown, generated using Web2Cit API results (one per community translator matching the URL provided). Translator author, edit and up/down-vote buttons, and vote count are shown.
  • Make it understand Web2Cit sources from the Citoid API (see above).
  • Add a "community generated citations" section, with citation metadata obtained using different community translators. As indicated in the What is your solution section, each would have author name, and "Edit" and up/down-vote buttons for logged-in users (*). Also an "Add new" button at the end.

(*) Alternatively, the Citation Extension can report to the Web2Cit API which community-generated citation was finally inserted.

Communications edit

These activities will ensure the tool is as widely promoted and adapted by different language Wikipedias as possible. It is divided into four areas: documentation, translation, community engagement, and workshops.

In doing these activities, we'll make sure that all the materials:

  • are created both in English and Spanish;
  • are openly available in editable formats that allow for greater re-use;
  • allow for internationalization and localization to languages other than English and Spanish.
Documentation edit

We'll be creating different types of documentation.

  • a. Documentation on how the tool works for non-technical users, including the following:
    • video tutorials with scripted versions that allow for the videos to be translated into other languages;
    • written documents with proper visual marks that help people navigate the tool.
  • b. Training materials for the workshops (workshops to be described below).
  • c. Guiding and orientation materials for people that want to independently run a workshop to train people on how to use the tool in their own language.

For points a) and b), we plan to design some lightweight discussion sessions where we introduce the tool and understand which questions the potential user base might have around the tool. We will mainly collect feedback from three specific groups of targeted audiences: researchers, librarians and Wikipedia+Wikidata editors. This is a set of pilot outreach sessions that will help us collect feedback and input on the design of the tool and the training materials and the documentation. We'll run 3 sessions of around 10-20 people for this process.

All this information will be centralized in a landing page on Meta to allow for people to monitor any updates on the project. Resources will be uploaded to Wikimedia Commons.

Translation edit

The tool will be released in English and Spanish, and open to translation to any other language. The tool will be developed in an internationalized way, and it will be translated using translatewiki. We will make sure to deploy different community engagement strategies to recruit volunteer as translators, including reaching out to specific bilingual volunteers, different language communities and projects working with multilingual approaches with an interest on improving reliable sources on the Internet (as an example, RisingVoices could be one of such projects).

We will provide a small tutorial to use translatewiki if there is none, and if needed we will conduct small sessions that help people to use translatewiki if anyone needs help with figuring out the tool.

Community engagement edit
  • The status of development will be communicated to different language Wikipedias and teams involved.
  • These communities will also be asked to fill in sources that they have been experiencing problems with, as explained below.
    • There are several community-generated lists: e.g., here, where users have been identifying sources that have problems with the Zotero translator. We will aim to systematize more of this information and identify more sources that could potentially be translated afterwards. The goal is to produce a table that identifies the following:
      • Conflicting URL;
      • Where the URL is coming from;
      • Language of the website.
    • This table will also be fed with some of the Research results explained above as well.
  • Interested individuals and groups will be identified that will help us test the tool and translate it.
  • Provide mechanisms for the community to be engaged during the 12 months. We will share updates periodically through Noticeboard; the Wikilibrary Facebook groups & mailman lists, and other communication channels for Spanish Wikipedia and other regionally focused communication channels for India, Africa & CEE, that we consider are regions that will benefit from this tool.
Workshops edit

We will run a set of public workshops with the intention to reach out to at least 30 power users and do skill transfer.

The goal of these workshops will be to have some power users or power adopters in small language communities that can socialize it both inside the wiki and other circles around them. The workshops will try to explain how people can use this tool to improve their local wiki, socialize the tool, and help people understand how they can use the tool to improve the sources they need.

Proposed methodology for the workshops

  • Before the workshop: We will invite communities to identify the sources they are having trouble with, and describe the problems that they are experiencing.
  • During the workshop:
    • Introduction: What the tool does, why is it important to get involved and help us with the dissemination of the tool.
    • Training session: Walk through the tool and training session! For the training session, we will be using the table described above in the "Community engagement" section. As part of the workshop, we will be adding a field that asks participants to sign up for a source in which they will be using the translator. Then, participants will identify citation needed templates (using the Citation Hunt tool) and fill the gap using a reference from a source they have created a translator for.
    • Think-forward: How can the tool be improved? Is the tool doing what is set out to do? Is it too complicated to use? Is the documentation and training clear enough? What other aspect needs to be covered?
  • After the workshop: We will send participants a post-workshop survey that asks for their feedback on the tool, the documentation and/or the workshop. What worked well, what could have done differently, and what they would add, if anything?

Budget edit

How you will use the funds you are requesting? List bullet points for each expense. (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!

Budget table
Activity Number Rate per unit (USD) Subtotals (USD)
Research 44 hours $30 $1320
Software development 1452 hours $30 $43,560
Community engagement 250 hours $30 $7,500
Wire fees $300 $300
Additional buffer & contingencies costs 5% of total $2700
Workshops logistics costs (Zoom subscription) 3 months $45 $45
TOTAL BUDGET $55,425

Community engagement edit

How will you let others in your community know about your project? Why are you targeting a specific audience? How will you engage the community you’re aiming to serve at various points during your project? Community input and participation helps make projects successful.

See the Communications section above.

  • Periodic updates of the development status
  • For translation to languages other than Spanish, community will be involved
  • Feedback group sessions.
  • Workshops.

In addition, to involve the Wikimedia developer community, code will be hosted in Gerrit, and issues will be tracked in Phabricator.

Get involved edit

Participants edit

Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.

  • Diego de la Hera (User:Diegodlh). Project lead and main developer. I am an advocate for free access to knowledge, and a developer committed to libre software, with a focus on web technologies, currently developing Wikidata's Wikicite plugin for Zotero. Full resume in https://diegodlh.conversodromo.com.ar
  • Evelin Heidel (User:Scann). Communication, documentation and public relations lead. Many years of experience in the Wikimedia movement and Wikipedia editing.
  • Second developer: to be defined
  • Research assistant: to be defined
  • ADVISORS. We are looking for advisors! Sign up below if you're interested in being one.
  • VOLUNTEER ROLES. In exchange for your participation we'll give you a cute barnstar exclusively designed for this project as a way to show you our eternal appreciation.
    • Project ambassadors. Do you want to be an ambassador in your language Wikipedia? This is a lightweight activity where you can help us disseminate the tool in your language Wikipedia, help us set us the workshop and suggest potential participants for the workshops. Sign below if you're interested in taking part!
    • Language translators. Are you interested and willing to help us translate the tool once it's ready? Sign the language you want to work on and your name below so we can contact you if we get the grant and once we have the tool ready!
      • Language: id. Your signature: ··· 🌸 Rachmat04 · 08:03, 17 March 2021 (UTC); Jeromi Mikhael (talk) 03:11, 18 March 2021 (UTC)
      • Language: ro. Strainu (talk) 22:32, 17 March 2021 (UTC)
      • Language: pt. - Darwin Ahoy! 12:54, 18 March 2021 (UTC)
      • Language: sr. Aca (talk) 17:56, 18 March 2021 (UTC)
      • Language: bs. – Srđan (talk) 17:58, 18 March 2021 (UTC)
      • Language: de. Gnom (talk) 10:40, 20 March 2021 (UTC)
      • Language: ha.Uncle Bash007 (talk) 10:48, 23 April 2021 (UTC)
      • Language: ru. rubin16 (talk) 15:59, 1 May 2021 (UTC)
      • ... Your signature: ...
    • Workshop participants. Are you interested in taking part in one of the workshops to learn how to use the tool and apply that knowledge translating and improving source metadata? Sign your name below so we can contact you if we get the grant and once we have the tool ready!
  • Volunteer Happy to do trials or testing as needed AmandaSLawrence (talk) 03:50, 25 March 2021 (UTC)
  • Volunteer I would love to help test, build, translate, make documentation and anything else you need. I would love to be involved in helping on every part of it. LilMamaVhris35 (talk) 19:03, 30 July 2021 (UTC)
  • Volunteer Hello, If needed, I think I can create a small table for the top 20 most used french press sources for fr-wiki. It would contain : website url, an article test url and if something is missing or wrongly displayed. For instance : Title, author, editor name and date. See you Jurbop (talk) 07:26, 19 January 2022 (UTC)
Jurbop Wonderful! We are collecting problematic URLs in this spreadsheet, including what is expected and what Citoid returns. We will use this information to improve and test Web2Cit. You may add any non- or partially-working URLs there. You may also include your name in the Contact column and a link to the "list of top 20 most used French press sources" in the Comments column. Contact our Communications & Community Manager User:Scann if you have any questions. Thank you very much! --Diegodlh (talk) 13:41, 19 January 2022 (UTC)

Community notification edit

You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.--> Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?

Discussion around the idea further elaborated in this proposal has been ongoing already in the following fora (all of which have been notified about the publication of this proposal):

  • Zotero translators integration topic in the discussion page of the Shared Citations proposal. This is where the original idea was raised by User:Strainu
  • Subpage "Visual Zotero/Citoid translator editor", where I summarized the idea (as it was at that moment). Feedback from the Wikimedia (and Citoid) community can be found in the discussion page and helped shape this proposal.
  • Zotero forum. Feedback from the Zotero developers helped shape this proposal as well.
  • Wikicite Telegram chat.
  • Wikimedia general Telegram chat.
  • Citoid talk page.

Other channels where this proposal has been or will be communicated include:

Endorsements edit

Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).

  • This ambitious project seems well-prepared, and it seems budgeted large enough to be able to achieve its goals. The project has potential to drastically improve the references in our projects which also paves the way for future project like the Shared citations part of Wikicite that is being developed. Ainali talkcontributions 22:14, 16 March 2021 (UTC)
  • As a result of my idea, I'm really happy to see this proposal. I believe that once implemented it will make adding citations significantly easier for projects using Citoid. I'm also willing to participate as a programmer to this project if approved. Strainu (talk) 05:59, 17 March 2021 (UTC)
  • Why not? The proposal is great. We need this kind of idea to improve the usability of Citoid for better, easy-to-use tool. ··· 🌸 Rachmat04 · 08:01, 17 March 2021 (UTC)
  • I am in favor of increasing translating articles and references across languages. The local differences may hinder this. Technical solutions are an important step in solving the problem. – Susanna Ånäs (Susannaanas) (talk) 09:08, 17 March 2021 (UTC)
  • Esteban (talk) 11:12, 17 March 2021 (UTC)
  • I think it's a good initiative to facilitate citations and to engage non technical users in the task of building tools.--Pepe piton (talk) 14:34, 17 March 2021 (UTC)
  • This will improve the experience for new and non-technical users who are non-native English speakers so   Strong support Pablísima (talk) 16:42, 17 March 2021 (UTC)
 
WikiCite 2017 - Citoid performance for news article citations - Research showing only 32% of the 90 most popular news sites cited in English Wikipedia could be successfully extracted using Citoid/Zotero, for the four basic reference fields: headline, date, author, and publication name.
  •   Support - One cannot over emphasize how badly this is needed. This one project can have massive benefits across the projects by making citations work better for thousand of users. Right now, if a Citoid 'scrape' fails to get the right fields from a web site, news site, etc. then the user is forced to either insert a poorly formed reference, or hand-edit it to fix it. Even worse, each and every time it is inserted, this manual process is required. Over and over. The right solution is to "fix" this by helping to create a Zotero translator, but this is highly complex and needs a special tool and a special Firefox browser setup. I tried pitching in to do this several times in the past but it was very intimidating, complex, and slow. User:Gamaliel and I even did a quick research project (see illustration) showing how poorly Citoid (not its fault!) did on many news sites because the metadata is so poorly formed on the side of publishers, and the Zotero translators are too hard to create to compensate for this. As a result, only a handful of folks from our community (like User:Zuphilip) ever wind up creating translators and fixing citations problems. If we can get more people to fix the translators through easier tools (like User:V111P created) then we have a great opportunity to solve this huge problem. User:Diegodlh has been extremely detailed in his approach and research, engaging with the WikiCite channel and people who have done work before him on this matter, and has shown deep knowledge about this issue and a robust approach to a solution. User:Scann is an experienced and engaged Wikimedian who can be a great advocate and community liaison. Highly endorse! -- Fuzheado (talk) 22:13, 17 March 2021 (UTC)
Oh wow, thanks for including those slides Fuzheado I was actually going to ask you if you had that research handy. When we were talking about this project I had first-hand experience for Spanish Wikipedia having issues with sources, so I see how this could be useful for "minority languages" (Spanish is not really a minority language, but you get the idea), but I didn't know that it was also a problem for English sources as well. Slide 10 shows that in most of the cases authors are not being retrieved properly, which is in itself a very interesting, convoluted problem, for a variety of reasons. Scann (talk) 11:59, 18 March 2021 (UTC)
  •   Support per Fuzheado. Gamaliel (talk) 23:59, 17 March 2021 (UTC)
  • Zotero is amazing but adding a translator is very tedious; Web2Cit will benefit not only the Wiki ecosystem but also the whole open knowledge movement. Le Loy 03:49, 18 March 2021 (UTC)
  • Extremely useful project - Citoid is kind of broken in Portuguese as well in many websites. Per Fuzheado and others above. - Darwin Ahoy! 12:56, 18 March 2021 (UTC)
  • Support. If this proposal is successful it will be a great enabler for people (like me) who are not programming-savvy to assist with improving metadata for citations. This will be of great benefit to the projects. Thryduulf (talk: meta · en.wp · wikidata) 16:46, 18 March 2021 (UTC)
  • Support This sounds like a great idea. Samwalton9 (talk) 17:02, 18 March 2021 (UTC)
  • Support. This would be extremely useful. Srđan (talk) 17:57, 18 March 2021 (UTC)
  • Support – Perfect project with a cool idea. --Aca (talk) 17:58, 18 March 2021 (UTC)
  • Support Awesome idea. Honestly I am only expecting an list of metadata in order to allow the user to map it to Zotero, but an UI selector goes an step further.--Snaevar (talk) 18:01, 18 March 2021 (UTC)
  • Support Citoid is great but it is so frustrating when you have websites you regularly cite and Citoid doesn't work at all or doesn't work properly for them. I've often thought "if only I could get it to use this metadata on that website, it would be so much more useful" so this project fits this niche perfectly. Also a smarter Citoid has value outside of Wikipedia too. Once you have the correct metadata in hand, it should be possible to create citations in any of the popular formats that someone may need in their writing. Kerry Raymond (talk) 01:51, 19 March 2021 (UTC)
  • I think it is useful and necessary for the entire community to be able to facilitate the automatic bibliographic reference process. I will be attentive to collaborate. Virc587 (talk) 12:24, 19 March 2021 (UTC)
  • As a regular user I love using this tool, but found it quite confusing that it can't render some website translations properly. I think this project would help fix that problem. Jeromi Mikhael (talk) 14:37, 19 March 2021 (UTC)
  • I'm often frustrated by how poorly Citoid renders even popular German news websites. --Gnom (talk) 10:42, 20 March 2021 (UTC)
  •   Support It solve a very common problem. I could help on the spanish implementation Wilfredor (talk) 22:31, 21 March 2021 (UTC)
  •   Support - Very good idea and helpful for many communities using multiple languages. Hiperterminal (talk)
  • Making it as easy as possible to provide citations with tools like Citoid increases the likelihood that Wikpedia content will be cited and thus contribute to the quality and reliability of Wikipedia. AmandaSLawrence (talk) 03:48, 25 March 2021 (UTC)
  •   Support Interested in testing and working with this tool. MargaretRDonald (talk) 04:40, 27 March 2021 (UTC)
  •   Support This is a wonderful idea. I've looked into creating/fixing Zotero translators before, but even as someone who knows javascript it is extremely daunting. Making this easier to do, and opening it up to a wider community, should vastly increase the number and diversity of sources that Citoid can handle. the wub "?!" 23:23, 11 April 2021 (UTC)
  • This will greatly help with the creation and usage of citations. Remagoxer (talk) 13:21, 11 June 2021 (UTC)
  • Support -- I agree with everything @Kerry Raymond: wrote :) ~~~~
  • To help proved the purpose fo which it been requested for 154.160.19.89 00:29, 24 August 2021 (UTC)