Web2Cit: collaborative automatic citations for web sources

Wikipedia's automatic citation generator (Citoid) greatly reduces the time, effort and knowledge needed to insert citations, one of Wikipedia's main pillars. However, these automatic citations don't always work as expected (see Research).

Web2Cit logo

Web2Cit is a collaborative automatic citation generator for web sources, meant to complement citation results returned by Citoid. It is collaboratively controlled by the community via a set of relatively simple tools aiming at lowering the barrier of technical skills needed to help improving automatic citations.

Getting started edit

Install edit

To simply use Web2Cit automatic citations in Wikipedia, install our user script to your Wikipedia account[Notes 1] by pasting the following code to your Wikipedia's User:YourUserName/common.js file:

// Web2Cit
mw.loader.load( '//en.wikipedia.org/w/index.php?title=User:Diegodlh/Web2Cit/script.js&action=raw&ctype=text/javascript' ); // Backlink: [[:en:User:Diegodlh/Web2Cit/script.js]]

Read our User script documentation for detailed instructions.

Use edit

 
Web2Cit user script. (left) The "Web2Cit" checkbox on the "Cite" dialog confirms successful installation. (right) The citation at the top comes from Citoid; the one at the bottom from Web2Cit.

Once installed, you should see a Web2Cit checkbox on your Visual Editor's "Cite" dialog, confirming installation.

Enter the URL you would like to cite and click on Generate. You should now get two citation results instead of one: the one at the top from Citoid, and the one at the bottom from Web2Cit.

This is all you need to know if you just want to use Web2Cit automatic citations in Wikipedia.

However, sooner or later you will find a URL for which both Citoid and Web2Cit results are incorrect. Read on to find out how to use Web2Cit to fix this.


Editing edit

Short video explaining how to use Web2Cit
Web2Cit workshop at one of the LD4 Wikidata Affinity Group calls
Taller de Web2Cit en español, en el ciclo Wikiherramientas

Web2Cit uses a set of three configuration files per website that determine how Web2Cit handles webpages (paths) from that website (domain). Read our Web2Cit basics page to find out more about how Web2Cit works and the parts that it is made of.

Because these configuration files are collaboratively defined, if you are not happy with Web2Cit extraction (aka. translation) results you can just edit them and results will be updated for all Web2Cit users.

Open the editor edit

 
Click on any of the "edit" links on a translation summary page to edit the corresponding configuration file with the JSON editor.

To edit configuration files, first open the translation summary page for a target webpage on the Web2Cit server. To do so, click on the Web2Cit link at the lower-right corner of the citation results showing in the Cite dialog (see Getting started above). Alternatively, just go to the Web2Cit server homepage, type in the target URL, and click Extract.

Then, click on any of the "edit" links that show on the translation summary page to edit the corresponding domain configuration file using our JSON editor, as explained in the subsections below. Refresh the translation summary page after each configuration change to see how results change accordingly.

Read the JSON editor section of our Editing documentation and our Server documentation to find out more about them, including how to temporarily save configuration files to a personal sandbox space without affecting all Web2Cit users, how to instruct the Web2Cit server to use configuration files from this sandbox space, and how to include debugging information in the translation summary that may help diagnose unexpected outputs.

Add a translation test edit

Always define a translation test before anything else. Clearly stating what the expected output should be for each translation field for a given target webpage will help you and other Web2Cit collaborators maintain Web2Cit configuration files for a given domain.

First, define a translation test for your target webpage, indicating the expected output:

  1. Click on the "edit" link next to the "Expected output" header on the translation summary page to edit the domain's tests configuration file using our JSON editor.
  2. If no translation test for the target webpage has been created, add a new translation test and enter the webpage's path.
  3. Add one or more test fields and indicate the expected output. Read our Tests documentation to find out more.
  4. Save the configuration file back to the Web2Cit storage.

If defining an translation test is all you can or want to do, that's great already! Your test will help other contributors define translation templates (see below), and will be used to regularly check the health of the Web2Cit system (see Watch changes below).

Add a translation template edit

So far you or somebody else will have indicated the expected output for your target webpage. But that doesn't include how to actually get that result!

Web2Cit uses translation templates to extract citation metadata from web sources. To define a translation template based on your target webpage:

  1. Click on the "edit" link next to the "Translation output" header on the translation summary page to edit the domain's templates configuration file using the JSON editor.
  2. If no translation template based on your target webpage has been created previously, add a new translation template and type in the target webpage's path.
  3. Add one or more template fields, each including at least one translation procedure. Procedures comprise a series of selection and transformation steps that specify how to retrieve and transform citation metadata. Read our Templates documentation to find out more.
  4. Save the configuration back to the Web2Cit storage.

In some cases, multiple templates per website might be needed. These can be can be grouped into separate translation subgroups based on URL path patterns. Read our Templates and Patterns documentation to find out more.

Once you are happy with your configuration, go back to Wikipedia and retry generating a citation for your target URL.

Watch changes edit

 
A sample Web2Cit monitor results page. Add any of these to your watchlist and be notified whenever test results change.

Finally, the Web2Cit monitor regularly checks whether translation outputs from Web2Cit match the expected outputs defined in translation tests.

Test results are written to a series of per-domain result pages on-wiki which you can add to your watchlist to get a notification whenever test results change. The full list of test result pages can be checked on the overview page.

Read the Web2Cit monitor documentation to find more about it.

Need help? edit

Documentation edit

Documentation about how Web2Cit works includes:

  • Basics: a quick overview of how Web2Cit works and the parts that make the Web2Cit ecosystem.
  • Fields: translation field types and details.
  • Templates: what are translation templates and how they work.
  • Tests: what are translation tests and how they work.
  • Patterns: what are URL path patterns and how they work.
  • Editing: how to edit Web2Cit configuration.

User and developer documentation about the parts that make the Web2Cit ecosystem can be reached from the Basics documentation page.

Support edit

Understanding Web2Cit can be challenging at first. If you need further help you can:

  • Open a new thread at the discussion page of this or any other Web2Cit page
  • Open a new task on Phabricator, using the Web2Cit umbrella project tag.

Ask someone edit

Web2Cit is a collaborative effort to improve automatic citations in Wikipedia. If you have questions you may also reach other members of the Web2Cit community directly. Check the Web2Cit contributors category for users who have added this category to their profile page. And feel free to add that category to your profile page too if you have are a Web2Cit contributor yourself!

Contribute edit

Configuration edit

The simplest way of contributing to Web2Cit is by helping collaboratively create and maintain domain configuration files as described above.

Language translations edit

Web2Cit is collaboratively translated into different languages. Different parts of Web2Cit are translated differently:

Metawiki pages edit

To translate pages like this one, check whether there is a banner at the top with a "translate this page" link. If yes, just click there to start translating. If not, it means the page is not ready (yet) for translation. You can bring this to notice in the corresponding discussion page.

User interfaces edit

The Web2Cit server interface is available for collaborative translation on translatewiki.net.

This does not currently include the Web2Cit JSON editor. It is planned that its interface and contents be available for collaborative translation under the same translatewiki.net project. In the meantime, you may use automatic translation provided by some web browsers.

Web2Cit monitor edit

The Web2Cit monitor produces overview, log and result pages that are made using custom templates. Please, help us translating those to have those pages translated to your language (see T321606).

Documentation edit

As most Wikimedia tools, Web2Cit is collaboratively documented. We made our best effort to provide some basic general and technical documentation, but we understand there is still lots of room for improvement. Feel free to improve what we currently have!

Development edit

All Web2Cit code is open source and free software. Please check the pages for our different software components to find out more about how to contribute:

Acknowledgements edit

Web2Cit was first developed with a grant from the Wikimedia Foundation, based on a idea proposed by Strainu.

The original team included:

Special thanks to our Advisory Board, who helped us from the beginning of the project with its development.

Web2Cit alternatives edit

Web2Cit may not always be the best choice. It may be worth it considering the following alternatives:

  • If you think it's unlikely that somebody else will benefit from the extra work of configuring a website in Web2Cit, simply fix the citation generated by Citoid manually. This may be trivial if simply having to add or fix a field, but may require extra effort if the citation template must be changed. This is the fastest way, but doesn't benefit others citing sources from the same website.
  • Talk to the webmaster of the website you are trying to cite and convince them to embed structured metadata. Webpages that include structured metadata are generally understood seamlessly by Citoid. This is the best solution long term.
  • If you know JavaScript (or can find someone who does) edit or create the specific Zotero translator for the website you are trying to cite. Note that this may take some time until your changes are merged into the Zotero's repository, and then until updates are pulled into Wikimedia. This is the most advanced option, but also one more powerful than Web2Cit.

See also edit

Notes edit

  1. Croatian and Romanian Wikipedias support Web2Cit as a gadget and can be easily enabled from your Wikipedia preferences.