Web2Cit/Docs/Storage
The Web2Cit storage is the part of the Web2Cit ecosystem responsible for keeping the collaboratively defined configuration files that dictate the behavior of Web2Cit.
As will be explained in the following sections, these configuration files are JSON files saved as wiki pages on Meta-Wiki.
They are defined collaboratively, with the help of Web2Cit editing tools, on a per-domain basis, with up to three configuration files per domain: templates.json
, patterns.json
and tests.json
.
Some of the concepts on this article are covered in a theoretical video for early adopters of Web2Cit on YouTube.
Location
Domain configuration files live in Meta, at Web2Cit/data/
, one sub-directory deeper per hostname label, from the top-level domain all the way through the last subdomain.
You can find a full list of configuration files here.
For example, for hostname meta.wikimedia.org
, configuration files would be at Web2Cit/data/org/wikimedia/meta/
.
URL scheme (e.g., http, https, etc), port and path are not part of the hostname (see T315020, though). For example, for URL https://meta.wikimedia.org/wiki/Web2Cit/Early_adopters#Domain_configuration_files
, only meta.wikimedia.org
is the hostname.
There are three configuration files per domain: templates.json
(for translation templates), patterns.json
(for URL path patterns), and tests.json
(for translation tests). So, for example, the translation templates configuration file for meta.wikimedia.org
would be at Web2Cit/data/org/wikimedia/meta/templates.json
.
Redirects
Redirects between configuration files are useful for domain aliases. For example, if www.example.com
is an alias of example.com
, configuration files of the former may be redirected to those of the latter so that the Web2Cit community does not have to maintain separate copies of the same files.
These redirects are followed both by Web2Cit core and by the JSON editor. Read the Domain aliases section of the Editing documentation for more information.
Format
All domain configuration files are written in JSON format (see below for alternative formats).
Generally speaking, our JSON files may have a combination of the following value types:
- Text strings. For example "xpath".[note 1]
- Booleans:
true
orfalse
. - Arrays: or lists, with zero or more values, separated by commas. For example:
[ "one", "two", "three" ]
. - Objects: with zero or more "key":value pairs separated by commas. For example:
{
"key1": value1,
"key2": value2
}
The MediaWiki editor is not specialized for editing JSON files (unless pages have the JSON content model, see T305571). You may find it useful to make your edits using a separate editor and then pasting the result.
For each configuration file below there is an example file and a JSON-schema file available. The JSON-schema can be used to validate your JSON files, using stand-alone validators or text editor integrations.[1]
We recommend using json-editor,[2] which lets you edit JSON files via a simple form generated from our JSON-schema files (direct links available at each configuration file section below):
- If you are editing a pre-existing JSON file, paste it into the json-editor's "JSON Output" field to the right, and click on "Update Form".
- Fill in the form.
- Copy the JSON output from the field to the right, and paste it into Meta.
templates.json
The templates.json
file contains an array of Template
objects at its root.
Template
objects
Each Template
object represents a translation template and has a series of three key:value pairs:
path
key, with a string as value, representing the path of the webpage used as translation template, in the current domain (note that multipleTemplate
objects with the samepath
value will be ignored). Do not include the hostname; just the path beginning with/
. You may also include query (?
) components. For example, for template webpagehttps://example.com/news/article?id=3
, use/news/article?id=3
.label
key, with a string as value, representing the (optional) fancy name for this translation template.fields
key, with an array ofTemplateField
objects as value. Note that multipleTemplateField
objects with the samefieldname
value (see below) will be ignored:
{
"path": string,
"label": string,
"fields": TemplateField[]
}
You can use the array of template fields from the default fallback template as a basis for your custom translation templates.[note 2]
TemplateField
objects
In turn, each TemplateField
object represents a template field in the translation template and has a series of three key:value pairs:
fieldname
key, with a string as value, representing the name of the template field. See the Fields documentation for currently supported values.required
key, with a boolean (true
orfalse
) as value, representing whether the template field should be marked as required or not; see the Templates documentation.procedures
key, with an array ofProcedure
objects as a value.
{
"fieldname": string,
"required": boolean,
"procedures": Procedure[]
}
Procedure
objects
In turn, each Procedure
object represents a translation procedure and has a series of two key:value pairs:
selections
key, with an array ofSelection
objects as value.transformations
key, with an array ofTransformation
objects as value.
{
"selections": Selection[],
"transformations": Transformation[]
}
Selection
objects
Each Selection
object represents a selection step and has a series of two key:value pairs:
type
key, with a string as value, representing the specific type of selection step. See the Selection steps subsection of the Templates documentation for currently supported values.config
key, with a string as value,[note 3] representing the specific configuration for the selection step. See the Selection steps subsection of the Templates documentation for currently supported values.
{
"type": string,
"config": string
}
Transformation
objects
Finally, each Transformation
object represents a transformation step and has a series of three key:value pairs:
type
key, with a string as value, representing the specific type of transformation step. See the Transformation steps subsection of the Templates documentation for currently supported values.config
key, with a string as value,[note 3] representing the specific configuration for the transformation step. See the Transformation steps subsection of the Templates documentation for currently supported values.itemwise
key, with a boolean (true
orfalse
) as value, representing whether the transformation should be applied to each item of the input independently (true
), or to the entire input as a whole (false
).
{
"type": string,
"config": string
"itemwise": boolean
}
patterns.json
The patterns.json
file contains an array of Pattern
objects at its root.
Pattern
objects
Each Pattern
represents a URL path pattern and has a series of two key:value pairs:
pattern
key, with a string as value, representing a glob path pattern that defines a URL matching grouplabel
key, with a string as value, representing the (optional) fancy name for this URL path pattern:
{
"pattern": string,
"label": string
}
tests.json
The tests.json
file contains an array of Test
objects at its root.
Test
objects
Each Test
object represents a translation test and has a series of two key:value pairs:
path
key, with a string as value, representing the path of the webpage used as translation test, in the current domain (note that multipleTest
objects with the samepath
value will be ignored). Just like with thepath
property ofTemplate
objects, do not include the hostname and make sure the path begins with/
. You may also include query (?
) components.fields
key, with an array ofTestField
objects as value. Note that multipleTestField
objects with the samefieldname
value (see below) will be ignored.
{
"path": string,
"fields": TestField[]
}
TestField
objects
Each TestField
object represents a test field in the translation test and has a series of two key:value pairs:
- fieldname key: any of the translation field names supported.
- goal value: an array of strings representing the expected translation output or translation goal for a given translation field. Each string value must comply with the translation field's validation rule. Provide an empty array to explicitly express that no output is expected.
{
"fieldname": string,
"goal": string[]
}
Alternative formats
Using alternative more human-readable formats, such as JSON5 or YAML, may help you read and write configuration files manually. We do not currently support any of them, although we may in the future, as tracked in task T302694.
For now, you may use online converters to:[note 4]
- Convert a JSON configuration file to either JSON5 or YAML
- Edit the configuration file in JSON5 or YAML
- Convert back to JSON and validate with JSON-schema (see above)
- Save configuration file in JSON
JSON5
JSON5[3] closely resembles JSON but is more flexible, thus tolerating some common JSON mistakes. In our case, the following features may be of interest:
- keys may be unquoted:
{ unquoted: "value" }
- strings may be single-quoted, allowing double quotes inside them:
'single "quoted" string'
- trailing commas in objects and arrays are OK:
{ key1: value1, key2: value2, }
[ a, b, c, ]
YAML
YAML is indentation-based (like the Python programming language) and is much shorter and (usually)[4] easier to write and read.
This is a side-by-side comparison between the JSON and YAML versions of an example template configuration file excerpt:
[
{
"path": "/",
"label": "fancy name",
"fields": [
{
"fieldname": "title",
"required": true,
"procedures": [
{
"selections": [
{
"type": "citoid",
"config": "title"
}, {
...
}
],
"transformations": [
{
"type": "range",
"config": "0",
"itemwise": false
},
{
...
}
]
}
]
},
{
"fieldname": "itemType",
...
}
]
},
{
...
}
]
- path: /
label: fancy name
fields:
- fieldname: title
required: true
procedures:
- selections:
- type: citoid
config: title
- ...
transformations:
- type: range
config: '0'
itemwise: false
- ...
- fieldname: itemType
...
- ...
Remember that the procedures
key of a TemplateField
object takes an array of Procedure
objects as values, each with selections
and transformations
keys. So the following code is wrong, because it specifies two separate Procedure
objects, one with a selections
key, and another one with a transformations
key:
...
"procedures": [
{
"selections": [
{
"type": "citoid",
"config": "title"
}
]
},
{
"transformations": [
{
"type": "range",
"config": "0",
"itemwise": false
}
]
}
...
...
procedures:
- selections:
- type: citoid
config: title
- transformations:
- type: range
config: 0
itemwise: false
...
Notes
- ↑ Text strings start and end with double quotes
"
. Therefore, avoid double quotes inside them. For example, in"some "quoted" text"
it is not clear where the string starts or ends. If possible, replace double quotes with single quotes'
:"some 'quoted' text"
. Alternatively, escape the inner double quotes with/
:"some /"quoted/" text"
. - ↑ Currently used fallback template definition available from the Web2Cit Core's source code repository here
- ↑ a b See T305903 for a proposal to use an array of config values instead.
- ↑ For example, toolkit.site's Data Format Converter