Web2Cit/Docs/Patterns

URL path patterns define separate translation subgroups within a website/domain based on the path of target webpages.

As described in the Basics documentation page, when a target webpage is requested to be translated, Web2Cit tries to match its path to one of the URL path patterns defined for the domain that it belongs to, and then uses translation templates based on webpages/paths sorted into the same pattern to return a translation.

If no matching URL path pattern is found, a catch-all translation subgroup is used instead.

Pattern definition

URL path patterns are defined using glob patterns.

In addition, URL path patterns may include a label to help Web2Cit collaborators understand why they have been added.

Pattern matching

Targets and templates are sorted into the first URL path pattern group that matches the corresponding path. URL path pattern groups are tried in the order in which they are defined in the domain's patterns configuration file.

There are tools to test your glob patterns. See for example DigitalOcean's Glob Tool. However, note that there may be differences between specific implementations. The Web2Cit core library currently uses the glob matcher for JavaScript minimatch.

Note that URL query strings (i.e., the part of a URL beginning with a question mark ?) are excluded from URL path pattern matching. If controlling which translation templates should be used to translate a target webpage based on these URL query string parameters, consider using the Control template field and the URL selection step; see the Templates documentation for further information.

Regarding URL fragment components (i.e., the part of a URL beginning with #), they are always ignored by Web2Cit from target or template paths.

Catch-all pattern

If no URL path pattern is found that matches the target webpage's path provided, Web2Cit sorts it into the translation subgroup defined by the ** catch-all glob pattern.