Web2Cit/Docs/Monitor

The Web2Cit monitor is the part of the Web2Cit ecosystem that uses the collaboratively defined translation tests to run regular checks of the Web2Cit translation system, and writes these results on-wiki for easy and prompt identification of domains which may need fixing.

How to use

Overview page

At meta:Web2Cit/monitor.

A list of all domains configured in Web2Cit, with a summary of the last check for each, transcluded from domain log pages (see below).

See draft example here.

Domain test result page

At meta:Web2Cit/monitor/com/example/www/results.

Saving this to meta:Web2Cit/monitor/com/example/www was considered, but a log sub-domain may conflict with its parent's checks log page (see below)

These pages are meant to be updated only when the test results for a domain change. That is, they won't be updated necessarily every time a check is run. Because of this, users wanting to be notified when test results change for a given domain, will watch (subscribe to) these pages.

See draft example here.

These result pages may be categorized with either Web2Cit passing tests or Web2Cit failing tests categories.

Domain checks log page

At meta:Web2Cit/monitor/com/example/www/log.

A list of checks run for a given domain. Every time a check is run, a new row is added at the top.

See draft example here.

Information concerning the version of the target paths (such as checksums of their corresponding HTML and Citoid responses) may be useful. However, we would need one row per target path for this (i.e., multiple rows per check). We cannot save this to the results page, because that page should only change if test results change. We may consider this when saving to a custom database (instead of Meta; see below).

Issues

Please report any issues to this page's discussion page, or to Phabricator using the w2c-monitor project tag.

Development

Where the source code is, setting up a local development environment, running tests, deploying to Toolforge, etc...

Installation

You need `pip` installed in your system.

pip install -r requirements.txt

How it works

The Web2Cit monitor is implemented in python, which allows the reading, interpretation, and writing of the tests carried out over the configurations that are necessary for Web2Cit to work as a Citoid complement.

Installation process

Installing

To install Web2cit-monitor, it is necessary to clone the repository https://gitlab.wikimedia.org/superzerocool/w2c-monitor and install the dependencies through the command

pip -r requirements.txt

It supports the use of virtual environments and can be run as indicated in the Toolforge python task configuration. It should be considered that temporarily only writing in local logs and in Meta is available without having the option of obtaining the parsed JSON from the responses of the endpoint. This configuration could be an interpretation gateway that web2cit-server does.

Database

To generate the database that serves as the work queue store, only the copy command must be executed since a small database is distributed with the model already implemented.

cp ./db/monitor.sqlite.dist ./db/monitor.sqlite

With this, the work queue can now be generated.

Write Credentials in Meta

To generate write credentials in Meta, you must create a user account and then request an OAuth v2 token in order to connect to the wiki with a write user.

The permits that must be requested are:

Perform high-volume activity
Interact with pages
Perform administrative actions (this would be used to revert changed not made by the bot)

When getting authentication tokens, they should be stored in the user-config.py file with the following logic:

mylang = 'meta'
family = 'meta'
usernames['meta']['meta'] = 'MY BOT NAME'

authenticate['meta.wikimedia.org'] = ('consumer_token', 'consumer_secret', 'access_token', 'access_secret')

(More information on this process can be found at https://www.mediawiki.org/wiki/Manual:Pywikibot/OAuth)

Reading from the Web2Cit API

The reading process is done by querying the Web2Cit API with the endpoint configuration using domains or using all the domains published in Meta in the space meta:Web2cit/data, which can obtain the patterns.json files, templates.json, and tests.json.

This reading from the configuration files is done considering the prefix where the configuration and test JSON files exist. When doing this process, it is verified that the template.json or tests.json file exists in order to be considered as part of the domains configured to operate with Web2cit.

The API checks are performed for each domain, and the results are returned from a JSON sent by the server that contains information about the evaluation of each path that the test or template files have, just as the API does. server web interface.

Meta Write

The information dump process is carried out directly in Meta-wiki, which stores, under a single prefix, the evaluation of all the domains that are carried out. This storage is done on 3 different pages:

A general results page, which compiles all the domains that have been checked at least once by the monitor,
A results page per domain, summarizing the evaluation of each path and the components of the paths, along with the score obtained by each evaluation, and
A page that allows you to summarize the results of the evaluation (score per domain) and links to the evaluation made by the monitor at the time of review (or a history of the evaluation)

This writing process to Meta can be complemented with a recording in local logs, which are used in a local demonstration or debug mode that avoids writing to Meta.

Change Monitor

The check for changes and the addition of new domains to the check is done constantly, established by a process (monitor.py) repeatedly called every 20 minutes. This review process allows us to identify:

new domains that can be checked ("first run" trigger);
the domains whose configuration file was modified in this period of time ("changed configuration" trigger); and
domains that have not been checked in a period of time ("programmed" trigger).

This checking process adds the domains to a work queue so that they are executed by an execution process that checks the work queue to process the changes requested in this period. The job queue execution (runner.py) frequency is set to every 1 hour.

The work queue is managed with a simple SQLite database in order to have a single place and file that concentrates all the information about the execution of the domains and keeps track of pending executions.

Solution architecture

The recurring check problem is divided into various sub-packages within the repository, which is connected, through classes, with the rest of the packages.

web2citwrapper: it has the consumption logic of the Web2cit-server API, which allows the query and import of results using the domain or path query directly.
monitor: it has the logic of evaluating and obtaining files and domains to check using the Mediawiki API
writer: has the logic of writing results in Meta, using Mako templates to simplify the writing process in wikisyntax.

Functional commands

Monitor

To run the check process or monitor, the following command must be invoked

./bin/python3 monitor.py

Which, by default, will run checks looking for the changes in the configuration files of the domains that have occurred within the last 1 hour, domains that have not been checked in the last 30 consecutive days, and domains that have never been checked.

This command only checks for the existence of these changes and generates the work queue in SQLite so that the run command can run the check.

Runner

To execute the writing process from the work queue, the following command must be invoked:

 ./bin/python3 runner.py

This command will search for all domains pending checks whose execution time has expired to enqueue the work within the pending work queue. Once executed, it changes its internal state so as not to requeue the job within the job queue.

Manual execution (not recommended, expert only)

If you want to make a manual execution without waiting for the runner, you can execute the command

./bin/python3 main.py --domain <domain> --trigger <trigger>

This allows the executions to be carried out manually, indicating as a trigger the reason why the command is executed manually.

How to use in the Wikimedia wikis

Resume

The account used to run the bot on Toolforge is on w2cmon account, which is running with two cronjobs (or toolforge jobs). The configured bot account to make changes is Web2cit-monitor-bot.

Following changes

To follow changes, you could use the monitor page to see new domains, or use this list to see the latest 15 changes.

List of abbreviations:

D: Wikidata edit
N: This edit created a new page (also see list of new pages)
m: This is a minor edit
b: This edit was performed by a bot
(±123): The page size changed by this number of bytes

18 December 2024

diffhist mb Web2Cit/monitor/com/ajunews/www/results 06:25 −3 WikiCleanerBot talk contribs (v2.05b - Fix CW error #16 - WCW (Unicode control characters)) Tags: WPCleaner Manual revert

17 December 2024

diffhist mb Web2Cit/monitor/de/sueddeutsche/www/log 04:02 +228 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/com/manga-news/www/log 04:02 +236 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/nl/gelderlander/www/log 04:02 +228 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/com/cronista/www/log 04:01 +229 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/com/cronista/www/results 04:01 −3 Web2cit-monitor-bot talk contribs (Update domain check) Tag: Manual revert
diffhist mb Web2Cit/monitor/com/newsbank/nl/log 04:01 +220 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/ru/interfax/www/log 04:01 +228 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/ru/interfax/www/results 04:01 +1,097 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/au/org/brisbanecatholic/log 04:00 +219 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/com/cronista/www/log 03:07 +228 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/nl/gelderlander/www/log 03:06 +228 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/de/sueddeutsche/www/log 03:05 +228 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/com/newsbank/nl/log 03:05 +220 Web2cit-monitor-bot talk contribs (Update domain check)
diffhist mb Web2Cit/monitor/com/newsbank/nl/results 03:05 +629 Web2cit-monitor-bot talk contribs (Update domain check)