Toolhub/Progress reports/2020-09-25

Report on activities in the Toolhub project for the week ending 2020-09-25.

More research tasks resolved edit

Toolinfo v1.2.0-draft02 json schema published for review edit

A new v1.2.0-draft02 json schema is now on metawiki for review. A "final" v1.2.0 schema will probably not be released until Toolhub reaches a milestone of a working toolinfo.json submission and storage system to reduce the documentation churn that theory meeting practice is likely to expose in the current draft.

Changes from v1.2.0-draft01:

  • MaxLength constraints added for all string types
  • Extracted #/definitions/url
  • Extracted #/definitions/url_multilingual_or_array
  • toolinfo_version replaced by $schema
  • toolinfo_language replaced by $language

This schema is also now being managed with the @wikimedia/jsonschema-tools library developed by the Foundation's Analytics team. This tool allows us to use a YAML file to describe the schema and produce versioned artifacts that are point-in-time copies of that master document. This should make updating and revising the schema easier in the future.

Work in progress on Blubber and PipelineLib integration edit

Giuseppe suggested that we start working towards integration with the Wikimedia deployment pipeline tooling sooner rather than later after last week's announcement of a working development environment based on Docker and Docker Compose.

Bryan spent time several days in this past week becoming more familiar with Blubber and PipelineLib. These tools are intended to make it easier to produce high quality Docker containers and test the code they contain. Using standard Foundation tooling will reduce friction when we get to the point of deploying the application to production.

The use of Poetry by Toolhub for Python packaging and dependency management appears to be a novel practice for Foundation projects. Bryan had some discussions with Dan Duvall on irc about various ways that Poetry could be used with Blubber. This in turn led to some exploratory coding to see how the various approaches would work in practice.

The first experiment was using Blubber's "builder" configuration to install Poetry using their isolated installation bootstrapping script. This worked, but it was later found that Blubber's order of operations does not support copying an artifact from an earlier stage of a multi-stage build in time to use that artifact as an input to a later build stage--reported as T263597.

The second experiment was creating a custom base image with Poetry pre-installed. This was functionally lifting the builder configuration experiment's bootstrapping step up into the base image. The problem found with this approach was managing directory permissions from the Blubber integration. The user account that needs to own the $POETRY_VIRTUALENVS_PATH directory is added by Blubber. Blubber has the ability to generate chown instructions for a directory (by setting the lives.in setting), but this ability is only extended for a single directory per variant. Trying to get around this using multi-stage chaining runs back into the artifact copying ordering issues from T263597.

The third experiment was extending Blubber itself to understand how to interact with Poetry. Bryan wrote and tested locally a patch for Blubber extending the syntax of Blubber's 'python' configuration to allow specifying a version of Poetry to install and use. With this patch and the proper configuration, Blubber will install Poetry using pip and then use Poetry rather than pip to install additional project dependencies. This allowed placing the correct instructions in each stage of Blubber's opinionated Dockerfile generation to avoid the issues encountered in the first two experiments.

Assuming that Bryan can work with the Blubber project maintainer to refine and merge the patch, this seems to unlock moving on to integrating Blubber into the local development workflow and then starting work on PipelineLib integration (which uses Blubber as a prerequisite).

Wrap up edit

Next week includes the end of the Foundation's FY20/21 Q1 and the start of Q2. The key result for Toolhub in the Q1 plan is currently showing 93% complete. The remaining work in the KR is closing out two mostly complete (Bryan thinks...) research tasks:

  • T261017: Determine basic hosting parameters for Toolhub
    • Waiting on feedback from Giuseppe and Reedy
  • T261023: Explore content moderation issues
    • A review from Risker would be appreciated