Toolhub/Progress reports/2021-03-26

Report on activities in the Toolhub project for the week ending 2021-03-26.

Direct tool registration edit

Tracked in Phabricator:
Task T195682

The UI to edit a record has been merged! This feature is now live on our https://toolhub-demo.wmcloud.org demonstration server. Work is in progress on displaying edit history and diffs which will complete our planned feature work for the January-March quarter.

Crawler enhancements edit

The crawler has been updated to better handle some useful edge cases. The crawler now checks the list of toolinfo records found at a given URL in the current run against the list for the same URL in its most recent prior run. When this check finds tools that were imported from the URL which are no longer present in the current response it will mark them as deleted in the database. This mark is a "soft delete". The record and its edit history are retained in the database, but the record is removed from the search index and excluded from most API responses.

The crawler now also tracks the toolinfo records which are seen across URLs during the same run to detect and prevent "edit war" behavior if the same name is seen in multiple URLs. This has been seen in current data primarily for a single user who has published multiple toolinfo.json collections which have overlapping collections of tools. It may also appear in the future as collisions of the "name" field of records representing distinct tools.

We would still like to improve the reporting for crawl errors to make it easier for folks to debug collisions, soft deletes, and other issues that may arise during a crawler run.

Production logging edit

Bryan did some research and testing in an experimental patch towards adding support for Elastic Common Schema event logging. The testing has been with the official upstream library for ECS in python. Unfortunately this library is low level and will need more custom wrapper work to make it safe for use by Toolhub. Django makes data available for some log events which violate the assumption by the upstream that only JSON serializable data is ever present in a LogRecord object.

Bugs and annoyances edit

Merged fixes:

Wrap up edit

The January-March quarter of planned work will end mid-week next week. Our remaining named goal for the quarter is nearly complete. It is possible that work will be completed before the arbitrary 2021-03-31 end date, but it may also extend into the early days of the next quarter. Either way is fine, we are pushing ourselves to move forward but are also more concerned with doing things well than exact timelines.

Bryan and Srishti will be attempting to finalize plans for work in the April-June quarter in the coming week. In our high level planning for the July 2020 - June 2021 fiscal year we projected deploying a "1.0" version of Toolhub by the end of June 2021. One of the questions that will be examined next week is if the more important part of that goal is the release date or the feature set. We have three more planned sets of features on the roadmap: lists of tools, annotations (community maintained notes/details for tools), and moderation and patrolling. We must implement moderation and patrolling support before the 1.0 release. The other two features are also strongly desired, but could easily be reprioritized as 1.x follow up features to be deployed after the initial launch.