ToolFlow is a tool to retrieve the output of other tools, aggregate, filter, and modify these results, and create novel outputs automatically.

Screenshot of the ToolFlow tool.

Concepts edit

Workflow
A flowchart-like group of nodes that are connected to each other, with the output of one node being the input of another one. Once logged in, you can create new workflows, or fork existing ones. You can edit and run only your own workflows.
Node
A single step in a workflow. This can be an adapter to another tool (e.g. PetScan), or an operation on one (e.g. filter) or multiple (e.g. join) nodes. Every node creates a single output file. Output filles for adapter nodes are just the representation of the repective tool, in a standardized format.
Output mapping
Adapter nodes need to map the output of the respective tools to the standardized internal format. ToolFlow will suggest mapping where possible, but for some tools, manual mapping may be required.
Inputs
The edges in the workflow are shown as inputs to each node. Adapter nodes do not have incoming edges, and therefore no inputs from other nodes; their input comes from the respective tools.
File
Each data file produced by nodes is a JSONL file. The first line is a JSON object containing the file header, specifically the column definitions. All subsequent lines are JSON arrays, JSON values according to the respective column header. Files that are used by subsequent nodes are marked as temporary, and automatically cleaned up at a pre-determined time. Files that are not used again are marked as permanent, and not cleaned up unless the run is repeated manually.
Filter
A filter processes a datafile and keeps only data rows matching the given condition. The output is another data file with the same header, and a subset of the original file.
Generator
A generator is a routine that converts a datafile to an external output. At the moment, the only generator is a Wikipage edit function.
Schedule
A workflow can have one scheduler assigned. A scheduler consists of a run number and an interval (one day, week, or month). The scheduler will, upon the respective regular interval, clear all files associated with the run, and re-run the workflow. If the workflow has a generator (e.g. Wikipage output), an edit will be performed i the resulting wikitext has changed. The edit will be done under your use name. A schedule can be seen, added, or changed via the clock icon on the workflow page.

Node types edit

Tools edit

AListBuildingTool
Not quite sure what a-list-bulding-tool does, but it outputs wiki page/Wikidata item pairings.
PagePile
PagePile generates a page list for a single wiki
PetScan
Generates a PetScan page list (metadata to be implemented)
QuarryQueryLatest
The latest run for a specific Quarry query
Sparql
Results of a SPARQL query
WdFist
Results of WD-FIST (copy the Permalink link from the WD-FIST page after running it, and paste it into the node value)

Operations edit

Inner join on key
Takes two or more nodes and joins rows into one, given a column key name. Rows that do not have the value of the key column in all files will be removed. Similar to SQL INNER JOIN.
Join (merge by unique key)
This will concatenate the output of two or more nodes, if they all have the same header. To avoid duplicate rows, a column name is used as a key; only the first row for each key will be passed into the output. Similar to sort -u.
Filter
Filters output of a single node by a condition or regular expression on a key column. Can either keep or remove matching rows.

Generators edit

Wikipage
Takes a datafile and generates a wikitext table of it. It will write to the given wiki/page. The edit will be done under your user name. If no generator edit has taken place on that page, the wikitext will be appended, otherwise the existing ToolFlow wikitext will be replaced.

Code edit

The web UI and API are at https://github.com/magnusmanske/toolflow (HTML/JS/CSS and PHP, respectively) and the background service that actually does the processing is at https://github.com/magnusmanske/toolflow_rs/ . Feel free to submit issues and suggest new tools/improvements in the respective issue tracker.