Gulp
- This is a proposal based on T231891
Generic Unified List Processor (GULP)
editThis aims to define the capabilities of a to-be-created API to handle generic lists of pages/items.
Definitions
edit- list
- a set of revisions for a site
- revision
- a set of page entries
- entry
- Consists of a title, a namespace, and optional metadata
- site
- A wiki
Essentials
edit- Create a new list
- Add/remove page entries to/from the current revision of a list. This would not create a new revision automatically!
- Create a snapshot, ie, freeze the current revision of a list
- Retrieve a (current or snapshotted) revision of a list
Minimum viable product
edit- Delete a revision or list (?)
- Import from various sources
- All sources offered in PagePile
- Wiki pages
- Export to various places
- All consumers offered in PagePile
- Listeria (V2?)
- Wiki pages
- Combine lists (subset, union, diff, etc.)
- Filter lists
Nice-to-have
edit- list and/or revision with optional expiration date (automatically updated lists)
- possibly "temporary" lists with ~1h expiration date?
- maximum number of revisions (delete oldest one if capacity is reached)
- visual interface for pipelines (generate/filter/combine/output) based on lists
- Diff functionality between revisions
- For the interface (what has changed?)
- For storage (less space required)
- Meta-data per entry (store/update/remove/query)
Extended version
editWhat is it's not just "page lists", but any (general, of one of pre-defined types) tables?
- One table type would be "page title/page namespace", giving us the above lists.
- Others could be, say, Mix'n'match catalogs ("external ID/url/name/description/instance of")
Technical notes
editStorage options for revisions
edit- sqlite files on disk, a la PagePile (or use PagePile transparently in the background). Works for large lists, especially for combination/filtering of lists.
- Commons Data: namespace (aka ".tab files"). Size limited.
- (MySQL) database. Might eventually outgrow capacity
- Disk-backed object store
Any combination of the above could be used transparently, based on the list (large lists=>sqlite, small lists=>MySQL etc). Storage could even switch between revisions.
Identifiers
edit- One ID (number) for lists, another for revisions (like MediaWiki)
- One ID (number) for both. Using the list ID when asking for a revision would automatically use the latest revision. Simple but might confuse users.
- Combined ID (String), eg "123.456" or "123/456" (the latter could be useful in URLs; missing revision would automatically use the latest revision)
Data structures
edit- List
- ID
- Name
- Description?
- Site
- User who created it (special privileges?)
- Creation timestamp
- Last update timestamp
- Last revision
- Optional:Maximum number of revisions (more => oldest gets deleted)
- Optional:Maximum age of revisions (old ones get deleted)
- Optional:Maximum age of list (auto-destruct for temporary lists)
- Optional:Source
- Revision
- ID
- Date of creation
- entries
- Is current or snapshot?
- Previous Revision ID
- Entry
- Page name
- Page namespace
- Metadata (possibly JSON object for flexibility)