User:EpochFail/Journal/Annotations system

  • This document summarizes requirements for a object/event annotation system for MediaWiki to support analytics (and logging).

Use cases edit

  • Track the agents/tools that make revisions
    • Bots
    • Tools
      • In wiki: twinkle, popups, etc.
      • Extra-wiki: huggle, awb
  • Tracking operations made by editors using an experimental interface in MediaWiki
    • AFTv5
      • Experiment - Link - Form

Linking annotations with wiki-objects edit

Annotations should be general enough that they can store arbitrary annotation data about in-wiki objects and abstract events (log in) that have no corresponding row in a table.

Lessons from NoSQL (e.g. MongoDB) edit

MongoDB has an interesting feature called a Database Reference (see the manual entry). A database reference allows for a foreign key to any collection's document (table row) to be stored in a field and looked up automatically by the db system.

Strategies edit

Key-value annotations edit

See example table creation for revision annotations:

CREATE TABLE revision_annotation (
    rev_id UNSIGNED INT,
    type VARBINARY(255),
    value ???,
    KEY(rev_id, type),
    KEY(type)
)

The type of value is left as ??? because there aren't any datatypes available that would efficiently store any type of potential value efficiently. Candidates:

  • VARBINARY(255): Limited to 255 bytes. Relatively efficient since size is variable. Inefficient for numbers. Size limitation could encourage conservationism, but could lead to common bugs related to data truncation.
  • MEDIUMBLOB/LARGEBLOB: Virtually unlimited for reasonable amounts of data. Relatively efficient since size is variable. Inefficient for numbers.

Preferably, annotation values will have a standardized data structure (e.g. JSON) to allow for relatively complex/related data to be stored in the annotation itself.