Toolhub/Data model

The goal of Toolhub is to make it easier for Wikimedians to find tools to use in their work. This data model describes what pieces of information the Toolhub collects and organizes to assist with that goal.

Summary edit

Below is a table of the various pieces of information that can be used to describe a tool in Toolhub. Note that this table omits some metadata fields that don't aid tool discovery, and in general glosses over the implementation details. The #Technical details section below describes how this works in greater detail.

Name Description Note
Basic information
Name
(Required)
Unique identifier The name needs to be unique across all tools. Tool developers are encouraged to use namespaces/prefixes to reduce the risk of clashes. For example, all Toolforge tool records published via toolsadmin.wikimedia.org use the prefix "toolforge-".
Title
(Required)
User friendly name
Description
(Required)
Tool description This should be around 3-5 sentences describing the tool, what it's used for, etc.
URL
(Required)
Link to where the tool is found
URL alternates Alternate links to the tool, e.g. for differing tool translations
Author Name of the tool developer
Official maintainer [1] User(s) with the ability to communicate official updates about the tool Subject to a verification process
Sponsor Organization that sponsored this tool's development Example: Wikimedia Sverige has sponsored the development of several tools.
Subtitle Brief pitch for tool. Longer than title, briefer than description Longer than a title but shorter than a description. Currently has a character limit of 250 characters.
Bot username If the tool is a bot, the username of the bot
Tool type Which type of tool? Web app, bot, gadget? Choice of: web app, desktop app, bot, gadget, user script, lua module, template, command line tool, coding framework, other
Icon Small icon that represents the tool A link to a Wikimedia Commons file description page for an icon that depicts the tool.
Wikidata item [1] The Wikidata item ID for the tool Example: Q4063270 for AutoWikiBrowser.
OpenHub ID The OpenHub ID for the tool Given a project URL "https://openhub.net/p/foo", the OpenHub ID is foo.
When and where to use the tool
For wikis... Which wikis should this be used on? Use hostnames Examples: * (all wikis), fr.wikipedia.org (French Wikipedia), *.wikisource.org (all Wikisources), www.wikidata.org (Wikidata)
Use cases [1] Different tasks that can be performed with this tool See #Tool use cases
Audiences [1] Applicable audiences of people Must be one of the standard Wikimedia Resource Center audiences; see #Audiences.
Related topics [1] Concepts that are related to this tool Use Wikidata items
Collections [1] This tool is a part of the following collections Tools can be organized into arbitrary groupings for ease of reference
Supported languages Languages the tool interface supports ISO 639 language strings like "zh" and "scn". Use "*" if a tool is effectively available in all languages. If not defined, it is assumed the tool is only available in English.
Feedback URL Where people should go to leave feedback
Broken [1] Yes/no flag for community members to indicate whether a tool is broken
Experimental Yes/no flag to mark that the tool is unstable and can change at any time
Deprecated Yes/no flag to mark that use of the tool is officially discouraged This allows tools to be indexed while also helping to filter them out of search results
Replaced by If a tool is deprecated, a link to the replacement tool
Tool use guidance
Documentation URL [2] Link to documentation Can include official and user-generated docs
Screenshots [1] Commons filename of screenshot Still-image preview of the tool
Video [1] Commons filename for video Video related to the tool, including an introduction or a tutorial
Privacy policy URL Link to applicable privacy policy
Additional information [1] Supplementary descriptive text
For developers
License Description of software license Use SPDX identifiers
Repository A link to the repository where the tool code is hosted. Often this will be a git repository, but it could also be any other version control system or even a wiki page in the case of a user script, gadget, template, or lua module.
Technologies used Technological concepts used in implementing the tool Includes programming languages, data standards, etc.
API URL Link to the tool's API if it exists
Developer documentation URL Developer documentation URL
Bug tracker URL Link to bug tracker on Phabricator, Trello, GitHub, etc.
Volunteering
Volunteer (user assistance) [1] These users offer to help people use the tool
Code maintainer [1] These users offer to help with maintaining code
Testers [1] People who have signed up to test new versions of tools
Translate URL Link to translation workflow
  1. a b c d e f g h i j k l m Annotation. Not included in toolinfo.json schema
  2. Documentation URL will also be available as an annotation to allow tracking of user written documentation for tools.

Taxonomy and controlled vocabulary edit

A taxonomy is a hierarchy of concepts. In the Toolhub taxonomy, the concepts represent attributes of tools. The Toolhub taxonomy seeks to enable filtering and browsing of tools by adding fields to the toolinfo.json schema and defining a controlled vocabularies for each of those attributes. A controlled vocabulary limits the values in a field to a predetermined (or controlled) set of options, to help ensure that tools are described consistently and similar tools can be discovered together.

The Toolhub taxonomy is limited to tool attributes that require human curation. For example: an attribute like "coding language" may be an important Tool attribute, but the list of coding languages that exist doesn't require human curation, so it isn't part of the proposed taxonomy. In contrast, an attribute like "use case" can have many different values which may overlap or conflict with one another. This is the type of attribute where a controlled vocabulary is useful.

Taxonomy v2 edit

During the Wikimedia Foundation's 2022-2023 fiscal year (July 2022-June 2023) the Toolhub team seeks to identify which additional tool attributes would be most useful to expand the data model and facilitate tool discovery. To identify a set of attributes and controlled vocabulary, User:TBurmeister_(WMF) completed a taxonomy research project and the team gathered community feedback that resulted in the following set of attributes.

Audiences edit

Who is the intended user of the tool?

Values:

  • Admins
  • Organizers and program coordinators
  • Editors and content contributors
  • Readers and content consumers
  • Researchers
  • Developers

Content types edit

With what type of content or data does the tool interact?

Values:

  • Articles
  • Audio
  • Books
  • Data
    • Bibliographic data
    • Categories or labels
    • Diffs and revision data
    • Event data
    • Geographic data
    • Linguistic data
    • Page metadata
    • Structured data
    • User data
  • Discussions
  • Drafts
  • Emails
  • Images
  • Links
  • Lists
  • Logs
  • Maps
  • References
  • Software or code
  • Templates
  • Videos
  • Watchlist
  • Webpages
  • Wikitext
Feedback received and changes implemented to this attribute

Feedback received:

  • "It would be good to have higher-kinded categories… I want to see only data tools that work with entity data only in Wikidata(not Discussion, Images, Files etc)"

[1]

  • "Drafts seems extremely English Wikipedia-specific. Images, audio/sound, video and books overlap with files - it's not clear whether selecting only "files" would include all tools for working on Commons files or if I would need to include the other four categories as well. What's the difference between audio and sound anyway?" [2]
  • "Content types: looks like a thorough list; I wonder if this could be made a bit hierarchical in the future so that there's only 2-5 top-level data categories?"[3]

Changes implemented:

  • Add additional level of hierarchy to group content types and enable both broad or specific values to be applied.
  • Remove "Files".
  • Split "Maps" and "Geographic Data"
  • Split "Books" and "Bibliographic Data"
  • Rename "Audio or sound files" to "Audio"

Tasks edit

What type of task does the tool help with? This is a more precise concept than "use case", which was proposed in the v1 taxonomy and is included in the data model as an annotation. This list of tasks was created as part of the taxonomy research and design process, which sought to map the large, uncontrolled list of tasks represented in various tool categorizations to the following more concise list of values:

Values:

  • Analysis
  • Annotating and linking
  • Archiving and cleanup
  • Categorizing and tagging
  • Citing and referencing
  • Communication and supporting users
  • Converting and formatting content
  • Creating content
  • Deleting and reverting
  • Disambiguation
  • Downloading or reusing content
  • Editing or updating content
  • Event and contest planning
  • Hosting and maintaining tools
  • Identifying policy violations
  • Identifying spam
  • Identifying vandalism
  • Listing and ranking
  • Merging content
  • Migrating content
  • Patrolling recent changes
  • Project management and reporting
  • Reading
  • Recommending content
  • Translating and localizing
  • Uploading or importing
  • User management
  • Warning users
Feedback received and changes implemented to this attribute

Initial questions:

  • Is "fixing" content more like Editing or more like Creating / Generating new content? Or do people generally consider it to be more like cleanup, closer to tasks like archiving unused pages or cleaning up sandboxes?
    • Similar questions about what is covered by "Patrolling" – too broad?
  • How do you feel about the number of values and what they capture? Is it too overwhelming? Should we try to make them even broader groupings? For reference: here is the even bigger list of terms that was used to generate this controlled vocabulary.

Feedback received:

  • "Tasks: love this list. "Patrolling" is probably too broad, as you say. "Communication and supporting users" seems broad as well; that could include tasks related to education, to building community, etc." [4]
  • "In the tasks there is a division which says Creating or uploading content IMHO these are two separate tasks supporting different projects creating content refers to article editing. While uploading is related to media files and may overlap with Converting and Formatting assuming its about files types and not page clean up."[5]
  • "I think adding and/or updating content is not well covered by the other tasks categories and could be a useful addition...I think it will be better to remove the "adding" part, because it can be considered a particular case within "updating" [...] I believe that "editing" mainly involves content introduced by the user with total or great freedom, while "updating" involves a fully or almost fully automatic change proposal with which the user only has to interact minimally."[6]

Changes to be implemented:

  • Revise the Tasks attribute values:
    • Remove "Creating or uploading content"
    • Add "Creating new content"
    • Rename "Generating and recommending content" to "Recommending content"
    • Add "Uploading or importing"
    • Rename "Editing" to "Editing or updating"
    • Remove "Patrolling"
    • Add:
      • Identifying policy violations
      • Identifying spam
      • Identifying vandalism
      • Patrolling recent changes
      • Warning users

Subject Domains edit

Is the tool targeted at helping in a specific type of wiki project or topic area?

Values:

  • Biography
  • Cultural heritage
  • Education
  • Geography and mapping
  • GLAM
  • History
  • Language and internationalization
  • Outreach
  • Science
Feedback received and changes implemented to this attribute

Attributes proposed but excluded edit

Expand to see attributes that were excluded after feedback and discussion

Platforms edit

Where does the tool run?

Proposed values:

  • Command-line
  • Desktop
  • MediaWiki
  • Mobile / smartphone
  • Web or browser

Initial questions:

  • Multiple of these values are already represented in the uncontrolled "tool type" field. Do we think it's worth having a controlled attribute for this concept?

Feedback received:

  • "The list of platforms is really confusing. An on-wiki gadget, for example, could come under desktop, mobile, MediaWiki and web/browser. I would expect to be able to distinguish mobile apps from web tools which work on mobile, web tools which work on mobile from web tools which only work on desktop, web tools which work on desktop from browser extensions, and tools on external websites from on-wiki tools. What about command-line tools that can be used in PAWS? Do they count as web/browser tools too?"[7]
  • "I'm a bit confused around the Platforms -- what would be the difference between desktop, mobile/smartphone, and web/browser? In my limited understanding, if a tool has a web interface, it nominally works for all three. Maybe there are tools that were designed specifically for mobile phones but I assume for most, web/browser covers it."[8]

Decision:

  • Exclude the proposed Platform attribute for now. Monitor tags and community-created lists to determine if this attribute would be useful or feasible in the future.

Programming languages edit

What programming languages does the tool use?

Proposed initial set of values and their Wikidata QIDs (see phab:T308030#8045397 for background):

  • Javascript (Q2005)
  • JSON (Q2063)
  • Lua (Q207316)
  • MySQL (Q107385678)
  • Node.js (Q756100)
  • PHP (Q59)
  • Python (Q28865)
  • SPARQL (Q54871)
  • SQL (Q47607)

Initial questions:

  • Would you use this attribute to look for projects to contribute to based on your skills or learning goals?
  • Would you want this attribute to be broadened to include frameworks like Flask, Django, etc?

Feedback received:

  • "Programming languages: my personal thought here is that what's most useful about these attributes as a tool developer is seeing what other people are doing. Basically, I don't want to find myself "accidentally" doing something no one has ever done before, and so it's most useful for looking at solutions to "solved problems" and ensuring I'm adopting a tech stack that others are using."[9]
  • Several comments on JSON and other things that are not programming languages being included.[10]
  • "Frameworks might be better suited to separate uncontrolled attribute instead of being included with programming languages since tool authors could use any number of frameworks, which would vary based on programming language."[11]

Decision:

  • Exclude the proposed "Programming languages" attribute for now, and rely on annotations and the existing "technology_used" field in the data model (though that field is uncontrolled). Monitor tags and community-created lists to determine if this attribute would be useful or feasible in the future.

Taxonomy v1 edit

These categories will be superseded by or integrated into the v2 Toolhub taxonomy described above.

Categories from Research Phase 1 Data Model

Audiences v1 edit

This refers to the audience categories in the Wikimedia Resource Center, which currently are:

  • For program coordinators
  • For contributors
  • For developers
  • For affiliate organizers

Tool use cases edit

Use cases for tools are represented by a controlled vocabulary meant to represent different purposes a tool may serve. Tools can have multiple use cases.

To put it briefly, tools can be used for developing or consuming content, facilitating interactions among community users, writing code, and organizing projects. With respect to content-related tools, the type of content is treated separately from the thing done with the content; appropriate Wikimedia projects to use a given tool on are represented through a separate tool attribute.

  • Content format
    • Content pages (encyclopedia articles, original texts)
    • Media (images, videos, sound recordings)
    • Data (Wikidata items, structured file data)
    • Code
    • Templates
    • Documentation
  • Contributors
    • Prepare
      • Research
      • Collection curation (curating datasets, curating image sets)
    • Create
      • Page creation
      • Uploading
      • Drafting
    • Change
      • Annotating
      • Expanding
      • Copyediting
      • Formatting
      • Illustrating
      • Renaming
      • Merging
      • Splitting
      • Categorizing
      • Format conversion (e.g. OCR, video conversion)
    • Quality assurance
      • Copyright management
      • New page patrolling
      • Recent changes patrolling
      • Maintenance tagging
      • Assessment
    • Destroy
      • Reverting
      • Deleting
      • Suppressing
  • Interacting with users
    • Socializing users
      • Welcoming
      • Training and mentoring
      • Counseling and social support
    • Conduct
      • Reverting
      • Warning
      • Blocking
      • Dispute resolution
    • Other
      • Assistance (solving specific problems)
      • Talk page discussion
      • User rights (admin, rollback, etc.)
      • User activity analysis
  • Developers
    • APIs
    • Coding environments
    • Data services
    • Productivity tools
    • Tool development kits
    • Wikimedia operational tools
  • Organizers
    • Online project planning (WikiProjects, etc.)
    • Event planning
    • Contest organizing
    • Governance
    • Learning and evaluation
    • Worklist development
    • Project communication
    • Partnership development
  • Consumers
    • Reading
    • Data and metrics
    • Visualization and remixing
    • Large-scale content analysis

Technical details edit

This section mostly serves to document technical implementation details. You don't need to know most of this stuff for day-to-day use.

Toolhub's data model is split into two parts: the tool record and its annotations. Any tool described in Toolhub has a fundamental tool record containing basic information such as the tool name and author. Tool developers have their choice of submitting tool records directly through Toolhub's interface, or by compiling the information in a toolinfo.json file that can be accessed anywhere on the web, including the tool itself or a Git repository, that is then crawled by Toolhub. Note that if tool record data is stored externally in toolinfo.json files, it can only be edited there. We encourage tool developers to host their toolinfo.json files as part of a Git repository so that volunteers can submit pull requests. (Note that the opposite is true as well: if the tool record was originally submitted through Toolhub, it can only be edited through Toolhub. By allowing either one or the other we avoid the risk of conflicts.)

Once Toolhub receives a tool record, volunteers can submit annotations to it. Annotations help make it easier to find tools by supplying additional information. Annotations cannot be stored in toolinfo.json files; they are meant to always be editable by community members.

Some parts of the data model rely on controlled vocabularies, where a field can only be defined using one of several pre-defined terms. Those are described above in #Controlled vocabularies.

Toolinfo schema edit

Version 1.2.2 edit

Version 1.2.2 for the schema, published on 16 March 2022. This schema introduces a new "person" data type and allows it to be used to declare multiple authors. The url_multilingual object definition no longer allows additional undeclared properties to pass validation.

Version 1.2.1 of the schema, published on 06 January 2022. This schema introduces two new tool types: "lua module" and "template".

Version 1.2.0 of the schema, published on 15 October 2021. This schema includes some new fields, but maintains backwards compatibility with the previous 1.1.1 and 1.0.0 schemas.

Changes from 1.1.1:

  • Update syntax for json-schema draft 7
  • Fix validation rules for "license" property. Prior schema referenced a non-existent spdx schema.
  • Add "user_docs_url" property.
  • Various description string copy edits.
  • MaxLength constraints added for all string types
  • Extracted #/definitions/url
  • Extracted #/definitions/url_multilingual_or_array
  • toolinfo_version replaced by $schema
  • toolinfo_language replaced by $language
{
  "title": "toolinfo",
  "description": "A tool is a piece of software that helps facilitate contribution toward, or consumption of, Wikimedia projects and associated data, not including the core MediaWiki software and its extensions.",
  "$id": "/toolinfo/1.2.2",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "definitions": {
    "tool": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "maxLength": 255,
          "description": "Unique identifier for this tool. Must be unique for every tool. It is recommended you prefix your tool names to reduce the risk of clashes.",
          "examples": [
            "toolforge-admin",
            "user-bdavis_wmf-GlobalWatchlistReset.js"
          ]
        },
        "title": {
          "type": "string",
          "maxLength": 255,
          "description": "Human readable tool name. Recommended limit of 25 characters."
        },
        "description": {
          "type": "string",
          "maxLength": 65535,
          "description": "A longer description of the tool. The recommended length for a description is 3-5 sentences. Future versions of this schema will impose a character limit."
        },
        "url": {
          "$ref": "#/definitions/url",
          "description": "A direct link to the tool or to instructions on how to use or install the tool."
        },
        "keywords": {
          "type": "string",
          "maxLength": 2047,
          "description": "[DEPRECATED] Comma-delineated list of keywords. This parameter is deprecated and will be removed in the next major version.",
          "$comment": "Remove in version 2."
        },
        "author": {
          "oneOf": [
            {
              "type": "string",
              "maxLength": 255
            },
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/person"
              }
            }
          ],
          "description": "The primary tool developers."
        },
        "repository": {
          "$ref": "#/definitions/url",
          "description": "A link to the repository where the tool code is hosted."
        },
        "subtitle": {
          "type": "string",
          "maxLength": 255,
          "description": "Longer than the full title but shorter than the description. It should add some additional context to the title."
        },
        "openhub_id": {
          "type": "string",
          "maxLength": 255,
          "description": "The project ID on OpenHub. Given a URL of https://openhub.net/p/foo, the project ID is `foo`."
        },
        "url_alternates": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/url_multilingual"
          },
          "description": "Alternate links to the tool or install documentation in different natural languages."
        },
        "bot_username": {
          "type": "string",
          "maxLength": 255,
          "description": "If the tool is a bot, the Wikimedia username of the bot. Do not include 'User:' or similar prefixes."
        },
        "deprecated": {
          "type": "boolean",
          "default": false,
          "description": "If true, the use of this tool is officially discouraged. The `replaced_by` parameter can be used to define a replacement."
        },
        "replaced_by": {
          "$ref": "#/definitions/url",
          "description": "If this tool is deprecated, this parameter should be used to link to the replacement tool."
        },
        "experimental": {
          "type": "boolean",
          "default": false,
          "description": "If true, this tool is unstable and can change or go offline at any time."
        },
        "for_wikis": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/wiki"
              }
            },
            {
              "$ref": "#/definitions/wiki"
            }
          ],
          "default": "*",
          "description": "A string or array of strings describing the wiki(s) this tool can be used on. Use hostnames such as `zh.wiktionary.org`. Use asterisks as wildcards. For example, `*.wikisource.org` means 'this tool works on all Wikisource wikis.' `*` means 'this works on all wikis, including Wikimedia wikis.'"
        },
        "icon": {
          "$ref": "#/definitions/commons_file",
          "description": "A link to a Wikimedia Commons file description page for an icon that depicts the tool."
        },
        "license": {
          "type": "string",
          "maxLength": 255,
          "description": "The software license the tool code is available under. Use a standard SPDX license identifier like 'GPL-3.0-or-later'.",
          "examples": [
            "GPL-2.0-or-later",
            "GPL-3.0-or-later"
          ]
        },
        "sponsor": {
          "$ref": "#/definitions/string_or_string_array",
          "description": "Organization that sponsored the tool's development."
        },
        "available_ui_languages": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/language"
              }
            },
            {
              "$ref": "#/definitions/language"
            },
            {
              "type": "string",
              "maxLength": 1,
              "enum": [
                "*"
              ]
            }
          ],
          "default": "en",
          "description": "The language(s) the tool's interface has been translated into. Use ISO 639 language codes like `zh` and `scn`. If not defined it is assumed the tool is only available in English."
        },
        "technology_used": {
          "$ref": "#/definitions/string_or_string_array",
          "description": "A string or array of strings listing technologies (programming languages, development frameworks, etc.) used in creating the tool."
        },
        "tool_type": {
          "type": "string",
          "maxLength": 32,
          "enum": [
            "web app",
            "desktop app",
            "bot",
            "gadget",
            "user script",
            "command line tool",
            "coding framework",
            "lua module",
            "template",
            "other"
          ],
          "description": "The manner in which the tool is used. Select one from the list of options."
        },
        "api_url": {
          "$ref": "#/definitions/url",
          "description": "A link to the tool's API, if available."
        },
        "developer_docs_url": {
          "$ref": "#/definitions/url_multilingual_or_array",
          "description": "A link to the tool's developer documentation, if available."
        },
        "user_docs_url": {
          "$ref": "#/definitions/url_multilingual_or_array",
          "description": "A link to the tool's user documentation, if available."
        },
        "feedback_url": {
          "$ref": "#/definitions/url_multilingual_or_array",
          "description": "A link to location where the tool's user can leave feedback."
        },
        "privacy_policy_url": {
          "$ref": "#/definitions/url_multilingual_or_array",
          "description": "A link to the tool's privacy policy, if available."
        },
        "translate_url": {
          "$ref": "#/definitions/url",
          "description": "A link to the tool's translation interface."
        },
        "bugtracker_url": {
          "$ref": "#/definitions/url",
          "description": "A link to the tool's bug tracker on GitHub, Bitbucket, Phabricator, etc."
        },
        "_schema": {
          "type": "string",
          "format": "uri-reference",
          "maxLength": 32,
          "description": "A URI identifying the jsonschema for this toolinfo.json record. This should be a short uri containing only the name and revision at the end of the URI path.",
          "examples": [
            "/toolinfo/1.2.1"
          ]
        },
        "_language": {
          "$ref": "#/definitions/language",
          "default": "en",
          "description": "The language in which this toolinfo record is written. If not set, the default value is English. Use ISO 639 language codes."
        }
      },
      "required": [
        "name",
        "title",
        "description",
        "url"
      ]
    },
    "url": {
      "type": "string",
      "maxLength": 2047,
      "format": "uri"
    },
    "wiki": {
      "type": "string",
      "maxLength": 255,
      "pattern": "^(\\*|(.*)?\\.?(mediawiki|wiktionary|wiki(pedia|quote|books|source|news|versity|data|voyage|media))\\.org)$"
    },
    "commons_file": {
      "$ref": "#/definitions/url",
      "pattern": "^https://commons.wikimedia.org/wiki/File:.+\\..+$",
      "maxLength": 2047
    },
    "language": {
      "type": "string",
      "maxLength": 16,
      "pattern": "^(x-.*|[A-Za-z]{2,3}(-.*)?)$"
    },
    "url_multilingual": {
      "type": "object",
      "properties": {
        "language": {
          "$ref": "#/definitions/language"
        },
        "url": {
          "$ref": "#/definitions/url"
        }
      },
      "additionalProperties": false
    },
    "url_multilingual_or_array": {
      "oneOf": [
        {
          "type": "array",
          "items": {
            "$ref": "#/definitions/url_multilingual"
          }
        },
        {
          "$ref": "#/definitions/url"
        }
      ]
    },
    "string_or_string_array": {
      "oneOf": [
        {
          "type": "string",
          "maxLength": 255
        },
        {
          "type": "array",
          "items": {
            "type": "string",
            "maxLength": 255
          }
        }
      ]
    },
    "person": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "maxLength": 255,
          "description": "The full/formatted name of the person."
        },
        "wiki_username": {
          "type": "string",
          "maxLength": 255,
          "description": "The person's Wikimedia username."
        },
        "developer_username": {
          "type": "string",
          "maxLength": 255,
          "description": "The person's Wikimedia Developer account username."
        },
        "email": {
          "type": "string",
          "maxLength": 255,
          "format": "email",
          "description": "Email address"
        },
        "url": {
          "$ref": "#/definitions/url",
          "description": "Home page or other URL representing the person."
        }
      },
      "required": [
        "name"
      ],
      "additionalProperties": false
    }
  },
  "oneOf": [
    {
      "type": "array",
      "items": {
        "$ref": "#/definitions/tool"
      }
    },
    {
      "$ref": "#/definitions/tool"
    }
  ]
}

Version 1.1.1 edit

Version 1.1.0, published on 30 June 2018, updated the schema with new fields while maintaining full backwards compatibility with the previous schema.

Version 1.1.1, published on 13 October 2018, corrects a typographical error from 1.1.0.

The JSON Schema is below.

toolinfo.json schema v1.1.1
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "https://tools.wmflabs.org/toolhub/schema/1.1.1",
  "title": "Wikimedia Tool",
  "description": "A tool is a piece of software that helps facilitate contribution toward, or consumption of, Wikimedia projects and associated data, not including the core wiki software and its extensions",
  "version": "1.1.1",
  "authors": [
    "Hay Kranen",
    "James Hare"
  ],
  "definitions": {
    "tool": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "description": "Unique identifier for tools. Must be unique for every tool. It is recommended you prefix your tool names to reduce the risk of clashes."
        },
        "title": {
          "type": "string",
          "description": "Human readable tool name. Recommended limit of 25 characters."
        },
        "subtitle": {
          "type": "string",
          "maxLength": 250,
          "description": "Longer than the full title but shorter than the description. It should add some additional context to the title."
        },
        "openhub_id": {
          "type": "string",
          "description": "The project ID on OpenHub. Given a URL https://openhub.net/p/foo, the project ID is `foo`."
        },
        "description": {
          "type": "string",
          "description": "A longer description of the tool. The recommended length for a description is 3-5 sentences. Future versions of this schema will impose a character limit."
        },
        "url": {
          "type": "string",
          "format": "uri",
          "description": "A direct link to the tool or to instructions on how to use or install the tool."
        },
        "url_alternates": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/url_multilingual"
          }
        },
        "keywords": {
          "type": "string",
          "description": "Comma-delineated list of keywords. This parameter is deprecated and will be removed in the next version."
        },
        "author": {
          "type": "string",
          "description": "The primary tool developer."
        },
        "repository": {
          "type": "string",
          "format": "uri",
          "description": "A link to the repository where the tool code is hosted."
        },
        "bot_username": {
          "type": "string",
          "description": "If the tool is a bot, the Wikimedia username of the bot. Do not include 'User:' or similar prefixes."
        },
        "deprecated": {
          "type": "boolean",
          "default": false,
          "description": "If true, the use of this tool is officially discouraged. The `replaced_by` parameter can be used to define a replacement."
        },
        "replaced_by": {
          "type": "string",
          "format": "uri",
          "description": "If this tool is deprecated, this parameter should be used to link to the replacement tool."
        },
        "experimental": {
          "type": "boolean",
          "default": false,
          "description": "If true, this tool is unstable and can change or go offline at any time."
        },
        "for_wikis": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/wiki"
              }
            },
            {
              "$ref": "#/definitions/wiki"
            }
          ],
          "default": "*",
          "description": "A string or array of strings describing the wiki(s) this tool can be used on. Use hostnames such as `zh.wiktionary.org`. Use asterisks as wildcards. For example, `*.wikisource.org` means 'this tool works on all Wikisource wikis.' `*` means 'this works on all wikis, including Wikimedia wikis.'"
        },
        "icon": {
          "$ref": "#/definitions/commons_file",
          "description": "A link to a Wikimedia Commons file description page for an icon that depicts the tool."
        },
        "license": {
          "$ref": "https://tools.wmflabs.org/spdx/schema/licenses.json#/definitions/license",
          "description": "The software license the tool code's is available under. Use a standard SPDX license keyword."
        },
        "sponsor": {
          "$ref": "#/definitions/string_or_string_array",
          "description": "Organization that sponsored the tool's development."
        },
        "available_ui_languages": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/language"
              }
            },
            {
              "$ref": "#/definitions/language"
            },
            {
              "type": "string",
              "enum": [
                "*"
              ]
            }
          ],
          "default": "en",
          "description": "The language(s) the tool's interface has been translated into. Specify this field manually only if the tool does not handle interface translation through translatewiki.net. Use ISO 639 language codes like `zh` and `scn`. If not defined it is assumed the tool is only available in English."
        },
        "technology_used": {
          "$ref": "#/definitions/string_or_string_array",
          "description": "A string or array of strings listing technologies (programming languages, development frameworks, etc.) used in creating the tool."
        },
        "tool_type": {
          "type": "string",
          "enum": [
            "web app",
            "desktop app",
            "bot",
            "gadget",
            "user script",
            "command line tool",
            "coding framework",
            "other"
          ],
          "description": "The manner in which the tool is used. Select one from the list of options."
        },
        "api_url": {
          "type": "string",
          "format": "uri",
          "description": "A link to the tool's API, if available."
        },
        "developer_docs_url": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/url_multilingual"
              }
            },
            {
              "type": "string",
              "format": "uri"
            }
          ],
          "description": "A link to the tool's developer documentation, if available."
        },
        "feedback_url": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/url_multilingual"
              }
            },
            {
              "type": "string",
              "format": "uri"
            }
          ],
          "description": "A link to where tool users can leave feedback."
        },
        "privacy_policy_url": {
          "oneOf": [
            {
              "type": "array",
              "items": {
                "$ref": "#/definitions/url_multilingual"
              }
            },
            {
              "type": "string",
              "format": "uri"
            }
          ],
          "description": "A link to the tool's privacy policy, if available."
        },
        "translate_url": {
          "type": "string",
          "format": "uri",
          "description": "A link to the tool translation interface."
        },
        "bugtracker_url": {
          "type": "string",
          "format": "uri",
          "description": "A link to the tool's bug tracker on GitHub, Bitbucket, Phabricator, etc."
        },
        "toolinfo_version": {
          "type": "integer",
          "default": 1,
          "description": "The major version number of the Toolinfo schema used. The default value assumed is 1, referring to versions 1.0.0 and 1.1.0."
        },
        "toolinfo_language": {
          "$ref": "#/definitions/language",
          "default": "en",
          "description": "The language the toolinfo record is written if, if not the default value of English. Use ISO 639 language codes."
        }
      },
      "required": [
        "name",
        "title",
        "description",
        "url"
      ]
    },
    "wiki": {
      "type": "string",
      "pattern": "^(%5C*|(.*)?%5C.?(mediawiki|wiktionary|wiki(pedia|quote|books|source|news|versity|data|voyage|tech|media|mediafoundation))%5C.org)$"
    },
    "commons_file": {
      "type": "string",
      "format": "uri",
      "pattern": "^https://commons.wikimedia.org/wiki/File:.+%5C..+$"
    },
    "language": {
      "type": "string",
      "pattern": "^(x-.*|[A-Za-z]{2,3}(-.*)?)$"
    },
    "url_multilingual": {
      "type": "object",
      "properties": {
        "language": {
          "$ref": "#/definitions/language"
        },
        "url": {
          "type": "string",
          "pattern": "uri"
        }
      }
    },
    "string_or_string_array": {
      "oneOf": [
        {
          "type": "string"
        },
        {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      ]
    }
  },
  "oneOf": [
    {
      "type": "array",
      "items": {
        "$ref": "#/definitions/tool"
      }
    },
    {
      "$ref": "#/definitions/tool"
    }
  ]
}

Version 1.0.0 edit

Hay's Tool Directory established a de facto standard for describing Wikimedia tools using JSON files. This standard has been retroactively established as version 1.0.0 of the toolinfo JSON schema.

toolinfo.json schema v1.0.0
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Wikimedia Tool",
  "description": "A tool is a piece of software that helps facilitate contribution toward, or consumption of, a Wikimedia project, not including the core wiki software and its extensions",
  "version": "1.0.0",
  "authors": [
    "Hay Kranen",
    "James Hare"
  ],
  "definitions": {
    "tool": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "title": {
          "type": "string"
        },
        "description": {
          "type": "string"
        },
        "url": {
          "type": "string"
        },
        "keywords": {
          "type": "string"
        },
        "author": {
          "type": "string"
        },
        "repository": {
          "type": "string"
        }
      },
      "required": [
        "name",
        "title",
        "description",
        "url"
      ]
    }
  },
  "oneOf": [
    {
      "type": "array",
      "items": {
        "$ref": "#/definitions/tool"
      }
    },
    {
      "type": "object",
      "$ref": "#/definitions/tool"
    }
  ]
}

Annotations edit

These are additional pieces of data that can be used to describe tools. Annotations cannot be submitted through toolinfo files; they are meant to be submitted through Toolhub only.

Current planned annotations include:

  • Additional info – expands on the tool description.
  • Audiences – Wikimedia Resource Center audiences.
  • Broken – yes/no flag to indicate that a tool is no longer working, with the username of the person making that report and an associated report.
  • Collections – community-curated groupings of tools.
  • Documentation URL – link to user documentation, including both official documentation and user-generated documentation.
  • Official maintainer – the people who are currently responsible for maintaining the tool's code.
  • Related topics – links between tools and Wikidata items as another way of describing tools.
  • Screenshots – visual aids showing the tool in use.
  • Testers – people who have signed up to test new versions of the tool.
  • Use cases – controlled vocabulary outlining different uses for tools.
  • Video – tutorials and other such audio-visual guides.
  • Volunteer (user assistance) – people who have volunteered to help other users with using the tool.
  • Wikidata item ID – the Wikidata item ID for the tool.

Automated data inputs edit

Automatically generated data will help factor into tool relevance. Note that not all of these will be available right away, nor will they be available for every tool.

  • Tool availability – is the tool up? When was the last time it was up? How often is the tool down?
  • Translators – credits for translation, based on translatewiki.net statistics.
  • Total gadget users – based on data from the wikis
  • Active gadget users – based on data from the wikis
  • Web hits – for Toolforge tools, based on data from Toolforge
  • Unique devices – nice to have, but would probably be harder to accomplish in practice
  • Last updated – based on changes to git repository, probably?
  • Wikis where used – for gadgets
  • Toolforge maintainers