Wikirecords

This request is totally unrelated to the old Wikirecords (old) proposal.


Wikirecords or Wikidata Records is a proposed Wikimedia project for primary or secondary records. A primary record is for example this (free account required); a secondary record is simply any entry in any external website.

This is a proposal for a new WMF sister project.
Wikirecords
Status of the proposal
Statusunder discussion
Details of the proposal
Project descriptionA new Wikimedia project for primary or secondary records
Is it a multilingual wiki?One wiki
Potential number of languagesThe wiki is multilingual, but individual records are mostly monolingual
Proposed taglineDoes your project have a tagline? (optional)
New features to requireNo mandatory requirement other than Wikibase, but many nice to have

See #Related projects/proposals for more information.

This will be a Wikibase installation. There are three entity types (alternatively, catalogs and entries can just be items):

  • Catalog: a reference to an external website or dataset. For example VIAF is a catalog. Another example of catalog is this. Not all catalogs are in public domain or free license. Most catalogs are expected to be completed, though this will not be achieved in the foreseeable future.
  • Entry: individual entry directly provided by a catalog. Entries may have label, description, alias and statements. Usually statements do not have sources, unless provided by the external website. If the catalog is not free, only uncopyrightable data should be added. It is Wikirecord's purpose to include all entries in a catalog (whether they are notable or not). Special kind of entries include redirected or deleted (withdrawn) entries; in this case the ID and target (for redirects) are still recorded and queryable.
  • Property: Similar to property in Wikidata. Wikidata properties may be used directly (via federation), as well as catalog-specific properties. I don't think Wikirecords require a property proposal process, as properties may be created as needed.

Note if other data sources provides data about external entries, separate entries should be created. For example the Samsung Freebase-Wikidata mapping should be considered a catalog whose entries refers to Freebase and Wikidata IDs; the data should not be imported to the entries of Freebase entities, as they are not provided by Freebase.

This project is intended to have much more entries than Wikidata. ~10 billion entries is expected, but we should first focus on smaller ones.

An important feature of Wikirecords is you can match individual record with Wikidata. After this, you may import data from the record to Wikidata, or add source to Wikidata items. It is also able to find data mismatch between entry and Wikidata item; if a statement in the entry is wrong, it can be marked as so. Note we do not remove or correct them, unless they are fixed upstream. Unlike Mix'n'Match, confident matches may be synchonised automatically between Wikidata and Wikirecords.

To prevent new users that do not know purpose of Wikirecords to edit records, a protection system is introduced (this is not a mandatory feature): a catalog may be one of static, synchronised or open. In the following description, only the texts in italics are technical limitations.

  • Static catalogs are based on database dump or static dataset. They are not intended to be edited after creation (other than fixing errors in imports). Static catalogs may be edited only by catalog curators (a new proposed user group).
  • Synchronised catalogs are based on external websites and may be updated according to that website. Usually extractors (an external tool or bot) updates them, either automatically or invoked manually. Synchronised catalogs may be edited only by catalog curators and extractor bots (both proposed user group).
  • Open catalogs are work in progress that can be edited by everyone.
  • In all cases, the "correctness" of statements and match between entry and Wikidata item are not proper part of entry (though stored in the entry page) and can be edited by everyone.

Proposed byEdit

GZWDer (talk)

Alternative namesEdit

  • Wikirecords
  • Wikidata Records

Related projects/proposalsEdit

There are some related projects:

WikidataEdit

Wikidata may be benefit from this project and vice versa. This site is fundamentally different from Wikidata for its purpose: Wikidata items are used to represent all the things in human knowledge, but entries in Wikirecords are about individual records. For example we have only one item for George Washington, but we will have 121 entries in Wikirecords for each of its identifiers.

Mix'n'MatchEdit

This project is intended to be used as a successor of Mix'n'Match. The feature is similar and Wikirecords is more powerful. Some notes:

  • Mix'n'Match allows creating catalogs from scraping external websites. See also "extractor" above for an analog. Allowing alternative implements will make it easy to scrape complex websites.
  • "auto-matcher" may be implemented directly on database level

Mix'n'Match V2Edit

A reimplementation of Mix'n'Match based on Wikibase, currently offline. Some advantages:

  • Entries are editable either manually or via QuickStatements
  • Allow external entries be used by 3rd party websites
  • Allow SPARQL queries

Disadvantages (which may also affect Wikirecords, as the idea is basically based on that)

  • MediaWiki currently lacks a way to do mass (up to millions) edits server side, which is a feature nice to have
  • Query service not scalable for mass editing (a new "batch" updater is needed, not the one which query recent changes, or the developing event-driven one)


Primary Source ToolEdit

A somewhat similar idea, may be integrated to this. The whole workflow can be implemented on Wikirecords. Wikidata will have a new feature to add claims from Wikirecords (This should be a built-in feature, not an external tool).

OpenRefineEdit

OpenRefine imports a dataset locally and reconciles with Wikidata items. Alternatively they may be imported to Wikirecords, which allows cooperative matching.



Domain namesEdit

  • www.wikirecords.org
  • records.wikidata.org
  • Alternatively, a part of www.wikidata.org, but the current Wikidata is already overwhelmed and introducing such a big feature will make things worse. Also, it will be confusing if Wikirecords-only property is introduced.

Mailing list linksEdit

DemosEdit

People interestedEdit

DiscussionEdit

  • "This project is intended to have much more entries than Wikidata. ~10 billion entries is expected, but we should first focus on smaller ones. " How many entries is Wikidata intended to have? —Justin (koavf)TCM 07:28, 19 June 2020 (UTC)
    • This means: Wikidata may cover things covered by sources, but this project will be for individual entries in individual sources. Therefore the number of entries in Wikirecords is expected to be much larger than that in Wikidata. (In the future, there may be a transcription project, recording every facts within every sources whether they are in a database or not, but this is not what we should do initially.)--GZWDer (talk) 07:43, 19 June 2020 (UTC)
  • "alternatively, catalogs and entries can just be items" - if you don't want to have considerable developer time spent on customizing wikibase, I am certain this is the right choice. Pick a property to relate entries and catalogs (like P31 for instances and classes in Wikidata) and all should be well. ArthurPSmith (talk) 13:19, 22 June 2020 (UTC)
    • Catalogs may even be items in Wikidata (used in Wikirecords via federation), and all items in Wikirecords will be entries.--GZWDer (talk) 13:52, 22 June 2020 (UTC)