Connected Open Heritage/Wikidata migration/Documentation
DATA EXPLORATION Edit
Set up milestone on Phabricator under
Connected-Open-Heritage-Wikidata-migration, using the name of the database table, such as
Set up page under
Fill it out with sample data.
Note: As of now, these are all created and filled out thanks to this script. It only needs to be rerun if a new table is added to the WLM db. Look at the unique
identifier of each item. Does it correspond to an identifier in an external source?
If yes, find or request an appropriate property.
If no (i.e. the ID is just for internal WLM use), this might mean the dataset is not suitable for import. Without a real-world reference, we can't tell much about the completeness or selection criteria of the data. Identify
heritage status. Do all the items represent the same type of heritage protection (eg. national monument in <country>)?
If not, how can the heritage status of each item be inferred?
Create or edit any necessary items, for example cultural monument of the Czech Republic (Q385405). It should at least have assigned country and subclass of cultural property / national heritage site. Identify
P31 for all the items -- something basic like
building or ancient monument. Sometimes there's a separate column for this, like type, that can be used to substitute the default one if possible. Create necessary lookup tables.
Some fields have a limited range of distinct values, for example
se-fornmin_(sv)/types. In SQL, you can check it using
select distinct(columnname) from tablename;
The script for this is
here. Focus on mapping the most common ones first Identify and download any necessary offline data.
This is to avoid doing live queries while running the program, which takes a lot of time.
Usually stuff like placenames, administrative units.
Data that does not change often. Identify areas that can benefit from community input.
Problematic due to language.
Problematic due to lack of factual knowledge. Labels and descriptions
name column be used as-is for label?
Descriptions can be made using the default P31/heritage and country/administrative location
Descriptions in extra languages, apart from the language of the dataset?
Create a basic mapping file like
Contains data that apply to all the items.
If possible, use a unique property (for ID number) that will be used in addition to monument_article to see whether an item might already exist. Create statements for all relevant columns.
All statements have a source -- see phab:T155241. UPLOADING Edit
Create page with preview of processed data.
Request for permission
Link to preview
Describe how data is processed.
Describe how already existing items are detected. Test upload of ~10 items.
Upload of dataset.