ESEAP Conference 2024/Submissions/Korean wikidata & Mix'n'Match


Korean wikidata & Mix'n'Match

edit

Abstract/description

edit

This is a workshop to enhance Korean wikidata via the Mix'n'Match for the Encyclopedia of Korean Culture (Q624626) and to get started on improving the number of matches. The aim will be to add the property {{P|P9475}} to Qitems and to augment the wikidata for items missing vital properties

The wikidata for many kowiki pages is limited to a Qitem and a link to the page. Many kowiki pages lack either "instance of" or "subclass of". Many fail to indicate the country the item is associated with.

The knowledge contained in such wikipages cannot be accessed by the non-Korean speaking world.

The language for communication will be English. Thus, most of the session will deal with techniques for those whose language is written using Roman script to tackle the language silos where most of the rest of the world cannot contribute,  (Everyday perhaps 1/4 of the Korean wikidata items I encounter cannot be accessed by anyone other than a Korean speaker: the Qitems have no "{{P|P31}}" and no country; they consist solely of an item number and a Korean label linked to a kowiki page.  The knowledge on the kowiki page is locked away from the rest of the world.)

Korean attendees will find the matching easier and will not need to use most of the techniques demonstrated. However, it will be a useful introduction the capabilities of Mix'n'Match.

A mechanism I have found useful for finding Qitems which are devoid of any wikidata is to try to match Qitems via the Encyclopedia of Korean Culture mix'n'match. (My wikidata languages are currently set at English, Korean, and Chinese. Thus, typically when I find a wikidata item without an English label I add both an English and Chinese label).  I would suggest that any Korean  using the mox'n'match should set their Babel preferences to at least these three languages, because the driving force behind what I am doing is to open up the knowledge locked away in kowiki (because its wikidata is lacking such things as "instance of" "subclass of", "country", "country of origin", "country of citizenship".)  For cultural items I often just take the Google translation as the English label. For short items, I try to always supply a roman script transliteration as an alias.)

Working on the mix'n'match not only can provide matches but by opening up the Qitem and the kowiki page when checking whether there is a match) the wikidata is seen (together with its need to be supplemented). If Koreans were to work on this, I hope that they would supply some key wikidata properties (instance of, country..) making objects, people, places, and events, accessible to the rest of the world via queries.  (For example, when I realised that 성 at the end of a Hangul string implied "fortress", I added instance of fortress to all such items in my download of Qitems in openRefine. This instantly added well over 100 South Korean fortresses to wikidata, and immediately a British wikimedian, having found them, set about labelling them, while another (non-Korean speaking) wikimedian added the kowiki page's coordinates to wikidata, making them mappable.  In other words, just adding "instance of" made these items acccessible to the non-Korean speaking world.)

Any Korean speaker attending the session would learn how to make their way through a mix'nmatch catalogue and would find the work considerably easier than a non-Korean speaker. However, English speakers familiar with mix'n'match would be able to assist and teach during the session because mix'n'match behaves identically across all languages.

The Encyclopedia of Korean Culture catalogue has been chosen to demonstrate mix'n'match as it provides more information to allow a match than the various NLK catalogues and hence is more rewarding to work with.

In matching, I swap between English and Korean and back again (and back again) within the encyclopedia and also the kowiki page.  I use Google translate to give a roman script transliteration. Searches of wikidata and any Korean site need to be performed in Hangul. Thus an open Google Translate window is useful. Use both the kowiki page and the encykor page to find identifying information to add to the wikidata.

Qitems with no wikidata are uncountable. Nor can they be queried. This means that the only people who can find the knowledge contained in such wikipages are the native speakers of that wiki.

It appears that the Korean wikimedian community does not use mix'n'match (https://mix-n-match.toolforge.org/#/)  See: https://mix-n-match.toolforge.org/#/catalog/4392 https://mix-n-match.toolforge.org/#/catalog/920 https://mix-n-match.toolforge.org/#/catalog/3992 https://mix-n-match.toolforge.org/#/catalog/3993 https://mix-n-match.toolforge.org/#/catalog/3994  and more.

I wish to see the Encyclpedia of Korean culture matched to its corresponding wikidata and have been working hard to that end despite neither speaking nor reading Korean. (My Hangul is at the level of a 3 year old)

However, like most tools, whether one is reading them in Korean, Chinese, or English, positioning of items,  buttons and links (and colouring of links)  is identical in the tool across languages.   I think it is worthwhile to show others how to use mix'n'match and to link some Qitems to this information source and improve their wikidata.

Relationship to ESEAP or to the theme

edit

Collaboration beyond the horizon

The various language wikis cover not only global knowledge, but very local knowledge. However, this user would love to be able to find and count via wikidata such things as schools, fortresses, sseowons, seodangs, hyanggyo... This is not possible when essential wikidata is missing. Many of the techniques shown in this workshop allow users to

  1. improve the wikidata for any language which does not use a roman script.
  2. regardless of whether they can speak the language

Thus monolingual speakers will become aware of the need to make more accessible wikis in non-Roman scripts and more capable of tackling the non-accessibility of many wiki pages that use non Roman scripts

(A previous talk at Wikimania 2023 (Korean Wikidata) led to collaboration with Korean wikimedia foundation.)

I hope that this presentation will lead to further collaboration with the Korean wiki community, as well as with other wiki language communities where the script is not Roman. The aim is to improve wikidata items which cannot be found via a SPARQL query...

Username/s

edit

Session type

edit

Workshop: 60-120 minutes

Level

edit

Medium

edit

Some familiarity with Wikidata will make the task easier. However, if this is a first-time exposure please come, as wikidata is a key mechanism for linking beyond our language silos.

Duration

edit

(Workshop) 60 minutes

Session outcomes

edit

At the end of the session, participants

  1. will have matched numerous Qitems to their mix'n'match Encyclopedia of Korean culture counterpart
  2. will have added to Qitems missing such informationː
    1. P31 or P279 to Qitems as appropriate ("instance of", "subclass")
    2. some form of "country" (e.g. P17, P27, P291, P495, P1001 or P3005)
  3. will be confident moving between languages as they enhance wikidata

After the session

  1. participants will
    1. know how to use mix'n'match and
    2. continue use their mix'n'match skills to continue to make their own wiki's and othe wikisknowledge more acessible.
  2. Liaise with those who can help in developing new identifiers particular to, and useful to their language wikis.

My physical presence at ESEAP2024 will permit interactions with other participants outside the formal presentations, and hence make possible the interchange & dissemination of ideas to make language wikis like kowiki less of a 'silo'.