Talk:Project/in situ/trans situ

Welcome to this page dedicated to discuss wished data models. Feel free to talk in whichever language you want, especially in the comment section.

The goal of this page is to allow wiktionarian communities to formalize an ontology that would solve their common requirements, with the aim to provide an associated dedicated Wikibase instance implementing it.

It comes as an alternative to the per linguistic version approach per situ which aims at providing as many Wikibase instance as Wikitionary linguistic version, to each hold its very own hardwired ontology.

Requirements analysis

This section is specifically dedicating to express requirements that an ontology should meet in order to encompass the aims of trans situ approach to create a data model whose structure will benefit all linguistic versions of Wiktionary and possibly beyond.

While many work have already be done in digitization of lexicographic works, including within Wikimedia projects, none as far as it is known was conducted with the specific goal of serving needs of existing Wiktionary communities as core aim.

Distinct linguistic versions of Wiktionary come with more or less distinct communities, which are neither completely hermetical to each other nor completely fluid. This is true both for

people contributing, as some contributors can and do contribute in different linguistic versions
as well as of the lexicographical works: information can be transferred from one linguistic version to on other, possibly adapted and translated.

But these exchanges are structurally limited. No one can contribute to all of the 44 active Wiktionary instances, each project as it's own way to structure data, and each community has its specific dynamics and rules despite their common ground. So each project makes its own choices regarding both lexicographical structure and other miscellaneous editorial policies, as well as deciding of a more or less extensive use of technical tools available like basic wikicode, templates, modules and so on.

While this is very nice from the perspective of letting a diversity of analysis coexists, it's also lower the opportunities of inter-pollination as the common software ground is not exploited with interoperability in mind across linguistic versions.

The approach of trans situ is not to make distinct treatments impossible, but to enforce explicit encoding of structural choices. The overall goal is to let enough flexibility to let each community present different analysis and way to render them, while easing the transfer of information between analysises.

And to make this approach completely fair, the basic data model should not include any lexicological a priori. So rather than providing a turn key domain specific ontology, it should let the community builds its own lexicological models within the systems and create bridges between them. So it is a meta-ontological model that should be proposed, not a lexicological model. Although a word like meta-ontological model might look somewhat turned to something very abstract, the solution should stick to something as concrete as possible. To a large extend, a bare Mediawiki instance is already giving users what was just described, and that's what's allowed different linguistic community to structure their projects differently.

To start with something very concrete, it's certainly possible to restrict the project to a common concrete object. Even what constitute a lexicographical entry is already too much abstract, especially in a multilingual dictionary such as wiktionarian projects. One of the most concrete object that is conceptualized around language is utterance. Indeed, this term can refer to somewhat any kind of language expression, without throwing a lot of assumption about what it refers to. An utterance doesn't imply any specific modality of expression (oral, written, signed…), language typology (Ancient Greek, Esperanto, slang…) or even a restriction to human emitters (waggle dance, automata intercommunications, "noises" by so called "inanimate objects" such as cosmic microwave background).

So this approach, while not incompatible with integration of what other conceptualizations offers, is departing on a radically different perspective.

For example the OntoLex Lemon Lexicography Module doesn't mention the word utterance even once. Right from the start, it is rushing at providing very abstract concepts. As Larry Masinter stated it as early as 2009:^[1]

[…] as an organization, W3C can, and should, define languages in which the meaning is defined in the document, in terms of abstractions rather than in terms of operational behavior.

On the contrary, this proposal attempt to focus on operational behavior first. Concrete utterances are always the start point. Concepts and abstractions come latter as a way to rationalize their diversity. Both should be explicitly stated in models, so that no one is let in the delusion that the theory is more authoritative than the bare records which inspired them. To be clear, even a meta-model such as the one proposed here doesn't make an exception. It can at best bring more flexibility and more explicitly states that all analysis are related to some motivated postulates. But it doesn't bring this theoretical apparatus out of motivated postulate foundations. At the end of the day, the unfathomable diversity of reality predominate over our schemes to encompass it.

Data model proposals

This section is there to host you data model proposals. Obviously, complete the requirements analysis must be first achieved to do something really relevant, but if you can't resist to stub something without delay, here is the place for it.

Miscellaneous comments and idea

Here is the section for all your suggestions and feedback that wouldn't fit other sections.

References

↑ Language semantics and operational meaning | W3C Blog

Add topic

[1] Language semantics and operational meaning | W3C Blog

[1]