Assessment of Wiktionary and Wikidata in relation to a sampling of concepts (entities) extracted from ST (scientific and technical) sources.
Assessment of high-quality dictionaries

I will work in an individual research activity. The planned elapsed time for the project is 4 weeks, half time (I would be on partial leave from work and I will not paid by my company for this leave). I will use samples of Scientific and Technical (ST) terms extracted from the literature (open documents) and I will analyze the coverage and quality of these sample terms in Wiktionary and in Wikidata. This analysis has the objective of identify differences between Wiktionary and Wikidata. These differences in the future can be used also to implement an automatic procedure to mutually enrich Wikidata and Wiktionary terms, taking also into account Wikipedia: this evolution in any case will be proposed only after this evaluation quick project.

I will keep constant relations with the research community by the Wiki-research-l mailing list: There are two kind of audiences: a) the research community around Wikimedia, mainly in Natural Language technologies; b) Wikimedia users for Scientific and Technical terms.

The deliverable will be an open publishing paper (also peer-reviewed) to present the results of this project, discussing i.e. the differences for the coverage and quality of terms between Wikidata and Wiktionary for sampled terms in Scientific and Technical literature, which are both relevant for some users and challenging from scientific point of view (large presence of neologisms). The value-added for Wikimedia projects will be a baseline for future evolutions for Wiktionary/Wikidata in Science and Technical area. Such paper will be indexed from Google Scholar and shared and discussed Wiki-research-l mailing list also after the end of the project.


The final report published as a paper will document results for a significant sample of ST terms (at least 1K terms for different ST domains).


I will work leveraging on my experience on natural language processing engineer (see scholar: and Wikimedia contributor since 2005.

  • The planned elapsed time for the project is 4 weeks, I will work for this project half time. In the project time I will on leave from work and I will not paid by my company for this leave. Hence, I ask a partial reimbursement (1,928.82 USD) for this activity (a total reimbursement in fact would be 2500 USD).