Research:Adapting Wikidata to support clinical practice using Data Science, Semantic Web and Machine Learning

Created
20:05, 30 May 2022 (UTC)
Duration:  2022-08 – 2024-01
Wikidata, Knowledge Graph Validation, Biomedical relation classification, Biomedical relation extraction



GearRotate.svg

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


DescriptionEdit

Nowadays, semantic resources have been proven as efficient to drive computer applications in a variety of domains, particularly healthcare[1]. Semantic resources provide detailed knowledge about various aspects of medicine including diseases, drugs, genes, and proteins[2] and they can consequently be used to retrieve, process, and represent clinical information including electronic health records[3] and scholarly publications[4]. These databases enable biomedical relation extraction[4], biomedical relation classification[5], biomedical data validation[6], biomedical data augmentation[7], and biomedical decision support[8]. However, the implementation of such biomedical resources in the Global South, particularly in Africa, is still limited due to the lack of consistent funding and human capacities[9]. Here, open knowledge graphs, particularly Wikidata, can be valuable to reduce the financial and technical burden of developing digital health in developing countries[2]. As a free and collaborative large-scale multilingual knowledge graph, Wikidata became a confirmed database that can represent multiple kinds of clinical information, particularly in the context of COVID-19[10]. Its representation in the Resource Description Framework (RDF) format enables the flexible enrichment of biomedical information using computer programs and crowdsourcing, the intrinsic and extrinsic validation of clinical knowledge, and the extraction of features from the medical data for decision making and human and machine learning[10]. Yet, Wikidata still lacks a full representation of several facets of biomedical informatics[2] and its data suffers from critical inconsistencies[11]. For instance, Wikidata items about genes, proteins, and drugs have an average of 10+ statements per item while anatomical structures have only an average of fewer than 4.6 statements per item[2]. Furthermore, more than 90% of the Wikidata statements about human genes and proteins are supported by references whereas only less than 50% of the statements about the anatomical structures are assigned references[2]. Moreover, the linguistic representation of biomedical entities in Wikidata is dominated by German and English when other natural languages are partially or rarely covered[10].

To solve these concerns, this project aims to:

  • Turn Wikidata into a large-scale biomedical semantic resource covering most of the aspects of the clinical practice in a significant way (S1): This is allowed thanks to the development of bots and tools to mass import clinical information from external resources already aligned with Wikidata and to the creation of machine learning algorithms to extract clinical information from the full texts and bibliographic information of scholarly publications indexed in PubMed, a large-scale bibliographic database hosted by National Center for Biotechnology Information and National Institutes of Health. This implies the enrichment of the facets of biomedical knowledge represented in Wikidata and the support of new kinds of clinical information that were not covered by Wikidata during the last few years.
  • Validate the biomedical information freely available in Wikidata (S2): This is enabled thanks to comparison to external resources through the use of semantic alignments between Wikidata items and external biomedical resources, and to intrinsic validation through the use of SPARQL for identifying mismatches between statements and the use of shape-based methods such as ShEx and SHACL as well as property constraints for verifying the accuracy of the formatting and data modeling of the clinical knowledge in Wikidata. These methods are coupled with the development of Wikidata Game-like human validation tools of medical information in Wikidata.
  • Promote the biomedical use of Wikidata in the Global South (S3): This is permitted thanks to online capacity-building events for the biomedical community in Africa about Wikidata and its tools, to the publication of surveys and position papers on biomedical applications of Wikidata in recognized research journals. The integration of Wikidata into Fast Healthcare Interoperability Resources, a semantic system for driving Electronic Health Records, is also envisioned to enable the use of human-validated subsets of Wikidata information for clinical reasoning in the Global South.

This project only uses public data and text, and never touches private, restricted, or personal data or health information. The development of these three tasks will allow not only the significant amelioration of Wikidata as a secondary knowledge base for medical information but also the development of a framework for the curation of other famous medical knowledge graphs such as Disease Ontology. The reproducibility of the Project will allow the development of solutions for the enrichment of Wikimedia Projects with knowledge about other research areas such as social science and computer science from open resources, particularly knowledge graphs and bibliographic databases.

MethodsEdit

Describe in this section the methods you'll be using to conduct your research. If the project involves recruiting Wikimedia/Wikipedia editors for a survey or interview, please describe the suggested recruitment method and the size of the sample. Please include links to consent forms, survey/interview questions and user-interface mock-ups.

PrinciplesEdit

ResearchEdit

Using SPARQL and APIs for biomedical data validation and enrichment

As a query language for RDF knowledge graphs, SPARQL is a very useful tool for retrieving a particular piece of information including inconsistencies[11]. As well, APIs are computer-friendly interfaces that allow computer programs to interact with open knowledge resources [2]. As an open knowledge graph in the RDF format, Wikidata has a SPARQL Endpoint that allows to extract semantic data from Wikidata and represent them in a variety of layouts including tables, plots, graphs, and maps. This query service also allows federated queries with a number of external knowledge databases having SPARQL endpoints. Quest is a tool that allows to develop Quickstatements instructions to add, modify or delete Wikidata statements based on the output of a Wikidata SPARQL query. We can consequently use Quest as a tool that enriches and adjusts Wikidata based on logical constraints implemented in SPARQL and probably on query federation with other knowledge resources, particularly OBO Foundry. As well, Wikidata includes an API that can be processed using a Python Library entitled Wikibase Integrator. This library can be used to automatically read and edit Wikidata and can be consequently jointly used with other Python libraries like Biopython to enrich and adjust Wikidata through comparison with other knowledge resources like MeSH. In this research project, we will apply several SPARQL queries to Quest to automatically enrich and adjust Wikidata based on logical constraints and we will use Wikibase Integrator with other Python Libraries to add new Wikidata statements and add references to older ones through comparison to other knowledge resources.

Using ShEx for shape-based biomedical data validation

Shape Expressions (ShEx) is a semantic web language that describes RDF graph structures[12]. It has been extended to validate Wikidata statements based on shape-based constraints [12]. Currently, there is a database of ShEx-based EntitySchemas that are used to validate the data model of a particular category of Wikidata items. As well, there is a JavaScript tool to validate a Wikidata item according to an EntitySchema named CheckShEx. Despite these significant advances, most of the Wikidata classes related to biomedicine are not supported by ShEx-based EntitySchemas. A list of currently available EntitySchemas can be found here. WikiProject COVID-19 tried to bridge this gap through the development of a series of EntitySchemas related to medicine. The list of COVID-19-related EntitySchemas is accessible here. However, a large number of biomedical classes still lacks EntitySchemas, particularly the ones not linked to COVID-19 and infectious diseases. Furthermore, there is not an automated way to link between a Wikidata item and its respective EntitySchemas for validation purposes. A Wikidata property has been proposed to solve the problem. However, the property proposal is still on hold. Further information can be found at the proposal page. The problem with such an approach is that it cannot be scaled to support a group of Wikidata items that are defined based on conditions beyond instance of relations. Here, we propose to manually add EntitySchemas to support all kinds of biomedical Wikidata entities. We also try to develop two tools to enhance how EntitySchemas are defined and reused across Wikidata. The first tool tries to create new EntitySchemas based on already existing ones using Embeddings. The second tool infers the EntitySchemas corresponding to a Wikidata item for an automatic validation of the Knowledge Graph.

Mapping biomedical informatics and Wikidata research and development

For years, biomedical informatics research has evolved covering multiple aspects of the clinical practice and using the SOTA techniques such as machine learning, information retrieval, image and signal processing, big data, and pre-trained language models[13]. Similarly, Wikidata research and development has been growing since 2012 to cover the changes in the multilingual and multidisciplinary coverage of the knowledge graph[14]. Here, Bibliometrics can be very useful to assess research outputs about biomedical informatics and Wikidata research as it uses the bibliographic metadata of scholarly publications to provide insights about the patterns of research publishing for a community[13]. As well, Empirical software engineering can apply empirical research methods on the characteristics of a set of code repositories, including source codes, pull requests, discussions, and issues, to study and evaluate software engineering behaviors related to the development of software tools related to a given topic[15]. In this research project, we will extract bibliographic metadata for scholarly publications related to two research fields (Machine Learning for Healthcare in Africa and Wikidata Research) from Scopus, a controlled large-scale bibliographic database maintained by Elsevier. Then, we will analyze them using four techniques:

  • Publishing Patterns and Time-Aware Analysis: We quantitatively analyze the most common values of every type of bibliographic metadata, including author information, source information, titles, abstracts, research fields, and open-access status. After that, we reproduce what has been done. However, we restrict our analysis to several periods so that we can assess the evolution of the research production in the considered area.
  • Network Analysis: We consider four types of bibliographic associations: Citation, Co-Citation, Co-authorship, and Bibliographic Coupling. For every kind of association, we construct networks for authors, sources, countries, and documents in multiple periods to assess how the field has been structured. We use Total Link Strength weighting to consider better visualize the nodes that contributed more to the establishment of the bibliometric networks. We use VOSViewer, an open-source software for generating bibliometric networks from the data of bibliographic databases, to generate our data visualizations[16].
  • Keyword Analysis: As author-generated and Scopus-generated keywords do not cover all the aspects of analyzed scholarly publications, we augment the data by extracting MeSH Keywords from PubMed if applicable using Biopython[17]. As quite all the research papers are written in English, we also use SpaCy pre-trained models to extract noun phrases from titles and abstracts and add them to the list of keywords[18]. When this work is finished, we align the keywords to their corresponding Wikidata items using OpenRefine[19], an open-source software for tabular data cleaning and reconciliation. After that, we generate the list of the most common keywords by type and period and we construct the keyword association networks for the field to study how a research topic interact with other ones.

After finishing this part, we will use the generated keywords for the research publications about machine learning for healthcare in Africa to classify the considered papers according to their research topics and then invite a number of African enthusiasts of machine learning to write an overview about the research works about every topic to develop a survey of the research field. Beyond this, we will extract detailed information about Wikidata and Wikipedia-related repositories in GitHub to assess how the Wikimedia Technical Community use these two Wikimedia Projects in tools and to find out what we should do to enhance the Wikimedia Technical Community, particularly in Africa.

MeSH Keywords for enriching clinical information in Wikidata

Currently, quite all the machine learning algorithms use the full texts of scholarly publications to extract biomedical relations[17]. However, bibliographic metadata of research publications are easier to parse and more structured than full texts and provide significant insights about what the paper includes as research findings[17]. Recently, a new field has emerged to allow the extraction of scientific knowledge from the bibliographic metadata of scholarly publications based on information retrieval, semantic web, and machine learning. This field is called Bibliometric-Enhanced Information Retrieval[17]. In this research project, we mainly interest to the bibliographic metadata of biomedical scholarly publications indexed in PubMed, a bibliographic database of biomedical research publications maintained by NCBI[17]. Particularly, the MeSH Keywords of PubMed scholarly publications are interesting bibliographic data that can be used to enrich clinical information in Wikidata[17]. In fact, these keywords are controlled (derived from Medical Subject Headings) and have a particular layout (Heading/Qualifier), enabling their semantic alignment to Wikidata items using the MeSH Descriptor ID Property and their easy processing due to their data structure. Such an interaction between MeSH Keywords and Wikidata is ensured thanks to two Python Libraries: Biopython and Wikibase Integrator. We did a preliminary work that predicts the types of semantic relations between two MeSH Keywords based on the association of their qualifiers in PubMed[17]. The classification algorithm returns a Wikidata property (195 relation types) as well as a first-order semantic relation type metaclass (5 superclasses) that corresponds to the analyzed relation[17]. We achieved an accuracy of 70.78% for the class-based classification and of 83.09% for the superclass-based classification[17]. In this research project, we envision to study the mechanism behind our biomedical relation classification approach using a generalization-based accuracy analysis[20] as well as Integrated Gradients[21] as model explainability methods. We will use the results to enhance our proposed approach named MeSH2Matrix so that it can become more accurate and to drive the search of references from PubMed for unsupported biomedical relations in Wikidata. Moreover, we will try to combine corpus-based semantic similarity measures with the MeSH2Matrix approach to extract semantic relations from the MeSH Keywords and add them to Wikidata using a Wikidata Game-like Toolforge Tool.

Expanding the coverage and real-world applications of biomedical knowledge in Wikidata

Currently, Wikidata is mainly used to support semantic information and develop educational dashboards about genomes, diseases, drugs and proteins[2]. Yet, biomedical knowledge is broader than this and needs to be less distorted in Wikidata so that this open ontological database can be used in various contexts of the clinical practice[2]:

  • Despite the significant representation of biomedical knowledge in Wikidata, many classes and relation types are still not well covered in this open knowledge graph[2]. This includes symptoms, syndromes, clinical trials, disease outbreaks, classifications, and surgeries among other types of important biomedical items[2]. In this research project, we will try to add new Wikidata properties related to biomedicine as we did for risk factor and medical indication. We will also define new classes for unsupported types of medical entities and we define data models for describing them in Wikidata, as we did for Clinical Trials[22].
  • Due to their structured format, knowledge graphs can be easily processed to extract features about a topic. This is enabled thanks to SPARQL as a query language or APIs as computer-friendly interfaces. Particularly, Wikidata API can be reused using Wikibase Integrator to drive knowledge-based systems[10]. Similarly, the outputs of SPARQL queries can be embedded to HTML pages to create real-time dashboard for various applications[10]. In this research project, we will explain how computer scientists and medical specialists can use Wikidata to improve their work through a series of opinion papers and implementation papers. The applications we will be dealing with include the use of Wikidata to:
    • Support clinical decisions, health research, and medical education through driving FHIR RDF-based structured electronic health records.
    • Create real-time bibliometric studies to evaluate health research and predict award-winning ones
    • Augment keyword analysis for bibliometric analyses and literature reviews
    • Biomedical Ontology Engineering

DisseminationEdit

This project goes in line with three points of the 2030 Wikimedia Strategic Direction: “Coordinate Across Stakeholders”, “Increase the Sustainability of Our Movement” and “Identify Topics for Impact”. Developing a framework to update, enrich and validate Biomedical Knowledge in Wikidata will allow ensuring better data quality for Wikidata in the healthcare context. Such quality for a freely available resource will increase the trustworthiness of Wikidata as a reference for physicians, pharmacists, and other medical professionals. This will allow for better patient management and health education in the Global South. This will solve representation gaps related to medical content for content, contributors, and readers as defined in the knowledge gaps taxonomy. Extending the use of Wikidata for clinical practice will allow the creation of knowledge-based medical systems at a low cost. This will allow the achievement of three UN Sustainable Development Goals: “Good Health and Well-Being” (SDG 3), “Quality Education” (SDG 4), and “Sustainable Cities and Communities” (SDG 11). From the perspective of the Wikimedia Movement, the Project will be referential for Wikimedia affiliates and communities from Africa, particularly Wikimedia Tunisia and African Wikimedia Developers Project, if they would like to continue working on the medical output of Wikidata and create projects about biomedical applications of Wikidata or if they would like to formulate a research project and apply for the next editions of Wikimedia Research Fund.

To measure the success of our research project, several objective metrics can be used to evaluate the reach and productivity of our upcoming work:

  • Number of scholarly publications in Scimago Q1 computer science and medical research journals: 3+
  • Number of proceedings papers in the main tracks of CORE A or A* scholarly conferences: 1+
  • Number of proceedings papers in the Workshops of CORE A or A* scholarly conferences: 2+
  • Number of office hours: 6+
  • Number of presentations in Wikimedia conferences: 3+
  • Number of attendees to office hours: 30+ per session

For the dissemination of this Project, we envision publishing most of our research results in recognized scholarly journals as Open Access publications. We look forward to presenting our efforts in Wikimedia Research-related venues such as Wiki Workshop, Wikidata Workshop, and Wiki-M3L as well as in premier scholarly conferences for Knowledge Engineering and Machine Learning (CORE A*), particularly SIGIR and WWW. We will publish our source codes on GitHub under the MIT License for reproducibility purposes. We will participate in Wikimedia Conferences (e.g., WikiArabia, WikiIndaba, Wikimania, and WikidataCon) to disseminate the outcomes of our work to the Wikimedia Community. We will organize regular office hours where we demonstrate our tools live on YouTube and Zoom to the information retrieval, semantic web, biomedical informatics, and clinical medicine communities. All this work will be collaboratively done with the collaboration of Sisonkebiotik, a community of African Machine Learning for Healthcare Purposes.

TimelineEdit

Please provide in this section a short timeline with the main milestones and deliverables (if any) for this project.

Aim Task Description
S1 S1.A1 Month 1 - Month 12: Enriching Wikidata with biomedical knowledge available in external resources
  • Developing Wikidata bots and tools that use semantic alignments (e.g., Disease Ontology ID) between Wikidata items and their equivalents in external resources to extract and validate semantic relations between Wikidata items from these resources and mass import them to the Wikidata knowledge graph.
  • Applying machine-learning models, semantic similarity, and natural language processing techniques on bibliographic metadata available in PubMed, mainly the MeSH keywords, to extract and classify biomedical relations between Wikidata items. MeSH2Matrix has been developed by Sisonkebiotik and Data Engineering and Semantics Research Unit as an approach for the MeSH keyword-based classification of Wikidata relations[17]. MeSH2Matrix will be used as a pillar for applying Bibliometric-Enhanced Information Retrieval to automatically enrich and validate biomedical relations in Wikidata.
S1.A2 Month 1 - Month 12: Enriching Wikidata with biomedical knowledge available in external resources
  • Adding new properties and classes based on our experience in this context. We have already proposed new Wikidata properties (e.g., risk factor and medical indication) and we have also added support for new Wikidata classes from scratch (e.g., COVID-19 app). We can reproduce this experience when needed. An example is the inclusion of knowledge about clinical trials in the Wikidata knowledge graph within the framework of WikiProject Clinical Trials[22].
S2 S2.A1 Month 4 - Month 10: Developing bots and tools for the cross-validation of Wikidata biomedical information from external resources
  • This work will be a development of the efforts of Wikimedia Deutschland in this context, particularly the Reference Island Project that assigns a reference to a Wikidata statement when it exists in an external knowledge resource. The bots will verify the availability of Wikidata statements in external resources (e.g., Disease Ontology) based on semantic alignments (e.g., Disease Ontology ID) in Wikidata. Toolforge tools will be based on the analysis of bibliographic metadata of scholarly publications coupled with human validation to decide whether a biomedical relation in Wikidata is accurate or not. RefB funded by WikiCred Grant Initiative is an example of preliminary work regarding the use of PubMed data mining for validating and adding reference support to Wikidata biomedical relations.
S2.A2 Month 4 - Month 10: Developing SPARQL-based approaches for the intrinsic validation of Wikidata clinical knowledge
  • Despite the usefulness of Shape-based methods (ShEx and SHACL), they would not allow the verification of Wikidata statements through the comparison of their values. Here, SPARQL can be used to identify mismatches between the values of two statements. In the context of the COVID-19 pandemic, we developed a SPARQL-based method for the validation of the epidemiological data about the disease[11]. We look forward to developing our work to cover other biomedical use cases of SPARQL-based validation that cannot be fulfilled by Shape-based methods.
S2.A3 Month 4 - Month 10: Developing EntitySchemas in ShEx for validating the shape and representation of medical concepts in Wikidata
  • We will build upon the efforts of WikiProject COVID-19 to develop data models for supporting biomedical entities related to the ongoing pandemic. We will reuse the data modeling output of WikiProject COVID-19 and extend it to cover other aspects of clinical practice.
S3 S3.A1 Month 1 - Month 12: Organize office hours to demonstrate Wikidata and its medical outputs
  • We will build upon the success of our previous presentations in Wikimedia Conferences (e.g., Wikimania 2019 and WikiArabia 2021) on the matter and customize our materials to let them more adapted to the healthcare industry and the computer science community in Africa.
  • We will deal with the technical side of reusing Wikidata in intelligent systems for the clinical practice that was not evocated in our previous presentations and that was used in our research on the topic (e.g., Wikibase Integrator, MediaWiki API, and SPARQL). We will also show examples of several clinical applications where Wikidata can be very useful based on what we have presented in Wikimedia conferences and peer-reviewed research venues.
S3.A2 Month 1 - Month 12: Publishing scholarly publications about Wikidata-driven biomedical applications and about the management of biomedical information in Wikidata
  • Develop a roadmap for integrating Wikidata with Fast Healthcare Interoperability Resources and other open knowledge resources to enable the reuse of Wikidata in knowledge-based biomedical systems
  • Publishing research works about how to practically use Wikidata in the clinical context and publish them in indexed scholarly journals
  • Publishing research works about how to manage clinical information in Wikidata and publish them in indexed scholarly journals

ResultsEdit

Once your study completes, describe the results an their implications here. Don't forget to make status=complete above when you are done.

OutputsEdit

Wikimedia EventsEdit

Date Title Venue Details
30 June 2022 Wikidata as a resource for enriching Medical Wikipedia (in Arabic, 18 attendees) Arabic Wikipedia Day 2022

  Abstract
  Slides

12 July 2022 Bibliometric-Enhanced Information Retrieval as a tool for enriching and validating Wikidata (in English, 65 attendees) 2022 LD4 Conference on Linked Data   Slides

  Video
  Notes

13 August 2022 Let us play with PubMed to enrich Wikidata with medical information (in English, thirty online attendees and eight in-person attendees) 2022 Wikimania Hackathon
Wikimania 2022 in Tunisia
 Slides

  Video
  Notes

Office HoursEdit

Date Title Details
25-26 August 2022 Growing AI for Healthcare in Africa: Telling our story (in English, 118 attendees)
SisonkeBiotik: Africa Machine Learning and Health Workshop
  Slides
23 October 2022 Let us solve the mysteries behind Wikidata (in French, 16 participants including 8 attendees and 10 contributors to the tutorial)
Wikidata Tenth Birthday

  Video
  Notes

Example Example Example

TutorialsEdit

Date Title
16 and 23 July 2022 Introduction to Wikidata, User Scripts, Wikidata Query Service, and OpenRefine (in French)
Wiki Wake Up Afrique

Research VenuesEdit

Date Title Venue Details
16 November 2022 Letter to the Editor: FHIR RDF - Why the world needs structured electronic health records Journal of Biomedical Informatics   Abstract

  Full Text

ReferencesEdit

  1. Callahan, T. J., Tripodi, I. J., Pielke-Lombardo, H., & Hunter, L. E. (2020). Knowledge-based biomedical data science. Annual review of biomedical data science, 3, 23-41. PMC:8095730.
  2. a b c d e f g h i j Turki, H., Shafee, T., Hadj Taieb, M. A., Ben Aouicha, M., Vrandečić, D., Das, D., & Hamdi, H. (2019). Wikidata: A large-scale collaborative ontological medical database. Journal of biomedical informatics, 99, 103292. doi:10.1016/j.jbi.2019.103292.
  3. Sun, H., Depraetere, K., De Roo, J., Mels, G., De Vloed, B., Twagirumukiza, M., & Colaert, D. (2015). Semantic processing of EHR data for clinical research. Journal of biomedical informatics, 58, 247-259. doi:10.1016/j.jbi.2015.10.009.
  4. a b Kang, N., Singh, B., Bui, C., Afzal, Z., van Mulligen, E. M., & Kors, J. A. (2014). Knowledge-based extraction of adverse drug events from biomedical text. BMC Bioinformatics, 15(1), 64:1-64:8. doi:10.1186/1471-2105-15-64.
  5. Hong, G., Kim, Y., Choi, Y., & Song, M. (2021). BioPREP: Deep learning-based predicate classification with SemMedDB. Journal of Biomedical Informatics, 122, 103888. doi:10.1016/j.jbi.2021.103888.
  6. Nicholson, N. C., Giusti, F., Bettio, M., Negrao Carvalho, R., Dimitrova, N., Dyba, T., et al. (2021). An ontology-based approach for developing a harmonised data-validation tool for European cancer registration. Journal of Biomedical Semantics, 12(1), 1:1-1:15. doi:10.1186/s13326-020-00233-x.
  7. Slater, L. T., Bradlow, W., Ball, S., Hoehndorf, R., & Gkoutos, G. V. (2021). Improved characterisation of clinical text through ontology-based vocabulary expansion. Journal of Biomedical Semantics, 12(1), 7:1-7:9. doi:10.1186/s13326-021-00241-5.
  8. Tehrani, F. T., & Roum, J. H. (2008). Intelligent decision support systems for mechanical ventilation. Artificial Intelligence in Medicine, 44(3), 171-182. doi:10.1016/j.artmed.2008.07.006.
  9. Odekunle, F. F., Odekunle, R. O., & Shankar, S. (2017). Why sub-Saharan Africa lags in electronic health record adoption and possible strategies to increase its adoption in this region. International Journal of Health Sciences, 11(4), 59. PMC:5654179.
  10. a b c d e Turki, H., Hadj Taieb, M. A., Shafee, T., Lubiana, T., Jemielniak, D., Ben Aouicha, M., ... & Mietchen, D. (2021). Representing COVID-19 information in collaborative knowledge graphs: the case of Wikidata. Semantic Web, 13(2), 233-264. doi:10.3233/SW-210444.
  11. a b c Turki, H., Jemielniak, D., Hadj Taieb, M. A., Labra Gayo, J. E., Ben Aouicha, M., Banat, M., ... & Mietchen, D. (2022). Using logical constraints to validate statistical information about COVID-19 in collaborative knowledge graphs: the case of Wikidata. PeerJ Computer Science, 8, e1085. doi:10.7717/peerj-cs.1085.
  12. a b Labra Gayo, J. E. (2022). WShEx: A language to describe and validate Wikibase entities. In Proceedings of the 3rd Wikidata Workshop 2022 (Wikidata 2022) (pp. 4:1-4:12). Hangzhou, China: CEUR-WS.org. doi:10.48550/arXiv.2208.02697
  13. a b Tran, B. X., Vu, G. T., Ha, G. H., Vuong, Q. H., Ho, M. T., Vuong, T. T., et al. (2019). Global evolution of research in artificial intelligence in health and medicine: a bibliometric study. Journal of clinical medicine, 8(3), 360. doi:10.3390/jcm8030360
  14. Mora-Cantallops, M., Sánchez-Alonso, S., & García-Barriocanal, E. (2019). A systematic literature review on Wikidata. Data Technologies and Applications, 53(3), 250-268. doi:10.1108/DTA-12-2018-0110
  15. Prana, G. A. A., Treude, C., Thung, F., Atapattu, T., & Lo, D. (2019). Categorizing the content of github readme files. Empirical Software Engineering, 24(3), 1296-1327. doi:10.1007/s10664-018-9660-3
  16. Van Eck, N., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523-538. doi:10.1007/s11192-009-0146-3
  17. a b c d e f g h i j Turki, H., Dossou, B. F. P., Emezue, C. C., Hadj Taieb, M. A., Ben Aouicha, M., Ben Hassen, H., & Masmoudi, A. (2022). MeSH2Matrix: Machine learning-driven biomedical relation classification based on the MeSH keywords of PubMed scholarly publications. In BIR 2022: 12th International Workshop on Bibliometric-enhanced Information Retrieval at ECIR 2022 (pp. 45-60). Stavanger, Norway: CEUR-WS.org. https://ceur-ws.org/Vol-3230/paper-07.pdf
  18. Vasiliev, Y. (2020). Natural Language Processing with Python and SpaCy: A Practical Introduction. No Starch Press.
  19. Delpeuch, A. (2020). A Survey of OpenRefine Reconciliation Services. In Proceedings of the 15th International Workshop on Ontology Matching co-located with the 19th International Semantic Web Conference (ISWC 2020) (pp. 82-86). Athens, Greece: CEUR-WS.org. https://ceur-ws.org/Vol-2788/om2020_STpaper3.pdf
  20. Turki, H., Hadj Taieb, M. A., & Ben Aouicha, M. (2022). How knowledge-driven class generalization affects classical machine learning algorithms for mono-label supervised classification. In International Conference on Intelligent Systems Design and Applications (pp. 637-646). Springer, Cham. doi:10.1007/978-3-030-96308-8_59
  21. Sundararajan, M., Taly, A., & Yan, Q. (2017, July). Axiomatic attribution for deep networks. In International conference on machine learning (pp. 3319-3328). PMLR. http://proceedings.mlr.press/v70/sundararajan17a.html
  22. a b Rasberry, L., Tibbs, S., Hoos, W., Westermann, A., Keefer, J., Baskauf, S. J., ... & Mietchen, D. (2022). WikiProject Clinical Trials for Wikidata. medRxiv. doi:10.1101/2022.04.01.22273328v1.