Source: OMOP2OBO v1.0 - Data limited to 5,000 rows


🗺 What is OMOP2OBO?


We developed OMOP2OBO, the first health system-wide integration and alignment between the Observational Health Data Sciences and Informatics' Observational Medical Outcomes Partnership (OMOP) standardized clinical terminologies and eight OBO biomedical ontologies spanning diseases, phenotypes, anatomical entities, cell types, organisms, chemicals, metabolites, hormones, vaccines, and proteins.

🤔 Mappings Overview

Each mapping file contains several tabs:

The figure below provides an example of a OMOP2OBO mapping to an OMOP condition concept. This figure is include to provide detail into the different components that are included in each mapping.

✅ Mapping Verification and Validation

To verify that the mappings are both clinically and biologically meaningful, we have performed extensive experiments to verify the accuracy, generalizability, and logical consistency of each released mapping set. Please note that the consistency experiments are still in progressand have only been applied to the HPO and MONDO mappings at this time.

OMOP2OBO Users and Usecases

The OMOP2OBO mappings have been used in several interesting usecases.

Patient Representation Learning and Rare Disease Subphenotyping

Understanding and Investigating Long COVID or Post Acute Sequelae of SARS-CoV2 Infection (PASC)

📥 Download Current Mapping Release (v1.0)

Additional information on each of the mapping sets is provided through Zenodo. Please be sure to read the information on these pages prior to using the mappings.

Please note that the mappings shown on the Data page have been limited to the first 5,000 concepts identifiers (sorted ascending).

✨ The OMOP2OBO Mapping Dashboard

The OMOP2OBO Dashboard provides up-to-date information on the current OMOP2OBO mapping release. This dashboard is built with R using Rmarkdown and the flexdashboard framework. The code behind the dashboard available here.

💻 Resources and Contact

- OMOP2OBO Algorithm
- Zenodo Community

We'd love to hear from you! To get in touch with us, please
🧵 Join or start a new discussion
🎯 Create an issue
💌 Send us an email

🗺 **What is OMOP2OBO?** 



- A significant promise of electronic health records (EHRs) lies in the ability to perform large-scale investigations of mechanistic drivers of complex diseases. Despite significant progress in biomarker discovery, this promise remains largely aspirational ([`PMID:32335224`](, [`PMID:30304648`]( 
- Linking molecular data to clinical data stored in EHR data will support biologically meaningful analyses, and can be achieved by integrating knowledge about biology and pathophysiology from multiple ontologies.  
- Similar to clinical terminologies, computational ontologies are classification systems that provide detailed representations of a specific domain of knowledge consisting of a set of concepts and logically defined relationships. Unlike most clinical terminologies, ontologies are computable and interoperable, which means they can be logically verified using description logics and easily integrated with other ontologies and non-ontological data including data from basic science and clinical research ([`PMID:30304648`](
- The usefulness of normalizing (i.e. mapping or annotating) clinical data to ontologies, like those in the Open Biological and Biomedical Ontologies ([OBO]( Foundry, has been recognized as a fundamental need for the future of deep phenotyping ([`PMID:32335224`](  
- Existing work has largely been limited to using ontologies to improve phenotyping in specific diseases (i.e. infectious [[`PMID:31160594`](] and rare diseases [[`PMID:31231902`](]) and for the enhancement of specific biological and clinical domains (e.g. laboratory tests [[`PMID:31119199`](] and diagnoses [[`PMID:29295235`](]).  
- Unfortunately, learning algorithms are not yet able to capture the complex clinical and biological semantics underlying these concepts and their relationships. Until a comprehensive, robust resource that includes mappings between multiple clinical domains and biomedical ontologies is created and validated, automatic generation of inference between patient-level clinical observations and biological knowledge will not be possible. 

We developed `OMOP2OBO`, the first health system-wide integration and alignment between the Observational Health Data Sciences and Informatics' Observational Medical Outcomes Partnership ([OMOP]( standardized clinical terminologies and eight OBO biomedical ontologies spanning diseases, phenotypes, anatomical entities, cell types, organisms, chemicals, metabolites, hormones, vaccines, and proteins.

```{r picture1, echo = F, fig.width=10, fig.height=10} knitr::include_graphics("") ```

🤔 **Mappings Overview**

Each mapping file contains several tabs:
- For the Drug ingredient and measurement files, the first tab will contain extended information on the `OMOP` clinical terminologies that were used to create the mappings.
- The condition mapping data file contains a similar, but less extensive set of data, which is present within each ontology tab. For all files the remaining tabs containing the results from mapping to a specific ontology. For conditions, symptoms were aligned to the [`Human Phenotype Ontology (HPO)`]( and diagnoses were aligned to the [`Mondo Disease Ontology (Mondo)`](
- For drug ingredients, all concepts were aligned to at least one [`Chemical Entities of Biological Interest (ChEBI)`]( concept and the remaining ontologies ([`National Center for Biotechnology Information Taxon Ontology (NCBITaxon)`](, [`Protein Ontology (PR)`](, and [`Vaccine Ontology (VO)`]( were mapped by their drug class and/or type (e.g., biologics versus vaccines).
- For each measurement, all levels of the test result (results above, below, and within a reference range) were mapped, not only those deemed clinically relevant. Results outside of a reference range, but not currently deemed clinically relevant (as advised by the literature or consultation via domain expert), were annotated to the nearest relevant ontology concept ancestor. The interpreted measurement result was mapped to the `HPO`, the measurement substance (body fluids, tissues, and organs via the [`Uber Anatomy Ontology (Uberon)`](, the entity being measured (chemicals, metabolites, or hormones via `ChEBI`; cell types via the [`Cell Ontology (CL)`](; and proteins and protein complexes via `PR)`, and the species of the measured entities (organism taxonomy via `NCBITaxon`).

The figure below provides an example of a `OMOP2OBO` mapping to an `OMOP` condition concept. This figure is include to provide detail into the different components that are included in each mapping.
```{r picture2, echo = F, fig.width=10, fig.height=10} knitr::include_graphics("") ```

✅ **Mapping Verification and Validation**

To verify that the mappings are both clinically and biologically meaningful, we have performed extensive experiments to verify the [accuracy](, [generalizability](, and [logical consistency]( of each released mapping set. Please note that the consistency experiments are still in progressand have only been applied to the `HPO` and `MONDO` mappings at this time.
***OMOP2OBO Users and Usecases***

The OMOP2OBO mappings have been used in several interesting usecases.

*Patient Representation Learning and Rare Disease Subphenotyping*
- Callahan TJ, Hunter LE, Kahn MG. Leveraging a Neural-Symbolic Representation of Biomedical Knowledge to Improve Pediatric Subphenotyping. 2022. [`Zenodo:5746173`](

*Understanding and Investigating Long COVID or Post Acute Sequelae of SARS-CoV2 Infection (PASC)*
- Rando HM, Bennett TD, Byrd JB, t al. Challenges in Defining Long COVID: Striking Differences across Literature, Electronic Health Records, and Patient-Reported Information. medRxiv. 2021. [`medRxiv:21253896`](
- Coleman B, Casiraghi E, Callahan TJ, et al. Manifestations Associated with Post Acute Sequelae of SARS-CoV2 Infection (PASC) Predict Diagnosis of New-Onset Psychiatric Disease: Findings from the NIH N3C and RECOVER Studies. medRxiv. 2022. [`medRxiv:22277388`](
- Reese J, Blau H, Bergquist T, et al. Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs. medRxiv. 2022. [`medRxiv:22275398`](
- Deer RR, Rock MA, Vasilevsky N, et al. Characterizing Long COVID: Deep Phenotype of a Complex Condition. eBioMedicine. 2021; 74:103722. [`DOI:10.1016/j.ebiom.2021.103722`](

📥 **Download Current Mapping Release (`v1.0`)**

Additional information on each of the mapping sets is provided through Zenodo. Please be sure to read the information on these pages prior to using the mappings.
- [`Condition Occurrence Mappings`](
- [`Drug Exposure Ingredient Mappings`](
- [`Measurement Mappings`](

*Please note that the mappings shown on the [Data]( page have been limited to the first 5,000 concepts identifiers (sorted ascending).*

✨ **The OMOP2OBO Mapping Dashboard**

The OMOP2OBO Dashboard provides up-to-date information on the current `OMOP2OBO` mapping release. This dashboard is built with R using [`Rmarkdown`]( and the [`flexdashboard`]( framework. The code behind the dashboard available [here](

💻 **Resources and Contact**

***Resources***
- [`OMOP2OBO Algorithm`](
- [`OMOP2OBO Wiki`](
- [`Zenodo Community`](

***Contact***
We'd love to hear from you! To get in touch with us, please
🧵 Join or start a new [discussion](
🎯 Create an [issue](
💌 Send us an [email](