Digital Collaboratory for Precision Health Research (DCPHR)
Vasant Honavar, co-lead of CTSI’s Informatics core services, and team launched the Digital Collaboratory for Precision Health Research (DCPHR). The DCPHR combines the efforts of CTSI’s informatics team and the Center for Artificial Intelligence Foundations in Scientific Applications. Together with the Institute for Computational and Data Sciences, the Social Science Research Institute, and the Health Spoke of the National Science Foundation’s Northeast Big Data Hub, the DCPHR provides access to large data sets via several discovery tools and provides researchers with the necessary artificial intelligence (AI) and machine learning (ML) stacks to properly use those data sets.
Observational Medical Outcomes Partnership (OMOP)
The DCPHR currently provides access to deidentified Penn State Health EHR data standardized using the OMOP common data model (CDM), which was developed by the NIH-funded Observational Health Data Sciences Initiative (OHDSI) consortium. The OHDSI consortium links 3,266 collaborators across approximately 75 countries with OMOP-based EHR data repositories that collectively contain 928 million unique patient records (representing about 12% of the world’s population).
TriNetX
TriNetX allows Penn State researchers to access Penn State Health EHR data, de-identified data from the TriNetX Research network (from over 71 healthcare organizations), de-identified claims data from the Diamond Network (from 92 organizations), and de-identified claims data from the COVID-19 Research Network (from 78 additional organizations). Researchers can define study cohorts of interest, by querying TriNetX networks based on medications, diagnoses, demographics, lab results, genomics, mortality, oncology, procedures, etc.
“PaTH to Health” Data
Wenke Hwang, co-lead of CTSI’s informatics core services, focuses on the development and curation of Penn State Health EHR that conforms to data standards of the PCORnet’s Common Data Model. This clinical research data repository splits all patient-level and encounter-level data into multiple tables using pseudo-identifiers in a HIPPA-compliant manner. The “PaTH to Health” data meet the national data standard and are harmonized with data from more than 70 PCORnet clinical sites. This data repository is refreshed regularly (currently every other week), checked for data quality assurance quarterly, and has been used as a data infrastructure that supports several successful multi-site grants applications.