Skip to content

Penn State PaTH to Health Data Set

Penn State PaTH to Health supports investigators across Penn State and Penn State Health in conducting research that uses electronic health record data from Penn State Health Milton S. Hershey Medical Center. This data set is distinctive because it’s organized using the PCORnet Common Data Model. This model is a way of organizing data into a standard structure (i.e., same variable name, attributes, and other metadata). This methodology mirrors the approaches used by other large national research consortia.

Read the PaTH to Health Collaborator Guide

Jump to topic


About PaTH to Health

About the Data Expand answer

About the Common Data Model

The Common Data Model (CDM) describes patient-level data variables defined and organized in a standardized manner (i.e., with the same variable name, attributes and other metadata). The CDM includes data on demographics, encounters, diagnoses and procedures. A CDM data dictionary can be found via PCORnet. The data incorporated in the CDM uses pseudo-identifiers instead of real medical record numbers.

Penn State PaTH to Health contains health data on more than 1 million patients who have received medical care at Penn State Health Milton S. Hershey Medical Center since Jan. 1, 2011; the dataset is refreshed quarterly.

Data Quality and Completeness

Data curation queries are used to assess the quality, completeness and characteristics of the data. Analysts within the Enterprise Information Management team at Penn State Health Milton S. Hershey Medical Center carefully examine the data curation results to identify common themes and opportunities for improvement.

Requesting Access to the Data Expand answer

The Collaborator Guide describes how investigators across Penn State and Penn State Health can request access to PaTH to Health patient-level and clinical encounter-level data.

Researchers can request access a 20 percent random sample dataset by completing the Penn State PaTH to Health data request form. This prep-to-research dataset may include all data tables from the common data model. If approved, files will be delivered securely in a SAS format.

The Penn State PaTH to Health project team will review and evaluate all data request forms using the following criteria:

  • The research question can be appropriately addressed by the available data
  • Clinical content expert is identified
  • Data analyst is identified

Submit a Data Request Form

Project Team Expand answer

Investigators and Data Stewards

Cynthia Chuang, MD, MSc, and Wenke Hwang, PhD

Project Manager

Jody McCullough
717-531-0003, ext. 289350 or

Penn State PaTH to Health is covered under IRB protocol 00006433. The purpose of the protocol is to provide regulatory oversight for answering queries against the full de-identified cohort of patients established for the PaTH Network.