The mission of Penn State PaTH to Health is to support research that uses patient data from multiple sources to advance scientific discovery.
Penn State PaTH to Health is committed to continuously improving our data quality, protecting the privacy of patient data, maintaining nationally recognized data standards (i.e., PCORnet Common Data Model), and advancing methodologies in applying electronic health record data (EHR) in research.
The foundation of the PaTH to Health data is interrelated to the PaTH Network. The PaTH Network is one of 13 Clinical Data Research Networks that comprise PCORnet, the National Patient-Centered Clinical Research Network. PCORnet is funded by the Patient-Centered Outcomes Research Institute (PCORI), an independent nonprofit, nongovernmental organization authorized by Congress in 2010.
This document provides Penn State researchers the guidelines for requesting access to PaTH to Health patient-level and clinical encounter-level data and allows the PaTH to Health data stewards to approve and monitor the progress of all collaborative research projects.
For technical assistance, contact the Department of Public Health Sciences Helpdesk at 717-531-6782 during regular business hours, 8 a.m. to 5 p.m. weekdays.
Jump to topic
The PaTH Network provides an infrastructure for conducting observational studies and pragmatic clinical trials across six health systems when populations beyond a single health system are needed to answer important clinical questions.
The PaTH Network is a collaboration between Geisinger Health System, Johns Hopkins University, Johns Hopkins Health System, Penn State College of Medicine, Penn State Health Milton S. Hershey Medical Center, Temple Health System, Lewis Katz School of Medicine at Temple University, the University of Pittsburgh, UPMC and UPMC Health Plan, The Ohio State University, The Ohio State University Wexner Medical Center, University of Michigan, and Michigan Medicine.
This infrastructure includes institutional relationships with established data use agreements, streamlined and centralized IRB review process, site champions to assist in identifying investigators, and data intra-operability between the EHRs.
The PaTH infrastructure allows researchers to conduct secondary data analysis on clinical data, use EHR data to more easily identify study cohorts and/or eligible patients, efficiently recruit patients, and rapidly implement the interventions. PCORnet has specified a Common Data Model (CDM), which is a set of individual-level and encounter-level data variables defined and organized in a standardized manner which all Clinical Data Research Networks (CDRNs) are required to comply. The CDM makes it easier to share, aggregate, analyze, and compare data for multi-site studies. For some studies, researchers might need clinical data elements that are not covered by the existing CDM.
The Penn State team (hereinafter referred to as “PaTH to Health”) has created a model to link these new data elements to the CDM so these core data elements need not be re-extracted every time for different research projects. Penn State researchers can potentially request all network sites to extract the same additional data elements. The CDM and additional PaTH-added data elements provide standardized variables that can be used as predictors, covariates, and outcomes in clinical research.
Locally, Penn State PaTH to Health provides data access, infrastructure, rigorous security standards, and regulatory support to assist investigators across Penn State University in conducting research that uses EHR data.
Penn State PaTH to Health is committed to continuously improving our data quality, protecting the privacy of patient data, maintaining nationally recognized data standards (i.e., PCORnet Common Data Model), and advancing methodologies in applying EHR for research.
PaTH is Patient Empowered Research. Our mission is to address questions and concerns that matter most to the communities we serve in order to make more informed health decisions.
About the Common Data Model
The Common Data Model (CDM) describes patient-level data variables defined and organized in a standardized manner (i.e., with the same variable name, attributes and other metadata). The CDM includes data on demographics, encounters, diagnoses and procedures. A CDM data dictionary can be found via PCORnet. The data incorporated in the CDM uses pseudo-identifiers instead of real medical record numbers.
Penn State PaTH to Health contains health data on more than 1 million patients who have received medical care at Penn State Health Milton S. Hershey Medical Center since Jan. 1, 2011; the dataset is refreshed quarterly.
Data Quality and Completeness
Data curation queries are used to assess the quality, completeness and characteristics of the data. Analysts within the Enterprise Information Management team at Penn State Health Milton S. Hershey Medical Center carefully examine the data curation results to identify common themes and opportunities for improvement.
Penn State PaTH to Health offers researchers the opportunity to securely work with and learn from varied patient populations, and we can support a range of studies including observational studies based on health record data and clinical trials.
Penn State PaTH to Health is covered under IRB protocol 00006433. The purpose of the protocol is to provide regulatory oversight for answering queries against the full de-identified cohort of patients established for the PaTH Network. It allows the Penn State PaTH to Health team to provide prep-to-research data in order to plan their individual research studies.
Penn State PaTH to Health contains health data on more than 1,026,425 patients who have received medical care at Penn State Health since Jan. 1, 2011; the data is refreshed quarterly.
The CDM splits all patient-level or encounter-level electronic health data into 15 data tables.
The PaTH to Health dataset contains records back to Jan. 1, 2011. Data are refreshed quarterly.
As of March 31, 2019, the dataset includes the following:
Data curation queries are used to assess the quality, completeness and characteristics of the data. Analysts within the Enterprise Information Management team at Penn State Health carefully examine the data curation results to identify common themes and opportunities for improvement.
PCORnet maintains implementation guidance to mitigate the variability in how the partner networks map their source data into the CDM. With each cycle, findings from data curation help inform the development and refinement of the implementation guidance and the data quality checks.
For contextual variables, the PaTH to Health team can create linkages to selected social determinants and environment data elements currently available within PaTH to Health. For example, all PaTH to Health study samples can be linked to American Community Survey (Census Bureau) at the Census Tract or Census Block Group levels for selected social characteristics, educational attainment, housing situation, and insurance coverage. If applicable, the data stewards will discuss this option during their meeting with you.
Penn State PaTH to Health employs extensive security measures to ensure all patient information remains safe and private. The CDM and de-identified study data sets are stored in the high-performance computing (HPC) system located on the Penn State Health campus.
Penn State PaTH to Health can provide Penn State researchers with de-identified health data.
Investigators may use a sample dataset to create the programming code to identify their study cohort (i.e., computable phenotype) and initial data analysis program.
Data access is a two or three-step process.
In this step, a Penn State investigator submits a data request form via REDCap. After approval from the data stewards, the requestor receives access to a random sample of 20 percent of the patients in the Penn State PaTH to Health dataset. This prep-to research dataset may include all data tables from the CDM. Files will be delivered in a SAS format via the HPC.
Note: Use of the 20 percent random sample dataset for testing research hypotheses, presentation or publication is NOT permitted by the College of Medicine Institutional Review Board or the Penn State PaTH to Health data stewards.
The researcher will provide the Penn State PaTH to Health team with the SAS code for their computable phenotype. The code will be executed against the full Penn State PaTH to Health dataset and the researcher will be given access to the full study cohort as a SAS dataset.
The Information Technology team will extract additional data elements that are not currently available in the CDM. For contextual variables, the PaTH to Health team will create linkages to selected social determinants and environment data elements currently available within PaTH to Health.
The CDM is stored as SAS files on the high-performance computing (HPC) system. These data are accessed and analyzed remotely via secure virtual desktops. The HPC allows researchers to quickly and efficiently process large amounts of data while allowing results to be accessed for the life of the research project, with appropriate approvals.
For any other formal studies or clinical trials that may arise from these preparatory-to-research investigations, investigators will obtain their own separate IRB protocol outlining the specific details of the study, including the setting for the research.
Researchers can request access to the 20 percent random sample dataset by completing the Penn State PaTH to Health data request form via REDCap. After completion, the requestor will be contacted within three to five business days.
Jody McCullough, Research Manager, will coordinate a meeting with the requestor and the team’s data stewards Cynthia Chuang, MD, MSc (Internal Medicine) and Wenke Hwang, PhD (Public Health Sciences).
During the meeting, the data stewards will evaluate the request based on the following review criteria:
- Research question can be appropriately addressed by the available data
- Clinical content expert identified – College of Medicine and Clinical and Translational Science Institute can help to identify an expert
The data stewards will notify the requestor, during the review meeting, if their request has been approved, denied, or is pending.
Faculty and staff in the Departments of Public Health Sciences and Statistics offer collaboration and consultation in biostatistics, epidemiology and data management to researchers and students which are in part supported by the Clinical and Translational Science Institute Biostatistics/Epidemiology/Research Design (BERD) Core.
Limited consultation is provided for research projects. Generally, no more than four hours is provided free of charge, after which the faculty or staff member handling your project can discuss the billing rates for additional collaboration.
If approved, the requester will have access to the prep-to-research data for six months. After six months, a progress update will be requested. If an update is not received, the access account will be locked.
The CDM and de-identified SAS data sets are stored in HPC system located on the Penn State Health campus. The requestor will be provided access to the virtual desktop approximately five business days after the data request is approved.
The Department of Public Health Science at Penn State College of Medicine has created a captive Citrix-based research environment to encapsulate and secure access while working with external collaborators. The purpose of this document is to provide a basic description and functional orientation to the PHS Collaborative Research environment.
A least-privilege model for granting permissions and access within the captive environment is utilized. Domain Security Groups along with appropriate account membership enforces compliance. Only authorized system administrators have the ability to modify permissions, initiated through a formal service request by the PI. In addition, all administrative and access events are logged and retained according to PSU Security Policy for auditing purposes. This environment is further defined and controlled by Domain Group and Citrix User Policy assignments which can prevent the ability of any data leaving the captive environment.
For piloting purposes, a Collaborative Research Account Request Form is used to capture requirements for the given research project, collaborator details and confidentiality/HIPAA attestation.
The data stewards will sign off on each collaborator account request form as Manager/Supervisor/Chair or Sponsor.
Based on the requirements of the research project, the Virtual Research environment will be created based on two primary components:
- Unique Domain Security Group to contain all collaborative research accounts for the project. This Security Group will be used to authorize access to specific data resources for the project.
- Unique Citrix User Policy for which the Security Group will be assigned. This Policy defines specific Citrix resources available to the project members as well as restrictions which define the environment and maintain compliance. This will typically include restricting even clipboard copy/paste functionality to be only within the Citrix session, preventing the ability for any data to be saved, copied, or pasted to or from the captive research environment.
A project research environment can be defined to provide a number of user experiences, including a list of application(s) which, when selected, will appear to execute on their local desktop, or a full virtual desktop, or a combination. Regardless of the mode, only resources for which they are authorized to access will be available to them with their Citrix session.
This section provides a basic functional orientation to a captive research environment from a collaborator point of view.
Open the Penn State College of Medicine Public Health Sciences Citrix Logon gateway within your web browser: https://phs-citrix.hersheymed.net
The Citrix Receiver client is required to utilize this environment. If you do not currently have the Citrix Receiver client installed on your system, you can download/install it by clicking the “Download Receiver” link on the login page.
The first time you enter logon credentials, you will be prompted to change the password.
Your new password should contain the following:
- Eight or more characters
- One or more capital letters
- One or more numbers
- One or more special characters (@#$%!)
After entering your new password, you will be prompted a second time to confirm it.
If your new password was not entered correctly (confirmation did not match or it does not meet complexity requirements), you will see a login screen with username and password fields and an incorrect password error.
From here, you can enter your original Logon credentials to start over.
Depending on your internet connection speed to the Penn State network, it could take several seconds to launch. You should receive a security disclaimer. Depending on the type of disclaimer, either click “OK” or “Permit Use.”
File access within the Citrix Desktop and any other applications will be made available to you as appropriate.
To work with PaTH to Health data, you will have access to “Random20,” a folder that contains the data tables for the 20 percent sample.
You will have Read and Write access to only this area from within your available applications to open data files, organize, and save your work. Users should place their codes and all interim datasets in the “Projects” folder.
At the end of each session, please remember to click “Log Off” before closing your web browser. This will help ensure that no inactive sessions remain on the Penn State Citrix server for your account.