Skip to content

PaTH To Health Collaborator Guide

The mission of Penn State PaTH to Health is to support research that uses patient data from multiple sources to advance scientific discovery.

Penn State PaTH to Health is committed to continuously improving our data quality, protecting the privacy of patient data, maintaining nationally recognized data standards (i.e., PCORnet Common Data Model), and advancing methodologies in applying electronic health record data (EHR) in research.

The foundation of the PaTH to Health data is interrelated to the PaTH Network. The PaTH Network is one of 13 Clinical Data Research Networks that comprise PCORnet, the National Patient-Centered Clinical Research Network. PCORnet is funded by the Patient-Centered Outcomes Research Institute (PCORI), an independent nonprofit, nongovernmental organization authorized by Congress in 2010.

This document provides Penn State researchers the guidelines for requesting access to PaTH to Health patient-level and clinical encounter-level data and allows the PaTH to Health data stewards to approve and monitor the progress of all collaborative research projects.

Contact Us

For general questions about PaTH to Health and its data, contact Jody McCullough at 717-531-0003, ext. 289350 or path@pennstatehealth.psu.edu.

For technical assistance, contact the Department of Public Health Sciences Helpdesk at 717-531-6782 during regular business hours, 8 a.m. to 5 p.m. weekdays.

Jump to topic

Search

Overview

About the PaTH Network Expand answer

The PaTH Network provides an infrastructure for conducting observational studies and pragmatic clinical trials across six health systems when populations beyond a single health system are needed to answer important clinical questions.

The PaTH Network is a collaboration between Geisinger Health System, Johns Hopkins University, Johns Hopkins Health System, Penn State College of Medicine, Penn State Health Milton S. Hershey Medical Center, Temple Health System, Lewis Katz School of Medicine at Temple University, the University of Pittsburgh, UPMC and UPMC Health Plan, The Ohio State University, The Ohio State University Wexner Medical Center, University of Michigan, and Michigan Medicine.

This infrastructure includes institutional relationships with established data use agreements, streamlined and centralized IRB review process, site champions to assist in identifying investigators, and data intra-operability between the EHRs.

The PaTH infrastructure allows researchers to conduct secondary data analysis on clinical data, use EHR data to more easily identify study cohorts and/or eligible patients, efficiently recruit patients, and rapidly implement the interventions.  PCORnet has specified a Common Data Model (CDM), which is a set of individual-level and encounter-level data variables defined and organized in a standardized manner which all Clinical Data Research Networks (CDRNs) are required to comply. The CDM makes it easier to share, aggregate, analyze, and compare data for multi-site studies. For some studies, researchers might need clinical data elements that are not covered by the existing CDM.

The Penn State team (hereinafter referred to as “PaTH to Health”) has created a model to link these new data elements to the CDM so these core data elements need not be re-extracted every time for different research projects. Penn State researchers can potentially request all network sites to extract the same additional data elements. The CDM and additional PaTH-added data elements provide standardized variables that can be used as predictors, covariates, and outcomes in clinical research.

About Penn State PaTH to Health Expand answer

Locally, Penn State PaTH to Health provides data access, infrastructure, rigorous security standards, and regulatory support to assist investigators across Penn State University in conducting research that uses EHR data.
Penn State PaTH to Health is committed to continuously improving our data quality, protecting the privacy of patient data, maintaining nationally recognized data standards (i.e., PCORnet Common Data Model), and advancing methodologies in applying EHR for research.

Mission Statement Expand answer

PaTH is Patient Empowered Research. Our mission is to address questions and concerns that matter most to the communities we serve in order to make more informed health decisions.

About the Data Expand answer

About the Common Data Model

The Common Data Model (CDM) describes patient-level data variables defined and organized in a standardized manner (i.e., with the same variable name, attributes and other metadata). The CDM includes data on demographics, encounters, diagnoses and procedures. A CDM data dictionary can be found via PCORnet. The data incorporated in the CDM uses pseudo-identifiers instead of real medical record numbers.

Penn State PaTH to Health contains health data on more than 1 million patients who have received medical care at Penn State Health Milton S. Hershey Medical Center since Jan. 1, 2011; the dataset is refreshed quarterly.

Data Quality and Completeness

Data curation queries are used to assess the quality, completeness and characteristics of the data. Analysts within the Enterprise Information Management team at Penn State Health Milton S. Hershey Medical Center carefully examine the data curation results to identify common themes and opportunities for improvement.

Data Details

Penn State PaTH to Health offers researchers the opportunity to securely work with and learn from varied patient populations, and we can support a range of studies including observational studies based on health record data and clinical trials.

Penn State PaTH to Health is covered under IRB protocol 00006433. The purpose of the protocol is to provide regulatory oversight for answering queries against the full de-identified cohort of patients established for the PaTH Network. It allows the Penn State PaTH to Health team to provide prep-to-research data in order to plan their individual research studies.

Penn State Patient Demographics Expand answer

Penn State PaTH to Health contains health data on more than 1,026,425 patients who have received medical care at Penn State Health since Jan. 1, 2011; the data is refreshed quarterly.

The CDM splits all patient-level or encounter-level electronic health data into 15 data tables.

An image from PCORnet shows the PCORnet Common Data Model, version 4.0. Elements in the data model are listed in multiple separate boxes, each with a header describing the type of data, such as demographics, encounters, lab results and more.

This image from PCORnet shows the PCORnet Common Data Model, version 4.0.

The PaTH to Health dataset contains records back to Jan. 1, 2011. Data are refreshed quarterly.

As of March 31, 2019, the dataset includes the following:

Data Quality and Completeness Expand answer

Data curation queries are used to assess the quality, completeness and characteristics of the data. Analysts within the Enterprise Information Management team at Penn State Health carefully examine the data curation results to identify common themes and opportunities for improvement.

PCORnet maintains implementation guidance to mitigate the variability in how the partner networks map their source data into the CDM. With each cycle, findings from data curation help inform the development and refinement of the implementation guidance and the data quality checks.

Contextual Variables (Optional) Expand answer

For contextual variables, the PaTH to Health team can create linkages to selected social determinants and environment data elements currently available within PaTH to Health. For example, all PaTH to Health study samples can be linked to American Community Survey (Census Bureau) at the Census Tract or Census Block Group levels for selected social characteristics, educational attainment, housing situation, and insurance coverage. If applicable, the data stewards will discuss this option during their meeting with you.

Data Security Expand answer

Penn State PaTH to Health employs extensive security measures to ensure all patient information remains safe and private. The CDM and de-identified study data sets are stored in the high-performance computing (HPC) system located on the Penn State Health campus.

Data Access

Penn State PaTH to Health can provide Penn State researchers with de-identified health data.

Investigators may use a sample dataset to create the programming code to identify their study cohort (i.e., computable phenotype) and initial data analysis program.

Data access is a two or three-step process.

Step 1: Preparation to Research Data Access Expand answer

In this step, a Penn State investigator submits a data request form via REDCap. After approval from the data stewards, the requestor receives access to a random sample of 20 percent of the patients in the Penn State PaTH to Health dataset. This prep-to research dataset may include all data tables from the CDM. Files will be delivered in a SAS format via the HPC.

Note: Use of the 20 percent random sample dataset for testing research hypotheses, presentation or publication is NOT permitted by the College of Medicine Institutional Review Board or the Penn State PaTH to Health data stewards.

Step 2: Study Cohort Data Access Expand answer

The researcher will provide the Penn State PaTH to Health team with the SAS code for their computable phenotype. The code will be executed against the full Penn State PaTH to Health dataset and the researcher will be given access to the full study cohort as a SAS dataset.

Step 3: Additional Data Elements (If Necessary) Expand answer

The Information Technology team will extract additional data elements that are not currently available in the CDM. For contextual variables, the PaTH to Health team will create linkages to selected social determinants and environment data elements currently available within PaTH to Health.

About the CDM Expand answer

The CDM is stored as SAS files on the high-performance computing (HPC) system. These data are accessed and analyzed remotely via secure virtual desktops. The HPC allows researchers to quickly and efficiently process large amounts of data while allowing results to be accessed for the life of the research project, with appropriate approvals.

For any other formal studies or clinical trials that may arise from these preparatory-to-research investigations, investigators will obtain their own separate IRB protocol outlining the specific details of the study, including the setting for the research.

This flowchart depicts the process through which Penn State investigators access PaTH to Health data. The chart shows Penn State Researchers, the Penn State PaTH to Health Team and Penn State IT Team, and uses arrows to depict how those groups interact with each other and with various sets of data.

This flowchart depicts the process through which Penn State investigators access PaTH to Health data.

Requesting Access to the Data Expand answer

Researchers can request access to the 20 percent random sample dataset by completing the Penn State PaTH to Health data request form via REDCap. After completion, the requestor will be contacted within three to five business days.

Jody McCullough, Research Manager, will coordinate a meeting with the requestor and the team’s data stewards Cynthia Chuang, MD, MSc (Internal Medicine) and Wenke Hwang, PhD (Public Health Sciences).

During the meeting, the data stewards will evaluate the request based on the following review criteria:

  • Research question can be appropriately addressed by the available data
  • Clinical content expert identified – College of Medicine and Clinical and Translational Science Institute can help to identify an expert
  • Data Analyst identified – College of Medicine and Clinical and Translational Science Institute can help to identify an expert

The data stewards will notify the requestor, during the review meeting, if their request has been approved, denied, or is pending.

Biostatistical Services Expand answer

Faculty and staff in the Departments of Public Health Sciences and Statistics offer collaboration and consultation in biostatistics, epidemiology and data management to researchers and students which are in part supported by the Clinical and Translational Science Institute Biostatistics/Epidemiology/Research Design (BERD) Core.

Limited consultation is provided for research projects. Generally, no more than four hours is provided free of charge, after which the faculty or staff member handling your project can discuss the billing rates for additional collaboration.

Study Updates Expand answer

If approved, the requester will have access to the prep-to-research data for six months. After six months, a progress update will be requested. If an update is not received, the access account will be locked.

Virtual Desktop

Virtual Desktop Overview Expand answer

The CDM and de-identified SAS data sets are stored in HPC system located on the Penn State Health campus. The requestor will be provided access to the virtual desktop approximately five business days after the data request is approved.

The Department of Public Health Science at Penn State College of Medicine has created a captive Citrix-based research environment to encapsulate and secure access while working with external collaborators. The purpose of this document is to provide a basic description and functional orientation to the PHS Collaborative Research environment.

Computing Environment Expand answer

A least-privilege model for granting permissions and access within the captive environment is utilized. Domain Security Groups along with appropriate account membership enforces compliance. Only authorized system administrators have the ability to modify permissions, initiated through a formal service request by the PI. In addition, all administrative and access events are logged and retained according to PSU Security Policy for auditing purposes. This environment is further defined and controlled by Domain Group and Citrix User Policy assignments which can prevent the ability of any data leaving the captive environment.

For piloting purposes, a Collaborative Research Account Request Form is used to capture requirements for the given research project, collaborator details and confidentiality/HIPAA attestation.

The data stewards will sign off on each collaborator account request form as Manager/Supervisor/Chair or Sponsor.

Based on the requirements of the research project, the Virtual Research environment will be created based on two primary components:

  • Unique Domain Security Group to contain all collaborative research accounts for the project. This Security Group will be used to authorize access to specific data resources for the project.
  • Unique Citrix User Policy for which the Security Group will be assigned. This Policy defines specific Citrix resources available to the project members as well as restrictions which define the environment and maintain compliance. This will typically include restricting even clipboard copy/paste functionality to be only within the Citrix session, preventing the ability for any data to be saved, copied, or pasted to or from the captive research environment.

A project research environment can be defined to provide a number of user experiences, including a list of application(s) which, when selected, will appear to execute on their local desktop, or a full virtual desktop, or a combination. Regardless of the mode, only resources for which they are authorized to access will be available to them with their Citrix session.

Functional Orientation

This section provides a basic functional orientation to a captive research environment from a collaborator point of view.

Logging On Expand answer

Open the Penn State College of Medicine Public Health Sciences Citrix Logon gateway within your web browser: https://phs-citrix.hersheymed.net

The Citrix Receiver client is required to utilize this environment. If you do not currently have the Citrix Receiver client installed on your system, you can download/install it by clicking the “Download Receiver” link on the login page.

This image shows the Citrix logon screen. The option for Receiver Download is circled in red. The screenshot includes the Penn State Health logo and username and password fields.

This image shows the Citrix logon screen. The option for Receiver Download is circled in red.

First-Time Logon Expand answer

The first time you enter logon credentials, you will be prompted to change the password.

Your new password should contain the following:

  • Eight or more characters
  • One or more capital letters
  • One or more numbers
  • One or more special characters (@#$%!)

After entering your new password, you will be prompted a second time to confirm it.

If your new password was not entered correctly (confirmation did not match or it does not meet complexity requirements), you will see a login screen with username and password fields and an incorrect password error.

From here, you can enter your original Logon credentials to start over.

A screen shot shows the Penn State Health virtual desktop log on screen. The Penn State Health logo is on the left. On the right is the User Name and Password fields with a Log On button below it. Below the password field is a red circle with a white “x” in it with the words “Incorrect user name and password”, which is circled in red. This screen will appear if your new password does not meet the appropriate complexity requirements.

This screen will appear if your new password does not meet the appropriate complexity requirements.

Application Menu Expand answer

When your new password is successfully created, you will be logged in and will see your Citrix Application Menu.

A screen shot shows the Citrix Application Menu. The window has a header of Main listed, with a button of Select view and a reload button on the right. Below that it states Name and Description and lists available applications.

This screenshot shows the Citrix Application Menu.

From here, any individual software applications or desktop made available to you will be displayed and you can click on each application to launch it.

While using an individual application, it will “appear” to run on your local system, however, it will be running at Penn State within the captive Citrix environment, and only the interface will be displayed on your local system.

You will not be able to access any local device drives/folders/files from within the application, nor will you be able to pass or save data locally through the clipboard or application file save functions.

When you open (click) an application link within Citrix, you should see a bar at the bottom of your screen.

You may see a bar like this when opening a Citrix application. The screenshot shows a dialog box with the words Do you want to open or save launch.ica (1/67KB) from phs-citrix.hersheymed.net? Three options are presented in box-shaped buttons: Open, Save or Cancel.

You may also see a bar like this when opening a Citrix application.

You might see this bar at the bottom of your screen when opening a Citrix application. The screenshot shows a dialog bar. On the left is a box that says launch.ica with an up arrow. On the right is a button that says Show all.

You might see this bar at the bottom of your screen when opening a Citrix application.

If you see boxes that say “Open,” “Save” and “Cancel,” choose “Open.” If you see the name of a file ending in .ica in a box at left and “Show All” at right, click on the file name to open the application.

Program Launch Expand answer

Depending on your internet connection speed to the Penn State network, it could take several seconds to launch. You should receive a security disclaimer. Depending on the type of disclaimer, either click “OK” or “Permit Use.”

A screenshot shows a Citrix Receiver-Security Warning window. The window shows an orange shield with a white exclamation point. The text says An online application is attempting to access information on a device attached to your computer. Two options state Block access. Do not permit the application to use these devices. and Permit use. Permit the application to use these devices. The Permit Use option is marked in red.

If you receive this security warning, choose “Permit Use.”

A screenshot shows a virtual WindowsServer2008 Enterprise desktop with Windows logo and an OK button marked in red. Below, a virtual WindowsServer2008 Enterprise window shows the words Windows Logon. The window contents state PSU/PHS Security and This Computer System is for authorized use only. With your login you agree to the appropriate use of this computer system. See PSU Policy AD/20 regarding obligations/responsibility. If you receive this security message, click OK.

If you receive this security message, click OK.

File Access Expand answer

File access within the Citrix Desktop and any other applications will be made available to you as appropriate.

To work with PaTH to Health data, you will have access to “Random20,” a folder that contains the data tables for the 20 percent sample.

You will have Read and Write access to only this area from within your available applications to open data files, organize, and save your work. Users should place their codes and all interim datasets in the “Projects” folder.

A screenshot shows a window marked SAS in the window bar. Four yellow folders are in a left-hand panel: Contextual, Full Sample, Projects, Random 20. A right-hand panel shows copyright information in blue writing.

This screen shows the PaTH to Health dashboard, with various folders at left depending upon user access.

Logging Off Expand answer

At the end of each session, please remember to click “Log Off” before closing your web browser. This will help ensure that no inactive sessions remain on the Penn State Citrix server for your account.

A screenshot shows a Citrix XenApp application toolbar. A search bar is on the left side of the toolbar. On the right are three options: Messages, denoted with a graphic of an envelope; Settings, denoted with a graphic of a gear; and Log Off, denoted by a graphic of a closed lock. The Log Off option is marked in red.

Users must click “Log Off” at the end of each Citrix session.