Skip to content

Introduction to i2b2: Informatics for Integrating Biology and the Bedside

Information on this page comes from the “Introduction to i2b2” presentation given by Terri Shkuda, i2b2 Administrator, Research Informatics, Penn State College of Medicine.

Jump to topic


Introduction to i2b2

What is i2b2? Expand answer

i2b2 stands for:

informatics for
biology and the

It was developed at Harvard Partners Healthcare through a CTSA grant.

It is a simple user interface to query selected clinical and billing data from Penn State Health care delivery from 1997 to present.

i2b2's Fundamental Purpose Expand answer

Cohort identification: Users search a de-identified database, without IRB approval, to determine the existence of a set of patients meeting specified criteria.

The data are presented as unique patient counts. This means a patient is counted exactly once if they ever met the criteria specified by the query.

Research Tool Continuum Expand answer

i2b2 is part of a research tool continuum that includes:

  • Hypothesis/feasibility testing: i2b2
  • Collaboration: Pure
  • Data capture: REDCap
  • Analysis: Statistics tools such as Excel, SPSS, SAS and R
  • Publication: PubMed and other sources (see Pure for details)
What is the Source of the Data in i2b2? Expand answer

Data Sources

  • Cerner EMR (updated daily)
  • Eclipsys and Signature billing systems (updated daily)
  • Cancer Registry (updated monthly)
  • Personalized Medicine (updated monthly)

Data Warehouse

All of that information flows to Oracle Health Foundations (the data warehouse), where it goes through the de-identification process.

From there, de-identified information is passed into Penn State’s i2b2.

What is contained in the data? Expand answer
  • De-identified data for more than 1,700,000 unique patients
  • Basic demographics (age, ethnicity, gender, language, marital status, race, religion, vital status)
  • Diagnoses and procedures (inpatient) coded with ICD-9 for Penn State Health Milton S. Hershey Medical Center billing
  • Diagnoses and procedures (inpatient) coded with ICD-10 for Penn State Health Milton S. Hershey Medical Center billing
  • Procedures coded with CPT (outpatient)
  • Lab tests performed by Penn State Health Milton S. Hershey Medical Center
  • Medications (administered during inpatient, med list reported at outpatient)
  • Personalized Medicine (sample availability, consent for research, tobacco)
  • Visit types (inpatient, outpatient, emergency, same-day care, etc.)
  • Providers (name, department, division, NPI, gender)
  • Cancer Registry
  • Key dates associated with each event above
Use Cases, Strengths and Limitations Expand answer

Intended uses

  • Cohort identification for research
  • Not intended for clinical workflow

Strengths and Limitations

  • Quick turnaround for count of patients
  • Immediate export capability of many concepts (de-identified)
  • Visualization tools for looking at demographic data in aggregate and timeline of events
  • Useful for retrospective studies using source data for secondary use
  • No imaging or narrative data
  • Works best when you know the underlying codes
What i2b2 Does Not Contain Expand answer
  • Summaries and clinic notes (free-form text)
  • Narrative reports (e.g., radiology, surgical pathology, operative reports)
  • Images (e.g., digital X-rays, EKGs, scanned documents)
  • Microbiology data (not yet recorded discretely in the EMR)
  • Family history and medical history (unless coded for billing purposes)
  • Genomic data

Coming Soon

  • Department location
  • Height, weight, BMI, systolic and diastolic BP, temperature, heart rate
  • Smoking assessments
  • ICD-o for oncology
  • Pulmonary function tests
  • Problem lists
From Hypothesis to Data Collection Expand answer
  • Develop queries and refine for cohort
  • Use analysis tools to review data, then refine the query as needed
  • Export data for preliminary exploration and analysis, or request a Standard Data Set or custom data set from Decision Support based on query
    • Without IRB approval: de-identified data set
    • With IRB approval: data set with identifiers
i2b2 Tool Panel Expand answer

A screenshot of the i2b2 application shows a wide left column and three narrower columns to the right. The left column is divided into three groups labeled Concept Group, Workplace and Previous Queries. The three narrower columns to the right are part of the Query Tool.

Left Panel

The left panel will be available for both the “Find Patients” and “Analysis” tools using i2b2.

This left panel contains the following:

Concept Group: Lists all data concepts found within i2b2

Workplace: Shows favorites/bookmarked concepts/queries

Previous Queries: Lists all of your prior queries

Query Tool

The right panel, the query tool, shows groups to gather your concepts for searching (ANDs and ORs) as well as results and graphs.

Searching for Terms

i2b2’s Navigate Terms and Find Terms functions at the top left allow you to select your query terms.

Resources for Code Lookups Expand answer
A Quick Cohort Example Expand answer

Looking for: Patients with the diagnosis of strep throat and scarlet fever

  • Find the ICD diagnosis for Streptococcal Sore Throat and Scarlet Fever using
    Find Terms, Search by Names; then, query for all patients with the ICD diagnosis

    • Approx. 16,150 patients
  • Date-restrict the search for 1/1/2013 to 03/01/2013
    • Approx. 259 patients
  • Find the CPT for strep test searching for the “CPT rapid strep test” on Google, then search using Find Terms, Search by Codes, CPT (87880); then, drag the Rapid Strep CPT code (87880) to your Group 2 query
    • Approx. 97 patients
  • Apply a temporal constraint of same financial encounter for both the diagnosis and procedure
    • Approx. 28 patients
Additional Query Refinement Expand answer

i2b2 provides different ways to refine your queries. For example:

  • Temporal constraint: Constrains the timing of observed concepts for all groups.
  • Date constraints: Constrains dates for all terms within the group.
  • Number of occurrences: (i.e., occurs more than 4 times) The only patients returned would be those who had five or more occurrences in that time period within the group.
  • Exclude: Will exclude all patients who have observations/demographics within the group.
Now that I Have Enough Patients in the Cohort, What's Next? Expand answer
  • Rerun the same query requesting patient set, by checking Patient List.
  • Use visualization tools.
    • Demographic composition shows the distribution of age, sex, race, and vital status.
    • Timeline plugin depicting temporal relationships among its “concepts” (diagnoses, meds, lab tests) with timing of their occurrence.
    • CARE (Cohort Analysis and Refinement Expeditor) provides histograms of demographics breakdowns along with selected subset of specific concepts.
    • Export de-identified data.
  • Timeline, CARE and Export tools will allow for additional data, not considered as part of the filter criteria. All data is available for that cohort.
  • Request data set from Decision Support.
    • Turnaround time will be 5 to 7 business days for the Standard Data Set.
    • The dataset you receive will be a standard data set, either de-identified or with HIPAA identifiers depending on what you have rights to see (IRB-determined or non-research)
i2b2 Analysis Tools Example Expand answer

CARE (Cohort Analysis and Refinement Expeditor)

  • Switch from Find Patients to Analysis Tools (make this selection at the very top right section of the page
  • From the Analysis Tools at the top:
    • Select CARE (Cohort Analysis and Refinement Expeditor) – Concept Demographics Histograms
    • For Patient Set, use the last query, strep with scarlet fever patient set, and choose the Patient Set in the Previous Queries tab.
    • Drag the Patient Set to the Patient Set field in the Analysis Tool window.
    • Choose a concept for the analysis.
    • Select the View Results tab.
i2b2 Data Export Example Expand answer

From the Specify Data tab:

  • At the top of the screen, under Analysis Tools, select ExportXLS.
  • Drag the strep with scarlet fever patient set into the Patient Set box.
  • Locate Strep Test in your Workplace, then drag this to the Concept(s) box.
  • Navigate the Ontology (Medications) pane to locate Amoxicillin, then drag this parent folder to the Concept(s) box.
  • Under Output Options:
    • For Formatting, select one row per observation (detailed, 1 column per observation)
    • For Demographic data, select a few (e.g., Sex, Age)
  • Under Options (may cause long running time):
    • Select Resolve Concept/Modifier Codes

Select the View Results tab, then wait until the export is displayed.

Requesting Standard Data Set Expand answer

Once you are satisfied with your patient set, you may request a Standard Data Set from Decision Support, which will encompass all dates for the patient.

Locate your query in the Previous Queries or Workplace window and copy the entire name to clipboard.

Submit a Report Request to IT:

  • On the Infonet home page, choose Request a Report at the bottom of the page.
  • Click on Report and Service Request Form.
  • Log in to ServiceNow using your ePass credentials (hersheymed\ePass)
  • Complete the Report Request as follows:
    • Type of Request: New Report
    • Information Source: Other
    • Do you intend to use any of the information obtains from EIM to directly contact patients or research subjects? If answering yes, EIM will ensure that the patients found are not deceased.
    • Date Report Needed: For Standard Data Set, allow for at least five to seven business days; for custom report, allow for four weeks.
    • Reason for Request: Research
    • Research Request Type: Choose the appropriate funding selection. Complete the remaining subfields with type of research and IRB number (if requesting PHI) and grant number.
    • Provide a detailed description of your request including data fields: Identify this as an i2b2 query and including the query name (paste from clipboard), requesting a standard data set. If you wish to have a subset of the standard data set, then indicate the items and Decision Support will remove the unwanted data.
    • Click Submit.

The report (spreadsheets) will appear in your LaunchPad inbox and a representative from Decision Support will contact you:

Find the LaunchPad inbox via Myapps (found in the Start menu) > login > Apps > LaunchPad > Documents > Inbox.