Respiratory Viral Phenotyping

Validated computable phenotypes for 8 respiratory pathogens in All of Us

Overview

Developed and validated computable phenotyping algorithms for 8 common respiratory viral infections using electronic health record data from the NIH All of Us Research Program.

Key Results

  • 79-97% accuracy across all 8 pathogens
  • 300,000+ patients identified with respiratory viral infections
  • Algorithms validated against chart review gold standard
  • Published in Scientific Reports (2025)

Pathogens Covered

Pathogen Accuracy Patients Identified
Influenza 97% 89,000+
RSV 91% 24,000+
COVID-19 95% 156,000+
Rhinovirus 85% 18,000+
Parainfluenza 82% 8,000+
Adenovirus 79% 5,000+
hMPV 84% 6,000+
Seasonal Coronavirus 81% 4,000+

Methods

The phenotyping algorithms combine:

  • Diagnosis codes (ICD-10-CM)
  • Laboratory results (PCR, antigen tests)
  • Temporal logic to distinguish acute infections from historical mentions

Validation was performed using manual chart review by clinicians with infectious diseases expertise.

Impact

These phenotypes enable researchers across the All of Us community to:

  • Study respiratory viral epidemiology at scale
  • Identify risk factors for severe outcomes
  • Investigate post-viral sequelae (e.g., Long COVID)
  • Analyze health disparities in respiratory infections

Publication

Waxse BJ, Tran TC, Mo H, Denny JC. Computable phenotypes to identify respiratory viral infections in the All of Us research program. Scientific Reports. 2025;15(1):18680. DOI: 10.1038/s41598-025-02183-9

Code Availability

Example notebooks demonstrating phenotype implementation are available in the All of Us Researcher Workbench.

References