This study presents PrimDx, a diagnostic test that applies machine learning (ML) to whole blood transcriptomic data to improve early detection and diagnosis of primary immunodeficiency (PID). Delays in PID diagnosis are common and contribute to preventable complications and morbidity, highlighting the critical need for accurate, accessible, and timely early-stage testing.
To demonstrate utility for PrimDx in a clinical setting, this study incorporated participants with bronchiectasis, a presentation common for PID patients.
Whole blood RNA sequencing was performed on samples collected from 66 individuals (aged 2–67) with antibody deficiencies and 89 controls. Both groups included participants with non-cystic fibrosis bronchiectasis (NCFB; 7 PID, and 15 controls). Over 15,000 genes were quantified per sample, providing a comprehensive transcriptional profile. The data underwent systematic preprocessing and feature selection using Least Absolute Shrinkage and Selection Operator (LASSO) to extract key predictive markers. Many of these markers are not well characterized and appear to be predominantly expressed in lymphocytes.
Multiple ML models were built and assessed using a structured pipeline that applied class balancing, feature scaling, and 5-fold cross-validated grid-search tuning. Models were trained on 80% of the dataset and evaluated on a blinded 20% holdout set. Among the models, a Feature Subspace Ensemble method with logistic regression achieved the strongest performance, with 90% accuracy, 93% F1-score, 95% receiver operating characteristic (ROC)-area under the curve (AUC), and 95% average precision, including robust performance across previously unseen datasets with batch variability. Among participants with NCFB, the model identified antibody deficiency with 93% accuracy.
These findings demonstrate the feasibility and potential clinical utility of integrating RNA sequencing with calibrated ML models to support earlier, more accurate PID diagnosis, including in NCFB patients, potentially enabling timelier treatment and improved patient outcomes. Ongoing work aims to expand the transcriptomic reference library to improve diagnostic accuracy for common PID presentations and enhance clinical applicability.

