A Site-Specific LLM-Based PHI Redaction Tool to Support Note De-Identification for the United States Immunodeficiency Network (USIDNET)

Vichare, Vaibhavi; Molina, Monica; D'Aiello, Russell; Kilich, Gonench; Rundles, Charlotte Cunningham; Abraham, Roshini; Puck, Jennifer; Fuleihan, Ramsay; Notarangelo, Luigi; Marsh, Rebecca; Sullivan, Kathleen

doi:10.70962/CIS2026abstract.25

Article navigation

Meeting Abstract| CIS Meeting Abstracts 2026| May 01 2026

A Site-Specific LLM-Based PHI Redaction Tool to Support Note De-Identification for the United States Immunodeficiency Network (USIDNET)

Vaibhavi Vichare,

Vaibhavi Vichare

1Children’s Hospital of Philadelphia

Search for other works by this author on:

This Site

PubMed

Google Scholar

Monica Molina,

Monica Molina

1Children’s Hospital of Philadelphia

Search for other works by this author on:

This Site

PubMed

Google Scholar

Russell D'Aiello,

Russell D'Aiello

1Children’s Hospital of Philadelphia

Search for other works by this author on:

This Site

PubMed

Google Scholar

Gonench Kilich,

Gonench Kilich

1Children’s Hospital of Philadelphia

Search for other works by this author on:

This Site

PubMed

Google Scholar

Charlotte Cunningham Rundles,

Charlotte Cunningham Rundles

2Icahn School of Medicine at Mount Sinai New York City

Search for other works by this author on:

This Site

PubMed

Google Scholar

Roshini Abraham,

Roshini Abraham

3Nationwide Children’s Hospital

Search for other works by this author on:

This Site

PubMed

Google Scholar

Jennifer Puck,

Jennifer Puck

4Division of Allergy/Immunology and Blood and Marrow Transplantation University of California, San Francisco, Department of Pediatrics

Search for other works by this author on:

This Site

PubMed

Google Scholar

Ramsay Fuleihan,

Ramsay Fuleihan

5Division of Allergy, Immunology and Rheumatology, Department of Pediatrics, Columbia University Irving Medical Center

Search for other works by this author on:

This Site

PubMed

Google Scholar

Luigi Notarangelo,

Luigi Notarangelo

6National Institutes of Health

Search for other works by this author on:

This Site

PubMed

Google Scholar

Rebecca Marsh,

Rebecca Marsh

7Cincinnati Children’s Hospital Medical Center

Search for other works by this author on:

This Site

PubMed

Google Scholar

Kathleen Sullivan

1Children’s Hospital of Philadelphia

Search for other works by this author on:

This Site

PubMed

Google Scholar

Author and Article Information

Vaibhavi Vichare

1Children’s Hospital of Philadelphia

Monica Molina

1Children’s Hospital of Philadelphia

Russell D'Aiello

1Children’s Hospital of Philadelphia

Gonench Kilich

1Children’s Hospital of Philadelphia

Charlotte Cunningham Rundles

2Icahn School of Medicine at Mount Sinai New York City

Roshini Abraham

3Nationwide Children’s Hospital

Jennifer Puck

4Division of Allergy/Immunology and Blood and Marrow Transplantation University of California, San Francisco, Department of Pediatrics

Ramsay Fuleihan

5Division of Allergy, Immunology and Rheumatology, Department of Pediatrics, Columbia University Irving Medical Center

Luigi Notarangelo

6National Institutes of Health

Rebecca Marsh

7Cincinnati Children’s Hospital Medical Center

Kathleen Sullivan

1Children’s Hospital of Philadelphia

Online ISSN: 3065-8993

2026

Vichare et al.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License .

J Hum Immun (2026) 2 (CIS2026): eCIS2026abstract.25.

https://doi.org/10.70962/CIS2026abstract.25

The United States Immunodeficiency Network (USIDNET) is a registry that collects de-identified patient data from hospitals across the country to study rare immune conditions, including inborn errors of immunity (IEI). This registry contains both structured data and clinical notes, such as pathology reports and imaging narratives. These text fields may include protected health information (PHI), which must be removed before the data can be added to the registry. To address this, we have developed a tool called “PHIdentifier.” Unlike manual de-identification, which is time-consuming and error prone, this tool is designed specifically for clinical notes and automatically removes PHI from the text. It allows valuable clinical details to remain available for research while still protecting patient privacy and supporting better patient care.

PHIdentifier runs on a secure, high-performance computing (HPC) environment to efficiently process large volumes of text data. It uses the Qwen-2.5-7B-Instruct large language model (LLM), combined with rule-based checks, to handle complex text patterns and ensure consistent de-identification across different types of note fields. The workflow starts with standard text preprocessing by organizing and preparing notes to ensure they are clean, structured, and ready for further analysis. The tool then performs a multilayered de-identification process, using carefully crafted prompts to instruct the model to detect PHI from the text. The model’s responses are combined with rule-based checks to ensure that only sensitive information is replaced with placeholders, preserving all other clinical content. We perform additional quality checks to ensure data accuracy and consistency across notes, creating a reliable process that converts unstructured text into fully de-identified information.

PHIdentifier was tested on 3,000 narrative and pathology notes, achieving a precision of approximately 97.9–98.1% and a recall of 95.7–97.1%. Nearly all flagged items were true PHI, with only a moderate number of nonsensitive elements over-redacted, and very few direct identifiers, such as hospital names, locations, or years, were missed. This strong performance enables the tool to improve the overall quality and completeness of the registry dataset for rare immunodeficiency disorders. These results show how an LLM de-identification tool can make data collection more efficient and help protect patient privacy.

2026

Vichare et al.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License .

A Site-Specific LLM-Based PHI Redaction Tool to Support Note De-Identification for the United States Immunodeficiency Network (USIDNET)

Suggested Content

Email alerts

Sharing Unavailable