Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the U.S. With High-Risk Non-Muscle-Invasive Bladder Cancer

鈥� An ASCO Reading Room selection

April 25, 2024

Vikram M. Narayan, MD, Despina Siolas, MD, PhD, Eric S. Meadows, PhD, Vladimir Turzhitsky, PhD, Arthur Sillah, MPH, PhD, Kentaro Imai, MD, MPH, Andrew J. McMurry, PhD, and Haojie Li, MD, PhD

JCO Clinical Cancer Informatics

This Reading Room is a collaboration between 木瓜直播庐 and:

Below is the abstract of the article.

Read the full article PDF by clicking here

or on the link below.

Purpose

Treatment of non–muscle-invasive bladder cancer (NMIBC) is guided by risk stratification using clinical and pathologic criteria. This study aimed to develop a natural language processing (NLP) model for identifying patients with high-risk NMIBC retrospectively from unstructured electronic medical records (EMRs) and to apply the model to describe patient and tumor characteristics.

Methods

We used three independent EMR-derived data sets including adult patients with a bladder cancer diagnosis in 2011-2020 for NLP model development and training (n = 140), validation (n = 697), and application for the retrospective cohort analysis (n = 4,402). Deep learning methods were used to train NLP recognition of medical chart terminology to identify seven high-risk NMIBC criteria; model performance was assessed using the F1 score, weighted across features. An algorithm was then used to classify each patient as high-risk NMIBC (yes/no). Manually reviewed records served as the gold standard.

Results

The F1 scores after model training were >0.7 for all but one uncommon feature (prostatic urethral involvement). The highest area under the receiver operating curves (AUC) was observed for Ta (0.897) and T1 (0.897); the lowest AUC was for carcinoma in situ (CIS 0.617). For high-risk NMIBC classification, positive predictive value was 79.4%, negative predictive value was 93.2%, and false-positive rate was 8.9%. Sensitivity and specificity were 83.7% and 91.1%, respectively. Of 748 patients manually confirmed as having high-risk NMIBC, 196 (26%) had CIS (of whom 19% also had T1 and 23% also had Ta disease); 552 tumors (74%) had no associated CIS.

Conclusion

The NLP model, combined with a rule-based algorithm, identified high-risk NMIBC with good performance and will enable future work to study real-world treatment patterns and clinical outcomes for high-risk NMIBC.

Read an interview about the study here.

Read the full article

Evaluation of a Natural Language Processing Model to Identify and Characterize Patients in the U.S. With High-Risk Non-Muscle-Invasive Bladder Cancer

Primary Source

JCO Clinical Cancer Informatics

Source Reference:

木瓜直播