Two machine learning models, XGBoost and RegCox, showed similarly superior performance in helping clinicians identify high-risk patients for gastrointestinal bleeding, an observational study found.
Examining patients prescribed antithrombotic drugs, both the regularized Cox proportional hazards regression (RegCox) and the extreme gradient boosting (XGBoost) models had the best performance in predicting gastrointestinal bleeding in the validation data set, with a 0.67 area under the curve (AUC) at 6 months and a 0.66 AUC at 1 year, though the RegCox had marginally better discrimination, reported Jeph Herrin, PhD, of Yale School of Medicine in New Haven, Connecticut, and colleagues.
However, the random survival forests model showed an AUC of 0.62 at 6 months and a AUC of 0.60 at 1 year, and existing model, showed an AUC of 0.60 and 0.59, respectively, the authors wrote in .
Implementing model approaches can help physicians mitigate risks associated with their treatment decisions when prescribing antithrombotics, such as direct oral anticoagulants, thienopyridine antiplatelets, or vitamin K antagonists, to patients with cardiovascular disease. The main goal of this study focused on incorporating advanced knowledge of gastrointestinal bleeding risk into clinical treatment decisions.
"We were able to construct a larger cohort with longer follow-up than, to our knowledge, had been used before in model development, and the cohort included patients who received treatment with a range of antithrombotic agents, thus improving the generalizability and clinical relevance of the results," the authors wrote, as could not compare more models with a larger data set.
All existing models do have limitations, the authors noted, such as an exclusion of contemporary drugs (second-generation antiplatelets), smaller datasets used for gastrointestinal bleeding in their model development, and an inability to add medical advancements in their algorithms.
"Although ML [machine learning] models can achieve superior performance, they are usually complex and thus sacrifice interpretability," wrote Fei Wang, PhD, of Weill Cornell Medical College in New York City in an . "Clinicians prefer to use models that they can understand and that align with their own experience and knowledge. This is an important reason why score-card type risk calculators, like HAS-BLED, are popular in clinical practice, despite the fact that their quantitative performances may not be high."
Herrin and colleagues used the models to predict elevated gastrointestinal bleeding in the validation group by assessing the AUC in constructed ROC curves in addition to using prediction density plots while analyzing the specificity, sensitivity, and positive predictive value. Machine learning models were to predict gastrointestinal bleeding risk at 6 months and 1 year.
Data came from over 306,000 adults in the American medical and pharmacy claims of the OptumLabs Data Warehouse (OLDW) from Jan. 1, 2016 to Dec. 31, 2019. Patients had no prescription in the last 12 months and a history of atrial fibrillation, ischemic heart disease or venous thromboembolism. Those at risk for gastrointestinal bleeding-related cancer were excluded. The main outcome was time in days to first gastrointestinal bleed.
Patients' average age was 69, just over half of participants were men, and over 60% were white. Black patients were more at-risk for gastrointestinal bleeding than whites and other races, and women were more at risk than men. There were 4% of participants with gastrointestinal bleeding during their 133-day follow-up.
A majority of participants were taking anticoagulants (57%), with 42% on antiplatelets. Most participants had hypertension (88%), while 46% were smokers, and 44% had valvular heart disease.
Among patients who experienced a GI bleed, 85.1% were taking antihypertensives, 61.6% were on anti-hyperlipidemic drugs, and 40.6% were taking proton pump inhibitors or gastroprotective agents.
For the RegCox model, the highest importance score variables included prior GI bleeding (0.72), "atrial fibrillation, ischemic heart disease, and venous thromboembolism combined" (0.38), and using gastroprotective agents (0.32).
"Although some machine learning models in our study showed better performance than traditional risk scores, the performance is modest (c statistic not very high). Also, the models seem to be better at identifying patients at low risk," Herrin told ľֱ in an email. "The modest performance indicates the tool might be better suited for the use as a supplementary tool to support clinical decision making in the context of other clinical information, rather than entirely relying on the model to make the decision."
Limitations of this study include the absence of uninsured or Medicare patient data since the trial used the OLDW claims database. Without these patient groups' data, findings cannot be entirely generalized to include all older patients. OLDW covers Medicare Advantage and private insurance.
"We will need to validate the algorithm in other settings or populations to see if the performance is similar, also consider embedding the algorithm into EHR to enable real-time risk prediction to help clinicians make decisions at the point of care," Herrin told ľֱ.
Disclosures
This research was funded by the Agency of Healthcare Research and Quality.
Herrin disclosed support from the Agency for Healthcare Research and Quality, CMS, Patient Centered Outcomes Research Institute, and the National Cancer Institute.
Other co-authors disclosed support from National Heart, Lung, and Blood Institute, National Institute on Aging, NIH, FDA, American Heart Association, Center for Medicare and Medicaid Innovation, Medical Device Innovation Consortium, and the National Science Foundation.
A coauthor reported support from AliveCor as well as being a formal investigator in a Medtronic trial.
Wang received funding from Sanofi, NIH, Office of Naval Research, National Science Foundation, IBM, American Air Liquide, Boehringer Ingelheim, and the Michael J. Fox Foundation for Parkinson's Research.
Primary Source
JAMA Network Open
Herrin J, et al "Comparative effectiveness for machine learning approaches for predicting gastrointestinal bleeds in patients receiving antithrombotic treatment" JAMA Netw Open 2021; DOI: 10.1001/jamanetworkopen.2021.10703.
Secondary Source
JAMA Network Open
Wang F "Machine learning for predicting rare clinical outcomes -- Finding needles in a haystack" JAMA Netw Open 2021; DOI: 10.1001/jamanetworkopen.2021.10738.