Abstract WP312: Identifying Acute Ischemic Stroke by Analyzing Icd-10 Claims Data Using Machine Learning Models
Introduction: Retrospective identification of patients hospitalized with new diagnosis of acute ischemic stroke is important for administrative quality assurance, post-discharge clinical management, and stroke research. The benefit of using administrative claims data is its widespread availability, but the disadvantage is in the inability to accurately and consistently identify the clinical diagnosis of interest.
Hypothesis: We hypothesized that decision tree and logistic regression models could be applied to administrative claims data coded using International Classification of Diseases, version 10 (ICD-10) to create algorithms that could accurately identify patients with acute ischemic stroke.
Methods: We used hospital records from our institution to develop a gold standard list of 243 patients, continuously hospitalized with a new diagnosis of stroke from 10/1/2015 to 3/31/2016. We used 1,393 neurological patients without a diagnosis of stroke as negative controls. This list was used to train and test two machine learning methods of diagnosis and procedure codes analysis, for the purpose of ischemic stroke identification: one using classification and regression tree (CART) and another using regularized logistic regression. We trained the models using 75% of the data and performed the evaluation using the remaining 25%.
Results: The CART model had a κ=0.78, sensitivity of 96%, specificity of 90%, and a positive predictive value of 99%. The regularized logistic regression model had a κ=0.73, sensitivity of 97%, specificity of 81%, and a positive predictive value of 98%.
Conclusion: Both the decision tree and logistic regression machine based learning models showed very high accuracy in identifying patients with a new diagnosis of ischemic stroke, using ICD-10 code claims data, when compared to our gold standard. Applying these machine learning models to identify patients with ischemic stroke has widespread applications, especially in this period where national billing data has transitioned from ICD-9 to ICD-10 codes.
Author Disclosures: C. Esenwa: Research Grant; Modest; NINDS Institutional National Research Service Award (T32) training grant in neuroepidemiology. J. Luna: None. B. Kummer: None. H. Salmasian: None. D. Vawdrey: None. H. Kamel: None. M. Elkind: None.
- © 2017 by American Heart Association, Inc.