Convergent Validity and Interrater Reliability of Estimating the ABCD2 Score From Medical Records
Background and Purpose—The ABCD2 score is increasingly used for risk stratification of transient ischemic attack patients. We sought to determine the reliability and convergent validity of retrospective ABCD2 score estimation from medical records.
Methods—We compared ABCD2 scores that were prospectively determined by a vascular neurology attending to scores determined retrospectively from medical record review. Emergency department records and neurology consult notes for patients with acute transient ischemic attack were abstracted with explicit ABCD2 scoring redacted. Scores were estimated by 2 independent raters using these records. Estimated ABCD2 component scores, total scores, and risk category were compared both between retrospective raters and with prospectively obtained scores. Reliability was assessed using unweighted κ statistics.
Results—Interrater reliability was substantial with 72% exact agreement in total score between retrospective raters (κ=0.64) and nearly perfect with 82% agreement for ABCD2 category (κ=0.71). Interrater agreement was best for age and diabetes mellitus and poorest for clinical features and duration. Agreement between the retrospective raters and prospectively obtained score was >90% for age, blood pressure, and diabetes mellitus, but only ≈70% for clinical features and duration. Retrospectively, estimated total ABCD2 score exactly matched the prospective score in 58% of patients for rater 1 and 44% of patients for rater 2. Retrospectively, estimated ABCD2 category matched the prospectively scored category in 67% of patients for rater 1 and 71% of patients for rater 2.
Conclusions—The ABCD2 score can be abstracted from medical records with substantial interrater reliability but limited convergent validity. This may lead to misclassification of risk category in more than one third of patients.
Accurate risk stratification of patients with transient ischemic attack (TIA) is critically important to facilitate efficient evaluation and management of these patients. To this end, clinical risk scores, such as the ABCD2 score, have been developed to help identify patients at highest risk of stroke during the early period after TIA. The ABCD2 score is based on age, blood pressure, clinical features, duration of symptoms, and presence of diabetes mellitus. It has been validated in observational studies, both by the initial group developing the score and by independent groups, with mixed results. Although prospective ABCD2 scoring is used to aid clinical decision making, retrospective abstraction is frequently used both for research purposes and score validation. Several validation studies have either used retrospective extraction of ABCD2 scoring from medical records or failed to explicitly state how ABCD2 scoring was obtained.1–6 However, the accuracy of retrospective abstraction of the score from medical records has never been established. The aim of the current study was to assess the convergent validity (accuracy compared with a prospectively assigned score) and interrater reliability of retrospective estimation of the ABCD2 score from medical records.
Patients with suspected TIA who presented to the emergency department at our hospital between December 2010 and March 2012 and who were seen by a board-certified vascular neurology attending were included. TIA was defined in the traditional sense, as acute onset of focal cerebral or monocular symptoms lasting <24 hours and presumed to be due to a vascular cause. Patients were included only if there was complete resolution of symptoms at the time of emergency department presentation or hospital admission. All patients for whom there was sufficient clinical suspicion to justify diagnostic testing for a neurovascular cause were eligible for inclusion. This study was approved by the University of Pennsylvania Institutional Review Board. At our institution, standard attending initial evaluation templates for all patients with TIA include ABCD2 scoring, and attending physicians were explicitly instructed to perform ABCD2 scoring themselves and not rely on housestaff reports. The ABCD2 score prospectively calculated by the vascular neurology attending was considered the gold standard.
Medical records from the emergency department as well as the initial neurology resident consultation note were redacted for patient identifiers as well as any explicit ABCD2 score documentation. Records were photocopied and distributed to 2 neurologists, who independently determined each component of the ABCD2 score and calculated the total score. Agreement between the 2 retrospective raters was assessed, as was agreement between retrospective and prospective raters, for both the total score as well as the individual component items of the score.
Analysis of interrater agreement for the retrospective scorers used weighted and unweighted κ scores assessing individual ABCD2 items, total ABCD2 score, and ABCD2 category (0–3, 4–5, 6–7). Comparison of each retrospective scorer with the prospective attending score (gold standard) was done using percent agreement.
Raters retrospectively reviewed medical records for 102 TIA patients. Mean patient age was 62 years; 53% were female. Based on prospective neurovascular attending scores, 33% of patients were categorized as low risk (ABCD2 score, 0–3), 58% as moderate risk (4–5), and 9% as high risk (6–7).
Interrater reliability of retrospective raters was substantial with 72% exact agreement in total score between raters (κ=0.64) and nearly perfect with 82% agreement for ABCD2 category (κ=0.71). Retrospectively, estimated total ABCD2 score exactly matched the prospective attending score in 58% of patients for rater 1 and 44% of patients for rater 2; retrospective ABCD2 category matched the prospectively scored category in 67% of patients for rater 1 and 71% of patients for rater 2. The Table summarizes this data and presents agreement between raters for the individual ABCD2 components.
In our cohort, the ABCD2 score was abstracted from medical records with substantial interrater reliability but limited convergent validity. Clinical features (C) and duration (D) display the greatest variability among all ABCD2 components in retrospective evaluation. These are also the only 2 components that rely on subjective history and patient self-report. Together, they comprise 4 of the 7 possible points of the ABCD2 score. In risk stratification of TIA patients in clinical practice, the ABCD2 category rather than the exact score is often used to determine short-term risk of ischemic stroke, and division of the score into 3 categories corresponding to low (0–3), moderate (4–5), and high (6–7) risk. We found that roughly one third of patients were misclassified for ABCD2 category based on retrospective scoring.
Estimation of clinical scores based on medical record review is widespread, but the accuracy of this approach needs to be confirmed for each individual clinical score. Formal evaluation of the National Institutes of Health stroke scale, for example, has demonstrated that it can be abstracted from medical records with a high degree of reliability and validity.7
Our results on the validity of retrospective abstraction of the ABCD2 score from medical records serve as a cautionary note. Our study has several limitations. The patient cohort came from a single tertiary care center, only patients presenting to the emergency department were included, and the vascular neurology attending was assumed to be the gold standard. Medical records reviewed by raters included neurology resident consult notes, which, given considerable exposure to the ABCD2 score during training, might have been biased to include more relevant historical detail than a typical generalist physician or community-based neurologist might include in their documentation. This suggest that, if anything, our results might reflect an overly optimistic assessment of the validity of the ABCD2 score. Finally, we did not assess the actual predictive value of ABCD2 scoring for short-term stroke risk (criterion validity), so the clinical significance of our findings remain uncertain. It is possible that more accurate ABCD2 scoring obtained prospectively might result in better performance of the ABCD2 score at predicting risk.
Ralph L. Sacco, MD, MS, was guest editor for this article.
- Received August 30, 2012.
- Revision received November 13, 2012.
- Accepted December 4, 2012.
- © 2013 American Heart Association, Inc.
- Josephson SA,
- Sidney S,
- Pham TN,
- Bernstein AL,
- Johnston SC
- Yang J,
- Fu JH,
- Chen XY,
- Chen YK,
- Leung TW,
- Mok V,
- et al
- Giles MF,
- Rothwell PM
- Kasner SE,
- Chalela JA,
- Luciano JM,
- Cucchiara BL,
- Raps EC,
- McGarvey ML,
- et al