Validity of Hospital Discharge Diagnosis Codes for Stroke
The Atherosclerosis Risk in Communities Study
Background and Purpose—Characterizing International Classification of Disease 9th Revision, Clinical Modification (ICD-9-CM) code validity is essential given widespread use of hospital discharge databases in research. Using the Atherosclerosis Risk in Communities (ARIC) Study, we estimated the accuracy of ICD-9-CM stroke codes.
Methods—Hospitalizations with ICD-9-CM codes 430 to 438 or stroke keywords in the discharge summary were abstracted for ARIC cohort members (1987–2010). A computer algorithm and physician reviewer classified definite and probable ischemic stroke, intracerebral hemorrhage, and subarachnoid hemorrhage. Using ARIC classification as a gold standard, we calculated the positive predictive value (PPV) and sensitivity of ICD-9-CM codes grouped according to the American Heart Association/American Stroke Association (AHA/ASA) 2013 categories and an alternative code grouping for comparison.
Results—Thirty-three percent of 4260 hospitalizations were validated as strokes (1251 ischemic, 120 intracerebral hemorrhage, 46 subarachnoid hemorrhage). The AHA/ASA code groups had PPV 76% and 68% sensitivity compared with PPV 72% and 83% sensitivity for the alternative code groups.
The PPV of the AHA/ASA code group for ischemic stroke was slightly higher among blacks, individuals <65 years, and at teaching hospitals. Sensitivity was higher among older individuals and increased over time. The PPV of the AHA/ASA code group for intracerebral hemorrhage was higher among blacks, women, and younger individuals. PPV and sensitivity varied across study sites.
Conclusions—A new AHA/ASA discharge code grouping to identify stroke had similar PPV and lower sensitivity compared with an alternative code grouping. Accuracy varied by patient characteristics and study sites.
Current data may be inadequate to monitor the national incidence of cerebrovascular disease,1,2 a leading cause of death and disability in the United States.3 One approach for national surveillance is capturing International Classification of Disease, 9th Revision, Clinical Modification (ICD-9-CM) codes from hospital discharge and claims databases (administrative data).4–6 Characterizing the validity of ICD-9-CM codes is essential given their widespread use for surveillance and for epidemiological and health services research.7–11 Estimates of the accuracy of these codes can be used in sensitivity analyses to account for the misclassification of stroke events in administrative data.11 Documenting coding accuracy over time is particularly important to understand the potential effect on temporal trends in stroke incidence estimated from administrative data.12
Estimates of the validity of ICD-9-CM codes for stroke vary depending on the codes investigated4,11 and by patient and hospital characteristics.13–17 In 2013, the American Heart Association/American Stroke Association (AHA/ASA) published an updated definition of stroke, including ICD-9-CM codes grouped according to stroke subtypes (ischemic stroke, intracerebral hemorrhage [ICH], and subarachnoid hemorrhage [SAH]).18 The accuracy of these code groups for identifying stroke has not been reported. Also, the positive predictive value (PPV) of ICD-9-CM codes by patient sex and age has rarely been assessed,14,19 variation by race/ethnicity has not been explored, and most studies to date were conducted in a single geographic location.11
Using the Atherosclerosis Risk in Community (ARIC) Study, we assessed the accuracy of hospital discharge ICD-9-CM coding of stroke. We estimated the PPV and sensitivity of the AHA/ASA code groups compared with previously validated alternative code groups11,20 overall and by stroke subtype (ischemic, ICH, and SAH). We characterized variation in ICD-9-CM code accuracy by patient sex, race, age, geographic location, hospital type (teaching versus nonteaching), and ICD-9-CM code position (first versus any position). We further investigated temporal trends in the accuracy of codes from 1991 to 2010.
The ARIC Study design is well documented.21 Briefly, a population-based cohort of 15 792 individuals aged 45 to 64 years was recruited in 1987–1989 in 4 communities: Washington County, Maryland; suburbs of Minneapolis, Minnesota; Jackson, Mississippi; and Forsyth County, North Carolina. This analysis included eligible hospitalizations of ARIC cohort members occurring from enrollment through December 31, 2010, or date of last contact if deceased or lost to follow-up.
Identification of Stroke Events
Hospitalizations and deaths were ascertained via annual follow-up phone calls, study examinations, and surveillance of hospital discharges in ARIC communities. Hospitalizations meeting ≥1 of the following criteria were eligible for medical record abstraction: (1) a discharge diagnosis ICD-9-CM code 430 to 438 (1987–1996) or 430 to 436 (since 1997); (2) ≥1 stroke-related keywords (see online-only Data Supplement) in discharge summary; or (3) diagnostic computed tomography or magnetic resonance imaging scan with cerebrovascular findings or admission to the neurological intensive care unit. A trained nurse abstracted records for each eligible hospitalization, including ≤21 ICD-9-CM discharge codes (see online-only Data Supplement).
A computer algorithm and physician reviewer independently classified each event according to criteria adapted from the National Survey of Stroke.22 A second physician reviewer adjudicated in cases where the computer and initial reviewer disagreed.
A definite or probable stroke was defined as a sudden and rapid onset of neurological symptoms lasting >24 hours or leading to death in the absence of evidence for a nonstroke cause (see online-only Data Supplement). Events that did not meet these criteria were classified as possible stroke of undetermined type, out-of-hospital fatal stroke, or no stroke. Definite and probable strokes were classified further as SAH, ICH, or ischemic stroke (including embolic and thrombotic brain infarction; Table I in the online-only Data Supplement).23
ICD-9-CM Code Groups
ICD-9-CM codes were grouped according to stroke subtype using 2 approaches. First, we used ICD-9-CM codes matched to stroke subtypes by the AHA/ASA in 2013.18 We excluded codes for spinal and retinal infarcts (336.1, 362.31, and 362.32) because these events (n=3) were not validated in ARIC. Second, we used an alternative grouping of ICD-9-CM codes with high PPV and sensitivity in previous studies.5,11,20 The primary difference between the 2 code groupings was exclusion of ICD-9-CM codes 432 and 436 from the AHA/ASA code group.
Using ARIC classification as a gold standard, we estimated the PPV and sensitivity of the AHA/ASA and alternative code groups. PPV was the proportion of validated strokes among all hospitalizations with a given ICD-9-CM code group. Sensitivity was the proportion of hospitalizations with a given code group among validated strokes. In total, 216 hospitalizations were identified for validation in ARIC based on keywords without any stroke-related ICD-9-CM codes. Of these, 18 (8%) were validated as strokes, and these were included in analyses as false negatives. Code 486 (pneumonia, organism unspecified) was the most common primary ICD-9-CM code among these 216 events.
We calculated the PPV and sensitivity of code groups for ischemic stroke and ICH stratified by patient and hospital characteristics. Stratified analyses were not conducted for SAH because of a small number of events. Patient characteristics were sex, age at event (<65 years and ≥65 years), race (black and white), and study center. Analyses stratified by race included only hospitalizations of white participants and black participants from Forsyth Co. and Jackson (n=4235). Hospitalization characteristics included ICD-9-CM code position (first and any position), incident versus recurrent event, and teaching status. Among hospitalizations with symptoms present for ≥24 hours (n=3250), incident events were those with no history of stroke or transient ischemic attack recorded in the medical record. Hospital teaching status was determined by the presence of full-time internal medicine residents for hospitalizations (n=3936) that occurred at 31 hospitals located within ARIC Study communities.
To assess temporal trends, we calculated the PPV and sensitivity of the code groups from 1987 to 2010. Binomial regression was used to estimate the age-adjusted trend in the PPV of the AHA/ASA code group for ischemic stroke by sex and race from 1991 to 2010 because no ischemic strokes with AHA/ASA-identified ICD-9-CM codes occurred before 1991. Confidence intervals were calculated using the exact method. Analyses were performed using SAS 9.3 (Cary, NC).
A total of 4318 stroke-eligible hospitalizations among 2533 persons were identified. Fifty-eight were excluded: 19 out-of-hospital fatal strokes, 24 possible strokes of undetermined type, 2 transfers from acute care facilities, and 13 hospitalizations for which ICD-9-CM codes were not available. Of 4260 remaining hospitalizations (among 2516 persons), 1417 (33%) were classified as definite or probable strokes in ARIC. By subtype, there were 1251 ischemic strokes, 120 ICH, and 46 SAH. The remaining 2843 events included rehospitalizations for prior stroke, transient ischemic attacks, and borderline events that did not meet ARIC clinical criteria.
The PPV of individual ICD-9-CM codes 430 to 438 in any position ranged from 2% to 79% (Table 1). Among hospitalizations with a stroke-related ICD-9-CM code in the first position (n=2521), the PPV increased by an average of 7% (Table II in the online-only Data Supplement).
AHA/ASA Code Group
Thirty percent (1275 of 4260) of eligible hospital discharges included ICD-9-CM codes in the AHA/ASA code group (Table 2). The AHA/ASA code group had a PPV of 76% (range 57%–76% by stroke subtype) and 68% sensitivity (range 64%–93% by stroke subtype).
Six percent (276 of the 4260) of hospitalizations occurred among individuals with no new symptoms at admission, but new symptoms during hospitalization. The PPV of the AHA/ASA code group to correctly identify these in-hospital events was similar to events presenting with symptoms at admission, 74% and 77% respectively, and the sensitivity was identical (64%).
Alternative Code Group
In total, 39% (1649 of 4260) of hospital discharges included ICD-9-CM codes in the alternative code group (Table 2). The PPV for the alternative code group was 72% (range 40%–75% by stroke subtype) with 83% sensitivity (range 80%–93% by stroke subtype). For both code groups, PPV was highest for ischemic stroke and lowest for ICH, whereas sensitivity was highest for SAH and lowest for ischemic stroke.
Across patient and hospital subgroups, the PPV of the AHA/ASA code group for ischemic stroke ranged from 68% to 85% with sensitivity between 24% and 93% (Table 3). PPV was higher for first compared with any position ICD-9-CM code, among black and younger patients, and at teaching hospitals. Sensitivity was higher for ICD-9-CM codes in any compared with the first position and among older patients (Table 3). Across the ARIC communities, PPV ranged from 68% to 80% and sensitivity from 60% to 70%. Both sensitivity and PPV were similar for incident and recurrent strokes. Patterns in the PPV and sensitivity of the alternative ICD-9-CM code grouping were similar (Table III in the online-only Data Supplement).
The AHA/ASA code group for ICH had higher PPV for first compared with any position ICD-9-CM code and among women, black, and younger patients (Table 4). Across study sites, PPV ranged from 31% to 80% and sensitivity varied from 69% to 90%.
Sensitivity of the AHA/ASA and alternative code groups for ischemic stroke increased over time (Table 3; Table III in the online-only Data Supplement). There was no consistent temporal trend in the PPV for either the AHA/ASA code group (Figure) or the alternative code group (data not shown), including after adjustment for age. Adjusting for age did not significantly change the pattern of increasing sensitivity (data not shown).
We expanded a preliminary validation of ICD-9-CM codes for stroke in the ARIC Study (1987–1995)23 to include hospitalizations of cohort members through December 31, 2010, and considered 2 approaches to grouping ICD-9-CM codes. The new code group published by the AHA/ASA in 2013 had slightly higher PPV and lower sensitivity than a previously validated alternative code group. Both PPV and sensitivity varied by patient and hospital characteristics.
The National Center for Health Statistics relies on ICD-9-CM codes 430 to 438 to estimate the number of hospital discharges for stroke and stroke-related mortality in the United States.24 The PPV of these codes in our study was ≈10% to 15% lower than previous estimates.11,25,26 The PPV of the AHA/ASA and alternative code groups also were lower than previously reported for similar code groups.11,17 For example, the alternative code group for ischemic stroke had PPV 90% and 86% sensitivity among hospitalized patients in Seattle, Washington (1990–1996),20 compared with PPV 75% and 80% sensitivity in our study. We compare our findings to other studies with caution as differences in PPV may be because of diagnostic criteria, disease prevalence, or both. PPV estimates differed by ≤29% using World Health Organization compared with Minnesota Stroke Survey definitions of stroke.12 Moreover, previous studies sampled hospitalized patients20,27 or postmenopausal women enrolled in Medicare,17 which may have higher prevalence of stroke.
For both code groups, sensitivity was lowest for ischemic stroke. Of 1251 definite and probable ischemic strokes validated in ARIC, 436 (35%) did not include any of the ICD-9-CM codes in the AHA/ASA code group. Among these 436 events, 46% had an ICD-9-CM code 436, 17% had a code 434.9, and 15% had a code 435. Thus, low sensitivity of the AHA/ASA code group for ischemic stroke was primarily because of exclusion of ICD-9-CM code 436.
Low sensitivity and PPV for identifying ICH were surprising. Low sensitivity was due in part to miscoding of 12% of ICH events as ischemic strokes (ICD-9-CM codes 433, 434, or 436), an error identified previously.28 Hemorrhagic infarctions are classified as ischemic strokes in ARIC but may receive ICH-related ICD-9-CM codes, possibly contributing to lower PPV for ICH. Additionally, ARIC-adjudicated events were assigned a single classification, such that events with combined pathology were assigned the subtype thought to be primary by the ARIC adjudicators. This may contribute to classification of events with hemorrhage-related ICD-9-CM codes as ischemic strokes in ARIC and vice versa.
In addition to the knowledge of coders and quality of the medical chart, the accuracy of ICD-9-CM coding depends on the specificity of coding criteria and regional or departmental variation in diagnosis and coding practices.10,16,19,29 Previous studies documented differences in coding accuracy by hospital department (emergency versus neurology)10,15,16 and urbanicity.30 The accuracy of ICD-9-CM code groups varied by hospital teaching status and ARIC community. Use of evidence-based diagnosis measures and updated diagnostic criteria may be more common at teaching hospitals and vary by region. The sensitivity of the alternative code group for ischemic stroke was more similar for teaching and nonteaching hospitals, likely because of inclusion of the nonspecific, commonly used ICD-9-CM code 436.
The accuracy of code groups for ischemic stroke and ICH also varied by patient race, age, and over time. Higher PPV among blacks compared with whites was likely because of a higher prevalence of stroke among blacks, but also may suggest differential usage of diagnostic tools. Among hospitalizations where computed tomography or magnetic resonance imaging was performed, PPV of the AHA/ASA code group for ischemic stroke was more similar across races (data not shown).
Higher PPV among younger compared with older patients was unexpected, given increasing stroke prevalence with age. Previous studies reported no difference in ICD coding accuracy by patient age16,19,31 or higher PPV among older adults.14,15 One explanation for our findings may be a higher prevalence of comorbidities and recurrent symptoms from prior stroke among older adults, complicating diagnosis and increasing the misapplication of stroke-related ICD-9-CM codes.
In contrast to PPV, the sensitivity of the AHA/ASA code group for ischemic stroke was higher among older patients. Disease severity was associated with higher sensitivity of ICD-9-CM coding among Medicare beneficiaries,28 and it is possible that more severe strokes among patients age ≥65 contributed to differences in sensitivity. Differences by age were less pronounced using the alternative compared with AHA/ASA code group for ischemic stroke. ICD-9-CM code 436 increased the sensitivity of the alternative relative to AHA/ASA code group among all patients, but had a greater effect among younger patients. However, in the context of a prospective cohort study, temporal trends in coding accuracy complicate the interpretation of age-stratified estimates.
Few previous studies have investigated temporal changes in code accuracy. A systematic review identified no substantial variation in PPV or sensitivity comparing studies conducted before and after 200011; and a Minnesota Stroke Survey study reported no consistent trend in PPV from 1980 to 2000.12 Similar to our findings, the Rochester Stroke Registry documented no trend in PPV and increasing sensitivity of ICD-9-CM codes 430 to 438 from 1970 to 1989.10 Thus, interpretation of national trend data from administrative databases may need to consider the possibility of increasing sensitivity of discharge codes to correctly identify stroke events. However, given fluctuating PPV and lacking data on specificity, it is difficult to predict the effect of temporal increases in sensitivity.
Given evolving diagnostic procedures and definitions, temporal changes in ICD-9-CM coding accuracy are expected. For example, increased prevalence of imaging technology was expected to increase coding accuracy16,26 and was linked to decreased use of nonspecific ICD-9-CM codes 436 to 437 in the Pawtucket Heart Health Program from 1980 to 1991.5 An important change between 1987 and 2010 was the 1992 addition of a fifth-digit clinical modification code indicative of cerebral infarction to ICD-9-CM codes 433 and 434.20 We investigated the effect of this change by restricting the analysis to hospitalizations since 1992, which slightly increased sensitivity estimates for ischemic stroke using the AHA/ASA (70%) and alternative code groups (84%).
Strengths of this study include the application of a consistent validation methodology to >4000 hospitalizations across 24 years and 4 communities. We are the first to estimate the PPV and sensitivity of new ICD-9-CM code groups proposed by the AHA/ASA in 2013 and to compare these with previously validated code groups. We addressed limitations in the existing literature, including using imaging data in validation, calculating PPV by stroke subtype, and investigating variation by population subgroup for hemorrhagic events.11 Unlike previous studies,5,13,20 we did not exclude recurrent strokes or multiple hospitalizations per individual. We report no difference in the PPV and sensitivity for incident versus recurrent ischemic strokes, nor did eliminating multiple hospitalizations per individual substantially change our results (data not shown), in contrast to previous studies.11,17,32 These findings are important given that up to a third of in-hospital strokes may be recurrent.13
Limitations to this analysis include differences between the ARIC definition of stroke and the AHA/ASA 2013 definition of stroke.18 In particular, spinal and retinal infarctions were classified as ischemic strokes by the AHA/ASA but were not validated in the ARIC Study. However, ICD-9-CM codes for these events were rare in this data set (n=3). Subgroup analyses for validity of ICD-9-CM coding for SAH were not conducted because of the small number of events, and no analyses were stratified by indicators of disease severity, comorbidity, or outcomes, which may affect code accuracy.6,20 Additional research is needed to explore the validity of codes for strokes resulting from in-hospital procedures relative to strokes present at admission because code position alone may not be a sufficient proxy for this distinction.13
New groups of ICD-9-CM codes proposed by the AHA/ASA to identify stroke subtypes had similar PPV and lower sensitivity compared with previously validated ICD-9-CM code groups. Both sensitivity and PPV varied by patient characteristics, including age, by geographic region, and over time. Given their affordability and ubiquity, administrative data are likely to remain an important source for surveillance and health services research.6,29 With the expansion of electronic health record systems, future studies should focus on the identification and collection of information in addition to ICD-9-CM codes that is required for accurate stroke surveillance.
We thank the staff and participants of the ARIC study for their important contributions.
Sources of Funding
The Atherosclerosis Risk in Communities Study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute (NHLBI) contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, and HHSN268201100012C). S.A. Jones was supported by NHLBI T32 training grant HL 007055-38 and the University of North Carolina at Chapel Hill Royster Society of Fellows.
Published in abstract form at the American Heart Association EPI/NPAM Conference, San Francisco, CA, March 18–21, 2014.
The online-only Data Supplement is available with this article at http://stroke.ahajournals.org/lookup/suppl/doi:10.1161/STROKEAHA.114.006316/-/DC1.
- Received June 9, 2014.
- Accepted August 15, 2014.
- © 2014 American Heart Association, Inc.
- Barrett-Connor E,
- Ayanian JZ,
- Brown ER,
- Coultas DB,
- Francis CK,
- Goldberg RJ,
- et al
- Sidney S,
- Rosamond WD,
- Howard VJ,
- Luepker RV
- Go AS,
- Mozaffarian D,
- Roger VL,
- Benjamin EJ,
- Berry JD,
- Blaha MJ,
- et al
- Derby CA
- Mercaldi CJ,
- Ciarametaro M,
- Hahn B,
- Chalissery G,
- Reynolds MW,
- Sander SD,
- et al
- Leibson CL,
- Naessens JM,
- Brown RD,
- Whisnant JP
- Lakshminarayan K,
- Anderson DC,
- Jacobs DR Jr,
- Barber CA,
- Luepker RV
- Spolaore P,
- Brocco S,
- Fedeli U,
- Visentin C,
- Schievano E,
- Avossa F,
- et al
- Lakshminarayan K,
- Larson JC,
- Virnig B,
- Fuller C,
- Allen NB,
- Limacher M,
- et al
- Sacco RL,
- Kasner SE,
- Broderick JP,
- Caplan LR,
- Connors JJ,
- Culebras A,
- et al
- Tirschwell DL,
- Longstreth WT Jr
- 21.↵The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. Am J Epidemiol. 1989;129:687–702.
- Rosamond WD,
- Folsom AR,
- Chambless LE,
- Wang CH,
- McGovern PG,
- Howard G,
- et al
- Hall MJ,
- Levant S,
- DeFrances CJ
- Williams GR,
- Jiang JG,
- Matchar DB,
- Samsa GP
- Ellekjaer H,
- Holmen J,
- Krüger O,
- Terent A
- Hsieh CY,
- Chen CH,
- Li CY,
- Lai ML