Coding of Stroke and Stroke Risk Factors Using International Classification of Diseases, Revisions 9 and 10
Background and Purpose— Surveillance is necessary to understand and meet the future demands stroke will place on health care. Administrative data are the most accessible data source for stroke surveillance in Canada. The International Classification of Diseases, 10th revision (ICD-10) coding system has potential improvements over ICD-9 for stroke classification. Our purpose was to compare hospital discharge abstract coding using ICD-9 and ICD-10 for stroke and its risk factors.
Methods— We took advantage of a switch in coding systems from ICD-9 to ICD-10 to independently review stroke patient charts. From time periods April 2000 to March 2001, 717 charts, and from April 2002 to March 2003, 249 charts were randomly selected for review. Using a before-and-after time period design, the accuracy of hospital coding of stroke (part I) and stroke risk factors (part II) using ICD-9 and ICD-10 was compared. We used careful definitions of stroke and its types based on ICD-9 using the fourth and fifth digit modifier codes.
Results— Stroke coding was equally good with ICD-9 (90% [CI95 86 to 93] correct) and ICD-10 [92% (CI95 88 to 95 correct) with ICD-10. There were some differences in coding by stroke type, notably with transient ischemic attack, but these differences were not statistically significant. Atrial fibrillation, coronary artery disease/ischemic heart disease, diabetes mellitus, and hypertension were coded with high sensitivity (81% to 91%) and specificity (83% to 100%). ICD-10 was as good as ICD-9 for stroke risk factor coding.
Conclusions— Passive surveillance using administrative data are a useful tool for identifying stroke and its risk factors using both ICD-9 and ICD-10.
An impending surge of new stroke cases and associated costs are expected in Western countries from 2010 through 2030 as the 1945 to 1950 demographic reaches their seventh decade and ages thereafter. Surveillance is necessary to determine the future demands stroke will place on health care.1,2 This should encompass stroke events and associated stroke risk factors. Such data will permit informed decision making regarding healthcare resource allocation to preventive and acute treatment programs.
Unlike cancer and some infectious diseases, there is little active or passive surveillance of stroke and its risk factors. One advantage of passive surveillance using administrative data are that such data are readily available and are a cost-effective resource compared with active surveillance. This is particularly true in Canada, where a centralized administrative structure to health care exists. Administrative data have been used to quantify trends in stroke; however, it has been criticized for lack of accuracy with low sensitivity and specificity.3–8 Furthermore, a particular disadvantage of administrative data are the inability to ascertain stroke severity, which is the most important short- and long-term prognostic variable. Nevertheless, stroke coding has been reviewed previously and found to be useful for high-level comparisons, particularly when compared against other diseases. We believe that a diagnostic accuracy of ≥85% is adequate for assessing trends over time. However, stroke risk factors have not been examined previously using administrative data, and validation of existing coding would enrich the utility of administrative data for surveillance of stroke.
Before fiscal year 2002/2003, medical centers in Alberta used the International Classification of Diseases, 9th Revision (ICD-9), Clinical Modifications to code hospital discharge abstracts. However, numerous studies have reported inaccuracies using ICD-9.2,3,7,9,10 At the beginning of 2002, the 10th revision replaced ICD-9 province-wide. Compared with ICD-9, ICD-10 is qualitatively more intuitive and specific for the diagnosis of ischemic stroke. We sought to compare the proportion of correctly coded stroke patient charts in academic and community hospitals (part I) and to assess stroke risk factor coding (part II) using ICD-9 and ICD-10 via a “before-after” study design.
Data used for parts I and II of this study were retrieved from a database of hospital discharge abstracts from the 3 adult acute care sites: a university hospital (Foothills Medical Centre) and 2 community hospitals (Peter Lougheed Centre and Rockyview General Hospital) in the Calgary health region. These sites serve a population of ≈1.4 million people. Data from the Alberta Children’s Hospital were not considered in this study. Each of the 3 acute care sites house a computed tomography (CT) and MRI scanner. Data for this study included patients admitted as inpatients as well as patients seen at the emergency department and discharged without admission. It does not include patients who were seen in an outpatient clinic, physician’s office, or those who did not present to medical attention. A single health records technologist conducted coding of hospital discharge abstracts at the university hospital. Before the study period, this person trained with the Calgary Stroke Team, learning more about stroke and its clinical diagnosis and management. Further, an ongoing dialogue exists between the health records department and the Calgary Stroke Team to resolve coding issues. At the 2 community hospitals, coding was not centralized to a single health records technologist.
All patients with a discharge diagnosis (ICD-9 codes 430.x, 431.x, 433.x1, 434.x1, 435.x, 436, and 362.3) of stroke were acquired for the 2000/2001 fiscal year (April to March); for the 2002/2003 fiscal year, ICD-10 codes I60.x, I61.x, I63.x, I64.x, H34.1, and G45.x were used to identify the patients. In ICD-9, the fourth and fifth digits were used to exclude or include patients. In the intervening year, coding switched from ICD-9 to ICD-10, and this time period between data collection epochs allowed for learning on the new ICD-10 system. All patients (n=2529) with a diagnosis of stroke in the primary diagnostic position, implying that stroke was the most responsible diagnosis for length of stay, comprised the sampling frame. This approach has been shown to result in high specificity and positive predictive value (PPV).3 A research assistant trained by a stroke neurologist in definitions of stroke and its risk factors performed the chart review. The neurologist resolved any ambiguities in diagnosis. The investigators accessed the same patient chart documents as health records technologists. If the chart contained multiple admissions, only those documents from the admission under review were used.
Major stroke types subarachnoid hemorrhage (SAH), intracerebral hemorrhage (ICH), acute ischemic stroke (AIS), and transient ischemic attack (TIA) were defined as described in Table 1. A stratified random sample of charts was drawn for review, stratified by major stroke type and by year. Sampling of AIS within the ICD-9 cohort was further oversampled compared with ICD-10 to allow for a better assessment of codes described as acute arterial occlusion without infarct. The size of the sample varied between 10% and 65% of the total available and was based on an expected precision of the sensitivity and specificity defined by a 10% 95% CI width.
Patient charts were reviewed in detail. Physician history and physical examination notes, physician progress notes, CT and MRI imaging reports (if available), and discharge summaries were used to ascertain the diagnosis most responsible for hospital length of stay and to assign a code. The proportion of charts that were adjudicated with available imaging reports (CT or MRI) was determined. Strokes were coded as TIA if they resolved within 24 hours of onset, and if imaging was performed, no detectable changes were evident. Using our determination as the gold standard, the proportion correct of hospital health technologist coding of AIS, SAH, ICH, and TIA using ICD-9 and ICD-10 were then calculated and compared. For practical reasons, chart reviewers were not blinded to how the charts had been coded by the health records technologist.
Statistical comparisons were made using Fisher’s exact test, and all proportions are reported using exact CIs. It was not possible to assess the sensitivity and specificity for the diagnosis of stroke because nonstroke diagnoses were underrepresented by design. Instead, we report the PPV of coding by stroke type among patients with stroke as well the Kappa statistic (κ) as a measure of the agreement between coder and researcher. The PPVs were arbitrarily categorized as “poor” (<70%), “good” (70% to 79%), “very good” (80% to 89%), or “excellent” (≥90%). κ was considered as having substantial agreement between health technologist and researcher as indicated by 0.61 to 0.80; almost perfect agreement: 0.81 to 1.00.3
A 10% random sample of charts, which included all available diagnostic positions, was drawn from the sampling frame. The sample size was estimated to provide, on average, a 10% 95% CI width without adjustment to allow for multiple comparisons. Source documents, namely physician history and physical examination notes, physician progress notes, discharge summaries, nursing notes, and laboratory, echocardiogram, and ECG reports were reviewed systematically for atrial fibrillation, coronary artery disease/ischemic heart disease, diabetes mellitus, history of cerebrovascular accident, hypertension, hyperlipidemia, renal failure, and tobacco use. Codes were assigned only to those risk factors that fulfilled the diagnostic criteria as outlined in Table 2. Criteria were investigator derived, on the basis of current medical guidelines. Using the gold standard, the sensitivity, specificity, and percent correct of hospital health technologist coding of these stroke risk factors using ICD-9 and ICD-10 were calculated and compared.
A total of 461 charts from 2000/2001 (ICD-9) and 256 from 2002/2003 (ICD-10) were randomly selected for review. The median age of patients was 71 (interquartile range [IQR], 59 to 80), and 50.6% were female. The level of agreement for stroke coding and the rate of correct coding of stroke type by coding scheme and type of hospital are listed in Table 3. The assessment of correct coding was based on clinical data alone in 24.3% of charts (no neurovascular imaging was done, or no imaging reports were available for review) and on clinical data and neurovascular imaging reports in 75.6% of charts.
On the whole, ICD-9 coding was excellent with 90% (CI95, 86 to 92) correct; κ=0.86 (CI95, 0.81 to 0.91). ICD-10 was similarly good with 92% (CI95, 88 to 95) of strokes correctly coded; κ=0.89 (CI95, 0.82 to 0.96). ICD-10 was not better than ICD-9; P=0.865. TIA was correctly coded 97% of the time (CI95 88 to 99) compared with 70% (CI95 56 to 82) with ICD-9; P=0.266. The range of coding errors was largely as expected, with stroke types confused with one another, notably TIA for AIS and ICH for SAH (Table 4). We found that the use of the modifier codes (fourth and fifth digit) in the ICD-9 excluded 95 cases of carotid endarterectomy that had not had an index stroke on the noted admission. This substantially increased the accuracy of ICD-9 coding for ischemic stroke when including code 433 in the stroke definition.
At the hospital level, within each coding system, coding was nonsignificantly better using ICD-10 at 95% (CI95, 91 to 98) accuracy at the university site compared with 80% (CI95, 68 to 90) accuracy at the community sites (P=0.503). Differences were less marked with ICD-9 (90% versus 89% accuracy; P=0.883).
A total of 137 charts from 2000/2001 (ICD-9) and 112 from 2002/2003 (ICD-10) were randomly selected for review. The median age of patients was 73 (IQR, 65 to 81), and 49% were female. Table 5 summarizes the sensitivity, specificity, and PPV of stroke risk factors according to ICD system and hospital site. Overall, coding of all risk factors was found to be similar between ICD-9 and ICD-10. Global sensitivity was lower, with ICD-10 at 58% (CI95, 52 to 63) compared with 67% (CI95, 61 to 72); P=0.234. Overall specificity was equal at 97% (CI95, 96 to 98) compared with 97% (CI95, 96 to 99); P=1.000. Overall accuracy was similar at 84% (CI95, 82 to 87) compared with 87% (CI95, 84 to 89); P=0.691. Characteristic of both schemes, atrial fibrillation, coronary artery disease/ischemic heart disease, diabetes mellitus, and hypertension coding was very good to excellent, showing a high degree of sensitivity and specificity. Conversely, coding of a history of cerebrovascular accident, hyperlipidemia, renal failure, and tobacco use ranged from poor to excellent, having quite low sensitivity but high specificity. The sensitivity improved only slightly for these risk factors with the switch to ICD-10. These trends were observed at the hospital level. No differences were observed between the university and community hospital sites.
Our data suggest that the administrative diagnoses of stroke and its risk factors are quite good and that no quantitative improvements have been realized with the switch to the ICD-10 system. Two caveats associated with using such data to make reliable conclusions are worth review. First, by definition, they apply only to hospital-based care. Therefore, data from those individuals who do not seek medical attention or who are only seen in outpatient clinic or offices are not captured.2,11 For stroke, this results in a slight bias to more severe strokes because patients with only mild symptoms may not seek medical attention. This has been well demonstrated in Texas, where active and passive surveillance conducted concurrently showed that some cases were identified by active surveillance that were missed completely by the passive surveillance system. Interestingly, the converse was also true in that cases identified by passive surveillance were missed by the active surveillance system.11
Second, administrative data depend on clerical staff interpreting the medical record and applying appropriate codes. Coding of stroke and its risk factors, like all hospital discharge abstract coding, depends on the quality of the data in the chart and the expertise of the coder. Charting is highly variable, and validation studies done in one setting may not apply across jurisdictions. We have shown previously that there exists wide variation in stroke coding using ICD-9 in rural compared with urban hospitals.12 Rural hospitals tended to code stroke using more general codes, whereas urban coding was more specific. The centralization of stroke services and the establishment of an inpatient stroke unit,13 as well as the training of a health records technologist at Foothills Medical Centre, may be 2 unique factors contributing to the results of our study.
Our results concur with those of Goldstein,8 who showed that a substantial minority of patients (15% to 20%) coded as having a stroke never had one. We used modifier codes (fourth and fifth digit in the ICD-9 system) to exclude patients who were admitted for carotid endarterectomy but had not had an index stroke on that admission. In the ICD-9 system for coding AIS, there are several codes that are described as acute arterial occlusion without infarct (433.00, 433.10, 433.20, 433.30, 434.10, and 434.90). In our study, all 95 cases of elective carotid endarterectomy were coded as acute arterial occlusion without infarct. As suggested by Goldstein, these codes have particularly low accuracy for the diagnosis of true stroke. This problem does not occur with ICD-10.
The identification of stroke risk factors is more variable. Atrial fibrillation, coronary artery disease/ischemic heart disease, diabetes mellitus, and hypertension are identified with a high degree of confidence, whereas history of cerebrovascular disease, hyperlipidemia, renal failure, and tobacco use are identified to a lesser degree. The poor coding of the latter 4 risk factors may be attributable to poor charting by physicians and nursing staff, a lack of perceived importance by health technologist coders, or a lack of time to “code everything.” Education and understanding may help to improve this situation. The emergence of the electronic health record may allow automated and better coding of such risk factors within administrative databases. The ICD-10 system itself might benefit from the inclusion of more specific diagnostic codes for these comorbidities to improve their true diagnosis. However, the ability to reliably identify stroke risk factors among stroke patients using administrative data is an important addition to use of such data in health services research.
Our study has notable limitations. We were unable to blind the chart reviewer to either the health records technologist coding or to which coding system was used because of practical limitations. Further, our system includes an active dialogue between the health records coder and the Calgary Stroke Team, meaning that our results may not be as generalizable to other jurisdictions. Our sampling frame was limited to those patients with stroke in the primary diagnostic position. Although this improves the specificity of diagnosis, it may limit sensitivity, implying that we may not have included patients with stroke in other diagnostic positions.
Overall, our data provide evidence that stroke coding with ICD-10 is similar to ICD-9. The greater clarity in definitions in the ICD-10 system may provide a qualitative advantage.
M.D.H. was supported by the Heart and Stroke Foundation of Alberta/NWT/Nunavut and the Canadian Institutes for Health Research. We would like to thank Chris Makar for her ongoing commitment to stroke coding.
Both authors contributed equally to this work.
- Received March 12, 2005.
- Revision received April 10, 2005.
- Accepted May 3, 2005.
Truelsen T, Bonita R, Jamrozik K. Surveillance of stroke: a global perspective. Int J Epidemiol. 2001; 30: 511–516.
Tirschwell DL, Longstreth WT Jr. Validating administrative data in stroke research. Stroke. 2002; 33: 2465–2470.
Mayo NE, Goldberg MS, Levy AR, Danys I, Korner-Bitensky N. Changing rates of stroke in the province of Quebec, Canada: 1981–1988. Stroke. 1991; 22: 590–595.
Mayo NE, Neville D, Kirkland S, Ostbye T, Mustard CA, Reeder B, Joffres M, Brauer G, Levy AR. Hospitalization and case-fatality rates for stroke in Canada from 1982 through 1991. The Canadian Collaborative Study Group of Stroke Hospitalizations. Stroke. 1996; 27: 1215–1220.
Ostbye T, Levy AR, Mayo NE. Hospitalization and case-fatality rates for subarachnoid hemorrhage in Canada from 1982 through 1991. The Canadian Collaborative Study Group of Stroke Hospitalizations. Stroke. 1997; 28: 793–798.
Goldstein LB. Accuracy of ICD-9-CM coding for the identification of patient with acute ischemic stroke: effect of modifier codes. Stroke. 1998; 29: 1602–1604.
Benesch C, Witter DM Jr, Wilder AL, Ducan PW, Samsa GP, Matchar DB. Inaccuracy of the International Classification of Diseases (ICD-9-CM) in identifying the diagnosis of ischemic cerebrovascular disease. Neurology. 1997; 49: 660–664.
Leibson CL, Naessens JM, Brown RD, Whisnant MD. Accuracy of hospital discharge abstracts for identifying stroke. Stroke. 1994; 25: 2348–2355.
Piriyawat P, Smajsova M, Smith MA, Pallegar S, Al-Wabil A, Garcia NM, Risser JM, Moye LA, Morgenstern LB. Comparison of active and passive surveillance for cerebrovascular disease: the Brain Attack Surveillance in Corpus Christi (BASIC) Project. Am J Epidemiol. 2002; 156: 1062–1069.
Field TS, Green TL, Roy K, Pederson J, Hill MD. Trends in stroke occurrence in Calgary. Can J Neurol. 2004; 31: 387–393.