Interrater Reliability of an Etiologic Classification of Ischemic Stroke
Background and Purpose Precise identification of the cause of stroke is critical to research and clinical practice. Published series of ischemic stroke show considerable variation in the proportion of cases classified as atherosclerotic large-vessel disease, lacunar infarct, cardioembolic stroke, stroke of other known cause, and stroke of undetermined etiology. We describe the development and use of an etiology-specific classification of ischemic stroke. The interrater reliability of the classification is then evaluated.
Methods A total of 160 cases of ischemic strokes in young adults were reviewed by paired neurologists who assigned cases to prioritized categories. The results of paired ratings were evaluated for each of the potential causes. Interrater agreement was assessed by means of κ, which is the chance-adjusted percent agreement.
Results For standard pairs, κ was fair to good for all causes except lacunar stroke (κ=0.31); however, pair-to-pair variation was greatest for lacunar strokes. Strokes of undetermined cause and hematologic/other cause were of borderline fair reliability.
Conclusions The utility of a stroke classification system is dependent on its intended use. An etiologic classification is useful in studies of the epidemiology and pathophysiological basis of stroke. Fair to good reliability for an etiologic classification of stroke can be obtained when criteria are explicit.
The correct identification of the cause of stroke is critical to both research and clinical practice. However, published series show considerable variation in the proportion of ischemic stroke classified as atherosclerotic large-vessel disease, lacunar infarct, cardioembolic stroke, stroke of other known cause, and stroke of undetermined etiology. While some of this variation may be due to differences in patient population, the evolving use of diagnostic procedures, and differences in diagnostic criteria, little attention has been given to assessing interobserver reliability in stroke classification that is etiology specific.
Because ischemic stroke subtyping is increasingly used in clinical trials and observational studies, there is a need to quantitate the reliability of these clinical judgments and develop procedures that maximize reliability within and between studies. Several reports have addressed interobserver reliability. However, either the number of patients was quite limited (<30)1 2 or case summaries3 or vignettes4 rather than actual cases were used. The independent classification of stroke etiology by pairs of neurologists in the Baltimore-Washington Cooperative Young Stroke Study provided an opportunity to study the interrater reliability of an ischemic stroke classification system with a large group of patients.
Subjects and Methods
The Baltimore-Washington Cooperative Young Stroke Study is a regional hospital-based registry initiated in 1988 to study the incidence and causes of stroke in young adults. At 41 regional hospitals, a trained neurological nurse abstracted the chart of every patient aged 15 to 44 years discharged with a primary or secondary diagnosis reflecting a possible ischemic stroke or intracerebral hemorrhage. Charts with International Classification of Diseases, 9th Revision (ICD-9) codes 431.00 to 438.00, 671.50 to 671.54, and 674.00 to 674.04 were reviewed. The abstracting process yielded a narrative summary describing transient ischemic attacks and past strokes, as well as the evolution of neurological symptoms and signs for the current event. Information on demographics, risk factors, neuroimaging, laboratory results, therapy, and autopsy data (when available) were also collected. These data were included with the case summary for review by the neurologists.
Based on this information, cases considered to represent a possible acute stroke were independently reviewed by two board-certified neurologists who classified the event as an ischemic cerebral infarction, intracerebral hemorrhage, or other diagnosis. Stroke was defined according to the criteria of the World Health Organization.5 The definitions of ischemic cerebral infarction and intracerebral hemorrhage were based on the criteria of the National Institute of Neurological Disorders and Stroke (NINDS) Stroke Data Bank.6
Each patient with an ischemic cerebral infarction was further classified by a pair of neurologists into nine categories according to written criteria developed by the study physicians for the project (see “Appendix”). The categories were divided into “higher priority” diagnoses (atherosclerotic vasculopathy, nonatherosclerotic vasculopathy, cardiac/transcardiac embolism, hematologic/other) and “lower priority” diagnoses (lacunar infarct, migrainous stroke, oral contraceptive related, other drug related, and indeterminate). Higher priority diagnoses were conditions for which well-defined positive criteria exist (atherosclerotic and nonatherosclerotic vasculopathy) and/or the probable mechanism of stroke is known (cardiac/transcardiac embolism, hematologic). Lower priority diagnoses included those of undetermined cause (indeterminate, lacunar infarct), those for which the mechanism is obscure (migrainous stroke), and those associated with conditions that often coexist with or are weakly linked with stroke (oral contraceptive related, other drug related).
For an individual event, the diagnosis of stroke subtype could be either probable or possible depending on the strength of clinical evidence. Two probable diagnoses were allowed if criteria were met for two conditions of equal priority. However, a lower priority diagnosis could not be coded as probable when a higher priority probable or possible diagnosis was present; the lower priority diagnosis had to be assigned a possible label. This system was intended to approximate the process of clinical diagnosis while preserving information about secondary, contributing, or multiple causes. The use of probable and possible labels preserved the ability to access data when a less likely but plausible (possible) cause was present.
Six board-certified neurologists with a special interest in stroke were assigned as standard pairs to review charts. A standard pair consisted of two neurologists who remained in that pair, whereas in nonstandard pairs the neurologists varied. The intent was to maintain the standard rater pairings and to balance the number of patients rated by each pair. Both objectives were compromised somewhat because of the necessity to use nonstandard pairs when the standard pairs could not meet on a timely basis. Nevertheless, each ischemic stroke patient was rated by one and only one pair of collaborating neurologists. Agreement and disagreement were tabulated for all pairs, all standard pairs, and by each standard pair. Subsequently, disagreement was adjudicated by a face-to-face consensus conference involving both members of the disagreeing pair and a third neurologist. If the two primary reviewers disagreed as to whether a stroke had occurred, even after the facts of the case had been clarified by discussion, the case was not considered a stroke.
The results of paired ratings were evaluated for each of the nine potential causes of ischemic stroke to which a patient could be assigned. Only probable diagnoses were considered in this analysis. Interrater agreement was assessed by means of κ,7 which is the chance-adjusted percent agreement. Statistics are presented for all raters, standard raters, and each standard rater pair as κ, and for positive and negative agreement.8 To assess the stability of the estimate of κ, 95% confidence intervals are presented around κ. Guidelines for evaluating κ suggest that values above 75% indicate excellent agreement, values between 75% and 40% indicate good to fair agreement, and those below 40% are considered poor agreement.9
The three standard rater pairs reviewed 131 charts (49 by pair 1-2; 30 by pair 3-4; and 52 by pair 5-6), and 29 charts were reviewed by the nonstandard pairs. Tables 1⇓ and 2⇓ show combined κs by diagnostic category for all raters and for standard rater pairs. Although κs for all raters in most categories (Table 1⇓) fell in the 50% to 60% range, they varied from a high of 70% agreement for oral contraceptive–related stroke to a low of 28% agreement for lacunar infarcts. When only standard pairs were considered (Table 2⇓), agreement generally improved, with absolute gains in κ of between 1% and 10% agreement. Only one category (other drug related) had a decrease (3%) in κ. The κ was dissected into its components of positive and negative agreement. Negative agreement, the ability to rule out a particular classification, was uniformly high, in the range of 80% to 99% agreement. Conversely, positive agreement was much lower, falling in the 40% to 70% range.
Tables 3⇓, 4⇓, and 5⇓ present similar statistics for each of the standard rater pairs. While smaller numbers tend to enhance the variability of each of the measures of agreement, there is evidence of differences in the amount of agreement across rater pairs and diagnostic entities. In general, each pair tended to have different categories in which they displayed their best and poorest agreement. Pair 1-2 (Table 3⇓) had the most problem with vasculopathy of uncertain cause (lacunar infarct) but had excellent agreement for migraine, other drug-related strokes, and atherosclerotic strokes. Pair 3-4 (Table 4⇓) had the highest agreement in the cardiac/transcardiac embolus and lacunar infarct categories but had problems with the hematologic, indeterminate, and other drug-related categories. Pair 5-6 (Table 5⇓) displayed excellent agreement in the oral contraceptive–related category but had difficulty with the migraine and lacunar infarct categories.
There were also two categories in which the level of agreement was especially variable. The greatest differences in performance occurred in the lacunar infarct category, where agreement varied between poorer than chance agreement to 78% agreement. There was also considerable disagreement within the migraine category, where κ varied from 100% to 38%. As suggested by the wide confidence intervals for each rater pair, this is partly due to the problem of small numbers.
We developed an ischemic stroke classification using current concepts of stroke etiology. Because there is no gold standard for stroke cause, the merit of a classification system depends on its clarity, utility, and reproducibility. We developed explicit definitions of the major accepted categories of ischemic stroke and then tested their reliability.
Our study is the first large interobserver reliability study of an etiologic classification system for ischemic stroke in which actual cases were used. Our data show fair to good reliability for most ischemic stroke categories. The exceptions are lacunar stroke (poor reliability) and the hematologic/other and indeterminate categories (low range of fair reliability). Since each category was present in only a minority of cases, negative agreement is always higher than positive agreement.
A closer examination of two of the categories with low reliability is informative. The category of lacunar infarcts displayed less agreement than expected by chance in rater pair 1-2, while showing high reliability in pair 3-4. On scrutiny it was found that one member of pair 1-2 had misinterpreted the written criteria for lacunar stroke. In the case of the hematologic/other category, disagreement was due to not specifying specific criteria for definition of what constituted a hematologic condition (an explicit exhaustive list of included conditions). Therefore, disagreement occurred regarding whether the strength of association between a condition and cerebral infarction was sufficient to be considered a cause.
Other attempts to study interobserver agreement included the NINDS Stroke Data Bank.2 Based on history, physical findings, and brain computed tomographic scans, observers were asked to distinguish between cases of infarction, subarachnoid hemorrhage, and parenchymal hemorrhage and to identify subtypes of ischemic stroke. The primary categories for ischemic infarcts in the Stroke Data Bank included infarction cause unknown, infarction with normal angiogram, infarction with tandem arterial pathology, embolism from cardiac source, infarction due to atherosclerosis, and lacunes. The κ was 0.39, with agreement for hemorrhage fairly reliable. However, distinguishing ischemic infarcts by subtypes was poor, with a κ of only 0.15.2
As clinical studies have evolved and the need to identify the pathophysiological basis of stroke has increased, recent studies have divided ischemic strokes primarily into subtypes of atherosclerotic, cardioembolic, small vessel, other known causes, and undetermined causes. Using this classification, which was similar to ours, Adams et al1 reported 20 cases, with agreement in 19 of 20. Gordon et al4 reported 18 case summaries reviewed by 24 neurologists with an overall κ of 0.64. There was good agreement for cardioembolic (κ=0.75) and large-artery (κ=0.69) stroke but poor agreement for undetermined cause (κ=0.12), indicating a reluctance of some observers not to assign a specific etiology.
Two studies have reported interobserver reliability according to topographic localization. Kessler et al3 compared three classification systems: ICD-9, ICD-10, and the German Classification of Neurologic Diseases. These classifications are primarily topographic (according to the vessel involved) and temporal (eg, diagnosis of transient ischemic attack versus stroke) and are not etiology specific (eg, arterial embolus, cardiac embolus, etc). Thirteen case studies were reviewed by 81 raters with a κ of 0.38 for ICD-9 and 0.72 for ICD-10. Lindley et al10 reported the reliability of the Oxfordshire Community Stroke Project method of four topographic subtypes: total and partial anterior circulation infarction, lacunar infarction, and posterior circulation infarction. This type of classification would be limited to clinical studies in which stroke pathogenesis may not be critical, such as short-term thrombolytic trials. As therapies for ischemic stroke become more specific, the importance of precise diagnosis increases. For example, cases of carotid large-artery disease with artery-to-artery embolization should generally be treated with carotid endarterectomy if the stenosis is greater than 70%,11 while cardioembolic disease is generally treated with warfarin or aspirin.12 Pathogenic categories for making decisions are important for both research and patient care.
In our study the discrepancies between pairs in the degree of agreement for the various diagnoses illustrate that written criteria themselves are not the only determining factor for reliability. Even written criteria are subject to differing interpretation. The presence of learning within each of the standard pairs is suggested by the fact that the standard pairs had a higher reliability for eight of nine diagnoses than did all rater pairs (standard and nonstandard). This was consistent with the experience of van Donselaar et al13 with seizure diagnosis, in which discussion increased reliability when a relevant data item or rule was overlooked; consensus quickly followed upon discussion of the case.
These results have implications for improving reliability within and between studies. At least during the developmental stages of a project, independent judgments should be made by at least two individuals. Ideally, consensus decisions should involve all raters. This would serve a training purpose and result in refined written criteria and improved adherence to established criteria. Thus, the process of improving reliability is an iterative one. Even when reliability is satisfactory, there is still value in retaining a consensus approach to minimize errors of fact or interpretation.
Finally, in evaluating stroke classifications, the utility of a specific classification is dependent on its intended use. Topographic or temporal classifications can accurately classify virtually 100% of events and may be sufficient for some purposes. In contrast, an etiologic classification of ischemic stroke is frequently desired. We believe our classification can be used for the latter purpose with a high interrater reliability.
Baltimore-Washington Cooperative Young Stroke Study Criteria
1. A stroke is a neurological deficit in a vascular distribution that either (a) lasts >24 hours or (b) lasts <24 hours and is associated with a relevant computed tomographic or magnetic resonance imaging abnormality.
2. If there is a history of a new neurological deficit occurring 24 hours before admission and a residual new deficit is documented on admission, the diagnosis of stroke will be given.
3. When the two primary reviewers disagree as to whether a stroke has occurred even after discussion, the case will not be considered a stroke.
4. Diagnoses may be probable or possible.
5. There can be two probable diagnoses if criteria for two conditions of equal priority are met.
6. Higher priority diagnoses include atherosclerotic vasculopathy, nonatherosclerotic vasculopathy, cardiac/transcardiac embolism, and hematologic/other. Lower priority diagnoses include vasculopathy of uncertain cause (lacunar infarct), oral contraceptive related, other drug related, migrainous stroke, and indeterminate. The lower priority diagnosis should not be coded as probable when a higher priority probable or possible diagnosis is present.
(1) Atherosclerotic and atrial fibrillation: Code both as probable.
(2) Atherosclerotic and oral contraceptive related: Code atherosclerotic vasculopathy as probable, oral contraceptive related as possible.
(3) Oral contraceptive related and migraine: Code both as possible.
(4) Positive toxicology screen and ipsilateral 20% carotid stenosis: Code as possible drug related, possible atherosclerotic vasculopathy.
1. Atherosclerotic vasculopathy
Probable: Ipsilateral intracranial or extracranial disease by angiogram or noninvasive tests showing:
(1) Hemodynamically significant obstruction or
(2) >60% obstruction or
(3) Plaque with intraluminal clot.
Possible: Ipsilateral intracranial or extracranial disease by angiogram or noninvasive tests showing any detectable atherosclerotic disease.
2. Nonatherosclerotic vasculopathy
Angiographic, noninvasive, or other evidence of fibromuscular dysplasia, vasculitis, dissection, radiation changes, or other specific vasculopathy.
Probable: Angiographic evidence or a clinical presentation and noninvasive testing highly consistent with nonatherosclerotic vasculopathy.
Possible: Clinical presentation suggestive but testing not done and/or diagnostic tests equivocal.
3. Vasculopathy of uncertain cause (lacunar infarct)
Probable: Lacune not in any other higher priority category (eg, radiographic or lacunar syndrome without evidence of cardiac or atherothrombotic source).
Possible: Same as probable, except eliminate the higher priority coexisting condition limitation.
Lacunes must satisfy either of the two following conditions:
(1) Small (or ≤1.5 cm) deep lesion on imaging study compatible with deficit and sensorimotor, pure motor, pure sensory, ataxic hemiparesis, or clumsy hand dysarthria syndromes.
(2) Normal imaging study or deep lesion of unspecified size and typical pure motor, pure sensory, ataxic hemiparesis, or clumsy hand dysarthria syndromes (sensorimotor stroke not included).
Sensorimotor stroke with a normal imaging study or unspecified size should be classified as indeterminate.
4. Cardiac/transcardiac embolism
(1) Atrial fibrillation, atrial flutter, sick sinus syndrome
(2) Recent (≤6 weeks before stroke) myocardial infarction
(3) Akinetic segment
(4) Cardiac thrombus
(5) Valvular vegetation or documented endocarditis
(6) Prosthetic heart valve
(7) Dilated cardiomyopathy
(8) Paradoxical emboli (right-to-left shunt and venous or systemic embolism)
(1) Mitral valve prolapse without detectable clot
(2) Remote (>6 weeks before stroke) myocardial infarction without other abnormality
(3) Hypokinetic segment
(4) Mitral annular calcification
(5) Calcific aortic stenosis
(6) Right-to-left shunt only
(7) Other possible source of embolism (must specify on review sheet)
Antiphospholipid antibody and other causes for hypercoagulable state; lupus; other (specific documented cause of stroke not listed above).
(1) Lupus with negative echo, negative antiphospholipid antibody, negative angiogram, or angiogram not done.
(2) Complication of procedures.
If there is an immediate and underlying cause of stroke, the immediate cause should be considered the cause and the underlying cause should be noted as a comment.
(a) Radiation vasculopathy leading to bypass operation followed by a stroke in the postoperative period. Cause: “other,” specify “embolism from bypass graft.” Comment: “radiation vasculopathy is underlying cause.”
(b) Intractable seizure leading to WADA test followed by stroke. Cause: “other,” specify “WADA test.” Comment: “occurred during catheterization, no other explanation found.”
(c) Transient ischemic attack with carotid endarterectomy followed by stroke within 72 hours. Code similar to the first example except that comment notes atherosclerosis as underlying cause.
6. Migrainous stroke
Probable14 : At least one attack of migraine with associated neurological deficit persisting for >24 hours and/or magnetic resonance imaging or computed tomographic evidence of acute stroke. Also requires:
(1) Prior history of common, classic, or complicated migraine;
(2) Typical (for the individual patient) migrainous headache and/or neurological associations with the acute stroke presentation; and
(3) The absence of other coexisting conditions with strong potential for stroke (eg, rheumatic valvular disease, atrial fibrillation, clinical evidence of advanced cerebral or extracerebral atherosclerotic vascular disease, vasculitis [higher priority diagnosis]).
Patients with hypertension, diabetes, mitral valve prolapse, or concomitant use of oral contraceptives or estrogen replacement therapy are not, however, excluded from consideration.
Possible: Same as probable except eliminate the higher priority coexisting condition limitation.
7. Oral contraceptive or exogenous estrogen use
Probable: Current oral contraceptive use and no other higher priority diagnosis.
Possible: Same as probable without the higher priority limitation.
8. Other drug related
Probable: Drug use reported within 48 hours of the stroke or present on toxicology screen and no other higher priority diagnosis present.
Possible: Same as probable without the higher priority limitation.
Should be coded when no other probable or possible diagnoses are satisfied.
This study was supported by a Grant-in-Aid from the American Heart Association, with funds contributed in part by the American Heart Association, Maryland Affiliate, Inc. Dr Kittner is also supported by a clinical investigator development award (K08-NS01319) from the National Institute of Neurological Disorders and Stroke.
The authors would like to acknowledge the assistance of the following individuals who have sponsored the Baltimore-Washington Cooperative Young Stroke Study at their institutions: Frank Anderson, MD; Clifford Andrew, MD, PhD; Christopher Bever, MD; Nicholas Buendia, MD; Remzi Demir, MD; John Echkholdt, MD; Nirmala Fernback, MD; Jerold Fleishman; Benjamin Frishberg, MD; Stuart Goodman, MD, PhD; Norman Hershowitz, MD, PhD; Luke Kao, MD, PhD; Ramesh Khurana, MD; John Kurtzke, MD; William Leahy, MD; William Lightfoote II, MD; Michael Miller, MD, PhD; Harshad Mody, MBBS; Marvin Mordes, MD; Seth Morgan, MD; Howard Moses, MD; Mark Ozer, MD; Roger Packer, MD; Philip Pulaski, MD; Nagbushan Rao, MD; Marc Raphaelson, MD; Solomon Robbins, MD; David Satinsky, MD; Michael Sellman, PhD; Arthur Siebens, MD; Harold Stevens, MD, PhD; Dean Tippett, MD; Michael Weinrich, MD; Roger Weir, MD; Richard Weisman, MD; Don Wood; MD; and Mahammed Yaseen, MD.
In addition, the study could not have been completed without support from the administration and medical records staff at the following institutions: in Maryland, Anne Arundel Medical Center, Bon Secours Hospital, Calvert Memorial Hospital, Church Hospital Corporation, Doctors Community Hospital, Franklin Square Hospital Center, The Good Samaritan Hospital of Maryland Inc, Greater Baltimore Medical Center, Greater Laurel Beltsville Hospital, Hadley Memorial Hospital, Harbor Hospital Center, Holy Cross Hospital, The Johns Hopkins Bayview Medical Center Inc, The Johns Hopkins Hospital, Howard County General Hospital Inc, Liberty Medical Center Inc, Maryland General Hospital, Mercy Medical Center, Montebello Rehabilitation Hospital, Montgomery General Hospital, North Arundel Hospital, Northwest Hospital Center, Prince George’s Hospital Center, Saint Agnes Hospital, Saint Joseph Hospital, Shady Grove Adventist Hospital, Sibley Memorial Hospital, Sinai Hospital of Baltimore, Southern Maryland Hospital Center, Suburban Hospital, The Union Memorial Hospital, University of Maryland Medical System, Department of Veterans Affairs Medical Center in Baltimore, and Washington Adventist Hospital; and in Washington, DC, Children’s National Medical Center, District of Columbia General Hospital, The George Washington University Medical Center, Georgetown University Hospital, Greater Southeast Community Hospital, Howard University Hospital, National Rehabilitation Hospital, Providence Hospital, Veterans Affairs Medical Center, and The Washington Hospital Center.
- Received June 27, 1994.
- Revision received October 13, 1994.
- Accepted October 13, 1994.
- Copyright © 1995 by American Heart Association
Adams HP, Bendixen BH, Kappelle LJ, Biller J, Love BB, Gordon DL, Marsh EE, and the TOAST Investigators. Classification of subtype of acute ischemic stroke: definitions for use in a multicenter clinical trial. Stroke. 1993;24:35-41.
Kessler C, Freyberger HJ, Dittmann V, Ringelstein EB. Interrater reliability in the assessment of neurovascular diseases. Cerebrovasc Dis. 1991;1:43-48.
Gordon DL, Bendixen BH, Adams HP, Clarke W, Kappelle LJ, Woolson RF, the TOAST Investigators. Interphysician agreement in the diagnosis of subtypes of acute ischemic stroke: implications for clinical trials. Neurology. 1993;43:1021-1027.
Foulkes MA, Wolf PA, Price TR, Mohr JP, Hier DB. The Stroke Data Bank: design, methods, and baseline characteristics. Stroke. 1988;19:547-554.
Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. New York, NY: John Wiley & Sons, Inc; 1981:212-225.
Lindley RI, Warlow CP, Wardlaw JM, Dennis MS, Slattery J, Sandercock PAG. Interobserver reliability of a clinical classification of acute cerebral infarction. Stroke. 1993;24:1801-1804.
van Donselaar CA, Geerts AT, Meulstee J, Habbema JDF, Staal A. Reliability of the diagnosis of a first seziure. Neurology. 1989;39:267-271.