Agreement and Variability in the Interpretation of Early CT Changes in Stroke Patients Qualifying for Intravenous rtPA Therapy
Background and Purpose—Ischemic changes identified on CT scans performed in the first few hours after stroke onset, which are thought to possibly represent early cytotoxic edema and development of irreversible injury, may have important implications for subsequent treatment. However, insecurity and conflicting data exist over the ability of clinicians to correctly recognize and interpret these changes. We performed a detailed review of selected baseline CT scans from the NINDS rt-PA Stroke Trial to test agreement among experienced stroke specialists and other physicians on the presence of early CT ischemic changes.
Methods—Seventy baseline CT scans from the NINDS Stroke Trial were read and classified for the presence or absence of various early findings of ischemia by 16 individuals, including NINDS trial investigators, other neurologists, other emergency medicine physicians, and radiology or stroke fellows. CT scans included normal scans and scans from patients who later developed symptomatic intracranial hemorrhage, as well as scans on which the NINDS rt-PA Stroke Trial neuroradiologist identified clear-cut early CT changes. For each CT finding, κ-statistics were used to assess the proportion of agreement beyond chance.
Results—κ-Values (95% confidence interval [CI]) ranged from 0.20 (−0.20, 0.61) (fair agreement) to 0.41 (0.37, 0.45) (moderate agreement) among the 16 viewers, and the κ-value was only 0.39 (0.29, 0.49) (fair) in answer to the question “do early CT changes involve more than one third of the MCA [middle cerebral artery] territory?” There was substantial variability within each specialty group and between groups. κ-Values were only fair to moderate even among physicians experienced in selecting and treating acute stroke patients with rtPA. Observed agreement ranged from 68% to 85%. Physicians agreed on the finding of early CT changes involving >33% of the MCA territory 77% of the time, although the κ-value of 0.39 suggested only moderate agreement beyond chance.
Conclusions—There is considerable lack of agreement, even among experienced clinicians, in recognizing and quantifying early CT changes. Improved methods of recognizing and quantifying early ischemic brain damage are needed.
Since US Food and Drug Administration (FDA) approval of intravenous rtPA therapy for stroke within 3 hours of symptom onset, some controversy has arisen concerning its use. One of the requirements that must be met before treatment is a noncontrast CT scan of the brain. One of the controversies surrounds the CT criteria for treatment.
In the NINDS rt-PA Stroke Trial on which FDA approval of this therapy was based, the CT scan was used primarily to exclude intracranial hemorrhage (ICH) or other unexpected pathology that might masquerade as acute stroke. However, secondary analysis of the NINDS data1 and experience with the use of intravenous rtPA given beyond 3 hours, most notably from the European Cooperative Acute Stroke Trial (ECASS),2 suggest that CT findings of advanced ischemia such as edema or mass effect or, in ECASS, more subtle changes involving >33% of the middle cerebral artery (MCA) territory are associated with increased risk of ICH. The ECASS investigators also suggested that such subtle changes involving <33% of the MCA territory might help identify the best candidates for thrombolysis.3 In the NINDS trial, edema and mass effect were not associated with a lack of response to therapy.4 These questions still require clarification.
Subtle changes of cerebral ischemia include hypoattenuation of the x-ray signal. Slight hypoattenuation of gray matter may be manifest as loss of the distinction between gray and white matter, especially between the basal ganglia and internal capsule or between the insular or frontoparietal cortex and underlying white matter. More marked hypoattenuation may appear as tissue “hypodensity,” in which either the gray or white matter appears darker than normal. Early swelling of brain tissue may be manifest as compression of cerebrospinal fluid (CSF) spaces, especially effacement of cortical sulci or the sylvian fissure.5 The pathological significance of such findings, in particular hypoattenuation of the x-ray signal, is uncertain, but it probably represents increased tissue water content caused by early cytotoxic edema,6 7 8 9 which may or may not signal irreversible injury. These early parenchymal changes will be collectively referred to as “early CT changes” in the present report.
Because of the critical importance of excluding ICH and the possible but uncertain importance of recognizing early CT changes of ischemia, insecurity exists over the ability of clinicians to correctly interpret CT scans in hyperacute stroke patients. One recent analysis has shown that ICH can be missed by treating physicians.10 Although excellent agreement about recognition of early CT changes has been reported among neuroradiologists with specific expertise in acute stroke diagnosis,11 interobserver variability and reliability in interpreting early CT changes have not been evaluated systematically among the groups of clinicians who most frequently are responsible for interpreting CT scans in the emergency department and making decisions about rtPA treatment. At one of the NINDS rt-PA Stroke Trial sites, a small, unpublished pilot study showed a high prevalence of early CT changes but poor interobserver agreement in identifying them on representative baseline CT scans obtained during the NINDS study.
Herein, we report a more detailed, formal analysis of a sample of the NINDS rt-PA Stroke Trial baseline CT scans. The primary hypothesis of this substudy was that there would be disagreement among experienced stroke clinicians themselves, as well as between stroke specialists and other physicians, about the presence of early CT changes. Additional analysis of the NINDS database to determine whether these early CT changes predict outcome, response to rtPA therapy, or symptomatic ICH will be the subject of a separate report.
We used the baseline pretreatment CT scans from the NINDS rt-PA Stroke Trial. This double-blind, placebo-controlled, randomized trial compared rtPA and placebo in patients who could be treated within 3 hours of stroke symptom onset. The methodology and results of this trial have been published.12 The study was conducted at 8 centers (43 hospitals). Uniform standards were established for CT scanning, which was required before randomization in all patients. All of the scans were performed on third- or fourth-generation scanners with 10-mm slice thickness and without contrast enhancement. Technical factors included the following: 120 kV, 170 mA, matrix size 512×512, and scanning time 3 seconds for posterior fossa and 2 seconds for the supratentorial compartment. Original hard copies of all CT scans were sent to the coordinating center at Henry Ford Hospital in Detroit, Mich.
This new review of baseline CT scans in the NINDS rt-PA Stroke Trial was conducted because of the considerations outlined in the introduction to this article. The baseline NINDS rt-PA Stroke Study CT scans were reread for early ischemic changes to address the question of interobserver reliability as well as their relationship to outcome, treatment response, and ICH risk. Early changes were defined as described above, and a case-review form was designed. The baseline scans of 70 patients were randomly selected and stratified by patient status, allowing overlaps. The patients selected included those who had normal scans with baseline National Institutes of Health Stroke Scale (NIHSS) scores <15 (n=20) or ≥15 (n=20), who had abnormal scans demonstrating edema or mass effect (n=15), and who later developed hemorrhages within 36 hours of treatment (n=15). As a result of overlaps, of 70 patients, there were 38 normal scans (NIHSS ≥15 in 19 patients), 20 scans from patients who later developed hemorrhages, and 19 scans demonstrating edema or mass effect. According to the original NINDS definition, “edema” was defined as an area in which the tissue density was less than that of white matter but higher than CSF. “Mass effect” was defined as effacement of cerebral sulci, compression or effacement of the basal cistern/sylvian fossa, or compression of the ventricular system. The scans were read by 16 physicians in a single room at a day-long session at Henry Ford Hospital on July 9, 1997. The readers were unaware which CT scans were from ICH patients and which had been read by Dr Patel as demonstrating edema or mass effect.
The readers fell into ≥1 of the following categories: 9 investigators who treated patients in the NINDS trial, 9 neurologists, 3 emergency department physicians who treat stroke patients in the emergency department on a regular basis (1 of whom was also trained as a neurologist), and 2 stroke fellows, all of whom were assembled from members of the stroke teams at each of the participating centers in the NINDS rt-PA Stroke Trial. In addition, the group included 2 radiology fellows who were completing their training at Henry Ford Hospital. The review process began with a short tutorial session by Dr Patel, including case studies demonstrating each of the possible CT findings listed on the review form. Then, all scans selected for review (original films) were presented on a screen by use of an overhead projector. Clinical data summarizing the neurological deficit (side of lesion, NIHSS score, and time from stroke system onset to CT scan) were provided. Readers were allowed unlimited time to read the scans and could take them to a view box to scrutinize them, although this was done by only 2 readers. No discussion was allowed between observers. The same set of scans were reread by Dr Patel with the same clinical information.
Our hypotheses were as follows: (1) interobserver agreement would be fair; (2) there would be differences in agreement between types of physicians; and (3) agreement would be better with scans that demonstrated more marked changes, such as edema and mass effect.
To compare agreement within and among physician groups for each CT finding, κ-statistics were used to assess the proportion of agreement beyond that expected by chance. κ-Statistics were used where no gold standard was assumed. κ-Statistics were calculated by methods that provide an average of all the pairwise κ-statistics among the readers in a group.13 They were classified into 6 categories based on Landis and Koch14 : “almost perfect” if κ=0.81 to 1.00, “substantial” if κ=0.61 to 0.80, “moderate” if κ=0.41 to 0.60, “fair” if κ=0.21 to 0.40, “slight” if κ=0.00 to 0.20, and “poor” if κ<0.00.
The nature of the study sample, as well as the measurement procedure, can influence the κ-value. Kraemer15 provided an example of a data set based on a diagnostic procedure with both high sensitivity and specificity of 0.95, a low prevalence of 0.05, and a κ-value of 0.45. Therefore, to optimally measure the agreement beyond chance, we calculated an average “balanced Kappa”16 17 for each physician group, assuming balanced prevalence of CT findings. Balancing was accomplished by repeated sampling. We used the trial neuroradiologist’s reading as the gold standard to determine the initial prevalence of each CT finding. The observed and expected agreements were also reported. We also conducted pairwise comparisons of the difference in agreement among mutually exclusive physician groups (emergency department physicians, neurologists, stroke fellows, and radiology fellows) using a randomization test that adjusted for multiple comparisons.18
As a secondary analysis, the physician groups were compared with the gold standard (the trial neuroradiologist). The mean and standard deviation for sensitivity and specificity by physician groups were determined. We adjusted for multiple comparisons using the Bonferroni approach (α/6).
Among the 70 CT scans, the neuroradiologist (“gold standard”) found ≥1 early CT change in 29 (41%). The frequency of the types of abnormalities found were as follows: loss of gray-white matter distinction in 23 (33%), hypodensity in 14 (20%), and compression of CSF spaces in 20 (29%). The κ-statistics (ie, agreement beyond chance) for each of the possible early CT changes among the 16 viewers are listed in Table 1⇓. They ranged from a low of 0.20 (fair) for hypodensity involving >33% of the MCA territory to a high of 0.41 (moderate) for CSF compression. With regard to the answer to the question “do early CT changes involve more than 33% of the MCA territory?” the κ-statistic was 0.39 (fair).
The κ-statistics for each of the physician groups for each of the possible early CT changes are listed in Table 2⇓. There was substantial variability within each group across findings and among groups. Agreement among emergency medicine physicians ranged from poor to fair but was not significantly different compared with agreement among other physician groups for any of the CT findings. The emergency medicine physicians differed from neurologists in determining compression of CSF spaces involving >33% of the MCA (P=0.04), but this difference was not statistically significant after adjustment for multiple comparisons. Even among the physicians who were involved in selecting and treating patients within the NINDS rt-PA Trial, κ-statistics were only fair to moderate. For instance, in answer to the question “do early CT changes involve more than one third of the MCA territory?” the κ-statistic among radiology fellows was 0.50 (moderate), and among treating physicians from the NINDS trial, it was 0.32 (fair).
When observations were limited only to those scans that the neuroradiologist had originally identified as demonstrating edema or mass effect, ie, those with the most advanced CT abnormalities, κ-statistics were still low in answer to the question “do early CT changes involve more than one third of the MCA territory?” They ranged from a low of 0.07 (slight) among emergency department physicians to 0.37 (fair) among treating physicians from the NINDS trial and 0.74 (substantial) among stroke fellows.
Interrater agreement (κ) also depends on how precisely the findings were defined and on the gold standard for the positive finding. To assess the quality of the gold standard, we compared the CT finding location at baseline with the lesion location at 24 hours for the placebo-treated group, excluding patients who had an old lesion at baseline. The positive predictive value was 96% (95% CI 92% to 100%).
When the neuroradiologist was used as the gold standard, on average (mean±SD), reviewers identified 78±16% of the true-positives for the presence of any early CT change (sensitivity) and 57±23% of true-negatives (specificity). For early CT findings involving >33% of the MCA, on average, reviewers identified 44±20% of the true-positives and 94±8% of the true-negatives.
Neither the method of viewing the films (use of an x-ray view box in addition to overhead projection of the films) nor possible improvement in CT image quality over the course of the study affected agreement by chance (κ-statistics) among viewers (P>0.14).
The main finding in the present study is that although early CT changes of ischemia may be present within 3 hours of stroke symptom onset, agreement is no better than moderate among all physician groups in recognizing and quantifying these findings. These results are consistent with previous studies that have shown that nonradiologists and others without specific expertise in stroke may miss subtle changes or even hemorrhage on CT.10 In the present study, identification and measurement of early CT findings were inconsistent even among experienced stroke physicians.
In one study limited to neuroradiologists who had particular expertise in evaluating diagnostic studies in acute stroke patients, interobserver agreement beyond chance, based on 78% prevalence of early CT changes, was reportedly moderate, with a κ-statistic of 0.53.11 The 3 expert readers agreed on CT changes in only 35 (70%) of 50 scans. In another study,19 almost identical interobserver agreement was reported among experienced neuroradiologists viewing a sample of scans with a 73% prevalence of hypodensity. Determining the exact extent of hypodensity was less reliable. Although the observed agreement in the present study, based on 53% prevalence, was similar (68%), our κ-statistic of 0.33 indicated only fair agreement beyond chance. These differences in κ-statistics are probably attributable to differences in the prevalence of abnormal findings in the 2 studies.
Interobserver agreement is not entirely explainable on the basis of experience and expertise, because in the present study, agreement was only fair among those stroke specialists who were principal investigators in the NINDS rt-PA Stroke Trial and who each had several years’ experience reading CT scans in hyperacute patients. It is possible that specific training in recognizing early CT changes can improve interobserver agreement and may account for some of the differences in reliability reported by different studies. No formal training other than a brief orientation was performed before the scans in the present study were read. More detailed training reportedly improved recognition of similar CT abnormalities in another study.20 After group training of the ECASS investigators, the number of patients enrolled in that 6-hour thrombolytic trial, with CT changes involving >33% of the MCA territory according to the central reading committee of neuroradiologists, was reduced from 8.3% in ECASS I2 to 4.6% in ECASS II.21 However, other investigators found approximately the same interobserver variability as in the present study even after specific training was provided.22
Surprisingly, it was also not clear from our results that 1 or more of the early CT changes proved to be easier or harder to detect than others. It was particularly sobering to find that the lowest κ-statistics were found among physicians asked to determine whether hypodensity involved >33% of the MCA territory. Even when scans were limited to those in which the neuroradiologist found the most definitive changes, agreement was still only fair to moderate in detecting changes involving >33% of the MCA territory. Therefore, it appears that determining the degree and extent of early CT changes is particularly troublesome when the “eyeball” technique is used.
The κ-statistic is commonly used to evaluate categorical data for the assessment of “agreement beyond chance” but has several complexities that should be considered in interpreting the results reported herein. Landis and Koch introduced 6-category κ values14 that were slightly different from the 3 categories (excellent >0.75, fair to good 0.40 to 0.75, and poor <0.40) proposed by Siegel.13 Seigel et al23 studied both grading schemes and concluded that the Landis and Koch method tended to be more charitable with their adjectives. We believed that the Landis and Koch scale would be more relevant for assessing agreement in an imaging study.
It is possible that relatively low κ-statistics such as observed in the present study can be found despite high levels of interobserver agreement.17 This can occur when the prevalence of the trait (eg, early CT changes in the present study) is low. This problem can be obviated by repeating sampling. However, even when this is done, a substantial lack of agreement persisted in the present study. As seen in Table 1⇑, for instance, the observed agreement for detecting early CT changes involving >33% of the MCA territory was 0.77 (meaning that physicians agreed on this finding 77% of the time), but expected agreement was high (0.62); the κ-statistic was only 0.39, and the 95% CI of the κ-statistics ranged from 0.29 to 0.49, meaning that the agreement beyond chance was still no better than moderate. Although agreement in 77% of cases might seem acceptable, in fact, if the finding is an important prognostic variable and is prevalent, then agreement could be considered no better than moderate if physicians disagreed in ≈25% of the cases.
The importance of our observations obviously depends on the pathophysiological and clinical meaning of early CT changes. It is likely that early CT changes of hypoattenuation, such as hypodensity or loss of gray-white matter distinction, represent cytotoxic edema. If so, they may reflect similar pathological changes as do abnormalities on diffusion-weighted MRI.24 Reduced diffusion on MRI is thought to be due to restricted movement of intracellular water. Hypoattenuation on CT and decreased diffusion on MRI may or may not be a sign that the injury is irreversible. Experimental and early clinical observations indicate that abnormalities on diffusion-weighted MRI usually are irreversible, although some reversal after neuroprotective therapy has been reported.25 Even if this observation is confirmed, it is not clear that the same therapeutic effect would be seen on either diffusion MRI or CT abnormalities in human stroke patients. In a small study comparing PET and early CT in 13 patients,26 early CT changes were associated with PET findings consistent with the presence of both irreversible and reversible damage within the region of CT abnormality.
The clinical meaning of early CT changes is also unclear. If these changes represent severely damaged brain, then associated tissue softening and disruption of the blood brain barrier may predispose to ICH. Indeed, the European experience with rtPA suggests that when early CT changes exceed 33% of the MCA, there is a higher incidence of ICH.2 3 Edema and mass effect were also associated with ICH in the NINDS study.1 It is also logical that if these changes represent irreversible damage, response to therapy will be less as they evolve to include a larger proportion of the arterial territory at risk. This relationship has not been evaluated carefully to date. In the ECASS data, patients with a relatively small quantity of early CT change (<33% of the MCA territory) appeared to benefit most from rtPA,3 but the explanation for this finding is not immediately apparent. CT changes of edema or mass effect did not predict response to therapy in the NINDS rt-PA Stroke Trial,4 but a reanalysis of the NINDS data after all baseline scans were reread for all subtle early CT changes has now been completed and will be the subject of a separate report.
On the basis of the results of the present study, if distinguishing the presence and extent of early CT changes proves to be meaningful, particularly for predicting response to therapy, more reliable methods of CT quantification than the eyeball method or other, more reliably quantified methods of imaging, such as diffusion MRI, will be needed.
The present study has several limitations. Although there were differences in some of the κ-statistics between physician groups, the number of physicians in each group was small, and the variability was high. Therefore, we have limited ability to reach definitive conclusions about agreement between groups.
Hard copies of the films were reviewed via overhead projectors and not on standard x-ray view boxes, as is usually done clinically. Furthermore, scans reviewed in the present study included many obtained in 1991 to 1992 on earlier-generation scanners, and it is likely that most CT scanners in use today have improved resolution. Analysis of our data did not suggest that either of these factors could explain all of the lack of agreement between readers, but our study was not designed to specifically address these questions.
Early CT changes may be present within the first 3 hours after the onset of stroke and may or may not influence patient selection for thrombolytic therapy. This study suggests that there is considerable lack of agreement, even among experienced clinicians, in recognizing and quantifying such early CT changes. Improved methods of recognizing and quantifying early CT changes are needed.
This study was supported by grants NO1-NS-02382, NO1-NS-02374, NO1-NS-02377, NO1-NS-02381, NO-NS-02379, NO-NS-02373, NO-NS-02376, NO1-NS-02378, and NO1-NS-02380 from the National Institute of Neurological Disorders and Stroke, Bethesda, Md.
Dr Frankel is a member of the Genentech Speaker’s Bureau.
- Received January 8, 1999.
- Revision received May 26, 1999.
- Accepted May 26, 1999.
- Copyright © 1999 by American Heart Association
Hacke W, Kaste M, Fieschi C, Toni D, Lesaffre E, von Kummer R, Boysen G, Bluhmki E, Höxter G, Mahagne M-H, Hennerici M, for the ECASS Study Group. Intracerebral hemorrhage after intravenous rt-PA therapy for ischemic stroke: the NINDS rt-PA Stroke Study Group. Stroke. 1997;28:2109–2118.
Generalized efficacy of rt-PA for acute stroke: subgroup analysis of the NINDS t-PA Stroke Trial. Stroke. 1997;28:2119–2125.
von Kummer R, Meyding-Lamade U, Forsting M, Rosin L, Rieke K, Hacke W, Sartor K. Sensitivity and prognostic value of early computed tomography in middle cerebral artery trunk occlusion. AJNR Am J Neuroradiol. 1994;15:9–15.
Schuir FJ, Hossmann KA. Experimental brain infarcts in cats, II: ischemic brain edema. Stroke. 1980;11:593–601.
Unger E, Littlefild J, Gado M. Water content and water structure in CT and MR signal changes: possible influence in detection of early stroke. AJNR Am J Neuroradiol. 1988;9:687–691.
von Kummer R, Weber J. Brain and vascular imaging in acute ischemic stroke: the potential of computed tomography. Neurology. 1997;49:S52–S55.
Marks MP, Holmgren EB, Fox AJ, Patel SC, von Kummer R, Froelich J. Evaluation of early computed tomographic findings in acute ischemic stroke. Stroke. 1999;30:389–392.
Siegel S. Nonparametric Statistics for the Behavioral Science. New York, NY: McGraw-Hill; 1998:284–289.
Edgington ES. Randomization Tests. 3rd ed. New York, NY: Marcel Dekker; 1995.
von Kummer, Holle R, Grzyska U, Hofmann E, Jansen O, Petersen D, Schumacher M, Sartor K. Interobserver agreement in assessing early CT signs of middle cerebral artery infarction. AJNR Am J Neuroradiol. 1996;17:1743–1748.
von Kummer R, Holle R, Meier D, for the ECASS Group. Effect of training on the recognition of large ischemic lesions on CT scans obtained within 6 hours of stroke onset. Stroke. 1998;29:310. Abstract.
Hacke W, Kaste M, Fieschi C, von Kummer R, Davalos A, Meier D, Larrue V, Bluhmki E, Davis S, Donnan G, Schneider D, Diez-Tejedor E, Trouillas P, for the Second European-Australasian Acute Stroke Study Investigators. Randomised double-blind placebo-controlled trial of thrombolytic therapy with intravenous alteplase in acute ischaemic stroke (ECASS II). Lancet. 1998;352:1245–1251.
Dippel D, Holle M, van Kooten F, Moseley ME, Cohen Y, Mintorovitch J, Chilewitt L, Shimiza H, Kucharczyk J, Wendland MF, Weinstein PR. The validity and reliability of early infarct signs on CT in acute ischemic stroke. Cerebrovasc Dis. 1997;7:15. Abstract.
Seigel DG, Podgor M, Remaley NA. Acceptable values of Kappa for comparison of two groups. Am J Epidemiol.. 1992;135:571–578.
Minematsu K, Fisher M, Li L, Davis MA, Knapp AG, Cotter RE, McBurney RN, Sotak CH. Effects of a novel NMDA antagonist on experimental stroke rapidly and quantitatively assessed by diffusion-weighted MRI. Neurology. 1993;43:397–403.
Fiorelli M, Marchal G, Iglesias S, Derlon JM, Viader F, Fieschi C, Baron JC. Early focal CT changes in patients with acute ischemic stroke: relationships with irreversibly damaged tissue. Stroke. 1998;29:310. Abstract.