Rapid Assessment of Perfusion–Diffusion Mismatch
Background and Purpose— For MR perfusion–diffusion (PWI-DWI) mismatch to become routine in thrombolysis patient selection, rapid and reliable assessment tools are required. We examined interrater variability in PWI/DWI volume measurements and developed a rapid assessment tool based on the Alberta Stroke Program Early CT Scores (ASPECTS) system.
Methods— DWI and PWI were performed in 35 patients with stroke <6 hours after symptom onset. DWI lesion and PWI (time to peak) volumes were measured with planimetric techniques by 4 raters and the 95% limits of agreement calculated. ASPECT scores were assessed separately by 4 investigators (2 experienced and 2 inexperienced) for DWI (MR DWI scores) and PWI (MR time to peak scores). MR mismatch scores were calculated as MR DWI-MR time to peak scores.
Results— Interobserver variability was much greater for PWI (95% limit of agreement=±72.3 mL) than for DWI (95% limit of agreement=±12.6 mL). A semiautomated PWI volume (time to peak+2 s) was therefore used to calculate mismatch volume. MR mismatch scores ≥2 predicted 20% PWI-DWI mismatch by volume with mean 78% sensitivity (range, 72% to 84%) and 88% specificity (range, 83% to 90%). There was excellent agreement on mismatch classification using MR mismatch scores between experienced raters (weighted kappa scores of 0.94) with agreement in 34 of 35 cases. Agreement was less consistent between inexperienced raters (weighted kappa=0.49, 28 of 35 cases).
Conclusions— Variability in planimetric mismatch measurements arises primarily from differences in PWI volume assessment. High specificity and interrater reliability may make MR mismatch scores an ideal rapid screening tool for potential thrombolysis patients.
Mismatch between a larger perfusion-weighted imaging (PWI) abnormality and a smaller diffusion-weighted (DWI) lesion has been postulated to represent the ischemic penumbra.1,2 This tissue is at risk for infarction, but may also be potentially amenable to salvage with thrombolysis. Although the mismatch hypothesis remains unproven, it is being used with increasing frequency in acute stroke studies and even clinical practice.3,4 If PWI-DWI mismatch is to be incorporated into the routine selection process for thrombolysis, rapid assessment tools with good interrater reliability are required. There is presently no standardized method for rapid assessment of PWI-DWI mismatch.
It has previously been demonstrated that purely subjective assessments of PWI-DWI mismatch have poor interrater reliability.5 Conversely, interrater agreement of DWI volume has been shown to be excellent.6 We therefore hypothesized that the source of disagreement between raters assessing mismatch is related to interpretation of the perfusion images.
Although planimetric volume measurement is accurate, it does require time intensive operator input, which may delay acute stroke therapy. Clinicians generally rely on qualitative assessments of images, sometimes in conjunction with rating systems. The Alberta Stroke Program Early CT Score (ASPECTS) is a validated semiquantitative scale useful for assessing the extent of ischemic changes within the middle cerebral artery (MCA) territory.7,8 This is a negative ordinal scale in which normal-appearing brains are scored as “10” and those with ischemic changes involving the entire MCA territory are rated “0.” The ASPECTS system has also been successfully applied to perfusion CT and MRI.9,10 By applying ASPECTS to DWI and PWI sequences, we have developed a novel tool known as the MR mismatch score.
We had 2 aims in this study, the first of which was to identify the sources of error in mismatch assessment and develop a solution. The second was to apply ASPECT scores to PWI and DWI and determine whether this novel tool could be used to identify patients with tissue at risk for infarction.
Forty patients with acute ischemic stroke were prospectively recruited from 9 centers participating in the Echoplanar Imaging Thrombolysis Evaluation Trial (EPITHET) and imaged with MRI 3 to 6 hours after symptom onset. Evaluation was restricted to the 35 of the 40 patients who had MCA territory infarction. Patients with anterior cerebral artery (n=1) and posterior circulation (n=3) infarcts were excluded because ASPECTS was designed for assessment of MCA territory infarcts. In addition, one patient with technically inadequate PWI data was also excluded. Informed consent was obtained from the patient/next of kin and local human research committees approved the protocol.
Noncontrast CT scans were obtained before MRI. Patients with intracerebral hemorrhage or ischemic changes more than one third of the MCA territory were excluded as per the EPITHET protocol.11 MRI scans were obtained with 1.5-T EPI-equipped scanners (GE Signa/Siemens Vision/Symphony/Philips Intera). Perfusion-weighted images were obtained using a bolus of gadolinium di-ethylentetriamine penta-acetic acid (0.2 mmol/kg), injected at 5 mL/s followed by 15 mL of saline. Twelve to 16 slices (32 to 50 time points) were obtained. Slice thickness was 5 to 6 mm +1-mm gap, matrix sizes were 128×128/256×256, and field of view=40×40 cm. Diffusion-weighted images were obtained with single-shot spin-echo EPI sequences. Sixteen to 20 slices 5 to 6 mm +1-mm gap were obtained. Matrix size was 128×128/256×256, field of view=40×40 cm, and TR/TE 6000/107 ms. Diffusion gradient strength was varied between 0 and 22 mT/m, resulting in b values of 0, 500, and 1000 s/mm.
Postprocessing of raw perfusion images was performed centrally by a single investigator using the software package Stroketool (DIS, Dusseldorf, Germany).12 This software was used to plot the change in MRI transverse relaxivity, which is linearly related to gadolinium di-ethylentetriamine penta-acetic acid concentration, on a per-voxel basis over time. Time to peak of the impulse response curve (Tmax) maps was calculated using single value decomposition. This technique allows the impulse response curve to be calculated as a deconvolution of the raw perfusion images using an arterial input function.13 The arterial input function was selected from the MCA contralateral to the affected hemisphere. Isotropic DWI images were obtained by averaging the signal from all orthogonal directions with the highest diffusion weighting (b=1000).
Regional PWI and DWI image analysis was performed using the Analyze software package (Biomedical Imaging Resource, Rochester, NY). Isotropic DWI hyperintense regions were outlined visually by 2 stroke neurologists and 2 stroke Fellows. The 2 stroke neurologists were considered experienced on the basis that each had >4 years performing volumetric analysis. The 2 stroke Fellows had less than 6 months’ experience and were considered to be inexperienced raters. Investigators were free to vary intensity window level and width settings. Tmax volumes were outlined in the same manner. Investigators individually calculated mean Tmax in contralateral homologous regions. The latter were mirror images of the ipsilateral regions of interest reflected on a 180° axis. Each investigator then applied a threshold to Tmax maps based on mean contralateral values +2 s and calculated a second volume. Mismatch was calculated as the difference in volumes between each observer’s Tmax, Tmax+2 s, and DWI measurements. Each investigator required approximately 20 minutes to complete all planimetric measurements for each patient.
A semiautomated Tmax volume was calculated separately using a +2-s threshold relative to the start of the impulse response. This Tmax+2 volume was used as the objective reference to which all semiquantitative mismatch assessments were compared. Standardized mismatch volumes were calculated as the difference between this semiautomated Tmax+2 s volume and the mean DWI volume measurement for all raters. A mismatch pattern was considered to be present if the standardized mismatch volume exceeded DWI volume by at least 20%.
MR ASPECT Scores
ASPECT scores of DWI (MR DWI scores) and Tmax (MR Tmax scores) images were then recorded independently >2 weeks after the planimetric volume measurements. The 2 ASPECTS slices at and immediately superior to the basal ganglia were first identified on the structural T2-weighted images on which DWI and PWI sequences are based. MR DWI and MR Tmax scores were then rated by each investigator. Hyperintensity on DWI or Tmax prolongation within an ASPECTS region resulted in a deduction of 1 point on each score. MR mismatch scores were then calculated by subtracting MR Tmax scores from MR DWI scores.
Statistical analysis was performed using Stata (Statacorp). The 95% limits of agreement (mean difference±2 SDs) for DWI and PWI regions of interest were calculated and interrater differences illustrated using Bland-Altman plots.14 The ability of MR mismatch scores to predict PWI-DWI volume mismatch was assessed with receiver-operator characteristic curves for each rater. Interrater receiver-operator characteristic differences were tested using a χ2 test of the area under each curve. Interrater reliability of MR DWI scores, MR Tmax scores, and MR mismatch scores was assessed with a weighted kappa analysis. Kappa scores were weighted to penalize differences of >1 as described previously.15
The 35 patients (27 men; median age, 73 years; range, 39 to 87 years) were imaged with MRI at a median of 4.5 hours (range, 2.7 to 5.6 hours) after symptom onset. Median acute National Institutes of Health Stroke Scale score was 11 (range, 4 to 25).
Planimetric Diffusion-Weighted Imaging, Perfusion-Weighted Imaging, and Mismatch Volumes
Interrater differences in DWI lesion volume measurements for each patient assessed by the 4 investigators are illustrated in Figure 1. The reference DWI volume of each individual patient was the mean measurement of all 4 observers as previously described.14 The mean DWI reference volume of the entire sample of 35 patients was 51.5±52.4 mL. The mean difference between individual raters and the reference volumes ranged from a minimum of −1.8 mL to a maximum of +2.4 mL (Figure 1). The 95% limits of agreement, for absolute volumes, between all 4 observers were ±12.6 mL. Experienced observers measured slightly larger volumes on average relative to the inexperienced observers, but overall differences were not significant (Figure 1). Thus, DWI lesion volumes varied very little between observers.
Interrater disagreement was much greater for PWI measures. The mean Tmax abnormality reference volume of the sample was 163.4±87.2 mL. The mean difference in Tmax volumes, relative to the reference value, ranged from −47.1 to +22.9 mL and the 95% limits of agreement were ±72.3 mL (Figure 1). Inexperienced observers tended to draw smaller regions of interest on average.
Mismatch volumes also varied widely between observers. The mean mismatch volume was 111.8±81.5 mL. The mean difference in mismatch volumes, relative to the reference value, was similar to that of the Tmax volume differences, ranging from −45.4 to +20.5 mL and the 95% limits of agreement were ±70.2 mL (Figure 1).
Perfusion-Weighted Imaging Threshold Application Effects
Application of a PWI threshold, relative to the contralateral hemisphere, was associated with a substantial decrease in interrater variability of perfusion deficit volume measurements. The mean Tmax+2 s abnormality volume was 93.8±62.8 mL. The mean difference in Tmax+2 s measured volumes, relative to the reference semiautomated value, ranged from −15.1 to +15.7 mL and the 95% limits of agreement narrowed to ±31.5 mL (Figure 2). The semiautomated reference Tmax+2 s volume was very similar to the mean volume calculated by all 4 raters.
Calculation of mismatch using Tmax+2 s volumes was also associated with reduced interobserver variability. The mean mismatch volume was 42.2±60.7 mL. The mean difference in mismatch volume, relative to the reference value, ranged from −13.4 to +14.7 mL and the 95% limits of agreement were ±33.1 mL.
MR Mismatch Scores
Examples of MR mismatch score assessment are shown in Figure 3. The median MR DWI scores, MR Tmax scores, and MR mismatch scores were 7, 3, and 2, respectively. MR DWI scores were inversely correlated with planimetric DWI volumes (ρ=−0.75, P<0.001; Figure 4). MR Tmax scores were also inversely correlated with planimetric Tmax+2 s volumes (ρ=−0.64, P<0.001; Figure 4). MR mismatch scores (MR DWI-MR Tmax) correlated with planimetric mismatch, calculated as Tmax+2 s−DWI volume (ρ=0.67, P<0.001; Figure 4).
A total of 26 patients had a standardized mismatch pattern. The ability of the MR mismatch score to predict this definition of mismatch is illustrated with the receiver-operator characteristic curve in Figure 5. Receiver-operator characteristic curve analysis indicated an MR mismatch score of ≥2 provided optimal sensitivity and specificity for prediction of mismatch by volume. An MR mismatch score of ≥2 predicted >20% mismatch by volume with a mean sensitivity of 78% (interrater range 72% to 84%) and specificity of 88% (interrater range 83% to 90%). The mean correct classification rate was 83% (interrater range 77% to 90%). Although the area under the receiver-operator characteristic curve of one of the inexperienced raters was slightly smaller than the other investigators, the differences were not significant (χ2=4.68, P=0.20). The optimal cut point MR mismatch score of ≥2 was the same for all 4 users regardless of experience (Figure 5).
Weighted kappa scores indicated excellent interrater agreement between the experienced users for MR DWI, MR Tmax, and MR mismatch scores (Table). Inexperienced raters, however, had only a fair interrater agreement rate. Experienced raters agreed on mismatch classification, using MR mismatch scores, in 34 of 35 cases (weighted kappa=0.94). The agreement rate decreased to 28 of 35 cases between inexperienced raters (weighted kappa=0.49).
Although the mismatch hypothesis has not yet been tested conclusively, penumbral selection is being used to select patients for acute thrombolytic therapy.3,4 Rapid and reliable assessment of PWI-DWI mismatch is therefore an important goal. In this study, we have shown that ASPECT scores of PWI and DWI data can be used to derive a semiquantitative ordinal scale, which predicts PWI-DWI mismatch by volume with high sensitivity and specificity. This MR mismatch score may be calculated without the time-intensive investigator input required for planimetric analysis so that access to acute stroke therapies need not be delayed. This study has also confirmed that interobserver variability of mismatch volume assessment is significant, even when planimetric measurement techniques are used. The source of disagreement is primarily related to interpretation of the PWI data. Agreement is improved by application of a perfusion threshold, which serves to standardize the data in an objective manner.
Planimetric Assessment of Mismatch
Visual delineation of DWI lesions and PWI deficits is commonly used in acute stroke MRI research studies. We have shown that planimetric DWI volumes are quite consistent between observers, even those with relatively little experience. This is consistent with previous reports that interobserver variation of DWI volumes is <5%.16–18 A recent systematic evaluation indicates that the mean absolute difference in DWI planimetric volume measurements made by 2 raters is 2.4±4.7 mL.19 Conversely, these authors observed larger interrater differences in measured PWI volumes similar to our own (19.4±34.6 mL).19 The borders of perfusion deficits are subject to greater disagreement primarily due to the fact that the periphery of the PWI abnormality often contains tissue with heterogeneous oligemia, much of which is unlikely to be at risk of infarction (Figure 2).1 This can lead to significant variation in measured volumes and therefore mismatch assessment.
Despite interobserver disagreement, planimetric techniques are the most accurate method available for measuring volumes and should remain the standard used in reporting the results of MRI-based research studies. In the absence of validated quantitative perfusion measures, we suggest that a minimal threshold, such as +2 s, should be used to standardize all PWI time domain maps. Tmax+2 s was used as a standardized PWI measure in a recent observational study of MRI profiles and thrombolysis.20 This parameter will also be used to define mismatch in the primary EPITHET analysis.21
Semiquantitative Assessment of Mismatch: MR Mismatch Scores
Planimetric measurement techniques are presently reserved for in-depth “offline” analyses. Fully automated planimetric assessment tools may one day allow accurate volume calculations in the hyperacute setting, but this is not possible with standard clinical MRI software currently in use. Instead, most clinicians and diagnosticians make qualitative assessments of lesion size, which are potentially prone to error.5 It has been shown that semiquantitative assessment scales can improve recognition of significant patterns in acute stroke imaging.8,15
Our findings indicated that the MR mismatch scoring system predicted mismatch by volume with greater specificity than sensitivity. As many as 28% of patients with mismatch by volume (>20%) were assessed as having a nonmismatch pattern by raters using MR mismatch scores. This may not represent a disadvantage of using an ASPECTS-based system to assess acute stroke MRI images. On the contrary, because the optimal criteria for definition of significant mismatch have not been established, it is possible that the qualitative score provides a more accurate estimation of salvageable tissue than the arbitrary 20% volumetric definition. An MR mismatch score of 2 indicates that 2 MCA regions are hypoperfused, but not yet compromised. Although the planimetric measurements indicated that some patients had at least 20% mismatch by volume in the presence of MR mismatch scores of less than 2, these may not be ideal thrombolysis candidates, because the hypoperfused regions already had evidence of some tissue compromise. This is also consistent with another proposed advantage of the ASPECTS system, specifically that functionally important subcortical regions with smaller volumes are given the same weight as the larger cortical regions.8,22 It must be emphasized that the true significance of a 20% mismatch has not yet been established. Accordingly, although we have shown an MR mismatch score of ≥2 predicts 20% mismatch, this does not in itself imply these represent the optimal thrombolysis candidates.
The primary advantage of applying semiquantitative ordinal scales to stroke image analysis is improved interrater agreement over subjective binary assessments. ASPECT scores have previously been shown to standardize acute stroke noncontrast CT assessment7,15 and have been applied to CT perfusion and CT angiographic source images with good interrater reliability.10,23 The present investigation indicates that semiquantitative assessments of mismatch can reliably be made by different observers, but, like with CT, this improves with experience.24,25 Nonetheless, variability between our inexperienced raters was still superior to a previous report of purely qualitative mismatch assessments.5
In contrast to the relatively large interobserver variability of planimetric Tmax measurements, interrater agreement appears to be very similar for MR Tmax and MR DWI scores. This likely reflects the fact that a regional analysis does not necessitate absolute agreement. The source of disagreement between investigators is generally at the periphery of the PWI deficit; however, in the majority of cases, these areas are smaller than an entire ASPECTS region. Thus, although investigators will measure different planimetric volumes, they may often record the same MR mismatch scores.
The chief limitation of this study is the use of the same investigators in the volumetric and MR mismatch score portions of the investigation. Furthermore, volume measurements and MR mismatch scoring were completed in a sequential and nonrandomized fashion. We attempted to minimize the effect of prior experience with a substantial time interval between assessments as has been reported in previous studies.5 In addition, we have applied the mismatch scoring system to only one set of PWI maps (Tmax). It remains to be determined if the system is as effective with other PWI parameters, including time to peak, mean transit time, and relative CBF maps. Finally, this study lacks outcome data to assess the ability of MR mismatch scores to predict final infarction. This will be performed at the completion of the EPITHET study, an ongoing randomized, controlled trial of tissue plasminogen activator versus placebo in the 3- to 6-hour time window.
Sources of Funding
The Investigator-led EPITHET trial is supported by the National Health and Medical Research Council of Australia. Study drugs were provided by Boehringer Ingelheim.
- Received April 5, 2007.
- Accepted May 1, 2007.
Kidwell CS, Alger JR, Saver JL. Beyond mismatch: evolving paradigms in imaging the ischemic penumbra with multimodal magnetic resonance imaging. Stroke. 2003; 34: 2729–2735.
Hacke W, Albers G, Al-Rawi Y, Bogousslavsky J, Davalos A, Eliasziw M, Fischer M, Furlan A, Kaste M, Lees KR, Soehngen M, Warach S. The Desmoteplase in Acute Ischemic Stroke Trial (DIAS): a phase II MRI-based 9-hour window acute stroke thrombolysis trial with intravenous desmoteplase. Stroke. 2005; 36: 66–73.
Coutts SB, Simon JE, Tomanek AI, Barber PA, Chan J, Hudon ME, Mitchell JR, Frayne R, Eliasziw M, Buchan AM, Demchuk AM. Reliability of assessing percentage of diffusion–perfusion mismatch. Stroke. 2003; 34: 1681–1683.
Barber PA, Darby DG, Desmond PM, Gerraty RP, Yang Q, Li T, Jolley D, Donnan GA, Tress BM, Davis SM. Identification of major ischemic change. Diffusion-weighted imaging versus computed tomography. Stroke. 1999; 30: 2059–2065.
Pexman JHW, Barber PA, Hill MD, Sevick RJ, Demchuk AM, Hudon ME, Hu WY, Buchan AM. Use of the Alberta Stroke Program Early CT Score (ASPECTS) for assessing CT scans in patients with acute stroke. AJNR Am J Neuroradiol. 2001; 22: 1534–1542.
Barber PA, Hill MD, Eliasziw M, Demchuk AM, Pexman JH, Hudon ME, Tomanek A, Frayne R, Buchan AM. Imaging of the brain in acute ischaemic stroke: comparison of computed tomography and magnetic resonance diffusion-weighted imaging. J Neurol Neurosurg Psychiatry. 2005; 76: 1528–1533.
Echoplanar Imaging Thrombolysis Evaluation Trial (EPITHET) web site. Available at: www.astn.org.au/epithet. Accessed April 1, 2007.
Coutts SB, Demchuk AM, Barber PA, Hu WY, Simon JE, Buchan AM, Hill MD. Interobserver variation of ASPECTS in real time. Stroke. 2004; 35: e103–105.
Barber PA, Darby DG, Desmond PM, Yang Q, Gerraty RP, Jolley D, Donnan GA, Tress BM, Davis SM. Prediction of stroke outcome with echoplanar perfusion- and diffusion-weighted MRI. Neurology. 1998; 51: 418–426.
Luby M, Bykowski JL, Schellinger PD, Merino JG, Warach S. Intra- and interrater reliability of ischemic lesion volume measurements on diffusion-weighted, mean transit time and fluid-attenuated inversion recovery MRI. Stroke. 2006; 37: 2951–2956.
Albers GW, Wechsler L, Kemp S, Schlaug G, Skalabrin E, Bammer R, Kakuda W, Lansberg MG, Qshuaib A, Coplin W, Hamilton S, Moseley M, Marks MP; for the DEFUSE Investigators. Magnetic resonance imaging profiles predict clinical response to early reperfusion: the diffusion and perfusion imaging evaluation for understanding stroke evolution (DEFUSE) study. Ann Neurol. 2006; 60: 508–517.
Butcher KS, Parsons M, MacGregor L, Barber PA, Chalk J, Bladin C, Levi C, Kimber T, Schultz D, Fink J, Tress B, Donnan G, Davis S; for the EPITHET Investigators. Refining the perfusion–diffusion mismatch hypothesis. Stroke. 2005; 1153–1159.
Effect of intravenous recombinant tissue plasminogen activator on ischemic stroke lesion size measured by computed tomography. NINDS; The National Institute of Neurological Disorders and Stroke (NINDS) rt-PA Stroke Study Group. Stroke. 2000; 31: 2912–2919.
Coutts SB, Lev MH, Eliasziw M, Roccatagliata L, Hill MD, Schwamm LH, Pexman JH, Koroshetz WJ, Hudon ME, Buchan AM, Gonzalez RG, Demchuk AM. ASPECTS on CTA source images versus unenhanced CT: added value in predicting final infarct extent and clinical outcome. Stroke. 2004; 35: 2472–2476.
Mak HKF, Yau KKW, Khong P-L, Ching ASC, Cheng P-W, Au-Yeung PKM, Pang PKM, Wong KCW, Chan BPL. Hypodensity of >1/3 middle cerebral artery territory versus Alberta Stroke Programme Early CT Score (ASPECTS): comparison of two methods of quantitative evaluation of early CT changes in hyperacute ischemic stroke in the community setting. Stroke. 2003; 34: 1194–1196.