A Cautionary Note
See related article, pages 1353–1358.
Drug development for acute stroke treatments is complex, very expensive, very time-consuming, and frequently disappointing. Surrogate outcome theory states that new therapies for stroke could be evaluated with smaller sample sizes than are required when traditional functional outcome measures are used by using surrogate outcomes in clinical trials. This would enable new therapies to be discarded or confirmed as candidates for larger definitive clinical trials more rapidly than at present.
This theory presupposes that surrogate outcomes work by removing the “noise” inherent in all clinically based assessments so that the therapeutic “signal” can be detected more clearly, thus requiring fewer patients to achieve a positive result. It also assumes that the effect of the treatment on the surrogate outcome profiles the therapeutic signal equally and uniformly across all key clinical characteristics (like stroke severity or age). Thus, it assumes that the surrogate mirrors both the beneficial effects and risks so that no systematic bias is inadvertently introduced that might skew the results away from the true therapeutic effect. Smaller samples should reduce costs, speed up drug assessments, and hence enable effective treatments to be found more quickly. It is therefore understandable why surrogates appear attractive, particularly for future trials in acute ischemic stroke, an area in which the discovery of effective new treatments has proved elusive.
Imaging has the potential to provide various surrogate outcome markers for acute ischemic stroke trials. Evidence of ischemic tissue injury is easy to see and appears simple to measure on MR diffusion imaging performed very early after onset; on fluid-attenuated inversion recovery at subacute times; and on T2-weighted imaging at later times. Recent thinking proposes that the amount of potentially salvageable penumbral tissue may be estimated by the difference between the perfusion and diffusion imaging lesions at presentation and the extent of reperfused tissue at follow-up by the reduction in the extent of the perfusion lesion. Thus, the size of the initial ischemic lesion at presentation, the amount of growth by later time points, the proportion with, and the extent of, any penumbra (by cross-sectional area or volume) could all be determined in patients allocated to placebo and compared with patients allocated to active treatment.
However, this presupposes that all patients will have the required imaging outcome and that there is no interaction between survival (ie, being able to obtain the imaging outcome variable of interest, eg, final infarct size) and the treatment under evaluation. It depends on there being no interaction between the 2 main variables of interest (like final infarct volume and treatment allocation) and any other factor such as initial infarct size. It also requires measurement error for key outcome measures to be small and less than those of clinical outcome measures and, finally, that each imaging variable has been standardized. The measurement error for diffusion lesions is not trivial,1–3 and may actually inflate the sample size.2,3 The use of thresholds may improve reproducibility but not validity.1 The choice of perfusion metric or threshold still requires standardization4 because different perfusion metrics yield substantially different estimates of lesion volume and hence differing proportions of patients with penumbra.5 All that aside, is there evidence that imaging surrogate outcomes could be useful?
The Echoplanar Imaging Thrombolysis Evaluation Trial (EPITHET) aimed to determine whether the presence and extent of the ischemic penumbra in acute stroke using diffusion and perfusion imaging would predict patients most likely to respond to thrombolytic therapy.6 Using the EPITHET definition, 86% of patients had penumbra. The 90-day MR scan was required to assess final infarct extent, but because approximately 30% of patients did not have the 90-day scan, the missing scan results were imputed from the “last value carried forward.” The primary analysis showed no difference in geometric mean infarct growth between baseline diffusion-weighted imaging and the 90-day final T2-weighted imaging lesion (exponential of the mean log of relative growth, 1.24 with alteplase and 1.78 with placebo, Student t test, P=0.239) or in the median relative infarct growth (1.18 with alteplase, 1.79 with placebo, Wilcoxon’s test, P=0.054), or in functional outcome between patients allocated to alteplase and patients allocated to placebo. However, reperfusion was more common with alteplase than with placebo and was associated with less infarct growth (P=0.001), better neurological outcome (P<0.0001), and better functional outcome (P=0.010) than no reperfusion.6 There were too few patients without penumbral tissue to determine whether the effect of alteplase was materially different in patients who had, versus who did not have, penumbra.
In this edition of Stroke, the EPITHET authors further explore whether an earlier time point than 90 days could be used to assess the final infarct because this would help to overcome the problem of increasing numbers of missing scans at later times. Of 101 patients originally included in EPITHET, 91 were scanned at Days 3 to 5 and only 72 at 90 days.7 The authors found little difference in lesion size between 3- to 5-day and 90-day scans; and similar correlations among acute, 3- to 5- and 90-day lesion volume, and the 90-day National Institutes of Health Stroke Scale score. They also found that alteplase reduced infarct growth (change in infarct volume from baseline to 3- to 5-day diffusion-weighted imaging) at 3 to 5 days (P=0.03) in contrast to the negative results of the primary analysis when 90-day final infarct volume was used.6 They suggest that sample size for future stroke treatment trials could be reduced by using growth in MR diffusion lesion volume between baseline and 3 to 5 days as the primary outcome rather than wait and scan at 90 days and risk loss of outcome data through missing scans. Removing the requirement for 90-day assessment would also reduce costs and speed up trial completion. So what are the bear traps?
One danger is that by focusing on imaging surrogate outcomes, to which only patients who survive and can be scanned can contribute, important adverse clinical effects may be missed. For example, in the case of alteplase in EPITHET, almost twice as many patients who received alteplase did not have the 90-day scan (18 alteplase versus 10 placebo) and this was mostly because they had died (13 alteplase versus 7 placebo). Even at 3 to 5 days, the majority of the 10 patients who were not scanned were from the alteplase group and were unable to have the scan because they had died (6 alteplase versus one placebo). Thus, although the imaging surrogate picked up a positive signal for alteplase (less infarct growth by 3 to 5 days), it completely missed the negative clinical signal of increased death. Perhaps we need imaging surrogates for adverse effects, not just for the beneficial effects. One would hope that if in the future, new drugs for stroke are tested in trials with an imaging surrogate, this would be quickly followed by trials based on relevant clinical outcomes that were large enough to identify important benefits and exclude major hazards. Perhaps use of an imaging surrogate would be less likely to risk missing an important adverse effect in the case of drugs with few hazards, but drugs with few side effects tend to have less striking beneficial effects as well and therefore may be of limited value in acute ischemic stroke.
Are there other interactions that need to be considered regarding use of infarct growth or infarct volume as an imaging surrogate? The imbalance in lesion volume at baseline in EPITHET is a problem, although it was not conventionally significant. Increase in infarct volume is not simply due to an increase in the extent of tissue involved, but is also due to swelling of the affected tissue. The degree of swelling is variable, but all acute ischemic lesions swell up to some extent.8 Swelling is maximal at approximately 3 to 5 days.9 Simply measuring volume does not differentiate between an increase in infarct size due to true increase in extent from that due to increase in swelling without any increase in extent. The major determinant of infarct swelling is initial infarct size. Large infarcts swell more than small ones.9 Therefore, any imbalance in lesion size at baseline between treatment groups, even one which is not conventionally significant, could introduce a systematic bias between treatment groups that is difficult to correct even with appropriate statistical adjustment. The group with larger baseline lesions will have more swelling and therefore apparently larger 3- to 5-day lesion volumes without there necessarily having been any increase in the actual lesion extent. On a similar note, lesions that are smaller to start with are likely to be smaller at 3 to 5 days simply because they have further to grow than larger lesions.
In EPITHET, the baseline lesions were almost 50% larger in the placebo group (20.51 mL) than the alteplase group (14.37 mL, P=nonsignificant), equivalent to diameters of 3.4 cm and 3.0 cm, respectively. This difference might not seem like much, possibly quite difficult to detect by eye, but remember that the majority of the volume of a lesion is in its outer layer. A small proportionate increase in diameter (eg, from 9 to 10 cm [10%]) equates to a much larger proportionate increase in volume (in this example, from 382 to 524 mL [30%]). Thus, the lesions in the EPITHET placebo group at 3 to 5 days would be expected to be larger than those in the alteplase group because, being larger to start with, they were going to swell up more, regardless of whether they actually grew in extent. The greater swelling would proportionately increase their volume even further.
Are imaging surrogates likely to be better outcome predictors than clinical variables? The nonlinear association between the National Institutes of Health Stroke Scale and functional outcome (very steep in mild stroke, much less steep in severe stroke)10 explains why, in the case of diffusion imaging lesions, there is a weak independent association between lesion volume and functional outcome in severe strokes, but not in moderate to mild strokes in which the association is swamped by the overwhelmingly powerful association between National Institutes of Health Stroke Scale and functional outcome.11,12 Therefore, how close diffusion lesion volume comes to being a true surrogate outcome measure for functional outcome will depend on the case mix of patients included in the trial.
Thrombolysis increases the rate of reperfusion and reperfusion is highly beneficial,13 whether spontaneous (as occurs in approximately 20% of patients by 24 hours) or therapeutically induced.14 For a given size of infarct, reperfusion within the first 24 hours is also associated with less infarct swelling at 3 to 5 days than if there had been no reperfusion,9 a further reason why alteplase-allocated patients would have smaller infarcts at 3 to 5 days. Perhaps, in a situation such as this in which the main mechanism of action of the drug is known, measuring reperfusion would be a more direct surrogate for alteplase. Infarct volume has been widely used as the primary outcome measure in experimental studies of new stroke treatments, yet so many new pharmaceuticals coming through this route have failed to realize their promise in clinical trials, often because an unexpected side effect emerged. Surrogate outcomes may be useful if used cautiously and judiciously, but if so, we must not forget to look “outside the box” to see what else the treatment under assessment might be doing.
The opinions in this editorial are not necessarily those of the editors or of the American Heart Association.
Ay H, Arsava EM, Vangel M, Oner B, Zhu M, Wu O, Singhal A, Koroshetz WJ, Sorensen AG. Interexaminer difference in infarct volume measurements on MRI: a source of variance in stroke research. Stroke. 2008; 39: 1171–1176.
Rana AK, Wardlaw JM, Armitage PA, Bastin ME. Apparent diffusion coefficient (ADC) measurements may be more reliable and reproducible than lesion volume on diffusion-weighted images from patients with acute ischaemic stroke—implications for study design. Magn Reson Imaging. 2003; 21: 617–624.
Wintermark M, Albers GW, Alexandrov AV, Alger JR, Bammer R, Baron JC, Davis S, Demaerschalk BM, Derdeyn CP, Donnan GA, Eastwood JD, Fiebach JB, Fisher M, Furie KL, Goldmakher GV, Hacke W, Kidwell CS, Kloska SP, Köhrmann M, Koroshetz W, Lee TY, Lees KR, Lev MH, Liebeskind DS, Ostergaard L, Powers WJ, Provenzale J, Schellinger P, Silbergleit R, Sorensen AG, Wardlaw J, Wu O, Warach S. Acute stroke imaging research roadmap. Stroke. 2008; 39: 1621–1628.
Kane I, Carpenter T, Chappell F, Rivers C, Armitage P, Sandercock P, Wardlaw J. Comparison of 10 different magnetic resonance perfusion imaging processing methods in acute ischemic stroke. Effect on lesion size, proportion of patients with diffusion/perfusion mismatch, clinical scores, and radiologic outcomes. Stroke. 2007; 38: 3158–3164.
Davis SM, Donnan G, Parsons MW, Levi C, Butcher KS, Peeters A, Barber PA, Bladin C, De Silva DA, Byrnes G, Chalk JB, Fink JN, Kimber TE, Schultz D, Hand PJ, Frayne J, Hankey G, Muir K, Gerraty R, Tress BM, Desmond PM; EPITHET investigators. Effects of alteplase beyond 3 h after stroke in the Echoplanar Imaging Thrombolytic Evaluation Trial (EPITHET): a placebo-controlled randomised trial. Lancet Neurol. 2008; 7: 299–309.
Ebinger M, Christensen S, De Silva DA, Parsons MW, Levi CR, Butcher KS, Bladin CF, Barber PA, Donnan GA, Davis SM; for the EPITHET Investigators. Expediting MRI-based proof-of-concept stroke trials using an earlier imaging end point. Stroke. 2009; 40: 1353–1358.
Adams HP, Davis PH, Leira EC, Chang KC, Bendixen BH, Clarke WR, Woolson RF, Hansen MD. Baseline NIH Stroke Scale score strongly predicts outcome after stroke. A report of the Trial of Org 10172 in Acute Stroke Treatment (TOAST). Neurology. 2003; 53: 126–135.
Hand PJ, Wardlaw JM, Rivers CS, Armitage PA, Bastin ME, Lindley RI, Dennis MS. MR diffusion-weighted imaging and outcome prediction after ischemic stroke. Neurology. 2006; 66: 1159–1163.
Johnston KC, Wagner DP, Wang XQ, Newman GC, Thijs V, Sen S, Warach S; GAIN, Citicoline, and ASAP Investigators. Validation of an acute ischemic stroke model. Does diffusion-weighted imaging lesion volume offer a clinically significant improvement in prediction of outcome? Stroke. 2007; 38: 1820–1825.
Wardlaw JM, Murray V, Sandercock PAG. Thrombolysis for acute ischaemic stroke. An update of the Cochrane thrombolysis meta-analysis. Int J Stroke. 2008; 3 (suppl 1): 50.