Heterogeneity of Stroke Pathophysiology and Neuroprotective Clinical Trial Design
Background and Purpose— Tissue substrates for action of neuroprotective agents may be absent in a significant proportion of strokes. Pathophysiological heterogeneity is a possible contributor to negative neuroprotective trials.
Methods— Stroke subtypes and their individual outcomes in neuroprotective trial control populations were used to derive models incorporating accuracy of clinical classification and probability of an ischemic penumbra. With the use of treatment effect sizes from successful trials (predominantly of reperfusion therapies), sample sizes for neuroprotective trials were calculated. The potential influence of altered recruitment strategies was explored.
Results— The proportion of informative patients in 2 large neuroprotective trials was probably only 27% to 30%. Optimistically, this proportion may be 50%; pessimistically, it may be only 17%. These figures necessitate a sample size of 3700 to 4500 subjects per group; at best, 1800 to 2200 are needed per group with optimistic assumptions about treatment effect. Strategies to enhance the proportion with tissue substrate for neuroprotection could reduce sample size to 500 per group and simultaneously reduce the total number of patients screened compared with inclusive trials.
Conclusions— Population heterogeneity alone may be sufficient to explain negative neuroprotective trials because even in the largest trials to date sample size is inadequate to detect effect size equivalent to those with thrombolysis, and it is possible that they have been severely underpowered. Reliable trials with inclusive entry criteria may be too large to be commercially feasible for novel compounds. Both sample size and total number of patients needing to be screened should be reduced by restricting entry to patients more likely to have a tissue target.
Clinical trials of neuroprotective therapies have to date been uniformly negative. The failure of trials to confirm clinical benefits implied by infarct volume reduction in animal models may result from pharmacological factors (irrelevance of the pharmacological target to human ischemia), clinical pharmacological factors (insufficient dose, inadequate treatment duration, or unfavorable pharmacokinetics), or inadequacies of trial design. Deficiencies of pharmacology and clinical pharmacology in neuroprotective drug development have undoubtedly contributed to the lack of trial success, and possible remedies have been considered.1 Potential problems with the statistical power of neuroprotective trials have been identified,2,3⇓ but the impact of stroke heterogeneity on trial design has not been assessed.
Pathophysiological heterogeneity is of particular relevance to neuroprotection. There are 2 principal groups of patients in whom neuroprotective therapies are unlikely to be effective: (1) those lacking a biological substrate relevant to the mode of action of the drug—eg, N-methyl-d-aspartate antagonists influence neuronal cell body survival but probably have no effect on white-matter injury,4 and excess glutamate release may play a role in cortical infarction but not lacunar stroke5—and (2) those lacking an ischemic penumbra, which seems likely to be of restricted volume in most patients but with wide interindividual variation6 and is probably absent in some stroke types such as intracerebral hemorrhage7 and lacunes. It is possible that a third group, those who do not reperfuse, also lacks a biological substrate for neuroprotection. By concentrating on early histological outcome, animal studies have left uncertainty over whether many neuroprotective drugs effect lasting reduction in infarct volume, especially in the absence of reperfusion, or simply prolong viability until delayed reperfusion occurs.
This article explores the potential influence of the inclusion of patients lacking a tissue target for neuroprotective drugs in clinical trials, with implications for trial design.
Models exploring the influence of stroke heterogeneity on sample size requirements and the magnitude of treatment effects were developed through the use of data derived from published trials. The possible impact of changed recruitment strategies was also explored. Assumptions are detailed below.
Imaging data do not support the existence of a penumbra around a primary intracerebral hematoma (PICH).7 There is also no biological target for most neuroprotective agents in lacunar strokes, which are characterized by white-matter ischemia, result mostly from end-artery disease, and by definition have no collateral flow for either penumbral conditions or drug delivery. It is therefore assumed that patients with intracerebral hematoma or lacunar strokes are not amenable to neuroprotection.
Imaging studies emphasize the limited sensitivity of clinical diagnosis in the acute phase of stroke, particularly in distinguishing lacunar from partial middle cerebral artery (MCA) syndromes.8–10⇓⇓ Models therefore assume that a proportion of clinically diagnosed lacunar syndromes will in fact result from partial MCA occlusion and vice versa. Similar misclassification of large MCA syndromes that are subsequently found to be restricted is assumed on the basis of existing data.9
These assumptions were applied to different stroke subtypes in 3 models called realistic, optimistic, and pessimistic (Table 1). Diffusion-perfusion mismatch on MRI (DWI-PWI mismatch) was assumed to correspond to a penumbra for practical purposes and is documented in ≈70% of patients with MCA occlusion and 30% of patients with patent MCAs within 6 hours of onset.11 These rates have been applied to the clinical syndromes of complete MCA infarction (total anterior circulation syndromes [TACS] by the Oxfordshire Community Stroke Project [OCSP] classification12) and partial MCA infarction (partial anterior circulation syndromes [PACS] by OCSP).
Estimates of effect size expressed as relative risk reduction (RRR) and 95% confidence intervals (95% CIs) were derived from 3 positive stroke trials published to date: the National Institute of Neurological Disorders and Stroke Recombinant Tissue Plasminogen Activator (NINDS rtPA) trial,13 Prolyse in Acute Cerebral Thromboembolism II (PROACT II) trial,14 and Stroke Treatment With Ancrod Trial (STAT).15 These are detailed in Table 2. Data from NINDS part 2 only were usable for the dichotomous outcome using a Barthel Index (BI) <55/100 to signify poor outcome because part 1 data for this end point were not published. Therefore, CIs for this end point are probably broader in this model than is truly the case. Neuroprotective trials have generally defined death and disability as BI <60/100, whereas the 3 positive trials have defined poor outcome as BI <95/100. Data for both definitions were examined. Effect sizes for pooled data appear to be consistent, with an RRR of 12% (95% CI, 10 to 16) for BI <55 to 65 and 15% (95% CI, 12 to 19) for BI <95. In case of a detrimental influence of the smaller effect size in STAT, pooled effect sizes were also calculated for the 2 thrombolysis trials: these were an RRR of 9% (95% CI, 6 to 12) for BI <55 to 65 and 18% (95% CI, 14 to 24) for BI <95. An RRR of 10% to 20% therefore seems most plausible.
Sample size calculations assumed 90% power and 2-tailed significance at P=0.05.
Control Event Rates
Combined event rates in the control groups are given in Table 2. Event rates in stroke subtypes were derived from those in the Chlomethiazole Acute Stroke Study (CLASS).8 Event rates in patients with confirmed proximal MCA occlusion were derived from the PROACT II trial.14
Alternative Recruitment Strategies
The effect of 4 different strategies was explored: (1) eliminating PICH by pretreatment CT, (2) restricting recruitment to complete MCA syndromes (TACS), (3) restricting recruitment to patients with DWI-PWI mismatch, and (4) restricting recruitment to patients with MCA occlusion confirmed on imaging. Strategy 1 removes 10% to 20% of patients from 6-hour window trials (Figure 1). Strategy 2 restricts recruitment to about one third of current trial populations, but up to 30% of patients may be misclassified and a further 23% may spontaneously recanalize.16 Strategy 3 restricts recruitment to 20% to 56% of patients with anterior circulation stroke confirmed to involve the MCA territory,11,16⇓ probably 70% of those clinically diagnosed as such.16 Strategy 4 may restrict recruitment to 5% of patients if conventional angiography forms the basis for diagnosis but may include 40% to 60% of anterior circulation strokes (about two thirds of current trial populations) if alternative techniques such as MR angiography,16,17⇓ CT angiography, and transcranial Doppler ultrasound are included.18 MCA proximal occlusion correlates closely with the presence of PWI-DWI mismatch and lesion volume expansion.11,17,19,20⇓⇓⇓ Alternative recruitment strategies were explored through the use of the realistic model estimates for tissue targets and event rates modified to correspond to the scenarios described. The sample size for different RRRs was calculated for each, as was the proportion of screened patients likely to be eligible for each scenario using the study flow data from PROACT II (Figure 2).
Two large neuroprotective clinical trial populations were considered in each of the model situations: those in CLASS and in the Glycine Antagonist in Neuroprotection–International (GAIN-I) trial.21 The GAIN trial reported proportions of patients with intracerebral hemorrhage (18.6%) and lacunar stroke (18%) but did not specify cortical stroke subtypes by OCSP; therefore, 30% have been assumed to be PACS and 33% to be TACS.
The CLASS and GAIN-I trial populations are described in Table 3. In each trial, the proportion of potentially uninformative patients (those with no tissue target) based on the model assumptions is given. For optimistic, realistic, and pessimistic models, the proportions were 0.46, 0.36, and 0.21, respectively, in CLASS and 0.41, 0.32, and 0.19, respectively, in GAIN-I. The sample size estimates for different effect sizes are given in Table 4. The total number of patients enrolled in CLASS was 1360; the 2 GAIN trials enrolled 3391. The models indicate that GAIN at best may have been only 77% of the necessary size and CLASS at best 38%; at worst, these trials may have been only 4% and 2% of necessary size, respectively, and for the realistic model and RRR of 15%, still only 22% and 11%, respectively.
The influence of the proportion of informative patients in a trial on sample size is shown in Figure 3, with calculations for different effect sizes.
Alternative entry criteria limit eligibility and restrict recruitment to varying proportions of those screened (Table 5). Judging from PROACT II screening figures, up to 30% of patients may be eligible for a 6-hour window neuroprotective trial, but restriction to patients with DWI-PWI mismatch on MRI or MCA occlusion reduces this to 5% to 7% (Figure 2). The 30% figure represents a very optimistic result in the experience of most stroke centers. The number of patients needing to be screened for each patient eligible is therefore 3 for current inclusive entry criteria and increases to 15 to 18 for screening on the basis of imaging correlates of a penumbra. However, because the proportion of informative patients is higher for imaging-based trials, fewer patients must be screened for this type of trial than for conventional trials: ≈12 600 using conventional inclusion criteria and 20% RRR compared with 5100 for DWI-PWI mismatch.
When the proportions of patients with BI <95 were used to define poor outcome (using event rates published for CLASS PICH and TACS patients and extrapolations for LACS [35% poor outcome] and PACS [60%]), there was a reduction in sample size requirements of ≈40%. Further exploration of the possible influence of different end-point dichotomies is shown in Figure 4. Because outcome event rates are similar for different end points (For example, in the placebo group in GAIN-I, 34% had BI >90, 28% had modified Rankin Scale score <2, and 26% had National Institutes of Health Stroke Scale score <2), there were no substantial changes in the results for different scales.
Inclusion of patients not affected by a treatment may have important implications for the outcome of clinical trials,22,23⇓ influencing estimates of benefit and potentially rendering trials negative. It has been argued that stroke trials to date have suffered from overoptimism, with assumptions of large effect sizes leading to an all-inclusive approach in phase III trials that ignores fundamental differences in stroke mechanism.14,24⇓ The PROACT II trial results14 supported the view that smaller trials in clearly defined and homogeneous stroke pathologies can yield answers that may be obscured by dilution in larger groups of patients who are not as well characterized.
Most neuroprotective trials to date have been powered to detect effects on the order of an absolute risk reduction of 10%. This strategy reflected a combination of anticipation that large reductions in histological infarct volume in animal models would translate into large functional effects in humans, extrapolation from small phase II trials that risk biasing effect size estimates upwards,2 and limited recruitment rates resulting from the inherent structural inadequacies of stroke care organization, as well as commercial considerations. The only data that permit estimation of the likely effect size of an efficacious treatment are derived from reperfusion therapies: the NINDS rtPA trial, PROACT II, and STAT. Event rates in the control groups of these trials are comparable to those in large neuroprotective studies, and the entry criteria are comparable, except for excluding hemorrhage by CT. Realistic estimates of potential treatment effect indicate an RRR of 10% to 20% for detrimental outcome (death and dependence), with an upper 95% CI limit of 24% for the 2 thrombolytic trials.
Assuming neuroprotection to have a more restricted target population than reperfusion therapies for the reasons outlined above, the sample size estimates derived from the models in this study indicate a requirement for ≈4000 patients per treatment arm to detect even large treatment effects (RRR of 20%) using current trial entry criteria and a typical trial end point. Even optimistic assumptions about treatment effect, accuracy of initial diagnosis, and penumbra yield a sample size of 1800 to 2200 per group. To detect a more modest treatment effect (RRR of 10%), the minimum sample size exceeds 5000 per group; more patients may need to be recruited than have taken part in all trials ever undertaken of calcium antagonists (≈7500) or glutamate antagonists (≈11 000). At best, even the GAIN trial program may have recruited only three quarters of the necessary subjects. Negative results from trials to date may thus potentially be explained entirely on the basis of stroke heterogeneity.
There may be important statistical advantages in using a definition that increases the proportion of patients categorized as having poor outcome, with reduction of sample size of ≈40% if BI <95 is used as an end point (Figure 4). Although this implies that GAIN-I may have had adequate sample size to detect a large treatment effect (RRR of 20%) in the realistic model, it would still be only 75% of the required size for a more modest 15% RRR, and CLASS remains too small to detect anything other than a large effect with optimistic assumptions. It remains unclear whether full or nearly full recovery is an appropriate end point for a neuroprotective treatment as opposed to reperfusion therapy. Although this end point would be valid if the assumption of proportional odds reduction is correct (ie, patients move across all categories of outcome in the same proportion in response to treatment), this remains unproven for neuroprotective agents, and there are biological reasons for uncertainty.
The proportion of informative patients should be enhanced by restricting trial time windows further (perhaps to 3 hours), excluding PICH with CT, instituting mandatory clinical criteria such as minimum severity on stroke scales or the presence of cortical features, and using MRI signatures of a penumbra. All of these strategies restrict recruitment rates, but sample size may to be reduced to 400 per treatment arm to show benefit using conventional outcome measures. Despite needing to screen ≈18 patients per eligible subject using MRI criteria (6 times as many as with conventional criteria), the total number needing to be screened for a trial should in fact be far fewer than for inclusive entry criteria. Costs may be increased through the need to screen more ineligible patients with expensive imaging and the likely need to expand the number of participating centers and lengthen study duration. Although there are advantages in conducting very large, inclusive trials with appropriate subgroup analyses rather than very restricted trials, a more restrictive trial model may be more cost effective in early drug development.
There are inevitable caveats about the results of this modeling exercise. First, the assumption of a lack of effect in hemorrhage and lacunar syndromes is based on limited data and may be overly pessimistic. Reperfusion appears beneficial in patients classified as having small-vessel disease despite similar mechanistic arguments against it, and ultimately there remains uncertainty about the pathological basis for many lacunar infarcts. All MRI data are based on very small numbers of patients and almost certainly reflect a biased sample. Second, the numbers of patients with MRI evidence of penumbra may be less than assumed. A further concern is the restricted volume of the penumbra in MRI and PET studies25,26⇓; if anatomically very limited, then even major salvage of target tissue may be without clinically detectable (or relevant) benefit. Third, the clinical classification and scales used are subject to error, and the BI in particular has limited discriminatory utility as an outcome measure. The OCSP classification, although not designed for acute use, categorizes patients in a biologically relevant manner by distinguishing large from small cortical strokes and lacunes, something not possible from total stroke scale scores, for instance. There are no published data with alternative categories, but the same modeling exercise could, for example, be applied on the basis of imaging findings. With respect to outcome, almost all trials have used a responder analysis based on the proportions in favorable versus unfavorable outcome categories, and the results from this model using the BI can be extrapolated to any other dichotomous outcome measure, including handicap scales such as the modified Rankin Scale or imaging-based biomarkers (eg, proportion with expansion of MRI lesion volume27) with little change in the implications for sample size, although there are some advantages in choosing a smaller rather than larger outcome category, as shown in Figure 4. Finally, event rates derived from reperfusion trials may be too conservative because the net RRR reflects an aggregate of very favorable outcomes in patients who reperfuse early and less favorable outcomes in those who do not or those who encounter hemorrhagic complications. Reperfusion trials may in fact have a proportion of uninformative patients similar to that of neuroprotective trials. However, most patients in the trials used to estimate effect size were randomized within 3 hours, and effect sizes are likely to be greater than achievable with neuroprotectives, typically delivered 4 to 6 hours after onset. There was no significant heterogeneity between stroke subtypes with respect to benefit. Most importantly, had the true effect size of any of half a dozen neuroprotective agents been as great as or greater than that evident in reperfusion trials, it should have been seen in the trials, many of which had sample sizes many times greater than NINDS. In the absence of any data from neuroprotective trials themselves, effect size estimates represent the best information from which to extrapolate.
In conclusion, heterogeneity of stroke populations recruited to typical neuroprotective trials may reduce substantially the likelihood of showing efficacy, even if a neuroprotective agent were to have an effect size equivalent to thrombolytics. Logical and feasible strategies to limit the variability in the patient populations recruited may enhance the ability of future trials to demonstrate efficacy.
Dr Muir is supported by Chest, Heart and Stroke, Scotland.
- Received November 14, 2001.
- Revision received January 7, 2002.
- Accepted January 22, 2002.
- ↵Stroke Therapy Academic Industry Roundtable (STAIR). Recommendations for standards regarding preclinical neuroprotective and restorative drug development. Stroke. 1999; 30: 2752–2758.
- ↵Samsa GP, Matchar DB. Have randomized controlled trials of neuroprotective drugs been underpowered? An illustration of three statistical principles. Stroke. 2001; 32: 669–674.
- ↵Dorman PJ, Sandercock PA. Considerations in the design of clinical trials of neuroprotective therapy in acute stroke. Stroke. 1996; 27: 1507–1515.
- ↵Castillo J, Davalos A, Lema M, Serena J, Noya M. Glutamate is a marker for cerebral ischemia in cortical but not deep infarcts. Cerebrovasc Dis. 1997; 7: 245–250.
- ↵Heiss WD, Grond M, Thiel A, von Stockhausen HM, Rudolf J, Ghaemi M, Lottgen J, Stenzel C, Pawlik G. Tissue at risk of infarction rescued by early reperfusion: a positron emission tomography study in systemic recombinant tissue plasminogen activator thrombolysis of acute stroke. J Cereb Blood Flow Metab. 1998; 18: 1298–1307.
- ↵Hirano T, Read SJ, Abbott DF, Sachinidis JI, Tochon-Danguy HJ, Egan GF, Bladin CF, Scott AM, McKay WJ, Donnan GA. No evidence of hypoxic tissue on 18F-fluoromisonidazole PET after intracerebral hemorrhage. Neurology. 1999; 53: 2179–2182.
- ↵Wahlgren NG, Ranasinha KW, Rosolacci T, Franke CL, van Erven PM, Ashwood T, Claesson L. Clomethiazole Acute Stroke Study (CLASS): results of a randomized, controlled trial of clomethiazole versus placebo in 1360 acute stroke patients. Stroke. 1999; 30: 21–28.
- ↵Lee LJ, Kidwell CS, Alger J, Starkman S, Saver JL. Impact on stroke subtype diagnosis of early diffusion-weighted magnetic resonance imaging and magnetic resonance angiography. Stroke. 2000; 31: 1081–1089.
- ↵Toni D, Iweins F, von Kummer R, Busse O, Bogousslavsky J, Falcou A, Lesaffre E, Lenzi GL. Identification of lacunar infarcts before thrombolysis in the ECASS I study. Neurology. 2000; 54: 684–688.
- ↵Barber PA, Davis SM, Darby DG, Desmond PM, Gerraty RP, Yang Q, Jolley D, Donnan GA, Tress BM. Absent middle cerebral artery flow predicts the presence and evolution of the ischemic penumbra. Neurology. 1999; 52: 1125–1132.
- ↵Furlan A, Higashida R, Wechsler L, Gent M, Rowley H, Kase C, Pessin M, Ahuja A, Callahan F, Clark WM, Silver F, Rivera F. Intra-arterial prourokinase for acute ischemic stroke: the PROACT II study: a randomized controlled trial: Prolyse in Acute Cerebral Thromboembolism. JAMA. 1999; 282: 2003–2011.
- ↵Rordorf G, Koroshetz WJ, Copen WA, Cramer SC, Schaefer PW, Budzik RF, Jr, Schwamm LH, Buonanno F, Sorensen AG, Gonzalez G. Regional ischemia and ischemic injury in patients with acute middle cerebral artery stroke as defined by early diffusion-weighted and perfusion-weighted MRI. Stroke. 1998; 29: 939–943.
- ↵Wildermuth S, Knauth M, Brandt T, Winter R, Sartor K, Hacke W. Role of CT angiography in patient selection for thrombolytic therapy in acute hemispheric stroke. Stroke. 1998; 29: 935–938.
- ↵Barber PA, Darby DG, Desmond PM, Yang Q, Gerraty RP, Jolley D, Donnan GA, Tress BM, Davis SM. Prediction of stroke outcome with echoplanar perfusion- and diffusion-weighted MRI. Neurology. 1998; 51: 418–426.
- ↵Lees KR, Asplund K, Carolei A, Davis SM, Diener HC, Kaste M, Orgogozo JM, Whitehead J. Glycine antagonist (gavestinel) in neuroprotection (GAIN International) in patients with acute stroke: a randomised controlled trial: GAIN International Investigators. Lancet. 2000; 355: 1949–1954.
- ↵Barnett HJ, Taylor DW, Eliasziw M, Fox AJ, Ferguson GG, Haynes RB, Rankin RN, Clagett GP, Hachinski VC, Sackett DL, Thorpe KE, Meldrum HE. Benefit of carotid endarterectomy in patients with symptomatic moderate or severe stenosis: North American Symptomatic Carotid Endarterectomy Trial Collaborators. N Engl J Med. 1998; 339: 1415–1425.
- ↵Albers GW. Choice of endpoints in antiplatelet trials: which outcomes are most relevant to stroke patients? Neurology. 2000; 54: 1022–1028.
- ↵Muir KW, Grosset DG. Neuroprotection for acute stroke: making clinical trials work. Stroke. 1999; 30: 180–182.
- ↵Heiss WD, Thiel A, Grond M, Graf R. Which targets are relevant for therapy of acute ischemic stroke? Stroke. 1999; 30: 1486–1489.
- ↵Oppenheim C, Grandin C, Samson Y, Smith A, Duprez T, Marsault C, Cosnard G. Is there an apparent diffusion coefficient threshold in predicting tissue viability in hyperacute stroke? Stroke. 2001; 32: 2486–2491.