| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(Stroke. 2005;36:2410.)
© 2005 American Heart Association, Inc.
Original Contributions |
From the Department of Biostatistics, Bioinformatics & Epidemiology (Y.Y.P., B.C.T., R.W.), Medical University of South Carolina, Charleston, SC; Trout Centre at Irish Lake (D.L.S.), Canada; and the Department of Neurology and Health Evaluation Sciences (K.C.J.), University of Virginia.
Correspondence to Yuko Y. Palesch, PhD, Department of Biostatistics, Bioinformatics & Epidemiology, Medical University of South Carolina, P.O. Box 250835, 135 Cannon St, Suite 303, Charleston, SC 29425. E-mail paleschy{at}musc.edu
| Abstract |
|---|
|
|
|---|
Methods To provide examples of the application of phase II methodology, we obtained primary outcome data for the active treatment group of 6 phase III ischemic stroke therapy trials. For each study, we estimated the sample size number required for a multistage single-arm study using parameters specified in the original study. We evaluated outcome data for the first number of subjects in the phase III study treatment arm ordered by enrollment dates. We compared the proportion of favorable outcomes to prespecified stopping criteria derived from a single-arm phase II futility design. If the observed proportion of favorable outcomes was less than the stopping criterion, we declared the treatment not sufficiently effective to warrant further evaluation in phase III.
Results We identified 3 trials as futile in phase II; none of 3 showed treatment efficacy in phase III. In the 3 remaining phase II trials in which we did not show futility, one showed efficacy in phase III.
Conclusion Single-arm phase II futility studies have been underused in stroke research, but provide a strategy for discarding treatments likely to be ineffective in phase III trials.
Key Words: clinical trials, phase II ischemia stroke
| Introduction |
|---|
|
|
|---|
Neuroprotective agents that proved efficacious in animal models of ischemic stroke have been tested extensively in phase III randomized trials.1 Kidwell et al examined publications through December 1999 and found 88 trials testing neuroprotective agents plus 26 testing combinations of neuroprotective and rheologic/antithrombotic agents.2 In all of these trials, many large and costly, the experimental treatments were considered futile.
Futility studies have been useful in the phase II evaluation of cancer treatments.36 The proportion of positive outcomes in a single, treated group is compared with the minimally worthwhile proportion of success expected by the drugs proponents. We applied this strategy to therapeutic agents for ischemic stroke to determine whether phase II futility studies could have prevented some of the futile, large, and costly phase III trials dominating this field.
| Materials and Methods |
|---|
|
|
|---|
, used in designing a phase III study, and use
as the "minimally worthwhile improvement" in the proportion of favorable outcomes for the futility study. Using this conceptual approach, we design a single-arm phase II study in which all patients receive the investigational drug. The proportion of favorable outcomes from this study is compared with p*. If the proportion is too low, we would consider it futile to proceed to a phase III trial.
In formal statistical terms, the hypotheses in a single-arm phase II futility study are as follows:
|
|
where ptx is the hypothesized proportion of treated subjects with a favorable outcome and p* and
are defined previously. If we reject the null hypothesis that a "minimally worthwhile" improvement exists, we conclude the benefit of the new treatment is less than what we would want, and it is futile to proceed to further testing in a phase III trial. If we fail to reject the null hypothesis that a minimally worthwhile improvement exists, we conclude there is insufficient evidence of futility, and the treatment deserves further testing in a phase III trial to determine its efficacy.
As highlighted in Table 1, the difference in the direction of the futility hypotheses from those of the traditional phase III randomized trials affects the interpretation of type I (
) and type II (ß) error probabilities. In a futility analysis, we want to minimize our risk of drawing a false-negative conclusion and miss a potentially effective agent, ie, we want to minimize
. However, a futility trial is still a "pilot" study, and therefore, we select an appropriate level of
that would not require too large a sample size. We are less concerned about drawing false-positive (ß error) conclusions that ineffective treatments may be effective because treatments that are not determined to be futile in phase II would be tested further in phase III trials with smaller error probabilities at the expense of larger sample sizes.
|
As in phase III trials, we can conduct interim analyses in single-arm phase II futility studies. The same adjustments for "multiple looks" at interim data apply with appropriate adjustment of the type I error probability. Several authors have developed multistage designs for phase II studies allowing early stopping for futility.58 OBrien and Flemings strategy for multiple testing, commonly used in phase III trials, is equally applicable in phase II studies.8 Their stopping boundary uses only a small portion of the overall type I error probability early, so we are less likely to declare futility prematurely. As a result, the final nominal
value used for testing at the end of phase II is close to the overall
level. Interim analyses are particularly relevant to phase II trials in stroke as sample sizes tend to be larger than for most phase II trials in cancer.
Methods
To illustrate the use of the single-arm phase II futility study design using interim analyses, we obtained data provided by primary investigators from a convenience sample of completed phase III randomized ischemic stroke treatment trials. For illustrative purposes, we selected these studies to include some negative phase III studies, a positive phase III study, and a negative phase III study with marginal statistical significance. We had no prior knowledge of the conclusions that might be drawn from our simulated single-arm futility studies of these trials.
For each study example, Table 2 summarizes the projected and actual sample sizes, main inclusion criteria, favorable outcomes, types I and II error probabilities, hypothesized effect size
, and the phase III trial results. Outcome data only in the active treatment arms of the phase III trials were evaluated in our hypothetical phase II study.
|
In our phase II futility studies, we chose a one-sided
of 0.10 because we wanted to keep required sample sizes small and were willing to tolerate a 10% chance of rejecting an effective treatment that could produce
, the magnitude expected in the original phase III trial on which we based the phase II design. We were willing to accept a greater chance of carrying an ineffective treatment forward to phase III testing and set ß to 0.15 at ptx=p*.
For each simulated single-arm phase II futility study, we used the EAST 3 (Cytel Software Corp.) software to estimate the required sample size and to generate the stopping criteria or the threshold required for taking the treatment to a phase III trial. We analyzed the data for futility after one third, two thirds, and all required patients had been enrolled using the OBrien and Fleming8 boundaries.
In each futility study, we listed the outcomes for treated patients in the corresponding phase III trial in chronologic order until we achieved our required phase II sample size. We calculated the cumulative proportion of favorable outcomes at each of the 3 analysis stages. We compared the proportion of favorable outcomes at each stage to the proportion of favorable outcomes corresponding to the threshold for each stage. If the observed proportion of favorable outcomes was less than or equal to the threshold, we rejected the null hypothesis and concluded further testing of the treatment was futile.
| Results |
|---|
|
|
|---|
for favorable outcome at 3 months using the modified Rankin Scale (mRS) score, the required sample size was approximately 600 subjects (Fosphenytoin Study presented by W. Pulsinelli at the 1999 American Academy of Neurology Meeting). After 4 years, 462 patients had been entered; the study was prematurely terminated as a result of lack of efficacy (ordinal logistic regression analysis, P=0.87). For the single-arm phase II futility study, we defined favorable outcome as a dichotomized mRS score of 0 or 1 and used a
considered appropriate for the phase III study (Poole RM, personal communication). Using this
, our phase II single-arm study would have been terminated for futility at its first interim analysis after evaluating 19 patients (3% of the original projected sample size for its phase III trial and 4% of the number enrolled in before the phase III trial was abandoned; see Figure, a).
|
The Phase III ATLANTIS Part B trial of alteplase in acute ischemic stroke began in December 1993. Its primary favorable outcome was a National Institutes of Health Stroke Scale (NIHSS) score of 0 or 1 (indicating no neurologic functional deficit) 90 days after stroke symptom onset. ATLANTIS investigators sought a
of 9% in favorable outcomes, requiring a sample size of 968 patients.9 The trial was terminated after enrolling 613 patients when the observed
was only 2% (P=0.65). Our phase II single-arm study would have been terminated for futility at its final analysis after evaluating 169 patients (18% of the required sample size for phase III and 28% of the number enrolled in phase III before the trial was abandoned; see Figure, b).
For the Phase III RANTTAS trial of tirilazad mesylate in 1993 to 1994, the favorable outcome was a combination of 2 functional measures (Barthel Index
60 and Glasgow Outcome Scale
2). To detect a
of 8% in their favorable outcome, the required sample size was 1130 patients. The trial was terminated for futility after randomizing 660 patients. Our phase II study would have been terminated for futility at its final analysis after evaluating 189 patients (17% of the sample size for phase III and 29% of the number enrolled in phase III before termination; see Figure, c).
Analyses of our single-arm phase II studies in the other 3 cases (one testing a heparinoid in TOAST11,12 and 2 testing alteplase in ECASS-II13 and NINDS tPA14 trials) indicated we could not declare these treatments futile to test these treatments in phase III (Figure, d through f). Two phase III trials (TOAST and ECASS-II) failed to demonstrate the hypothesized
in favorable outcomes, but the third trial demonstrated a worthwhile improvement across multiple rating scales.
The observed proportion of favorable outcomes from the futility studies were within 4% of the observed proportion of favorable outcome in the actively treated group in the respective phase III studies (pTMT in Table 2) with the exception of TOAST, in which the phase III study result was 6% higher than the futility study result.
| Discussion |
|---|
|
|
|---|
One could take issue with the use of historic data as a reference for phase II futility studies. Temporal changes in other aspects of patient management, changing criteria for response assessment, variations in data quality, and variations in protocol adherence can distort estimates of the reference proportion (p*). These same difficulties apply to hypothesizing the control group proportions in phase III trials as well, although in a phase III trial, one would still have a valid test of the hypothesis. In the phase II design, if p* is incorrect, one can erroneously conclude that an effective drug is futile or an ineffective drug is potentially effective. If the hypothesized p* is in doubt as a result of changes over time in the natural history or other concerns, it may be worthwhile to include a second small calibration group of placebo patients, not for making a direct comparison between groups, but to ascertain the validity of the hypothesized control group proportion.15 If the value observed in the calibration group is substantially different than the hypothesized p*, the phase II study may need to be redone using the calibration groups proportion as the new p*. We do not recommend or advocate the use of historic controls in a definitive phase III or phase IIB study, but historical controls data could provide valuable information for determining p* for earlier stages of drug development such as the phase II futility study discussed here.
The observed pCTL proportions (shown in the last column of Table 2) in the 6 phase III trials we studied have a wide range of values (0.32 to 0.737). The heterogeneity of these observed values might be the result of the differences in the eligibility criteria (especially different upper age limits and different durations from symptom onset to treatment) and the timing and specification of primary outcome measure of the studies (NIHSS score, mRS, Barthel Index, and Glasgow Outcome Scale).
The proposed futility evaluation approach does not directly address toxicity, and toxicity may determine the feasibility of a phase III trial. Usually, toxicity is more directly addressed in the design of phase I trials. Thall et al16 have proposed an approach combining phase I and II trials, and their approach could be considered in future studies of new agents in which phase I data are not available.
Finally, in planning a single-arm futility study, the choice of p*,
, or ptx, and
and ß has a large impact on the determination of futility. In general, p* and
or ptx may be estimated based on the values used to estimate sample size for a phase III trial (as we have done in our exercise). The errors (
and ß) should be chosen based on the investigators level of comfort with the risk of having false-negative or false-positive conclusions from the futility study. For example, an investigator may be willing to risk a higher type II error for a treatment that may prevent disability from hemorrhagic stroke in which no known cure exists to date, whereas a lower type II error may be required for an expensive, invasive, risky procedure for mild ischemic stroke in which an alternative treatment exists. In our exercise, we chose to use
that were clinically meaningful, the same measures as used in the phase III trials from which we received the data. When designing a phase II futility study, investigators should choose a value as close as possible to the
they would use in the future phase III trial to provide a reasonable test of the futility hypothesis.
The single-arm phase II futility design approach has been used recently in an NINDS-funded trial of intravenous and intraarterial tPA treatment for ischemic stroke. Results of that trial have been published.17 The trial used the data on the placebo arm of the NINDS tPA study to obtain p*. More recently, the phase II design has been used in studies of patients with Parkinson disease.18
In summary, we have adapted the single-arm phase II futility study design commonly used in oncology to the evaluation of therapeutic agents in stroke. We found that single-arm phase II futility studies could have helped investigators avoid 3 large, expensive phase III randomized trials of treatments for ischemic stroke. Based on the reduction in sample size, this phase II strategy could permit the testing of a wider array of promising treatments at a fraction of the cost of taking all treatments directly to phase III trials.
| Acknowledgments |
|---|
Received February 9, 2005; revision received May 2, 2005; accepted May 12, 2005.
| References |
|---|
|
|
|---|
2. Kidwell CS, Liebeskind DS, Starkman S, Saver JL. Trends in acute ischemic stroke trials through the 20th century. Stroke. 2001; 32: 13491359.
3. Green S, Dahlberg S. Planned versus attained design in phase II clinical trials. Stat Med. 1992; 11: 853862.[Medline] [Order article via Infotrieve]
4. Herson J. Statistical aspects in the design and analysis of Phase II clinical trials. In: Buyse ME, Staquet MJ, Sylvester RJ, eds. Cancer Clinical Trials: Methods and Practice. Oxford: Oxford University Press; 1984.
5. Herson J. Predictive probability early termination plans for phase II clinical trials. Biometrics. 1979; 35: 775783.[CrossRef][Medline] [Order article via Infotrieve]
6. Simon R. Optimal two-stage designs for phase II clinical trials. Control Clin Trials. 1989; 10: 110.[Medline] [Order article via Infotrieve]
7. Ensign LG, Gehan EA, Kamen DS, Thall PF. An optimal three-stage design for phase II clinical trials. Stat Med. 1994; 13: 17271736.[Medline] [Order article via Infotrieve]
8. OBrien PC, Fleming TR. A multiple testing procedure for clinical trials. Biometrics. 1979; 35: 549556.[CrossRef][Medline] [Order article via Infotrieve]
9. Clark WM, Wissman S, Albers GW, Jhamandas JH, Madden KP, Hamilton S. Recombinant tissue-type plasminogen activator (Alteplase) for ischemic stroke 3 to 5 hours after symptom onset. JAMA. 1999; 282: 20192026.
10. The RANTTAS Investigators. A randomized trial of tirilazad mesylate in patients with acute stroke (RANTTAS). Stroke. 1996; 27: 14531458.
11. The Publications Committee for the Trial of ORG 10172 in Acute Stroke Treatment (TOAST) Investigators. Low molecular weight heparinoid, ORG 10172 (Danaparoid), and outcome after acute ischemic stroke: a randomized controlled trial. JAMA. 1998; 279: 12651272.
12. Adams HP, Woolson RF, Clarke WR, Davis PH, Bendixen BH, Love BB, Wasek PA, Grimsman KJ. Design of the Trial of Org 10172 in acute stroke treatment (TOAST). Control Clinical Trials. 1997; 18: 358377.[CrossRef][Medline] [Order article via Infotrieve]
13. Hacke W, Kaste M, Fieschi C, von Kummer R, Davalos A, Meier D, Larrue V, Bluhmki E, Davis S, Donnan G, Schneider D, Diez-Tejedor E, Trouillas P; for the Second European-Australasian Acute Stroke Study Investigators. Randomized double-blind placebo-controlled trial of thrombolytic therapy with intravenous alteplase in acute ischemic stroke (ECASS II). Lancet. 1998; 352: 12451251.[CrossRef][Medline] [Order article via Infotrieve]
14. The National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group. Tissue plasminogen activator for acute ischemic stroke. N Engl J Med. 1995; 333: 15811587.
15. Herson J, Carter SK. Calibrated phase II clinical trials in oncology. Stat Med. 1986; 5: 441447.[Medline] [Order article via Infotrieve]
16. Thall PF, Cook JD. Dose-finding based on efficacytoxicity trade-offs. Biometrics. 2004; 60: 685693.
17. IMS Investigators. Interventional Management of Stroke (IMS) study. Stroke. 2004; 35: 904911.
18. Elm JJ, Goetz CG, Ravina B, Shannon K, Wooten GF, Tanner CM, Palesch YY, Huang P, Guimaraes P, Kamp C, Tilley BC, Kieburtz K, NET-PD Investigators. A responsive outcome for Parkinsons disease neuroprotection futility studies. Ann Neurol. 2005; 57: 197203.[CrossRef][Medline] [Order article via Infotrieve]
Related Article:
Stroke 2005 36: 2331-2332.
This article has been cited by other articles:
![]() |
J J. Lee and D. D Liu A predictive probability design for phase II cancer clinical trials Clinical Trials, April 1, 2008; 5(2): 93 - 106. [Abstract] [PDF] |
||||
![]() |
B. C. Tilley and W. R. Galpern Screening Potential Therapies: Lessons Learned From New Paradigms Used in Parkinson Disease Stroke, February 1, 2007; 38(2): 800 - 803. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. K. Cheung, P. H. Gordon, and B. Levin Selecting promising ALS therapies in clinical trials Neurology, November 28, 2006; 67(10): 1748 - 1751. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Czaplinski, L. J. Haverkamp, A. A. Yen, E. P. Simpson, E. C. Lai, and S. H. Appel The value of database controls in pilot or futility studies in ALS Neurology, November 28, 2006; 67(10): 1827 - 1832. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. C. Tilley, Y. Y. Palesch, K. Kieburtz, B. Ravina, P. Huang, J. J. Elm, K. Shannon, G. F. Wooten, C. M. Tanner, G. C. Goetz, et al. Optimizing the ongoing search for new treatments for Parkinson disease: Using futility designs Neurology, March 14, 2006; 66(5): 628 - 633. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Levy, P. Kaufmann, R. Buchsbaum, J. Montes, A. Barsdorf, R. Arbing, V. Battista, X. Zhou, H. Mitsumoto, B. Levin, et al. A two-stage design for a phase II clinical trial of coenzyme Q10 in ALS Neurology, March 14, 2006; 66(5): 660 - 663. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Levin The Utility of Futility Stroke, November 1, 2005; 36(11): 2331 - 2332. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2005 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |