Novel Methodologic Approaches to Phase I, II, and III Trials
Among the future directions of stroke research identified by the Stroke Progress Review Group was a call for improved trial design, conduct, and outcome assessment. This demand is underscored by recent trial findings. Of the >100 clinical trials in ischemic stroke published during the first decade of the 21st century, only 5 of 31 phase III trials demonstrated efficacy on the primary outcome.1 As the number of experimental therapies undergoing clinical investigation in stroke increases, there is a rising need for more efficient statistical designs to (1) determine the optimal dose, (2) identify interventions with therapeutic potential, and (3) evaluate clinically relevant treatment effects.
The correct selection of the optimal dose, via a thorough understanding of the dose–toxicity and dose–efficacy relationships, is critical for establishing the efficacy of a potential therapeutic agent.2 Traditional rule–based dose-escalation designs,3 such as the 3+3 design and its variations, have suboptimal statistical operating characteristics. The inefficiency in the 3+3 design is that it stems from the decision to escalate or de-escalate, a decision is based solely on event data from the current dose, without considering event information available from neighboring doses. An alternative design called the Continual Reassessment Method4 is an adaptive dose-finding algorithm, wherein escalation or de-escalation through the dose region is determined by continuous re-estimation of the dose–toxicity curve, and each cohort of subjects is treated at the dose currently believed to be the maximum tolerated dose. Although computationally more intensive than the 3+3 design, the Continual Reassessment Method and its variations use all available toxicity data in the estimation of the maximum tolerated dose, and thus are statistically more efficient. Ethical considerations also favor the Continual Reassessment Method, because the Continual Reassessment Method typically treats fewer patients at subtherapeutic doses.
Once the appropriate dose has been determined through a phase I trial, the next step is to evaluate efficacy potential in a phase II trial. The traditional concurrently controlled phase II design, intended to simultaneously estimate treatment effect and assess variability, is often criticized as an underpowered phase III comparative clinical trial. Proposed alternative designs, such as the selection design and the futility design, are instead intended to weed out ineffective or mediocre therapies.
The objective of the futility design, which has been successfully implemented in stroke,5,6 is to discard treatments that do not show promise. Statistical hypotheses are stated such that the goal is to demonstrate that the intervention is futile. Failure to conclude futility would be considered evidence in favor of a need for a definitive phase III clinical trial. In the single arm futility design,7 the experimental treatment arm is compared with a target response rate, π1, defined as the minimum proportion of successes in the treated group which would warrant further study. If the true success proportion π is less than π1, the intervention is declared futile. Comparison of the experimental arm with a target response rate, which is fixed and has no variability, results in a smaller sample size than would be required for direct comparison with a concurrent control arm. The target response rate can be determined based on a clinically relevant treatment effect and historical control data.
Concerns over the use of historical control data include temporal changes in outcome associated with improvements in patient management, as well as variations in eligibility criteria, protocol adherence, and primary outcome measures across clinical trials. If the historical control data are outdated and therefore no longer relevant, trial results may not reflect an accurate assessment of the futility of the experimental treatment. Inclusion of a small cohort of control subjects for calibration of the threshold value has been suggested.8 If this cohort is too small, its usefulness in terms of calibration is quite limited; if this cohort is too large, the trial begins to resemble an underpowered phase III.
The inclusion of a concurrent control arm in the futility design avoids the drawbacks of historical control data and allows for a direct comparison of treatment arms. The futility hypothesis is based on a direct comparison of the randomized treatment arms, such that futility would be declared if the absolute treatment effect is less than δ, a prespecified clinically meaningful futility margin, in favor of the experimental treatment. The sample size is increased over the single-arm design, but the concurrently controlled futility design is not an alternative to phase III efficacy testing. The objective remains to establish futility, rather than to demonstrate efficacy, of the active treatment. The National Institute of Neurologic Disorders and Stroke-funded phase II trial of deferoxamine mesylate in intracerebral hemorrhage (Hi-Def in ICH [intracerebral hemorrhage], clinicaltrials.gov NCT01662895) uses this design.
Selection designs9 can be used to prioritize candidate interventions, such that resources are allocated to the most promising of candidates. In a selection design, the objective is to select the best among K interventions (or K interventions and a control) for further testing. In the selection design, subjects would be randomized to one of the K interventions. The best intervention is defined as the intervention with the numerically, rather than statistically, highest response rate. The sample size is determined to ensure that, if the best treatment is superior by at least some margin D, then the best treatment will be selected with high probability.
The selection design can be combined with a futility or superiority test in a sequential 2-stage design, as described in the trial of Co-Q10 in ALS (amyotrophic lateral sclerosis).10 At the conclusion of stage 1, a treatment would be selected, and the statistical hypothesis tested at the end of stage 2. Inclusion of stage 1 subjects in the statistical hypothesis test has the advantage of using all available outcome data but introduces bias, which must be accounted for in the test statistic. If the stage 1 subjects are excluded from the stage 2 hypothesis test, the parameter estimate is unbiased, but the overall sample size is increased.
Adaptive designs promise increased flexibility of the trial to respond to accumulating information, a promise which has generated both enthusiasm and confusion. According to the Food and Drug Administration draft guidance,11 an adaptive design “includes… a prospectively planned opportunity for modification of one or more specified aspects of the study design and hypotheses based on analysis of data (usually interim data) from subjects in the study.” It is important to emphasize that the potential issues, and in what manner the trial will adapt to each, must be prespecified in the design stage to maintain trial validity.
Although certainly not novel, group sequential methods are adaptive according to this definition, in that they allow the trial to be stopped early, based on accumulating data, in the face of overwhelming efficacy or futility. Other valid mechanisms for adaptation in phase III trials include blinded sample size re-estimation12 and covariate adaptive randomization.13 Adaptations based on an unblinded assessment of interim data, including sample size re-estimation and response adaptive randomization, may be more enticing but may also be more controversial in the confirmatory setting. Early phase designs allow for more flexibility with regard to adaptation, and adaptive designs have gained greater acceptance in this exploratory setting. Whether exploratory or confirmatory, Bayesian or Frequentist, each trial must demonstrate that the statistical operating characteristics remain sound in the face of the chosen adaptation(s).
Even a well-designed trial of an effective therapy can fail based on an inappropriate primary outcome or primary analysis. The selection of valid, reliable, and efficient outcome measures and analytic strategies for stroke clinical trials is receiving much attention in the stroke literature.14 The modified Rankin Scale is an ordinal disability measure, which is commonly used as the primary efficacy outcome measure in stroke clinical trials. Traditional analysis focuses on dichotomization of the ordinal scale into a binary response, where success or failure is determined by comparing the result of each subject with a fixed threshold. Responder analysis (also referred to as the sliding dichotomy or the stratified dichotomy) tailors the threshold for success on the basis of the baseline prognosis of each subject.15 In the responder analysis setting, a mild stroke would have a more stringent definition of success than a severe stroke. This approach is thought to reflect more accurately the clinical perspective of outcome and is currently being implemented in the SHINE (Stroke Hyperglycemia Insulin Network Effort) trial (clinicaltrials.gov NCT01369069). Although these dichotomization approaches have the advantage of a relatively straightforward clinical interpretation, reduction of an ordinal measure to a binary outcome results in a loss of information.
Analytic approaches that maintain the ordinal nature of the scale, sometimes referred to as shift analysis,16 are statistically more powerful than dichotomized analyses; however, the clinical interpretation is also less intuitive. In addition, some of these ordinal approaches require distributional assumptions which may not be supported by the trial data. Ordinal regression, for example, requires the assumption of proportional odds, which means that the estimated odds ratio is constant across all possible cut points of the ordinal scale. Alternative methods must be carefully considered in the design stage; a clinical trial powered for ordinal regression may be underpowered for the logistic regression required if the proportional odds assumption is violated.
The modified Rankin Scale reflects a global assessment of function, but a patient may consider many other life aspects in describing outcome after a stroke. Patient-reported outcomes, including the NeuroQOL17 and PROMIS18 tools, may provide important information regarding quality of life, cognition, and social functioning. Consideration of these varying aspects of global outcome may provide a more complete understanding of the evolution of stroke and allow a more sensitive estimate of treatment effect.
These are just a sampling of the novel approaches being considered, and in some cases implemented, in current stroke trials. Continued development of innovative trial designs, outcome assessments, and analytic approaches is an essential component of stroke research.
Dr Yeatts is the SDMC PI (Statistics and Data Management Center Principal Investigator) for the phase II trial of deferoxamine in ICH (Hi-Def; U01 NS074425). She is the unblinded statistician for the IMS 3 (U01 NS077304) and ProTECT (U01 NS062778; Bio-ProTECT R01 NS071867) phase III clinical trials. The other authors have no conflicts to report.
- Received November 5, 2012.
- Accepted March 4, 2013.
- © 2013 American Heart Association, Inc.
- Hong KS,
- Lee SJ,
- Hao Q,
- Liebeskind DS,
- Saver JL
- Fisher M
- 5.↵The IMS Trial Investigators. Combined intravenous and intra-arterial recanalization for acute ischemic stroke: the Interventional Management of Stroke study. Stroke. 2004;35:904–912.
- 6.↵The IMS II Trial Investigators. The Interventional Management of Stroke (IMS) II Study. Stroke. 2007;38:2127–2135.
- Palesch YY,
- Tilley BC,
- Sackett DL,
- Johnston KC,
- Woolson R
- 11.↵US Food and Drug Administration. Draft Guidance for Industry: Adaptive Design Clinical Trials for Drugs and Biologics. 2010; http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM201790.pdf Accessed October 31, 2012.
- Chow SC
- Chow SC,
- Chang M
- Chow SC
- Chow SC,
- Chang M
- Saver JL
- Saver JL