| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(Stroke. 2005;36:597.)
© 2005 American Heart Association, Inc.
Original Contributions |
From the Division of Cardiovascular and Medical Sciences (F.B.Y., K.R.L.), Gardiner Institute, Western Infirmary, and the Robertson Centre for Biostatistics (C.J.W.), Boyd Orr Building, University of Glasgow, University Avenue, UK.
Correspondence to Fiona B. Young, Division of Cardiovascular and Medical Sciences, University of Glasgow, Gardiner Institute, Western Infirmary, Glasgow, G11 6NT, UK. E-mail Fby1w{at}clinmed.gla.ac.uk
| Abstract |
|---|
|
|
|---|
Methods We used data from the Glycine Antagonist in Neuroprotection (GAIN) International trial to assess statistical power of various primary end points for intervention trials. We selected prognosis-adjusted cut points based on Barthel Index (BI) or Rankin Scale (RS) using a prognostic model, or assigned a fixed end point within subgroups of patients defined by their Oxford category or National Institutes of Health Stroke Scale (NIHSS) score. We simulated a treatment effect and estimated statistical power with standard formulae.
Results Assignment of end points using a prognostic model for individual patients increased statistical power, when compared with assigning end points using only the Oxford classification. For the BI, power was increased from 60% to 88% (equivalent to a 49% reduction in sample size if power remains unchanged). With the RS end points, power was increased from 84% to 92% (or a 24% reduction in sample size). Versus a fixed end point for all patients, model-based methods increased power by 22 percentage points for BI
95 and 14 percentage points for RS
1 (effective sample size reductions 43% and 34%).
Conclusion Prognosis-adjusted end points can increase statistical power compared with fixed end points. Assessment is based on realistic goals for individual patients and yet trial results remain generalizable.
Key Words: clinical trials stroke, acute
| Introduction |
|---|
|
|
|---|
95) but may still consider being independent from essential care (BI
60) to be a good outcome. On the other hand, relatively mildly affected patients may only consider themselves to have recovered if they reach total independence. To account for the heterogeneity, the cut point on a recovery scale could be varied for individual patients. This would tailor end points to suit the prognosis or baseline characteristics of patients. Such variation of cut points in a trial would allow patients to be assessed on achievable goals, while ensuring that the trial remains generalizable. Allowing for patients to be assessed on different goals applies the principles of goal attainment scaling,2 which considers the outcome to be favorable if the patient achieves a prespecified objective.
A few trials have already used variable end points to take account of initial severity. The Stroke Treatment with Ancrod Trial (STAT)3 assessed whether patients achieved a BI score of at least 95 or had a score at 3 months at least equal to their prestroke value. The Abciximab in Emergent Stroke Treatment Trial (AbESTT; H.P. Adams, written communication, November 2002) used a dichotomized Rankin Scale (RS)4 secondary end point that split patients into 3 groups using the baseline National Institutes of Health Stroke Scale (NIHSS) score for each patient:5,6 RS
0 for baseline NIHSS of 0 to 7, RS
1 for baseline NIHSS of 8 to 14, and RS
2 for baseline NIHSS >14. Berge and Barer7 also suggested that it may be appropriate to use variable criteria for assessing patients with different stroke severity levels but did not define the criteria to be used.
We have previously described the use of prognosis-adjusted end points for the BI and RS,8 where we divided patients into subgroups using the Oxford classification.9 We chose appropriate cut points for each of the Oxford category by picking a value on the RS and BI that fell close to the median value of the scale within each Oxford category, basing our estimates on placebo data from the Glycine Antagonist in Neuroprotection (GAIN) International trial.10 To enhance the selection of end points, we now consider the use of factors that are more closely related to baseline severity and prognosis, because assignment of cut points using an approach that incorporates a prognostic model may be potentially more powerful.
We aimed to assess a range of prognosis-adjusted end points. We examined cut points for both the RS and BI that were assigned either using subgroups of patients or using prognostic models. These prognosis-adjusted end points were compared according to their statistical power estimated under likely trial circumstances and were also contrasted with power estimates for the traditional fixed end points of RS
1 and BI
95.
| Methods |
|---|
|
|
|---|
|
We developed all models using the GAIN International trial10 placebo data. There was no significant treatment effect observed in the GAIN trial, so the final prognostic models were tested on the GAIN investigative treatment group data to check for generalizability and accuracy.
Our prognostic models predicted the cumulative probability of a given patient being in each outcome category. Initially, we set the probability threshold at 50% (patients were assigned cut points that they had at least a 50% chance of achieving), but later we considered alternative thresholds to find the optimal cut point. For example, using a 3 category BI model with categories set to be 0 to 55, 60 to 90, and
95 and with a probability threshold of 50%, the highest outcome category would be predicted if a given patient had at least a probability of 50% of achieving a score of 95 or more. If the probability of achieving a score of 95 or more was <50% but the probability of obtaining a score of 60 or more was at least 50%, then the outcome category of the patient was predicted to be 60 to 90. If neither of these constraints was achieved, then the outcome was predicted as 0 to 55. We took the lower bound of these ranges for the cut point (ie, 60 for a patient predicted to lie between 60 and 90).
We also investigated alternative probability thresholds, ranging from 45% to 5%. Assessing patients on end points that are toward the favorable end of the outcome scale had been more powerful in a previous study,8 and hence if the probability threshold was reduced from 50%, then patients would be assessed on a cut point that would be more difficult to attain (movement of cut point toward the most favorable extreme of the scales).
Simulation of Treatment Effect and Estimation of Statistical Power
We simulated a treatment effect which assumed that all patients would derive equal benefit from the treatment. In practice this is unlikely to be a valid assumption; however, our previous work has shown that more complex treatment patterns tend to alter the magnitude but not the direction of the conclusions. Our method consisted of applying treatment effects to the GAIN International placebo data. The treatment effect in terms of an odds ratio was estimated using the data from a previous simulation study.8 Each simulated clinical trial in the study consisted of 2 treatment groups. Patients entered into the simulated placebo groups simply were randomly selected with replacement from the GAIN data. In contrast, patients entered into the simulated treatment groups were selected so as to have had a milder stroke on average than the placebo group (ie, they had lower baseline NIHSS scores). We assumed that this would confer a greater chance of favorable outcome at 90 days. We believe that this is the closest artificial treatment effect that we can generate to mimic a scenario in which a neuroprotectant or thrombolytic has an early effect to limit infarct extent.
The simulated clinical trials were used to estimate the difference between groups for each given treatment level in terms of an odds ratio. Treatment level could be defined as the difference in baseline NIHSS score that we had artificially generated (0, 1, 2, or 3 points). We then examined the 3-month outcomes of the patients and estimated differences between the active treatment and placebo groups for each end point. Bootstrap CIs12 for the odds ratios were constructed using 1000 replications. We calculated the statistical power of each end point and its 95% CI using standard formulae.13 We could then compare end points on the basis of this estimate of statistical power. We also calculated any potential reduction in sample size that we could introduce without reduction of power, assuming that we used the revised end point instead of a fixed end point that incorporated BI
95 or RS
1.
| Results |
|---|
|
|
|---|
Compared with using the Oxford classification to assign cut points to patients, only model BI2 increased the statistical power (Table 2). We found no further improvement in power through increasing the model complexity.
|
Rankin Scale End Points
From stepwise regression, we found baseline NIHSS, age, and worst leg score to be closely related to RS outcome. We called this model RS1, and a further model that included only baseline NIHSS and age was termed RS2.
On average, the RS patient specific end points had higher statistical power than the BI end points (Table 3). Subgrouping the patients by Oxford category delivered the lowest statistical power, whereas subgrouping by NIHSS produced the highest power. When we used a model instead of subgroups, we found that we achieved good power through the simple approach of RS2 that controlled only for age and baseline NIHSS.
|
Optimal Probability Threshold for Model-Based End Points
For the analyses above, we assigned target outcomes to patients such that there would be a 50% chance that they would attain the chosen recovery target. The statistical power of our model-based prognosis-adjusted end points was further improved when the probability thresholds were optimized (Table 4). For the BI model, greatest power was obtained with a probability threshold of 25%, whereas for the RS model the optimal probability threshold was 30%. This suggests that assessing patients on stricter criteria for recovery results in a more powerful end point. The power that we obtained by using the RS2 model at its optimal probability threshold exceeded that obtained when patients were subgrouped by NIHSS: 0.921 (95% CI: 0.913, 0.928) compared with 0.882 (95% CI: 0.871, 0.893).
|
Comparison of Prognosis-Adjusted End Points to Fixed End Points
We compared the prognosis-adjusted end points to what could be considered the best fixed end points (BI
95 or RS
1). With a treatment level of 2, the BI
95 and RS
1 fixed end points obtained statistical power of 0.657 (95% CI: 0.639, 0.676) and 0.784 (95% CI: 0.770, 0.800), respectively. These values are inferior to those obtained with the model-based or NIHSS subgroup-based prognosis-adjusted end points.
Comparison of End Points in Terms of a Relative Sample Size
Finally, we compared selected end points in terms of a relative sample size (Table 5). All of the model-based end points used the optimal probability thresholds discussed in the previous section. For the BI end points, the sample size could be reduced by 43% (95% CI: 41%, 45%) if the model BI2 was used instead of the BI
95 dichotomized end point. Using the BI2 model end point rather than the Oxford category end point would allow an effective sample size reduction of 50% (95% CI: 48%, 52%). For the RS end points, using the RS2 model to assign cut points could reduce the sample size by 34% (95% CI: 32%, 36%) compared with the RS
1 dichotomy. If the NIHSS subgroup end point was used instead of the RS
1 dichotomy, the sample size could be reduced by 24% (95% CI: 22%, 28%). Using the Oxford category to subgroup patients could result in reductions in sample size of 14% (95% CI: 11%, 17%).
|
| Discussion |
|---|
|
|
|---|
Our model-based method performed better than the approach in which prognosis was identified from NIHSS or Oxford subgroups. Although the NIHSS gives a reasonable measure of initial severity, it disregards factors such as age that also influence prognosis. The Oxford classification gives an even more crude grading of severity and as a guide to prognosis it is more suitable for epidemiological purposes than for prediction of outcome in individuals.
Although our model-based approach was slightly more powerful than the NIHSS subgroup method used in AbESTT, the absolute advantage was marginal and the simplicity of the AbESTT method may outweigh this advantage. Such an end point will consequently be easier to understand in the context of clinical trials.
We used the placebo patients from the GAIN International trial.10 The GAIN trial showed no effect of gavestinel. It would be informative also to use data from other sources to validate the results. Our treatment effect is artificially generated. We hope that a successful thrombolytic or neuroprotectant would have an almost immediate effect in limiting infarct extent and thus initial severity, but there can be no guarantee that this would hold true. Only 1 pattern of treatment effect was considered: a fixed effect where all patients were assumed to improve by the same magnitude. Even though there is little knowledge of the actual true effect of most stroke interventions, this is unlikely to be clinically valid; however, our previous work8 has shown that more complex treatment patterns tend to alter the magnitude but not the direction of the conclusions. Also, other factors can influence the statistical power of a trial: patient selection, sample size, and time to treatment are other possible factors that should be considered.14,15 Elsewhere, we have also recently proposed that age and baseline NIHSS should be used together when considering eligibility for acute stroke studies.16
We have investigated a range of prognosis-adjusted end points and found that adjusting the cut points for patients depending on prognosis offers analytical power advantages over use of a single fixed end point. Our optimal method of assigning cut points used a prognostic model that considered age and baseline NIHSS, though simply subgrouping according to baseline NIHSS was more straightforward and almost as effective. Prognosis-adjusted end points allow patients to be assessed on realistic achievable goals, while allowing the clinical trial results to be generalizable. Maximizing trial power makes development of treatment for stroke more attainable and less expensive.
| Acknowledgments |
|---|
Received May 19, 2004; revision received September 20, 2004; accepted October 22, 2004.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
H.-C. Diener, K. R. Lees, P. Lyden, J. Grotta, A. Davalos, S. M. Davis, A. Shuaib, T. Ashwood, W. Wasiewski, V. Alderfer, et al. NXY-059 for the Treatment of Acute Stroke: Pooled Analysis of the SAINT I and II Trials Stroke, June 1, 2008; 39(6): 1751 - 1758. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. R. Konig, A. Ziegler, E. Bluhmki, W. Hacke, P. M.W. Bath, R. L. Sacco, H. C. Diener, C. Weimar, and on behalf of the Virtual International Stroke Tria Predicting Long-Term Outcome After Acute Ischemic Stroke: A Simple Index Works in Patients From Controlled Clinical Trials Stroke, June 1, 2008; 39(6): 1821 - 1826. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Saver Novel End Point Analytic Techniques and Interpreting Shifts Across the Entire Range of Outcome Scales in Acute Stroke Trials Stroke, November 1, 2007; 38(11): 3055 - 3062. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bruno, C. Saha, and L. S. Williams Using Change in the National Institutes of Health Stroke Scale to Measure Treatment Effect in Acute Stroke Trials Stroke, March 1, 2006; 37(3): 920 - 921. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2005 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |