Shift Analysis Versus Dichotomization of the Modified Rankin Scale Outcome Scores in the NINDS and ECASS-II Trials
Background and Purpose— The SAINT I trial that showed a significant benefit of the neuroprotectant NXY-059 used a novel outcome for acute ischemic stroke trials: a shift toward good functional outcome on the 7-category modified Rankin scale (mRS).
Methods— We used the Cochran-Mantel-Haenszel shift test to analyze the distribution of the 90-day mRS outcomes in the NINDS and ECASS-II databases and compared the results with a dichotomized mRS outcome by logistic regression (0 to 2 vs 3 to 6, or 0 to 1 vs 2 to 6). We also stratified each dataset based on National Institutes of Health Stroke Scale baseline severity.
Results— Each dataset showed a statistically significant shift in the 90-day mRS distributions favoring tissue plasminogen activator (odds ratio, 1.6 for NINDS, 1.3 for ECASS-II). For ECASS-II, larger shift effects appeared in National Institutes of Health Stroke Scale 0 to 6 and 16 to 40 strata. Similarly, the mRS 0 to 2 analysis but not mRS 0 to 1 found similar treatment effects in both datasets (odds ratio, 1.6 for NINDS, 1.5 for ECASS-II) and similar variations in the low and high strata in the ECASS-II trial. NINDS found no significant treatment effects across the strata. After removing the strata at the fringes, the shift test lost significance in both datasets.
Conclusions— Tissue plasminogen activator causes a beneficial shift toward wellness on the mRS in both the NINDS and ECASS-II trials, and ECASS-II would have been a positive trial according to the shift approach. However, the shift effect is not global for all treated patients and does not outperform the dichotomized 0 to 2 outcome. Patients with mild and severe deficits also shifted favorably on the mRS in the ECASS-II trial.
The SAINT I study of the neuroprotectant NXY-059 introduced a new primary outcome for acute ischemic stroke trials: a favorable shift toward better functional outcomes in the distribution of the modified Rankin scale (mRS) scores across the range of disabilities in the treated group versus placebo control group.1 In many prior acute stroke studies, the mRS was dichotomized as 0 to 1 versus 2 to 6, or 0 to 2 versus 3 to 6, to define treatment success versus failure. An analysis based on a small, uniform global shift on the mRS may outperform analysis based on a binary outcome when the therapeutic intervention produces beneficial transitions within these ranges.2 Another important rationale of the shift analysis is to detect clinically meaningful shifts or changes in the middle of the mRS spectrum, hence representing stroke outcome as a continuum rather than a binary outcome.
We have analyzed the distribution of the 90-day mRS outcomes in the NINDS3 and ECASS-II4 trials to determine the extent to which subjects shifted toward good functional outcomes, and we have compared the results with those from an analysis with binary mRS end points. Both of these studies administered 0.9 mg/kg IV tissue plasminogen activator (t-PA) either up to 3 (NINDS) or 6 (ECASS-II) hours after symptom onset.
The SAINT I trial stratified subjects at baseline by categories of the National Institutes of Health Stroke Scale (NIHSS) severity score.5 We selected 2 other studies, TOAST and ECASS-I, to obtain choices for categories of baseline severity based on different stratification of the NIHSS. These studies created strata by dividing the range of NIHSS scores into categories at selected cutpoints that distinguished mild, moderate, and severe levels of disability.
Reporting only the overall odds ratios (ORs) or overall differences between proportions (of positive results by treatment arm) may create the impression of uniform effects across the strata. However, the large sample sizes of the NINDS and ECASS-II studies facilitate exploratory analyses of the strata that may indicate where the treatment has its strongest and weakest effects. Hence, we have also explored stratum-specific results and report on the degree of uniformity in effect sizes.
Materials and Methods
Sources of Data
The NINDS t-PA Stroke Data Set data were obtained from the National Technical Information Service in Springfield, Va (www. ntis.gov). The ECASS-II data were obtained from the sponsor, Boehringer-Ingelheim. From each dataset, we used only 5 variables; treatment, 90-day mRS outcome, baseline NIHSS score, sex, and the time from stroke onset to treatment (hereafter abbreviated as onset to treatment time, OTT). Because the NINDS and ECASS-II trials focused on treatment within 6 hours, OTT was analyzed as a continuous variable.
Definitions of NIHSS Score Strata
As per SAINT I, we stratified each dataset on baseline severity. On the basis of the literature, we chose 2 different NIHSS stratification schemes: for TOAST,6 0 to 6, 7 to 15, and 16 to 40, and for ECASS-I,7 0 to 5, 6 to 10, 11 to 15, 16 to 20, and 21 to 40. For descriptive purposes, we refer hereafter to mild, moderate, and severe strokes. Patients with baseline NIHSS scores of 0 to 5 or 6 had “mild” strokes. Patients with baseline NIHSS scores of 7 to 22 had “moderate” strokes, and 23 to 40 indicated “severe” strokes.
We used the Cochran-Mantel-Haenszel (CMH) test to analyze the distribution of the 90-day 7-category mRS outcome scale in the NINDS and ECASS-II databases. A logistic-regression analysis with dichotomized analyses of the mRS was also performed, where scores of 0 to 2 or 0 to 1 were defined as positive (treatment success) and scores of 3 to 6 or 2 to 6, respectively, were defined as negative (treatment failure). The particular form of the CMH test that was used was the van Elteren test, which is a direct extension of the 2-sample Wilcoxon rank-sum test.8 If the results are displayed as a cross-tabulation with 2 rows (1 for each treatment) and 7 columns (1 for each category of the mRS score), then the van Elteren test uses the modified ridit value as the column score.
Covariates were added to these analyses in slightly different ways. For the CMH analyses, covariates entered the analysis as stratifying variables because this nonparametric method does not have the linear-regression structure that incorporates continuous covariates. In contrast, the logistic-regression analyses have this structure. Hence, the analyses focused on the effects on the variable, treatment, when adding only 1 categorical covariate, baseline severity of disease in terms of the NIHSS strata. With 1 covariate, the CMH and logistic analyses are directly comparable. With 2 or more covariates, they need not be, because the CMH test must include all interactions between the levels of each covariate, whereas logistic regression does not.
The CMH test produces a probability value but does not typically have an associated OR or effect size. We followed Lees et al1 (SAINT I) and computed the OR for treatment effects from PROC LOGISTIC in SAS but using the full range of the mRS as a 6-category polychotomous outcome (merging mRS scores of 5 with 6) instead of a binary outcome, such as merging scores of 0 to 2 for success and scores of 3 to 6 for failure. Hence, the comparison of the CMH test results with the logistic-regression results relies on another logistic-regression model to obtain an OR for the CMH test.
For randomized clinical trials such as NINDS and ECASS, the prospective risk ratio is more appropriate than the OR for describing treatment effects. In general, the risk ratio lies closer to 1.0 than the OR. The SAS procedures PROC FREQ and PROC LOGISTIC produce simple familiar ORs. For binary outcome, logistic-regression estimates of relative risk can be obtained from the SAS procedure PROC GENMOD.9 Hence, for simplicity and for consistency with Lees et al,1 we have chosen to compare effect sizes in terms of ORs. However, if these studies were new clinical trials, we would have reported instead the relative effect sizes.
The series of analyses had a detailed structure. We analyzed each entire dataset, analyzed the subset of patients with baseline NIHSS scores from 7 to 22, and finally separately analyzed each stratum within each of the 2 stratification schemes. Thus, we chose to report only a selected part of this large volume of results.
Analyses of the entire dataset were organized as follows. Each shift analysis was carried out with the 7-category mRS outcome scale in parallel with the 2 binary-outcome logistic-regression analyses that respectively used 0 to 2 for success and 0 to 1 for success. Each analysis was done without covariates, controlling for baseline severity, and controlling for baseline severity and the interaction between treatment and baseline severity. In addition, the shift and binary analyses were carried with the factors sex and OTT. To incorporate OTT into the nonparametric CMH test, the variable was converted into a 2-category variable (above and below the median OTT) and then added to the analysis as a stratifying variable. Much of the same set of analyses was done on the subset of patients with moderate strokes; ie, with baseline NIHSS scores that ranged from 7 to 22, except that we did not test the extra factors sex and OTT.
Finally, we analyzed each stratum within the TOAST and ECASS-I stratification schemes. Only the shift test with strata and the χ2 test were done. These stratum-specific analyses did not examine the effect of sex and OTT. The analyses were carried out in SAS version 9 with PROC FREQ to obtain the results for the van Elteren CMH test and PROC GENMOD and PROC LOGISIC to obtain results for logistic-regression analyses.
Selected baseline characteristics and outcomes are summarized in Table 1, 1 panel for the NINDS study and 1 for the ECASS-II study. These profiles do not exactly match the published tables in the seminal articles on these studies, but neither do they differ in any substantial way. Following the analysis by Ingall et al,10 there may be minor differences between the dataset used to produce the original study and the archived datasets produced after the study.
Overall, each dataset showed a statistically significant shift in the 90-day mRS distributions favoring t-PA (Table 2, the Figure). The treatment effects were similar to the dichotomized outcome, mRS 0 to 2 (Table 3). However, the treatment effects differed between the shift and the mRS 0 to 1 outcome (Table 3). When stratified for NIHSS baseline severity score, the shift test was no longer significant in the NINDS trial, whereas it remained significant in the ECASS-II trial (Table 3).
For ECASS-II, larger shift effects appeared in the NIHSS strata 0 to 6 (P=0.05) and 16 to 40 (P=0.004) in contrast to stratum 7 to 15 (P=0.6) (Table 4). Similar treatment effects were seen at the ends of the NIHSS strata, when they were divided into finer categories (Table 5). In addition, similar effects were seen in the binary mRS 0 to 2 analysis (Table 4).
In contrast, for NINDS, none of the shift effects were significant in the NIHSS strata 0 to 6, 7 to 15, or 16 to 40 (Table 4). When the strata were divided further according to the ECASS-I strata, there were marginal probability values in categories 11 to 15 (P=0.08, Table 5). Removing the low and high strata, the CMH-stratified shift test was no longer significant in either dataset (Table 6). Repeated analyses with fewer and broader strata led to similar results. Logistic regression confirmed that the effect was not uniform across the strata.
We also investigated whether the shift or dichotomized approaches were affected by sex or OTT. We ran all models with all cutpoints on these variables and found no significance after controlling for baseline NIHSS severity scores (data not shown).
The van Elteren form of the shift test, used to analyze the overall distribution of the mRS outcome scores in the t-PA–treated versus placebo-treated groups, was positive in the NINDS and ECASS-II trials (Table 2, the Figure). These results indicate that patients treated with t-PA up to 6 hours after stroke onset shifted in a favorable direction toward a better health state compared with the placebo-treated patients. ECASS-II would have been a positive trial if the shift analysis had been included as a primary outcome. The results also suggest that other acute stroke trials might be reanalyzed with this particular shift test. Saver2 applied a novel shift analysis to the NINDS trial with the use of expert opinions from stroke physicians to predict shifts and also demonstrated that t-PA causes beneficial effects across the range of mRS scores compared with placebo-treated patients. Similar to the SAINT I trial, we did not predict shifts but used standard statistical software to test the overall population shift.
Our analysis, however, indicated that the shift test did not dramatically outperform the dichotomized analysis in the NINDS and ECASS-II databases when the dichotomization was 0 to 2 versus 3 to 6 (Table 3). The results from the shift test closely resembled the binary analysis when success was defined as achieving an mRS score of 0 to 2. The results for the mRS 0 to 1 logistic regression qualitatively differed from the results of the shift analysis (Table 3). In the ECASS-II database, the mRS dichotomization at 0 to 2 versus 3 to 6 was statistically significant, but the prespecified dichotomization at 0 to 1 versus 2 to 6 was not statistically significant. The data overall do not support the selective use of a shift over a dichotomized 0 to 2 analysis in future thrombolysis trials; rather, both outcomes may be appropriate trial end points for recanalization therapies in acute ischemic stroke.
Because of the large stratum sizes of patients with different degrees of stroke severity, we performed the van Elteren test within individual NIHSS baseline strata. The strata were based on the TOAST6 study and divided patients with mild (0–6), moderate (7–15), and severe (16–40) deficits. This analysis showed differences in effects (as opposed to 1 global effect) among the different strata, ie, t-PA did not affect all patients equally (Table 4). In both NINDS and ECASS-II, t-PA treatment appeared to have no effect within certain strata, evidence against a uniform global shift (Table 4). Reforming the NIHSS strata with different cutpoints according to the ECASS-I study (Table 5) did not alter these findings. The binary outcome showed a heterogeneous pattern similar to the shift analysis, further indicating that treatment effects were not uniform across the study groups in either trial. Rather than causing small, uniform incremental changes toward wellness in the overall study population, t-PA likely leads to large treatment effects, small effects, no changes, and detrimental effects.
Surprisingly, the effect sizes of the shift test in the ECASS-II trial were more pronounced at the fringes of the NIHSS strata (Tables 4 and 5⇑). After excluding patients with mild and very severe strokes, the overall significance of the shift test was lost in the ECASS-II and NINDS databases (Tables 1 and 6⇑). We had anticipated that the middle range of the NIHSS would show stronger shift effects compared with the ends, but instead, patients with mild and severe strokes had the more favorable odds. These findings agree with prior reports that t-PA–treated patients with NIHSS scores >20 do improve better than placebo patients.11 Therefore, future acute stroke trials should not reflexively exclude patients outside the range of NIHSS baseline scores from 7 to 22, as advised by experts regarding recommendations on the design of future acute stroke trials.12
Random variation alone may have created these peculiarities, which warrant further exploration. The Breslow test, a formal test of the equality of the ORs among the strata, did not reject the null hypothesis of equal odds across the strata. These studies were not powered to detect such subgroup differences. However, the uneven effect sizes among the NIHSS strata persisted no matter how we divided the baseline NIHSS cutoffs. The high ORs in the extreme strata of the ECASS-II data were unexpected and suggest at the very least that future studies should examine whether t-PA leads to beneficial shifts on the mRS in patients with mild and severe strokes. The stratified analysis also yielded uneven results across the NIHSS strata in a comparison between the 2 trials. The differences between the stratum ORs within each of these large studies and between the studies support the need to do secondary, exploratory analyses. Exploratory analysis with and without stratification might indicate whether the strata help to explain the primary end point results. Differences in the strata imply focal effects within the study population that may warrant further investigation.
Others share our concern over the irregular pattern among the ORs. For example, the summary of the STAIR V meeting suggested checking that the odds conform to the proportional-odds assumption.13 The CMH test does not make this assumption.8 The assumption arises in a logistic-regression analysis with an outcome having 3 or more categories and imposes an ordinal structure on the categories.8 Future investigators intent on testing for a regular pattern should probably use logistic regression for this task.
Overall and within each stratum, we found little or no predictive power in the 2 factors, OTT and sex, after controlling for baseline severity. If a future study has a binary primary end point such as mRS 0 to 2 versus mRS 3 to 6, then one might include baseline severity and OTT as covariates. We advocate including baseline severity. The typical practice of restricting eligibility to NIHSS scores between a range such as 7 to 22 reflects this principle. Secondary end points might include the shift test with and without stratification by baseline severity. Stratifying also by OTT may add little to exploratory analysis, particularly if baseline severity captures “delay” effects, ie, an increase in severity during the hours before receiving treatment.
In conclusion, the mRS scores in t-PA–treated patients compared with placebo control support a therapeutic effect from thrombolysis and confirm that IV t-PA is beneficial in acute stroke when given within 6 hours of symptom onset. However, the uneven results across baseline severity strata undermine the efficiency of the shift analysis, and for this reason, it does not markedly outperform the binary-outcome tests. In other words, patients in these 2 trials were not making small improvements along the entire range of the mRS scale. It is possible that the shift approach might outperform the dichotomized outcome in studies where the therapeutic benefits and harm would be less dramatic, such as with neuroprotective agents. Neuroprotection is hypothesized to preserve some brain tissue in acute stroke, but thrombolysis has more potential for substantial benefits and possible cures (mRS 0, 1, or 2) because of recanalization of occluded vessels and rescue of larger amounts of ischemic brain. This hypothesis may explain why statistical significance was achieved on the shift analysis in the SAINT I trial, whereas there was no significance on most dichotomized outcomes. Therefore, a shift test may be more appropriate for neuroprotection trials, and it may be most useful where the effect of the intervention is uncertain (across the range of stroke severity), because a dichotomous approach may, by chance, miss an observed effect.
Source of Funding
This study was supported by a fellow to faculty transition award (to S.I.S.).
E.B. is an employee at Boehringer Ingelheim. W.H. has received honoraria for speaking at major symposia for Boehringer Ingelheim and for participating in DSMB and a steering committee for Boehringer Ingelheim. M.F. has served on the speaker’s bureau for Boehringer Ingelheim and has served as a consultant to Boehringer Ingelheim and Genentech. All other authors report no conflicts of interest.
- Received March 27, 2007.
- Accepted May 8, 2007.
Hacke W, Kaste M, Fieschi C, von Kummer R, Davalos A, Meier D, Larrue V, Bluhmki E, Davis S, Donnan G, Schneider D, Diez-Tejedor E, Trouillas P. Randomised double-blind placebo-controlled trial of thrombolytic therapy with intravenous alteplase in acute ischaemic stroke (ECASS II). Second European-Australasian Acute Stroke Study Investigators. Lancet. 1998; 352: 1245–1251.
Lees KR, Davalos A, Davis SM, Diener HC, Grotta J, Lyden P, Shuaib A, Ashwood T, Hardemark HG, Wasiewski W, Emeribe U, Zivin JA. Additional outcomes and subgroup analyses of NXY-059 for acute ischemic stroke in the SAINT I trial. Stroke. 2006; 37: 2970–2978.
Hacke W, Kaste M, Fieschi C, Toni D, Lesaffre E, von Kummer R, Boysen G, Bluhmki E, Hoxter G, Mahagne MH, et al. Intravenous thrombolysis with recombinant tissue plasminogen activator for acute hemispheric stroke. The European Cooperative Acute Stroke Study (ECASS). JAMA. 1995; 274: 1017–1025.
Stokes ME, Koch GG. Categorical Data Analysis Using the SAS System. Cary, NC: SAS Institute; 2000.
Spiegelman D, Hertzmark E. Easy SAS calculations for risk or prevalence ratios and differences. Am J Epidemiol. 2005; 162: 199–200.
Ingall TJ, O’Fallon WM, Asplund K, Goldfrank LR, Hertzberg VS, Louis TA, Christianson TJ. Findings from the reanalysis of the NINDS tissue plasminogen activator for acute ischemic stroke treatment trial. Stroke. 2004; 35: 2418–2424.
Devuyst G, Bogousslavsky J. Recent progress in drug treatment for acute stroke. J Neurol Neurosurg Psychiatry. 1999; 67: 420–425.
Recommendations for clinical trial evaluation of acute stroke therapies. Stroke. 2001; 32: 1598–1606.
Fisher M, Hanley DF, Howard G, Jauch EC, Warach S. Recommendations from the STAIR V meeting on acute stroke trials, technology and outcomes. Stroke. 2007; 38: 245–248.