# Can We Improve the Statistical Analysis of Stroke Trials?

## Statistical Reanalysis of Functional Outcomes in Stroke Trials

## Jump to

## Abstract

** Background and Purpose—** Most large acute stroke trials have been neutral. Functional outcome is usually analyzed using a yes or no answer, eg, death or dependency versus independence. We assessed which statistical approaches are most efficient in analyzing outcomes from stroke trials.

** Methods—** Individual patient data from acute, rehabilitation and stroke unit trials studying the effects of interventions which alter functional outcome were assessed. Outcomes included modified Rankin Scale, Barthel Index, and “3 questions”. Data were analyzed using a variety of approaches which compare 2 treatment groups. The results for each statistical test for each trial were then compared.

** Results—** Data from 55 datasets were obtained (47 trials, 54 173 patients). The test results differed substantially so that approaches which use the ordered nature of functional outcome data (ordinal logistic regression,

*t*test, robust ranks test, bootstrapping the difference in mean rank) were more efficient statistically than those which collapse the data into 2 groups (χ

^{2}; ANOVA,

*P*<0.001). The findings were consistent across different types and sizes of trial and for the different measures of functional outcome.

** Conclusions—** When analyzing functional outcome from stroke trials, statistical tests which use the original ordered data are more efficient and more likely to yield reliable results. Suitable approaches included ordinal logistic regression,

*t*test, and robust ranks test.

The management of patients with acute or recent stroke has benefited significantly from the results of randomized controlled trials and meta-analyses of these. For example, functional outcome is improved with alteplase, aspirin, management in a Stroke Unit, and community occupational therapy.^{1–7} In contrast, some studies were overtly negative finding that treatment worsened outcome, eg, DCLHb, enlimomab, selfotel, or tirilazad.^{8–11} However, the majority of acute stroke trials were neutral in spite of positive preclinical findings. The failure of these latter studies can be attributed to multiple causes, including the relevance of laboratory findings to clinical stroke,^{12} inadequate sample size,^{13} choice of primary outcome, and its statistical analysis.

Measures of functional outcome such as the modified Rankin Scale (mRS),^{14} Barthel Index (BI)^{15} and “3-questions”^{16} are ordinal in nature: that is, they consist of ≥3 categories which have a natural ordering, eg, the mRS has 7 categories ranging from no symptoms to dead. It might then be expected that statistical analysis would preserve and use the data in this ordinal form. However, most published trials have used a “yes/no” (dichotomized) analysis of functional outcome, eg, combining categories within the mRS into 2 groups, such as “dead or dependent” (eg, mRS 3 to 6) and “independent” (mRS 0 to 2), and then comparing these between the treatment groups. Unfortunately, there is little agreement where mRS data should be divided (ie, 0,1 versus 2 to 6,^{1} 0 to 2 versus 3 to 6,^{17} or 0 to 3 versus 4 to 6,^{18}) and whether this matters.^{19} Furthermore, collapsing data in this way generally lowers statistical power and therefore reduces the chance of finding a significant treatment effect because information from many subjects are ignored. For example, patients responding to treatment and achieving a mRS of 3 rather than 4 or 0 rather than 1 are not detected in a analysis comparing mRS 0 to 2 with 3 to 6.

Inadequacies in the statistical analysis of trials in acute stroke are apparent in 2 examples. First, the ECASS II trial of alteplase showed no treatment effect for its primary outcome (when comparing mRS 0,1 with mRS 2 to 6) but was positive when reanalyzed using the data collapsed in a different place (mRS 0 to 2 versus 3 to 6)^{20} or when analyzed using a “bootstrapping” technique (Figure 1).^{21} Second, 5 trials of tirilazad individually showed no treatment effect when analyzed using dichotomous outcomes^{22–24} although a meta-analysis found that the intervention was associated with a worse outcome^{25}; post hoc analysis then suggested that one of these trials was negative^{24} (not neutral) when analyzed using a method which preserved the original ordered data (P.B., unpublished data, 2004).

We aimed to identify which statistical methods might optimize the analysis of data from functional outcome scales in stroke trials.

## Methods

### Identification of Trials

We sought individual patient data from randomized controlled trials assessing functional outcome after stroke for interventions which were either positive or negative according to the trial publication, or were included in a meta-analysis showing benefit or harm; neutral trials in a neutral meta-analysis were excluded. Published studies (full article or abstract) fulfilling these criteria were identified from electronic searches of the Cochrane Library (to end of 2005). In each case, we invited the chief investigator to join the collaboration and share their data. In some cases where individual data could not be obtained it was possible to extract it from the original publication.

### Trial Data

Demographic (age, gender), trial (setting, intervention, length of follow-up, result), patient severity, and functional outcome (BI, mRS, “3 question” scale [3Q, a derivative of mRS], or another measure) data were collected for each trial. In factorial trials or those having >2 treatment groups, data were analyzed for each comparison of active therapy versus control. Where outcome data were scored at several time points (eg, 1, 3 and 6 months) the time point used for the primary outcome was included.

### Statistical Tests

We compared different statistical tests for assessing treatment effect. Some of these required the data to be collapsed into groups (such as the χ^{2} test), whereas others used the original ordinal data (such as Wilcoxon test and *t* test). Statistical tests which dichotomized (“yes/no”) data were assessed multiple times collapsing the data in different places, eg, mRS 0,1 versus 2 to 6, 0 to 2 versus 3 to 6 and 0 to 5 versus 6. A description of the statistical tests used is given in the supplemental Appendix I, available online at http://stroke.ahajournals.org.

### Comparison of Statistical Tests

Each data set was analyzed using each statistical test. These results were then ordered within each trial and given a rank, with the lowest rank given to the test which produced the most significant result, ie, the largest *z* score, within that trial. A 2-way analysis of variance test was then used to see on average which statistical test had produced the lowest ranks. We were then able to order the statistical tests in terms of their efficiency in identifying treatment effects. We also assessed how many statistically significant (at 5%) results each test found.

To assess the validity and reliability of the results, a number of supplementary analyses were carried out. First, the comparison of statistical tests was repeated within subgroups of trials sharing similar characteristics; second, the statistical assumptions of the tests were assessed; and last, the sensitivity of the tests was explored to make sure treatment effects were only detected when they truly existed (the type 1 error rate). Technical details of these supplementary analyses can be found in the supplemental Appendix II, available online at http://stroke.ahajournals.org.

Analyses were carried out in SAS (version 8.2) and Stata (version 7) and significance was taken at *P*<0.05.

## Results

### Trials Characteristics

A total of 55 comparisons of active versus control treatment (54 173 patients) were included, these comprising individual patient data from 38 trials and summary data extracted from the publications of a further 9 studies; 6 trials had 2 active treatment groups, and 1 had 3 active groups so a further 8 comparisons were available (Figure 2). The data related to 34 acute stroke trials, 7 trials of rehabilitation (1164 patients) and 6 trials of stroke units (1399 patients). BI was used to measure functional outcome in 22 trials, 18 used the mRS, 3 used the 3Q scale, 1 used the Rivermead scale, 2 related trials used the Nottingham ADL scale, and 1 trial used its own ordinal measure.^{26} Included trials studied the following interventions: abciximab (AbESTT); alteplase (ATLANTIS A & B, ECASS II, NINDS); aspirin (CAST, IST); atenolol (BEST); citicoline; DCLHb; ebselen; edaravone; enlimomab (EAST); factor VIIa; feeding (FOOD 3); nadroparin (FISS, FISS-TRIS); nimodipine (INWEST); occupational therapy (Corr, Gilbertson, Logan, TOTAL, Walker); physiotherapy (Young); pro-urokinase (PROACT II); selfotel (ASSIST); streptokinase (ASK, MAST-E, MAST-I); stroke unit (Dover, Helsinki, Kuopio, Nottingham, Orpington, Newcastle); and tirilazad (RANTTAS I & II, STIPAS, TESS I & II). Data relating to 16 trials or interventions which fulfilled the inclusion criteria were not made available.

The method of analyzing functional outcome used in the original trial publication varied considerably, see supplemental Appendix III, available online at http://stroke.ahajournals.org. Twenty-three (48.9%) trials assessed the treatment effect using a method which required the data to be collapsed into groups, eg, χ^{2} test; 17 (36.2%) used a test based on comparing medians and 4 (8.5%) used a test which compared means; the remaining trials were unpublished so the method of analysis is not known.

### Comparison of Statistical Tests

The statistical tests assessed differed significantly in the results they gave for each trial (2-way ANOVA, *P*<0.0001). The ordering of the tests showed that those which analyze the original ordinal data generally perform better than those which collapse the data into ≥2 groups. The most efficient tests included ordinal logistic regression, *t* test, robust rank test and bootstrapping the difference in mean rank (Table). The subgroup analysis showed the same ordering of tests irrespective of type of intervention (acute, rehabilitation, stroke unit), trial size, time between randomization and onset, patient age, baseline severity, outcome measure, length of follow-up, and trial result (supplemental Appendix IV, available online at http://stroke.ahajournals.org).

When assessed by how many trials were statistically significant, those tests which did not collapse the data into groups again out-performed the other approaches; for example, ordinal logistic regression (using raw data) gave a statistically significant result in 25.9% of trials, whereas the 2×2 χ^{2} test comparing death or poor outcome to an excellent outcome only gave a significant result in 9.3% of the trials (Figure 3).

### Test Assumptions and Sensitivity

The statistical assumptions of the *t* test were not met for the majority of trials, and the assumptions of the ordinal logistic regression analysis failed for 8 of the 55 data sets; in contrast, the assumptions for the other tests were maintained. The sensitivity analysis showed that the top performing statistical tests were not overly sensitive, and statistically significant treatment effects were only found where they truly existed; see supplemental Appendix V, available online at http://stroke.ahajournals.org, for detailed results.

## Discussion

These results show that statistical approaches which analyze the original ordinal data for functional outcome are more efficient than those which work on preprocessed data which has been collapsed into ≥2 groups. Interestingly, this point was originally demonstrated mathematically by Shannon in 1948.^{27} In particular, ordinal logistic regression, *t* test, robust ranks test, and bootstrapping (the difference in mean rank) performed well and appear to be useful irrespective of the type of stroke trial, patient or intervention. Although individual tests based on dichotomized data using χ^{2} analysis (eg, “dead/dependent” versus “independent”) were effective for some data sets, they performed poorly in many and therefore cannot be recommended as general solutions for analyzing stroke trials. From an historical perspective, it is quite possible that trials which collapsed mRS or BI in 2 groups may have used a suboptimal analysis, and this may have contributed to false neutral findings in some cases in the past. For example, MAST-E^{28} and STIPAS^{24} were neutral as reported using dichotomous analysis but negative when assessed with ordinal approaches.

Several comments can be made about this study. First, it aimed to include data from all stroke trials assessing a beneficial or harmful intervention. Unfortunately, data were not made available for all identified trials; where possible, we created individual data from publications which provided patient numbers by outcome score. Data were missing for a variety of trial types (acute/rehabilitation/stroke unit) and sizes, and functional outcome measure (mRS/BI), so it is unlikely that a systematic bias was introduced into the findings; however, the precision of the results may have been attenuated by the missing trials. Second, we did not exhaustively search for all possible statistical tests relevant to the problem of analyzing ordered categorical data; instead, we focused on those approaches which are available in standard statistical textbooks and computer packages. Additionally, we could not include some tests used in recent trials, eg, patient specific outcomes^{29} and Cochran Mantel-Haenszel test,^{30} because these require access to individual data for both baseline and outcome variables, and these data were not available uniformly. Third, some of the statistical assumptions underlying the more efficient tests were not met in all trials; for example, the *t* test assumes data are normally distributed, whereas ordinal logistic regression assumes that any treatment effect is similar across outcome levels (“proportionality of odds”, ie, the odds of moving a treated patient from mRS 2 to 1 is similar to that for moving them from 5 to 4). Nevertheless, the robustness of these tests to deviations from their underlying assumptions means that they remain relevant for analyzing functional outcome data from stroke trials.

If alternative approaches to analyzing functional outcome data are to be used in the future, it is pertinent to ask how sample size should be calculated at the trial design stage. Historically, most calculations assumed that functional outcome would be dichotomized and analyzed using a χ^{2} test approach.^{13} Although future trials could continue to calculate sample size in the same way (and then gain extra power by analyzing their data using an ordinal approach), specific sample size calculations are available when data are to be analyzed using ordinal logistic regression^{31} or the *t* test. Ideally, the extra power gained by using an ordinal statistical approach should not be used to reduce sample size; stroke trials have been too small in the past, as shown in a recent meta-analysis,^{13} and this may also have contributed to the failure of some of them.

A further issue with using a statistical test which analyses ordered categorical data are how to report the results to patients, carers, clinicians, and health-policy makers. The results of dichotomous tests may be summarized easily as the proportion of patients who benefit (or suffer) with a treatment, ie, alteplase reduced absolute death or dependency (mRS >1) by 13% in the NINDS part 2 trial.^{1} In contrast, ordinal tests will need to be presented as the average absolute improvement in outcome, eg, alteplase improved the mRS by 1 (of 7) point and BI by 22.5 (of 100) points. Alternatively, the combined odds ratio and its confidence intervals would be reported if ordinal logistic regression was used. In this respect, health consumers will need to decide what differences in mRS and BI are worthwhile, both clinically and in terms of health economics. In reality, it is reasonable to present the effect on functional outcome using both absolute percentage change and mean or median change in functional outcome score, and show this data graphically (as in Figure 1).

In summary, we suggest that ongoing and future trials should consider using statistical approaches which use the original ordered categorical data in the primary analysis of functional outcome measures. Such ordinal tests include ordinal logistic regression, and the robust ranks test; the *t* test may also be used although its assumptions were not meant in the majority of trials.

## Acknowledgments

Secretariat and writing committee: Philip M.W. Bath (chief investigator, Nottingham, UK); Laura J. Gray (lead statistician, Nottingham, UK); Timothy Collier (statistical advisor, London, UK); Stuart Pocock (statistical advisor, London, UK). Statistical advisor: James Carpenter (Medical Statistics Unit, London School of Hygiene and Tropical Medicine, London, UK).

The following contributors provided individual patient data from their trial, and commented on the draft manuscript: Abciximab: H Adams (USA), E. Barnathan (USA); W Hacke (Germany); ASK: G. Donnan (Australia); ASSIST 07 & 10: S. Davis (Australia); ATLANTIS A & B: G. Albers, S. Hamilton (USA); BEST Pilot & Main: D. Barer (UK); Citicoline 1, 7, 10, 18: A. Davalos (Spain); Corr: S. Corr (UK); Dover Stroke Unit: P. Langhorne (UK); DCLHb: P. Koudstaal, R. Saxena (Netherlands); Ebselen: T. Yamaguchi (Japan); ECASS II: W. Hacke, E. Bluhmki (Germany); Factor VII: S. Mayer (USA), K. Begtrup (Denmark); FISS: R. Kay (Hong Kong); FOOD 3: M. Dennis (UK); Gilbertson: L. Gilbertson (UK); INWEST: N.-G. Wahlgren, N. Ahmed (Sweden); IST: P. Sandercock (UK); Kuopio Stroke Unit: J. Sivenius (Finland); Logan: P. Logan (UK); MAST-I: L. Candelise (Italy), J. Wardlaw (UK); Newcastle Stroke Unit: H. Rodgers (UK); NINDS: J. Marler (USA); Parker: C. Parker (UK); Nottingham Stroke Unit: N. Lincoln, P. Berman (UK); RANNTAS I & II, STIPAS, TESS I & II: P. Bath (UK); Walker 1 & 2: M. Walker (UK); Young: J. Young, A. Forster (UK).

We thank the patients who took part in these studies, and the trialists who shared their data. The study was conceived, initiated, managed, analyzed, and interpreted independently of any pharmaceutical company. Each collaborator listed above commented on the draft manuscript.

**Sources of Funding**

L.J.G. is funded, in part, by BUPA Foundation and The Stroke Association (UK). P.M.W.B. is Stroke Association Professor of Stroke Medicine. The funding sources had no involvement in this project.

**Disclosures**

None.

- Received September 27, 2006.
- Revision received November 2, 2006.
- Accepted November 21, 2006.

## References

- ↵
- ↵
- ↵
Walker M, Gladman J, Lincoln N, Siemonsma P, Whiteley T. Occupational therapy for stroke patients not admitted to hospital: a randomised controlled trial. The Lancet
*.*1999; 354: 278–280. - ↵
Chen ZM, Sandercock P, Pan HC, Counsell C, Collins R, Liu LS, Xie JX, Warlow C, Peto R; on behalf of the CAST and IST Collaborative Groups. Indications for early aspirin use in acute ischemic stroke: a combined analysis of 40 000 randomized patients from the chinese acute stroke trial and the international stroke trial. Stroke
*.*2000; 31: 1240–1249. - ↵
Stroke Unit Trialists’ Collaboration. Organised inpatient (stroke unit) care for stroke. The Cochrane Library. Oxford: Update Software; 2002.
- ↵
The ATLANTIS ECASS and NINDS rt-PA Study Group Investigators. Association of outcome with early stroke treatment: pooled analysis of ATLANTIS, ECASS, and NINDS rt–PA Stroke trials. The Lancet
*.*2004; 363: 768–813. - ↵
Walker MF, Leonardi-Bee J, Bath P, Langhorne P, Corr S, Drummond A, Gilbertson L, Gladman JRF, Jongbloed L, Parker C. An individual patient data meta-analysis of randomised controlled trials of community occupational therapy for stroke patients. Stroke
*.*2004; 35: 2226–2232. - ↵
Saxena R, Wijnhoud AD, Carton H, Hacke W, Kaste M, Przybelski RJ, Stern KN, Koudstaal PJ. Controlled safety study of a hemoglobin-based oxygen carrier, DCLHB, in acute ischemic stroke. Stroke
*.*1999; 30: 993–996. - ↵
Davis SM, Lees KR, Albers GW, Diener HC, Markabi S, Karlsson G, Norris J; for the ASSIST Investigators. Selfotel in acute ischemic stroke: possible neurotoxic effects of an NMDA antagonist. Stroke
*.*2000; 31: 347–354. - ↵
Enlimomab acute stroke trial investigators. Use of anti-ICAM-1 therapy in ischemic stroke: results of the enlimomab acute stroke trial. Neurology
*.*2001; 57: 1428–1434. - ↵
Bath PMW, Blecic S, Bogousslavsky J, Boysen G, Davis S, Diez-Tejedor E, Ferro JM, Gommans J, Hacke W, Indredavik B, Norrving B, Orgogozo JM, Ringelstein EB, Sacchetti ML, Idddenden R, Bath FJ, Musch BC, Brosse DM, Naberhuis-Stehouwer SA. Tirilazad mesylate in acute ischemic stroke: a systematic review. Stroke
*.*2000; 31: 2257–2265. - ↵
- ↵
Weaver CS, Leonardi-Bee J, Bath-Hexall FJ, Bath PMW. Sample size calculations in acute stroke trials: a systematic review of their reporting, characteristics, and relationship with outcome. Stroke
*.*2004; 35: 1216–1224. - ↵
- ↵
Mahoney FI, Barthel DW. Functional evaluation: The Barthel Index. Maryland State Medical Journal. 1965: 61–65.
- ↵
- ↵
- ↵
Horn J, deHaan R, Vermeulen M, Limburg M. Very early nimodipine use in stroke (VENUS): a randomised, double-blind, placebo-controlled trial. Stroke
*.*2001; 32: 461–465. - ↵
Wardlaw J, Sandercock P, Warlow C, Lindley RI. Trials of thrombolysis in acute ischemic stroke: does the choice of primary outcome measure really matter? Stroke
*.*2000; 31: 1133–1135. - ↵
Hacke W, Markku K, Fieschi C, von Kummer R, Davalos A, Meier D, Larrue V, Bluhmki E, Davis S, Donnan G, Schneider D, Diez-Tejedor E, Trouillas P. Randomised double-blind placebo-controlled trial of thrombolytic therapy with intravenous alteplase in acute ischaemic stroke (ECASS II). Lancet
*.*1998; 352: 1245–1251. - ↵
- ↵
Haley EC. High-dose tirilazad for acute stroke (RANTTAS II). Stroke
*.*1998; 29: 1256–1257. - ↵
The RANTTAS Investigators. A randomized trial of tirilazad mesylate in patients with acute stroke (RANTTAS). Stroke
*.*1996; 27: 1453–1458. - ↵
The STIPAS Investigators. Safety study of tirilazad mesylate in patients with acute ischemic stroke (STIPAS). Stroke
*.*1994; 25: 418–423. - ↵
The Tirilazad International Steering Committee. Tirilazad for acute ischaemic stroke (cochrane review). Oxford: Update Software; 2002.
- ↵
Barer DH, Cruickshank JM, Ebrahim SB, Mitchell JR. Low dose beta blockade in acute stroke (“Best” Trial): an evaluation. BMJ
*.*1988; 296: 737–741. - ↵
- ↵
- ↵
- ↵
- ↵

# OAST Supplemental Appendix I: Statistical Tests Compared (see Table I)

### Included Tests

Univariate statistical approaches for analyzing dichotomous and ordinal data comprised tests based on χ^{2}, ordinal, and bootstrap approaches.^{1–3} Sixteen statistical approaches were assessed: (1) χ^{2} 2×2 test—death or poor outcome versus good outcome (BI <60 versus 60 to 100, mRS 3 to 6 versus 0 to 2, 3Q 1/2 versus 3/4); (2) χ^{2} 2×2 test—death or poor outcome versus excellent outcome (BI <95 versus 95/100, mRS 2 to 6 versus 0/1, 3Q 1 to 3 versus 4); (3) χ^{2} 2×2 test—death versus alive; (4) χ^{2} 2×3 test (unordered data)—death versus poor versus good outcome; (5) χ^{2} 2×4 test (unordered data)—death versus poor outcome versus good outcome versus excellent outcome; (6) Cochran-Armitage trend test (ordered data with 3 levels)—death versus poor versus good outcome); (7) Cochran-Armitage trend test (ordered data with 4 levels)—death versus poor versus good versus excellent outcome); (8) ordinal logistic regression (raw data); (9) ordinal logistic regression (3 levels) (10) ordinal logistic regression (4 levels); (11) median test; (12) Wilcoxon/Mann-Whitney *U* test (adjusted for ties); (13) robust ranks test (RRT^{4}); (14) Kolmogorov-Smirnov test; (15) *t* test (unpooled variances); (16) bootstrap of difference in mean rank (with 3×3000 cycles^{5,6}). χ^{2} tests were performed without continuity correction because most trials enrolled >100 patients.

### Excluded Tests

Three nonparametric tests were excluded: Wald-Wolfowitz runs test; Siegel-Tukey test; and the Cramer-von Mises 2-sample test, on methodological grounds.^{2}

### Statistical Detail for Nonstandard Tests

**
***RRT*

*RRT*

The RRT is an alternative to the Wilcoxon test; it tests whether the median of one group is equal to another, but unlike the Wilcoxon test it does not assume that the distributions of the 2 groups are equal, ie, it makes no assumptions about the variance of the 2 groups.^{3,4}

**
***Bootstrapping*

*Bootstrapping*

Bootstrapping is a computationally intensive method which involves resampling data from a given sample. The main advantage of bootstrapping over more traditional methods is that it does not make assumptions about the distribution of the data. In this report we bootstrap the difference in mean rank; the procedure for doing this is outlined below:^{5}

Take a dataset, which contains

*N*observations.Draw a sample with replacement of size

*N*(using replacement means that some of the original observations may appear in the new sample more than once and some not at all).Estimate the parameter of interest (here the difference in mean rank) and store the result.

Repeat 2 and 3 many times; here we use 3 sets of 3000 as used in the ECASS II trial.

^{6}Compare the distribution of the stored results to the actual point estimate from the original dataset.

**
***Ordinal Logistic Regression*

*Ordinal Logistic Regression*

Ordinal logistic regression can be used when the dependent variable is ordered categorical. It is similar to logistic regression, but it simultaneously estimates multiple end points instead of just one. The number of end points it estimates is equivalent to the number of ordered categories minus one. For example, if the mRS was the dependent variable of interest it would compare the following *j* categories:

0 versus 1,2,3,4,5,6

0,1 versus 2,3,4,5,6

0,1,2 versus 3,4,5,6

0,1,2,3 versus 4,5,6

0,1,2,3,4 versus 5,6

0,1,2,3,4,5 versus 6

Ordinal logistic regression provides one overall estimate for each covariate in the model and not one for each cut point. This assumes that the overall odds ratio is constant no matter which cut is taken. So, for example the odds ratio for the treatment effect would be interpreted as the odds of being in category *j* or above for all choices of *j* comparing treatment 1 to treatment 0.^{7}

## References

- 1.↵
Altman DG. Practical Statistics for Medical Research. London: Chapman & Hall; 1991.
- 1.↵
Conover WJ. Practical Nonparametric Statistics. 2nd ed. New York: John Wiley & Sons; 1971.
- 1.↵
Siegel S, Castellan NJ. Nonparametric Statistics for the Behavioral Sciences. 2 ed. Singapore: McGraw-Hill; 1988: 1–399.
- 1.↵
- 1.↵
Efron B, Tibshirani RJ. An introduction to the bootstrap. In: Cox, DR, et al, eds. Monographs on Statistics and Applied Probability. Vol. 57. New York: Chapman & Hall; 1993: 1–436.
- 1.↵
- 1.↵

# OAST Supplemental Appendix II: Supplementary Analyses

### Subgroup Analysis

Subgroup analyses were performed by assessing the efficiency of the different tests for differing trial characteristics: type of intervention (acute drug treatment, rehabilitation, stroke unit); trial size (<500, ≥500 participants); time between randomization and stroke onset (≤6, >6 hours); patient age (median ≤70, >70 years); baseline severity (control group death rate adjusted for length of follow-up, ≤median (0.05), >median); outcome measure (BI, mRS, 3Q); length of follow-up (≤3 months, >3 months); and trial result (positive, negative).

### Statistical Assumptions

The principal statistical assumptions underlying the tests which performed well were assessed to ensure that their use was appropriate for stroke trial data. Assumptions included: ordinal logistic regression—proportionality of odds across response categories (ie, the magnitude of improvement or hazard, with a treatment, would be similar irrespective of baseline severity, age etc); *t* test—normal distribution of outcome scores (the use of the unpooled *t* test means that homogeneity of variances between the treatment groups was not a necessary assumption); RRT—independence of treatment groups.^{1,2}

### Type 1 Error Rate

It is conceivable that an overly sensitive statistical test might find significance in a trial when no real difference existed, a type 1 error. We assessed the type 1 error rate for the 3 most efficient statistical tests, using data from 3 representative trials including 1 of the 3 measures of functional outcome (BI: RANTTAS,^{3} mRS: NINDS,^{4} 3Q: IST^{5}). From these we generated 1000 data sets, using random sampling with replacement, in which any treatment difference could have occurred only by chance. Tests maintaining adherence to the nominal type 1 error rate would expect to see a significant result in around 50 of the 1000 data sets.

## References

- 2.↵
Altman DG. Practical statistics for medical research. London: Chapman & Hall; 1991.
- 2.↵
Siegel S, Castellan NJ. Nonparametric statistics for the behavioral sciences. 2 ed. Singapore: McGraw-Hill; 1988: 1–399.
- 2.↵
The RANTTAS Investigators, A randomized trial of tirilazad mesylate in patients with acute stroke (RANTTAS). Stroke
*.*1996; 27: 1453–1458. - 2.↵
- 2.↵

# OAST Supplemental Appendix III: Trial Data (see Tables II and III⇓⇓⇓)

### Excluded Trials

Unavailable data: aptiganel, Day hospital, DIAS, ECASS I, Glycine, Hyperbaric oxygen, LUB-INT-9, Norwegian, Orpington (1993 & 1995), Ronning, Sulter and Goteberg stroke unit trials PROACT I, STAT, ZK200775.

## References

- 3.↵
Abciximab Emergent Stroke Treatment Trial (AbESTT) Investigators. Emergency treatment of Abciximab for treatment of patients with acute ischemic stroke. Results of a randomized phase 2 trial. Stroke
*.*2005; 36: 880–890. - 3.↵
- 3.↵
Davis SM, et al. Selfotel in acute ischemic stroke: Possible neurotoxic effects of an NMDA antagonist. Stroke
*.*2000; 31: 347–354. - 3.↵
Clark WM, et al. The rtPA (alteplase) 0–6 hour acute stroke trial, part A (A0276g): Results of a double-blind, placebo-controlled, multicenter study. Stroke
*.*2000; 31: 811–816. - 3.↵
- 3.↵
Barer DH, et al. Low dose beta blockade in acute stroke (“BEST” trial): an evaluation. BMJ
*.*1988; 296: 737–741. - 3.↵
- 3.↵
Clark WM, Warach SJ, Pettigrew LC. A randomised dose-response trial of citicoline in acute ischemic stroke patients. Neurology
*.*1977; 49: 671–678. - 3.↵
Clark WM, et al. A randomised efficacy trial of citicoline in patients with acute ischaemic stroke. Stroke
*.*1999; 30: 2592–2597. - 3.↵
- 3.↵
Clark WM, et al. A phase III randomized efficacy trial of 2000mg citicoline in acute ischemic stroke patients. Neurology
*.*2001; 57: 1595–1602. - 3.↵
Saxena R, et al. Controlled safety study of a hemoglobin-based oxygen carrier, DCLHb, in acute ischemic stroke. Stroke
*.*1999; 30: 993–996. - 3.↵
Enlimomab Acute Stroke Trial Investigators. Use of anti-ICAM-1 therapy in ischemic stroke. Neurology
*.*2001; 57: 1428–1434. - 3.↵
Yamaguchi T, et al. Ebselen in acute ischemic stroke. Stroke
*.*1998; 29: 12–17. - 3.↵
- 3.↵
- 3.↵
- 3.↵
- 3.↵
Wong KS, et al. A randomized controlled study of low molecular weight heparin versus aspirin for the treatment of acute ischaemic stroke in patients with large artery occlusive disease. Presented at the 14th European Stroke Conference. 2005. Bologna, Italy.
- 3.↵
- 3.↵
- 3.↵
- 3.↵
- 3.↵
- 3.↵
Morris AD, et al. A pilot study of streptokinase for acute cerebral infarction. Quarterly Journal of Medicine
*.*1995; 88: 727–731. - 3.↵
- 3.↵
- 3.↵
The RANTTAS Investigators. A randomized trial of tirilazad mesylate in patients with acute stroke (RANTTAS). Stroke
*.*1996; 27: 1453–1458. - 3.↵
Haley EC. High-dose tirilazad for acute stroke (RANTTAS II). Stroke
*.*1998. - 3.↵
The STIPAS Investigators. Safety study of tirilazad mesylate in patients with acute ischemic stroke (STIPAS). Stroke
*.*1994; 25: 418–423. - 3.↵
Peters GR, et al. Safety and efficacy of 6 mg/kg/day tirilizad mesylate in patients with acute ischemic stroke (TESS study). Stroke
*.*1996; 27: 195. - 3.↵
Orgogozo JM,
*TESS II*. Unpublished Work, 1995. - 3.↵
Corr S. Occupational therapy for stroke patients after hospital discharge - a randomised controlled trial. Clinical Rehabilitation
*.*1995; 9: 291–296. - 3.↵
Gilbertson L, et al. Domiciliary occupational therapy for patients with stroke discharged from hospital: randomised controlled trial. BMJ
*.*2000; 320: 603–606. - 3.↵
Logan PA, et al. A randomised controlled trial of enhanced social service occupational therapy for stroke patients. Clinical Rehabilitation
*.*1997; 11: 107–113. - 3.↵
Parker CJ, et al. A multicentre randomized controlled trial of leisure therapy and conventional occupational therapy after stroke. Clinical Rehabilitation
*.*2001; 15: 42–52. - 3.↵
Walker, M., A.Drummond, and N.Lincoln, Evaluation of dressing practice for stroke patients after discharge from hospital: a crossover design study. Clinical Rehabilitation
*.*1996; 10: 25–23. - 3.↵
Walker M, et al. Occupational therapy for stroke patients not admitted to hospital: a randomised controlled trial. The Lancet
*.*1999; 354: 278–280. - 3.↵
Young JB, Forster A. The Bradford community stroke trial: results at six months. BMJ
*.*1992; 304: 1085–1092. - 3.↵
Stevens RS, Ambler NR. The Dover Stroke Rehabilitation Unit: a randomised controlled trial of stroke management. In: Rose FC, ed. Advances in Stroke Therapy. New York: Raven Press; 1982: 257–261.
- 3.↵
Kaste M, Palomaki H, Sarna S. Where and how should elderly stroke patients be treated? A randomized trial. Stroke
*.*1995; 26: 249–253. - 3.↵
Sivenius J, et al. The significance of intensity of rehabilitation of stroke - A controlled trial. Stroke
*.*1985; 16: 928–931. - 3.↵
- 3.↵
- 3.↵
Aitken PD, et al. General medical or geriatric unit care for acute stroke? A controlled trial (Abstract). Age and Ageing
*.*1993; 22 (supp 2): 4–5.

# OAST Supplemental Appendix IV: Results (see Table IV)

### OAST Supplemental Appendix V: Results

#### Type 1 Error Rate

Analysis of 1000 resampled random datasets from the 3 trials^{1–3} did not find any evidence of an increased type 1 error rate for ordinal logistic regression with the number of “positive” data sets being: BI 39/1000 (*P*=0.96); mRS 57/1000 (*P*=0.17) and 3Q 56/1000 (*P*=0.21). Similar results were found for both the *t* test and RRT.

#### Test Assumptions

When assessing ordinal logistic regression, the assumption of proportionality of odds (likelihood ratio test comparing the multinomial logistic model to the ordinal logistic regression model) was not met (*P*<0.05) in 8 of the 55 data sets (ASK, *P*=0.001; ASSIST 07, *P*=0.002; ATLANTIS A, *P*=0.01; citicoline 10, *P*=0.004; FOOD 3, *P*=0.04; MAST-I, *P*=0.003; Orpington Domiciliary care, *P*=0.02; Orpington Team, *P*=0.02). The assumption of normality required for the *t* test did not hold for any of the data sets. In contrast, the assumption of the RRT was met in all cases while the bootstrap approach is assumption free.

## References

## This Issue

## Jump to

## Article Tools

- Can We Improve the Statistical Analysis of Stroke Trials?The Optimising Analysis of Stroke Trials (OAST) CollaborationStroke. 2007;38:1911-1915, originally published May 29, 2007http://dx.doi.org/10.1161/STROKEAHA.106.474080
## Citation Manager Formats

## Share this Article

- Can We Improve the Statistical Analysis of Stroke Trials?The Optimising Analysis of Stroke Trials (OAST) CollaborationStroke. 2007;38:1911-1915, originally published May 29, 2007http://dx.doi.org/10.1161/STROKEAHA.106.474080

## Related Articles

- No related articles found.

## Cited By...

- C-reactive protein is associated with disability independently of vascular events: the Northern Manhattan Study
- Cerebral Computed Tomography-Graded White Matter Lesions Are Associated With Worse Outcome After Thrombolysis in Patients With Stroke
- Statistical Analysis Plan for the 'Triple Antiplatelets for Reducing Dependency after Ischaemic Stroke' (TARDIS) Trial
- Recombinant Tissue-Type Plasminogen Activator Plus Eptifibatide Versus Recombinant Tissue-Type Plasminogen Activator Alone in Acute Ischemic Stroke: Propensity Score-Matched Post Hoc Analysis
- Functional Outcome After Common Poststroke Complications Occurring in the First 90 Days
- An Improved Method for Simple, Assumption-Free Ordinal Analysis of the Modified Rankin Scale Using Generalized Odds Ratios
- Baseline Characteristics of the 4011 Patients Recruited into the 'Efficacy of Nitric Oxide in Stroke' (ENOS) Trial
- Testing Devices for the Prevention and Treatment of Stroke and its Complications
- Update Protocol Preventive Antibiotics in Stroke Study (PASS)
- Statistical Analysis Plan for the 'Efficacy of Nitric Oxide in Stroke' (ENOS) Trial
- Endovascular Therapy for Stroke: Getting to the "Heart" of the Matter
- Risk Adjustment of Ischemic Stroke Outcomes for Comparing Hospital Performance: A Statement for Healthcare Professionals From the American Heart Association/American Stroke Association
- Effect Size Measures and Their Relationships in Stroke Studies
- Regression Analysis of Ordinal Stroke Clinical Trial Outcomes: An Application to the NINDS t-PA Trial
- Feasibility of an Ambulance-Based Stroke Trial, and Safety of Glyceryl Trinitrate in Ultra-Acute Stroke: The Rapid Intervention With Glyceryl Trinitrate in Hypertensive Stroke Trial (RIGHT, ISRCTN66434824)
- Do stroke models model stroke?
- Response to Letter Regarding Article Entitled "A Simple, Assumption-Free, and Clinically Interpretable Approach for Analysis of Modified Rankin Outcomes"
- Letter by Bath Regarding Article, "A Simple, Assumption-Free, and Clinically Interpretable Approach for Analysis of Modified Rankin Outcomes"
- Statistical Analysis of the Primary Outcome in Acute Stroke Trials
- Calculation of Numbers-Needed-To-Treat in Parallel Group Trials Assessing Ordinal Outcomes: Case Examples from Acute Stroke and Stroke Prevention
- Optimal End Points for Acute Stroke Therapy Trials: Best Ways to Measure Treatment Effects of Drugs and Devices
- Stroke Unit Care Combined With Early Supported Discharge Improves 5-Year Outcome: A Randomized Controlled Trial
- Thrombolysis in very elderly people: controlled comparison of SITS International Stroke Thrombolysis Registry and Virtual International Stroke Trials Archive
- Thrombolysis Is Associated With Consistent Functional Improvement Across Baseline Stroke Severity: A Comparison of Outcomes in Patients From the Virtual International Stroke Trials Archive (VISTA)
- Therapeutic Hypothermia for Acute Ischemic Stroke: Ready to Start Large Randomized Trials?
- Effect of Combined Aspirin and Extended-Release Dipyridamole Versus Clopidogrel on Functional Outcome and Recurrence in Acute, Mild Ischemic Stroke: PRoFESS Subgroup Analysis
- A simulation study evaluating approaches to the analysis of ordinal outcome data in randomized controlled trials in traumatic brain injury: results from the IMPACT Project
- Emulating Multicentre Clinical Stroke Trials: A New Paradigm for Studying Novel Interventions in Experimental Models of Stroke
- Effect of Telmisartan on Functional Outcome, Recurrence, and Blood Pressure in Patients With Acute Mild Ischemic Stroke: A PRoFESS Subgroup Analysis
- Sample Size Estimates for Clinical Trials of Vasospasm in Subarachnoid Hemorrhage
- Functional Outcome Measures in Contemporary Stroke Trials
- Severity of leukoaraiosis correlates with clinical outcome after ischemic stroke
- Treatment effects for which shift or binary analyses are advantageous in acute stroke trials
- Stroke trials: A shift to shift analysis?
- Cochrane review: information provision for stroke patients and their caregivers
- Should Stroke Trials Adjust Functional Outcome for Baseline Prognostic Factors?
- Systematic Reviews as a Tool for Planning and Interpreting Trials
- Prestroke physical activity is associated with severity and long-term outcome from first-ever stroke
- Use of Ordinal Outcomes in Vascular Prevention Trials: Comparison With Binary Outcomes in Published Trials * Supplemental Appendix I: Statistical Tests Compared * Supplemental Appendix II: Supplementary Analyses * Supplemental Appendix III: Results
- Ordinal Reanalysis of the SHEP Trial
- Calculation of Sample Size for Stroke Trials Assessing Functional Outcome: Comparison of Binary and Ordinal Approaches: The Optimising Analysis of Stroke Trials (OAST) collaboration
- Response to Letter by Miller and Palesch
- Comments Regarding the Recent OAST Article
- Novel End Point Analytic Techniques and Interpreting Shifts Across the Entire Range of Outcome Scales in Acute Stroke Trials