Use of Ordinal Outcomes in Vascular Prevention Trials
Comparison With Binary Outcomes in Published Trials

Abstract
Background and Purpose— Vascular prevention trials mostly count “yes/no” (binary) outcome events, eg, stroke/no stroke. Analysis of ordered categorical vascular events (eg, fatal stroke/nonfatal stroke/no stroke) is clinically relevant and could be more powerful statistically. Although this is not a novel idea in the statistical community, ordinal outcomes have not been applied to stroke prevention trials in the past.
Methods— Summary data on stroke, myocardial infarction, combined vascular events, and bleeding were obtained by treatment group from published vascular prevention trials. Data were analyzed using 10 statistical approaches which allow comparison of 2 ordinal or binary treatment groups. The results for each statistical test for each trial were then compared using Friedman 2-way analysis of variance with multiple comparison procedures.
Results— Across 85 trials (335 305 subjects) the test results differed substantially so that approaches which used the ordinal nature of stroke events (fatal/nonfatal/no stroke) were more efficient than those which combined the data to form 2 groups (P<0.0001). The most efficient tests were bootstrapping the difference in mean rank, Mann–Whitney U test, and ordinal logistic regression; 4- and 5-level data were more efficient still. Similar findings were obtained for myocardial infarction, combined vascular outcomes, and bleeding. The findings were consistent across different types, designs and sizes of trial, and for the different types of intervention.
Conclusions— When analyzing vascular events from prevention trials, statistical tests which use ordered categorical data are more efficient and are more likely to yield reliable results than binary tests. This approach gives additional information on treatment effects by severity of event and will allow trials to be smaller.
Major advances have been made in the primary and secondary prevention of stroke with effective strategies based on lifestyle modification, antithrombotic agents, blood pressure and cholesterol lowering, and carotid endarterectomy. In parallel, the absolute risk of recurrence has fallen dramatically over time; in stroke trials, this is apparent as a decrease in the control event rate, eg, 10.8% in the Canadian American Ticlopidine Study (CATS) in 19892 and 3.4% in Perindopril protection against recurrent stroke study (PROGRESS) in 2001.3 This trend is likely to continue as new and effective interventions are added. Because absolute event rates are a key component in sample size calculations for binary (“yes/no” event) outcomes, low rates equate to larger trials.4 An additional pressure in performing trials is that their number has increased as new prophylactic strategies are tested, eg, antiplatelets (thromboxane synthase inhibitors), anticoagulants (thrombin/factor Xa inhibitors), and carotid interventions (stenting, treatment of asymptomatic stenosis). The combination of more and larger trials means it is becoming increasingly difficult to find sufficient patients to enroll into new studies.
New strategies are required to bring trial sample sizes down and to maximize the potential to demonstrate benefit. In the past, composite outcomes of vascular death, nonfatal stroke, and nonfatal myocardial infarction (MI) have been used, in part to increase the number of events. This approach can be extended to include further events in the composite such as hospitalization, silent brain infarcts (as identified by MRI), or by counting all vascular events rather than just the first one.5 However, the use of composite outcomes has been criticized.6 An alternative approach is to analyze vascular prevention trials in a way which does not lose clinically relevant data. Most studies compare binary (stroke/no stroke) event rates between the treatment and control group. However, stroke or MI events may be fatal or nonfatal, so trichotomous outcomes (fatal event/nonfatal event/no event) can be analyzed. This approach can be extended to 4 (fatal stroke/severe nonfatal stroke/mild stroke/no stroke) or 5 (fatal stroke/severe nonfatal stroke/mild stroke/transient ischemic attack [TIA]/no event) levels. Similar ordered categorical outcomes can be developed for MI, composite vascular outcomes, and bleeding, as well as other vascular events, such as heart failure. The analysis of such ordered categorical (ordinal) events is usually more efficient statistically (because data on severity are not lost) thereby offering the potential for reducing trial sample size while maximizing the potential to find small clinically relevant treatment benefits.7 Such polytomization of events assumes that the ordering of events is meaningful, ie, that fatal vascular events are considered more severe than nonfatal ones. If so, ordinal outcomes may be more informative to patients, carers, healthcare professionals, and government than binary outcomes.
We report a comparison of the relative efficiencies of using and analyzing binary and polytomous outcomes from vascular prophylaxis trials. Although the use of ordinal statistical approaches is well defined in the methodological literature, its use for designing and analyzing vascular prevention trials is entirely novel.
Methods
Identification of Trials
We sought summary patient data from randomized controlled trials assessing primary or secondary vascular prevention, ie, preventing first or recurrent events respectively, which were either positive or negative according to the trial publication, or were included in a meta analysis showing benefit or harm; neutral trials in a neutral meta-analysis were excluded, an approach which follows our previous study in acute stroke trials.7 We included vascular trials involving nonstroke patients and those measuring nonstroke outcomes because stroke patients suffer subsequent nonstroke vascular events, and those with other vascular conditions can go on to have a stroke. Taking this approach means the findings are generalizable across the field of vascular medicine. Published studies fulfilling these criteria were identified from electronic searches of the Cochrane Library and included studies of antithrombotic, BP or lipid lowering therapy, carotid endarterectomy, and hormone replacement therapy. Trials were excluded if they were neutral and related to a neutral intervention (as determined from a published meta-analysis) or did not include adequate ordered categorical information for at least one vascular outcome.
Trial Data
The numbers of subjects at the end of follow-up having a stroke (fatal, nonfatal, severe nonfatal, mild, TIA), MI (fatal, nonfatal), composite vascular event (fatal stroke or MI, nonfatal stroke or MI), and bleeding (major, minor, no bleeding) were obtained, where available, for each treatment group (active, control) from the primary trial publication. In factorial trials or those having more than two treatment groups,8 data were analyzed for each active comparison versus control. Data were assessed by intention-to-treat where possible.
Statistical Tests
We compared different statistical tests for assessing treatment effect.9–14 Some of these required the ordinal data to be combined into two groups (eg, Pearson’s Chi-square test), whereas others used the raw ordered categorical data (eg, Mann–Whitney U test, unpooled t test, bootstrapping the mean rank, ordinal logistic regression [also known as the proportional odds regression]). A description of the statistical tests used is given in the supplemental Appendix I, available online at http://stroke.ahajournals.org.
Comparison and Ordering of Statistical Tests
Each data set was analyzed using each statistical test. The results were then ordered within each trial and given a rank, with the lowest rank given to the test which produced the smallest probability value within that trial. A 2-way analysis of variance test (Friedman with adjustment for ties15; ANOVA) was then performed to assess which statistical test produced the lowest ranks (ie, the most statistically significant values). Duncan multiple range test was used to assess the ordering of tests and determine where significant differences between tests were present. We also assessed how many statistically significant (at 5%) results each test found.
To assess the validity and reliability of the results found, a number of supplementary analyses were carried out. First, the comparison of statistical tests was repeated within subgroups of trials sharing similar characteristics to assess whether particular types of trials suited different statistical approaches; second, the statistical assumptions of the tests were assessed; and third, the sensitivity (type 1 error) of the tests was assessed. Technical details of these supplementary analyses can be found in the supplemental Appendix II.
Analyses were carried out in SAS (version 8.2) and Stata (version 7); significance was taken at P<0.05 for analyses of trials and P<0.01 for ANOVA.
Results
Trials
Of 243 identified trials, 101 (416 020 subjects) were included, these comprising 35 primary and 66 secondary prevention studies (supplemental Table I⇓⇓⇓⇓⇓⇓). One hundred forty-two trials were excluded, mostly because their published data did not distinguish between fatal and nonfatal vascular events so that 3-level data could not be calculated (supplemental Table II⇓⇓⇓⇓⇓⇓⇓⇓).
Table I. Appendix 3: Included Trials
Table I. Continued
Table I. Continued
Table I. Continued
Table I. Continued
Table I. Continued
Table II. Excluded Trials
Table II. Continued
Table II. Continued
Table II. Continued
Stroke
The trials variably included intracerebral hemorrhage within the outcome of stroke. The results of the statistical tests differed significantly with 3-level data (fatal stroke/nonfatal stroke/no stroke; 85 trials, 335 305 subjects; ANOVA P<0.0001); ordinal analyses ranked above binary approaches (Tables 1 and 2⇓; Figure 1) with the Mann–Whitney U test, bootstrapping (difference in mean rank), and ordinal logistic regression significantly better than the other methods (supplemental Figure I). Similar results were seen for the other stroke outcome assessments: 4-level (fatal stroke/severe nonfatal stroke/mild stroke/no stroke), 4-level including TIA (fatal stroke/nonfatal stroke/TIA/no stroke or TIA), and 5-level (fatal stroke/severe nonfatal stroke/mild stroke/TIA/no stroke or TIA; each ANOVA P<0.0001; Table 2). Although the absolute ordering of the tests varied for these polytomous outcomes, ordinal tests always performed better than binary ones (Table 2). Six trials gave sufficient data to compare qualitatively 3-, 4-, and 5-level stroke data; 4-level data (with TIA included as an event) and 5-level data (including TIA) appeared to be the most efficient approaches. When assessed by how many trials were statistically significant (positive or negative but not neutral), those tests which did not collapse the data into groups again out-performed other approaches; for example the Mann–Whitney U test gave a statistically significant result in 44% of trials in comparison with the Pearson’s χ2 2x3 test at 32% (Figure 1).
Table 1. Assessment of 10 Statistical Approaches for Analyzing Stroke as a 3-Level Event (Fatal/Nonfatal/No Stroke) in 85 Vascular Prevention Trials
Analysis by 2-way ANOVA (P<0.0001) on the ranked data (1 to 10 with 1 “best”); comparison of tests by Duncan’s multiple range test—those tests joined by the same band are not significantly different from each other at P<0.01.
Table 2. Ranking of Statistical Tests (1 to 10 With 1 “Best”) for Measure of Stroke (3, 4, and 5-Levels), Myocardial Infarction (3-Level), Composite Vascular Outcome (3-Level), and Bleeding (3-Level)
Figure 1. The number of significant trials (positive or negative but not neutral, P<0.05) for each statistical test for 3-level stroke (fatal, nonfatal, no stroke).
Figure I. Ordering of statistical tests for stroke (3, 4 and 5-level), myocardial infarction (3-level), vascular events (3-level) and bleeding (3-level). Ordinal tests (Mann–Whitney U test, ordinal logistic regression) were superior (lower rank) to dichotomous tests.
Myocardial Infarction
Fifty-eight trials (232 515 subjects) were included. The analyses differed significantly for a 3-level outcome (fatal MI/nonfatal MI/no MI; P<0.0001), with ordinal approaches performing better than binary (Table 2).
Composite Vascular Event
Forty-three trials (204 108 subjects) gave data for a 3-level composite vascular outcome (fatal stroke or MI/nonfatal stroke or MI/no stroke or MI). Ordinal tests performed best (P<0.0001) with the Mann–Whitney U test, bootstrapping (the difference in mean rank) and ordinal logistic regression ranking highest (Table 2).
Bleeding
Fifteen trials (26 215 patients) were identified as including information on bleeding at three levels: major bleeding, minor bleeding, no bleeding. Definitions of bleeding differed between trials. Once again, ordinal analytic approaches ranked highest (Table 2).
Sensitivity Analysis and Test Assumptions
The ordering of statistical tests, with ordinal more efficient than binary, was maintained for all subgroups of trials irrespective of type of prevention and treatment, average age of patients, trial size and length of follow-up, risk of death or stroke, and time from index event (Table 3). When considering the 19 trials (27 datasets) with a high event rate (>10% overall), ordinal tests remained most efficient. Published hazard ratios (which take into account the time to event, as derived from the Cox proportional hazards model) for stroke were available for 36 trials; a comparison of the 11 statistical tests, including Cox results, revealed bootstrapping, Mann–Whitney U, and ordinal logistic regression to be as good if not slightly superior to the Cox model (Duncan multiple range test).
Table 3. Ranking of Statistical Tests (1 to 10 With 1 ‘Best’) for 3-Level Stroke (Fatal, Nonfatal, No Stroke) in Subgroups of Vascular Prevention Trials
The statistical assumptions for ordinal logistic regression were not violated (P>0.05) in 79 of 85 trials with 3-level stroke data; no violations were present for 11 trials with 5-level stroke data (supplemental Appendix III). The sensitivity analysis showed that the top performing statistical tests (ordinal logistic regression, Mann–Whitney U test) were not overly sensitive, and statistically significant treatment effects were only found where they are likely to be present (supplemental Appendix III). Using ordinal logistic regression, the odds ratios were similar for different strata of severity for 3-level, 4-level, and 5-level data (supplemental Table III⇓⇓⇓).
Table III. Odds Ratios for Example Trials for Different Outcome Levels
Discussion
Improvements in secondary prevention are leading to falling event rates in clinical trials. This means that future vascular prevention trials will need to be longer and, with an increasing number of new interventions, the availability of subjects is becoming limited. Thus, new approaches to trial design and analysis are needed to help reduce sample size. This study has shown that it is feasible to create 3-level ordered categorical outcomes for stroke, MI, a composite vascular event (fatal stroke and MI/nonfatal stroke and MI), and bleeding. Analysis reveals that, in general, statistical approaches which use ordinal data are more efficient than conventional binary tests based on “event/no event.” A further increase in efficiency comes from using 4-level or 5-level data for stroke (with or without TIA). Ordering vascular events by severity has both biological and clinical meaning. Fatal events are clearly the most extreme health state whereas a severe stroke (normally defined as a stroke resulting in dependency on others) is a disaster for the patient, their career, and society, for both clinical and economic reasons. A mild stroke leaves the patient independent, even if residual impairment remains, and those who are younger can often return to work.
The most efficient statistical tests were those which examined ordinal data, including ordinal logistic regression, the Mann–Whitney U test, and bootstrapping the mean rank. In addition to improving statistical efficiency, the use of ordered categorical outcomes gives information on the ability of an intervention to reduce the severity of an event, not just the number of events. Ordinal logistic regression allows both estimation (with confidence intervals) and inclusion of baseline prognostic covariates in analyses. However, it assumes that any treatment effect is similar across outcome levels, ie, the odds of moving a treated patient from fatal to severe nonfatal stroke are similar to those for moving from TIA to no event (“proportionality of odds”). This assumption requires justification because it is neither widely recognized nor obvious in most published vascular trial data. First, it is biologically plausible to suggest that prophylactic interventions will reduce severity as well as the total number of events. Since the development of atherosclerosis and increases in thrombosis, coagulation and inflammation are not binary events in nature, and their magnitude is a determinant of the severity of clinical vascular events, it is reasonable to expect that interventions will move patients from fatal to severe, severe to mild, and mild to no events. If this assumption (of proportional odds) is not met, an alternative ordinal model could be considered.16
Second, there is existing published evidence that interventions do alter severity: simvastatin reduced the risk of stroke of different severities by similar risk reductions in the Heart Protection Study (HPS),17 hormone replacement therapy increased both stroke and its severity in the Women’s Estrogen for Stroke Trial (WEST),18 and antiplatelet agents reduced both fatal and nonfatal vascular events in the Antithrombotic Trialists’ (ATT) Collaboration meta analysis.19 The apparent failure of most vascular prevention trials to show individual effects on death or severe events is largely because they were not powered to assess these specific and, therefore, relatively uncommon events. Third, the odds reduction at each outcome level appeared to be relatively constant when individual trials were assessed (Figure 2); formal statistical assessment using the likelihood ratio test indicated that “proportionality of odds” was present in most cases (although this test is known to be conservative; Appendix 6). Last, using ordinal statistical tests was more powerful than binary approaches, the central finding of this study. Although this is not a novel idea in the statistical community,20 ordinal outcomes have not been applied to vascular prevention trials in the past. In this context, it is worth noting that ordinal logistic regression is relatively robust to deviations in its assumptions even if they are not met in a particular trial. Another efficient ordinal test is the Mann–Whitney U test, which is widely available in statistical packages and can produce a point estimate (median difference between groups) with confidence intervals. The major assumption of the test is that the treatment groups should be independent, and this is met here. The final efficient statistical approach was bootstrapping the mean rank; this approach is computer intensive13 and its application and the interpretation of results are not well appreciated by clinicians, although it is free of assumptions.
Figure 2. Odds ratios across trial (by ordinal logistic regression) and by individual outcome levels for 4 trials to illustrate the assumption of proportionality of odds.
The conventional approach to analyzing vascular prevention trials is to perform time to event analyses, as visualized using Kaplan–Meier curves and analyzed with Cox regression. When the frequency of events is high, analyses based on time-to-event are more efficient than those using frequencies (as analyzed using logistic regression). However, the frequency of vascular events in most primary and secondary prevention trials running over 3 to 5 years is relatively low; recent vascular prevention trials have tended to report annualized stroke rates of 2% to 4%.21,22 Logistic and Cox models give similar results when the overall event frequency is less than 10%.23,24 Where the frequency of events is higher, ordinal data may be analyzed by time to event.25,26 In the current dataset, the Cox model was slightly less efficient than bootstrapping, Mann–Whitney U, and ordinal logistic regression.
In this study, we have focused on assessing stroke as the primary outcome rather than using a composite vascular outcome (fatal vascular event, nonfatal stroke, and MI). Stroke was of interest since it has been used in several prevention trials, eg, the European Stroke Prevention Study-II (ESPS-II) and PROGRESS,3,27 and 4- or 5-level data (including TIA) may be created. Nevertheless, ordered categorical outcomes may also be created for composite outcomes (fatal stroke or MI/nonfatal stroke or MI/no event) as well as other events such as MI or bleeding. Our results suggest that the use and analysis of polytomous outcomes would benefit trials assessing any of these vascular outcomes, and it is likely that the approach would work for others such as heart failure and venous thromboembolism; we are currently assessing this.
Using ordered categorical data will mean that results will need to be reported differently. The results of binary tests are summarized easily as the proportion of patients who benefit (or suffer) with a treatment, ie, oral anticoagulation reduced absolute stroke recurrence by 1.46% (odds ratio 0.75, P=0.036) in the Anticoagulants in the Secondary Prevention of Events in Coronary Thrombosis (ASPECT) trial.28 In contrast, ordinal tests will need to be presented as the average absolute improvement in outcome, eg, anticoagulation reduced stroke recurrence and its severity with an odds ratio of 0.60 (or reduced the mean severity by 0.5 points, P=0.013) on a 5-level scale.28 In this respect, health consumers will need to decide what odds ratio or difference in events is worthwhile, both clinically and in terms of health economics. In reality, it is reasonable to present the primary result using the odds ratio (or median change in event severity) and to give the absolute percentage change calculated from the binary outcome as a secondary measure. Further, a visual presentation of the data can be displayed as the percentage of patients within each category by treatment group (data from the North American Symptomatic Carotid Endarterectomy Trial [NASCET], Figure 3).
Figure 3. Example 4-level ordinal data from NASCET1 of carotid endarterectomy (CEA).
Just as sample size calculations exist for trials using dichotomised analyses,4 analogous approaches exist for ordinal tests.29 Because ordinal analyses are more powerful statistically, trial size may be reduced for a given power of say 90%; eg, sample size falls by 15% to 24% as the number of outcome categories increases from 3 to 7.29 This reduction is worthwhile and would reduce competition between trials for patients, and lower trial costs and complexity. Taking the Hypertension in Elderly Patients (HEP) trial30 as an example (and assuming significance=0.05 and power=0.9), the sample size is reduced by 48% from 1556 for a binary outcome of stroke/no stroke to 810 for a 3-level stroke outcome as calculated using the method of Whitehead;29 this is further reduced to 772 with a 5 level stroke outcome.
A number of caveats must be made about this study. First, a majority of identified trials could not be included because they did not publish adequate information on vascular events. As data were missing for a variety of trial types (primary, secondary prevention), sizes, and outcome measures (stroke/MI/vascular/bleeding) it is unlikely that a systematic bias was introduced into the findings; however, the precision of the results will have been attenuated by the missing data. Future trial publications should give this information, including vital status for the main vascular outcomes, so that ordered outcome categories can be calculated. Second, we did not use all possible statistical tests relevant to the problem of analyzing ordered categorical data; instead, we focused on those approaches which are readily available in statistical textbooks11 and computer packages.
In summary, we suggest that vascular prevention trials should consider using statistical approaches, which use the inherent ordered categorical data present within vascular outcome events. The resulting trials could be smaller (with savings in patient numbers, numbers of centers, and study cost and complexity) and would allow appreciation of the effect of interventions on severity, as well as absolute number of events, to be highlighted. Appropriate tests include ordinal logistic regression, the Mann–Whitney U test, and bootstrapping the mean rank.
Acknowledgments
Sources of Funding
The Division of Stroke Medicine (University of Nottingham) receives core funding from The Stroke Association (UK). P.B. is Stroke Association Professor of Stroke Medicine. L.G. receives funding from the Medical Research Council. The funding sources had no involvement in the project.
Disclosures
P.W.B. participated in some of the included trials.
- Received November 12, 2007.
- Accepted March 19, 2008.
References
- ↵
- ↵
Gent M, Blakely JA, Easton JD, Ellis DJ, Hachinski VC, Harbison JW, Panak E, Roberts RS, Sicurella J, Turpie AGG, Group TC. The Canadian American Ticlopidine study (CATS) in thromboembolic stroke. Lancet. 1989: 1215–1220.
- ↵
- ↵
Weaver CS, Leonardi-Bee J, Bath-Hexall FJ, Bath PMW. Sample size calculations in acute stroke trials: A systematic review of their reporting, characteristics, and relationship with outcome. Stroke. 2004; 35: 1216–1224.
- ↵
Schrader J, Luders S, Kulschewski A, Hammersen F, Plate K, Berger J, Zidek W, Dominiak P, Diener H-C, for the MOSES study group. Morbidity and mortality after stroke, eprosartan compared with nitrendipine for secondary prevention. Principal results of a prospective randomised controlled study (MOSES). Stroke. 2005; 36: 1218–1226.
- ↵
Ferreira-Gonalez I, Permanyer-Miralda G, Domingo-Salvany A, Busse JW, Heels-Ansdell D, Montori VM, Akl EA, Bryant DM, Alonso-Coello P, Alonso J, Worster A, Upadhye S, Jaeschke R, Schunemann HJ, Pacheco-Huergo V, Wu P, Mills EJ, Guyatt GH. Problems with the use of composite end points in cardiovascular trials: Systematic review of randomised controlled trials. BMJ. 2007; 334: 786.
- ↵
The Optimising Analysis of Stroke Trials (OAST) Collaboration. Can we improve the statistical analysis of stroke trials? Statistical re-analysis of functional outcomes in stroke trials. Stroke. 2007; 38: 1911–1915.
- ↵
UK-TIA Study Group, 2025. The United Kingdom Transient Ischaemic Attack (UK-TIA) aspirin trial: Final results. J Neurol Neursurg Psych. 1991; 54: 1044–1054.
- ↵
Altman DG. Practical Statistics for Medical Research. London: Chapman & Hall; 1991.
- ↵
Conover WJ. Practical Nonparametric Statistics. New York: John Wiley & Sons; 1971.
- ↵
Siegel S, Castellan NJ. Nonparametric Statistics for the Behavioral Sciences. Singapore: McGraw-Hill; 1988.
- ↵
- ↵
Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman & Hall; 1993.
- ↵
- ↵
Hollander M, Wolfe DA. Nonparametric Statistical Methods. New York: John Wiley & Sonsinc; 1999.
- ↵
Stokes ME, Davis CS, Koch GG. Categorical Data Analysis Using SAS. Cary, NC: SAS Institute; 1995.
- ↵
- ↵
- ↵
Antithrombotic Trialists Collaboration. Collaborative meta-analysis of randomised trials of antiplatelet therapy for prevention of death, myocardial infarction, and stroke in high risk patients. BMJ. 2002; 324: 71–86.
- ↵
- ↵
Bhatt DL, Fox KAA, Hacke W, Berger PB, Black HR, Boden WE, Cacoub P, Cohen EA, Creager MA, Easton JD, Flather MD, Haffner SM, Hamm CW, Hankey GJ, Johnston SC, Mak K-H, Mas J-L, Montalescot G, Pearson TA, Steg PG, D, Steinhubl SR, Weber MA, Brennan DM, Fabry-Ribaudo L, Booth J, Topol EJ, for the CHARISMA Investigators. Clopidogrel and aspirin versus aspirin alone for the prevention of atherothrombotic events. N Engl J Med. 2006; 354: 1706–1717.
- ↵
- ↵
- ↵
- ↵
Schatzkin AR, Cupples LA, Heeren T, Morelock S, Kannel WB. Sudden death in the Framingham heart study. Differences in the incidence and risk factors by sex and coronary disease status. Am J Epidemiol. 1984; 120: 888–899.
- ↵
- ↵
- ↵
- ↵
- ↵
Coope J, Warrender TS. Randomised trial of treatment of hypertension in elderly patients in primary care. BMJ. 1986; 293: 1145–1151.
Supplemental Appendix I: Statistical Tests Compared
Included Tests
Univariate statistical approaches for analyzing dichotomous and ordinal data comprised tests based on Pearson’s Chi-square, ordinal, and bootstrap approaches.1,2 Ten statistical approaches were assessed:
(1) Pearson’s Chi-square 2×2 test—stroke versus no stroke; (2) Pearson Chi-square 2×2 test—death versus alive; (3) Pearson’s Chi-square 2×3 test (unordered data)—fatal stroke versus non fatal stroke versus no stroke; (4) Cochran-Armitage trend test; (5) ordinal logistic regression; (6) median test; (7) Wilcoxon/Mann–Whitney U test (adjusted for ties); (8) robust ranks test (RRT)3; (9) t test; (10) bootstrap of difference in mean rank (with 3×3000 cycles).4,5 Pearson Chi-square tests were performed without continuity correction because most trials enrolled more than 100 patients.
Statistical Detail for Nonstandard Tests
Robust Rank Test
The Robust rank test3 is an alternative to the Wilcoxon test; it tests whether the median of one group is equal to another, but unlike the Wilcoxon test it does not assume that the distributions of the two groups are equal, ie, it makes no assumptions about the variance of the two groups.
Bootstrapping
Bootstrapping is a computationally intensive method which involves resampling data from a given sample. The main advantage of bootstrapping over more traditional methods is that it does not make assumptions about the distribution of the data. In this report we bootstrap the difference in mean rank; the procedure for doing this is outlined below:
Take a dataset, which contains n observations.
Draw a sample with replacement of size n (using replacement means that some of the original observations may appear in the new sample more than once and some not at all).
Estimate the parameter of interest (here the difference in mean rank) and store the result.
Repeat 2 and 3 many times; here we use 3 sets of 3,000 as used in the ECASS II trial.5
Compare the distribution of the stored results to the actual point estimate from the original dataset.
Ordinal Logistic Regression
Ordinal logistic regression (also called proportional odds regression)6 can be used when the dependent variable is ordered categorical. It is similar to logistic regression but it simultaneously estimates multiple end points instead of just one. The number of end points it estimates is equivalent to the number of ordered categories minus one. For example if the mRS was the dependent variable of interest it would compare the following j categories:
0 versus 1, 2, 3, 4, 5, 6
0, 1 versus 2, 3, 4, 5, 6
0, 1, 2 versus 3, 4, 5, 6
0, 1, 2, 3 versus 4, 5, 6
0, 1, 2, 3, 4 versus 5, 6
0, 1, 2, 3, 4, 5 versus 6
Ordinal logistic regression provides one overall estimate for each covariate in the model and not one for each cut point. This assumes that the overall odds ratio is constant no matter which cut is taken. So, for example the odds ratio for the treatment effect would be interpreted as the odds of being in category j or above for all choices of j comparing treatment 1 to treatment 0.
References
- ↵
Altman DG. Practical Statistics for Medical Research. London: Chapman & Hall; 1991.
- ↵
Conover WJ. Practical Nonparametric Statistics. New York: John Wiley & Sons; 1971.
- ↵
- ↵
Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman & Hall; 1993.
- ↵
- ↵
Agresti A. Analysis of Ordinal Categorical Data. New York: John Whiley & sons; 1984.
Supplemental Appendix II: Supplementary Analyses
Subgroup Analysis
Subgroup analyses were performed by assessing the efficiency of the different tests for differing trial characteristics: type of prevention (primary, secondary); type of treatment (anticoagulants, antiplatelets, antihypertensives, lipid lowering, carotid endarterectomy, hormone replacement therapy); patient age (≤65, >65 years); trial size (<2520, ≥2250 participants); length of follow up (≤36 months, >36 months); baseline severity (control group death rate adjusted for length of follow up, ≤median [0.2], >median [0.2]); time from index event (≤87 days, >87 days).
Statistical Assumptions
The principal statistical assumptions underlying the tests which performed well were assessed to ensure that their use was appropriate for stroke trial data. Assumptions included: ordinal logistic regression—proportionality of odds across response categories (ie, the magnitude of improvement or hazard, with a treatment, would be similar irrespective of baseline severity, age etc); Mann–Whitney U—independence of groups.
Type 1 Error
While assessing the statistical power of a particular test it is also important to ensure that the test maintains an acceptable proportion of type 1 errors (false-positive). A type I error occurs when a statistical test produces a significant result when in truth no treatment difference exists. If a test is maintaining adherence to the nominal proportion of type I errors then, under repeated sampling from a population in which the null-hypothesis of no treatment effect is true, we would expect to see a significant result (P<0.05) on 5% of occasions at the 5% significance level.
We assessed the proportion of type I errors for the three most efficient statistical tests, using data from five representative trials. From these we generated 1000 data sets, using random sampling with replacement, in which any treatment difference could have occurred only by chance. Tests maintaining an acceptable proportion of type I errors would expect to see a significant result in around 50 of the 1000 data sets.
Supplemental Appendix III: Results
Type 1 Error
Analysis of 1000 resampled random datasets from 5 representative trials did not find any evidence of an increased proportion of type I errors for ordinal logistic regression (SPAF-2, positive data sets n=54/1000, P=0.301; ESPS-2, n=56, P=0.212; HOPE, n=56, P=0.213; HPS, n=46, P=0.744; NASCET, n=47, P=0.69)5; Mann–Whitney U test (SPAF-2, n=21, P>0.99; ESPS-2, n=30, P=0.99; HOPE, n=17, P>0.99; HPS, n=26, P>0.99; NASCET, n=18, P>0.99).
Test Assumptions
When assessing ordinal logistic regression, the assumption of proportionality of odds (likelihood ratio test comparing the multinomial logistic model to the ordinal logistic regression model) was not met (P<0.05) in 6 of the 85 data sets (ASPECT P=0.04,6 TPT-I P=0.0002,7 TPT-II P=0.03, HOPE P=0.02,3 ANBP2 P=0.04,8 WEST P=0.059). The same analysis was repeated on the 5-way stroke data, and the assumption of proportionality of odds was met for all 11 trials included in this part. In contrast, the assumption of the Mann–Whitney U test was met in all cases while the bootstrap approach is assumption free.
References
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
The Medical Research Council’s General Practice Research Framework. Thrombosis prevention trial: Randomised trial of low-intensity oral anticoagulation with warfarin and low-dose aspirin in the primary prevention of ischaemic heart disease in men at increased risk. Lancet. 1998; 351: 233–241.
- ↵
- ↵
Viscoli CM, Brass LM, Kernan WN, Sarrel PM, Horwitz RI. Estrogen after ischemic stroke: Effect of estrogen replacement on risk of recurrent stroke and death in the women’s estrogen for stroke trial (WEST). Stroke. 2001; 32: 329.
Jump to
This Issue
Article Tools
- Use of Ordinal Outcomes in Vascular Prevention TrialsPhilip M.W. Bath, Chamila Geeganage, Laura J. Gray, Timothy Collier and Stuart PocockStroke. 2008;39:2817-2823, originally published September 29, 2008https://doi.org/10.1161/STROKEAHA.107.509893
Citation Manager Formats