Use of a Global Test for Multiple Outcomes in Stroke Trials With Application to the National Institute of Neurological Disorders and Stroke t-PA Stroke Trial
Background The National Institute of Neurological Disorders and Stroke (NINDS) held a workshop on statistical approaches to analysis of acute stroke trials that have multiple prespecified outcomes. An objective was to plan for statistical analysis of the NINDS t-PA Stroke Trial, a randomized, double-blind, placebo-controlled trial of recombinant tissue plasminogen activator (rt-PA) for patients with acute ischemic stroke. Treatment success was defined as a “consistent and persuasive difference” in the proportion of patients achieving favorable outcomes on the Barthel Index, Modified Rankin Scale, Glasgow Outcome Scale, and National Institutes of Health Stroke Scale. The Data and Safety Monitoring Committee for the trial recommended this outcome because the committee did not believe that a positive result for a single outcome would provide sufficient evidence of efficacy.
Summary of Comment Workshop participants accepted the global test as a viable approach to testing the primary trial hypothesis. Clinician participants advocated categorizing outcomes as favorable/unfavorable, outcomes more clinically meaningful than continuous outcomes for evaluating a drug with potentially serious side effects. They agreed that a global test was appropriate for ischemic stroke when no single outcome is accepted. Hypothetical, special-case examples illustrate that highly correlated outcomes diminish the power of the global test. NINDS t-PA Stroke Trial data demonstrate the clinical interpretability of the global test.
Conclusions Workshop participants concluded that a global statistic should be used to test the trial's primary hypothesis accompanied by secondary tests of individual outcomes. Workshop participants recommended familiarizing the clinical/scientific community with the global approach.
On November 5, 1993, the NINDS organized a workshop to discuss the statistical approaches to analysis of acute stroke trials that have multiple prespecified outcomes. At the time of the workshop the NINDS t-PA Stroke Trial was not yet completed, and results were not available. Workshop participants are listed in “Appendix 1.” The workshop (1) provided a forum for discussion among biostatisticians, clinical investigators, and pharmaceutical company and regulatory representatives and (2) led to a consensus on the approach to the statistical analysis of the NINDS t-PA Stroke Trial.1 Background on the NINDS t-PA Stroke Trial, an overview of the chosen statistical methodology, and a summary of the workshop discussion and conclusions are presented below.
The NINDS t-PA Stroke Trial
The NINDS t-PA Stroke Trial was a randomized, double-blind, placebo-controlled trial of rt-PA as a treatment for patients with acute ischemic stroke.1 Patients with stroke were enrolled within 180 minutes of stroke onset, nearly half within 90 minutes or less. The trial was performed in two parts. Part I assessed rt-PA clinical activity at 24 hours after stroke onset. Part II assessed efficacy at 90 days after stroke in a separate group of patients. Trial sites and principal investigators are listed in “Appendix 2.”
For stroke trials, no one measure of disability (eg, Barthel Index,2 Modified Rankin Scale,3 4 Glasgow Outcome Scale,5 NIH Stroke Scale6 7 ) describes all dimensions of recovery for a stroke patient at 90 days. Clinical investigators in the trial believed that a small difference in means in a scale such as the Barthel Index would not be convincing clinical evidence of treatment benefit at 90 days, particularly because pilot studies of thrombolytic therapy for stroke had demonstrated associated risk of intracerebral hemorrhage.8 9 As the 90-day outcome, the clinical investigators recommended dichotomizing the Barthel Index into favorable (scores of 95 to 100) and unfavorable (scores <95) outcomes. The dichotomy also eliminated the necessity of assigning arbitrary scale scores to patients who died before 3 months. Near the end of completion of Part I of the trial, the Data and Safety Monitoring Committee reviewed Part I data and the clinical investigators' recommendations. The committee considered data on the four clinical outcome scales measured in Part I and considered each outcome scale as a dichotomy. The dichotomy was chosen to classify patients as having either (1) minimal/no disability or (2) any other more severe degree of disability. The committee then defined efficacy in Part II as a consistent and persuasive difference in the proportion of patients achieving favorable outcomes as measured by four outcomes: the Barthel Index, Modified Rankin Scale, Glasgow Outcome Scale, and NIH Stroke Scale.
This outcome was specified in the protocol for Part II. The committee recommended the latter outcome because they did not believe that a positive result in a single outcome would be sufficient (given that there is no standard single outcome) and they believed that requiring a statistically significant result on all outcomes would be too stringent. Also, they were concerned that inconsistent results among the four outcomes would be difficult to interpret.
To compare treatment groups, the NINDS t-PA Stroke Trial Coordinating Center proposed the use of a global statistical test. This test allows an overall assessment of treatment efficacy for a combination of correlated outcomes and appeared appropriate to test the hypothesis as proposed by the Data and Safety Monitoring Committee.
Overview of Global Tests
Global tests provide a solution to assessment of treatment efficacy when no single outcome is sufficient. Global tests are useful when the outcome, in this case clinical recovery from stroke, is difficult to measure and a combination of correlated outcomes (each measuring recovery from stroke) would be informative.
Classic approaches to the analysis of multiple outcomes provide unsatisfactory solutions.10 11 One classic approach, Hotelling's T2, a multivariate analogue of Student's t test,12 tests whether treatment groups differ in any way with respect to multiple outcomes. The test is nonspecific in that it does not test for a favorable treatment. If, for example, treatment A were superior to treatment B for outcome 1 but treatment B were superior to treatment A for outcome 2, Hotelling's T2 could reject the null hypothesis that the two treatments had equal efficacy, even though one is not uniformly superior to the other.
Another classic approach is to use the Bonferroni13 correction. This approach may lack power for alternatives on which most measures of efficacy are improved, but no single measure is overwhelmingly improved10 14 and lacks power with highly correlated outcomes, as expected for stroke. With four outcomes, results would be considered statistically significant if at least one of the computed probability values is less than .0125 (ie, 5/4). However, significance for a single outcome would not meet the Data and Safety Monitoring Committee criteria of consistency, and it is not clear what combination of probability values would.
Recently, several approaches have been designed to have high power to determine whether one treatment is better than another with respect to multiple outcomes. These statistical approaches to global testing include a nonparametric global test for continuous data,14 parametric approaches to global testing of continuous data or binary data,10 14 and a more generalized approach to global testing for binary data.15 16 17 Generally, all tests except some of those in the report of Legler et al17 assume a common intervention effect, implying that a treatment has the same effect on all outcome measures. When a common intervention effect is assumed, the power to detect differences between treatment groups decreases if the assumption is not met. For example, when the assumption is not met and some outcomes for the treatment group show an increase in benefit and other outcomes show a decrease, it would be more difficult to demonstrate treatment benefit. The decrease in power in the presence of conflicting results is a desirable characteristic with respect to the assessment of benefit in the NINDS t-PA Stroke Trial, which requires consistent and persuasive evidence among the four measures. In general, to understand the conclusion of a global test, the underlying assumptions must be assessed. In the presence of doubts about the assumptions beforehand, alternate approaches should be considered.
Of particular interest for the NINDS t-PA Stroke Trial are the methods of Lefkopoulou et al,16 who generalized the work of Pocock et al10 specifically for binary tests. With the use of GEE18 to take the correlations among outcomes into account, and assuming a common intervention effect, a Wald type of test statistic19 can be computed. An SAS macro was used for computation. Details about the GEE approach are provided in “Appendix 3.”
Under the assumption of a common intervention effect, global tests of multiple outcomes, including the parametric and nonparametric analytical approaches, have power greater than or equal to the test of one of those outcomes.10 Clinical investigators at the workshop believed that if rt-PA was effective, the drug would benefit the patient as a whole and that the four different measures of outcome would be highly correlated. For these reasons, clinical investigators believed the assumption of a common intervention effect to be valid for the NINDS t-PA Stroke Trial; that is, rt-PA would be expected to show benefits of similar magnitude for all four outcome measures.
The GEE approach has the additional advantage of providing an odds ratio and its 95% confidence interval. The odds ratio is familiar to clinicians and parallels their usual methods of weighing risks. In a two-group comparison, the global odds ratio gives an odds ratio for a favorable outcome based on the four outcome measures overall. For example, a global odds ratio of 2.0, produced by the GEE approach, would imply that the odds of a favorable outcome in the treatment group are two times the odds of a favorable outcome in the control group. An odds ratio of 1.0 would imply that the odds of a favorable outcome were the same in both groups. The global test statistic also realized the requirement of the Data and Safety Monitoring Committee to show a consistent and persuasive difference in the proportion of patients achieving favorable outcomes on the four scales.
Interpreting Tests of Single Outcomes
In addition to the global comparison of treatment groups, clinicians are usually interested in differences in each single outcome alone. In the presence of a statistically significant global test, three approaches to interpreting tests of a single outcome were discussed. Lehmacher et al20 developed a “step-down” procedure. Suppose the overall significance level is .05. Perform the global test on k outcomes. If it is statistically significant, all subsets of k−1 end points are tested, again with a global test with each test performed at the .05 level. If a subset of k−1 end points does not differ (at the .05 level), conclude that the treatments do not differ with respect to each of those k−1 outcomes. If, however, a test on k−1 end points is significant, continue with all subsets of (k−2) end points. Once a set of end points does not differ, no further tests are done on those end points. The procedure is continued until it is determined whether the treatments differ with respect to each outcome. Because the procedure steps down from k to (k−1) to (k−2) ... end points, it is called a step-down procedure. The procedure has the property that even though many hypothesis tests are performed at the .05 significance level, the overall significance level of .05 is preserved in the sense that the probability of rejecting any of the null hypotheses is maintained at .05. The procedure of Lehmacher et al has the property that the global test might indicate overall significance, but the treatments may not differ significantly with respect to any of the individual end points. If this procedure was used, it would be possible to obtain a statistically significant improvement on the four outcomes overall but not on any one of the measures.
Another approach when the global test is significant is to test each outcome at the same α-level (.05) as used for the global test. This approach would avoid all of the intermediate tests that would be performed with the step-down procedure and is less stringent. We have protection against falsely rejecting the global null hypothesis at the stated level of α, but no adjustments are made for the fact that the global test is followed by four individual tests of significance. That is, in the NINDS t-PA Stroke Trial we required that the global test be statistically significant at α=.05 before proceeding to the four individual tests. If the global test had failed at α=.05, then none of the individual tests would have been performed. For simplicity of interpretation, this latter approach was chosen for the NINDS t-PA Stroke Trial.
If a trial is designed to have power for the global test alone, the trial may have limited power to detect differences in individual end points. The sample size of the NINDS t-PA Stroke Trial was based on having adequate power to detect a difference in the proportion of patients achieving improvement in the categorized Barthel Index, Rankin Scale, Glasgow Outcome Scale, or NIH Stroke Scale alone.
Workshop participants were concerned that the high correlation between scales such as the Glasgow Outcome Scale and Modified Rankin Scale would adversely affect the global test. As a special case when all outcomes occur at the same rate, all outcomes have equal pairwise correlations, and there are two treatment groups, the test statistics derived from GEE would reduce to the simpler test statistic proposed by Pocock et al.10 For this special case, the test statistic is calculated as the mean of the z statistics for the separate tests of the individual outcomes. This test statistic, z, was compared with a critical value calculated as\mathit|<||<|\bar|<|z|>||>||>||<|>|>|1.96\sqrt|<||<|[|>|1|<|+|>|(\mathit|<|k|>||<|-|>|1)|<|\rho|>||<|]|>|/\mathit|<|k|>||>|where k is number of outcomes, ρ is the equal correlation between all pairs of outcome measures, and 1.96 is the traditional critical value for a z score to be significant at the .05 level.
This special case was used to illustrate the effect of correlation on the global test statistic. If the four outcome measures used in the NINDS t-PA Stroke Trial were completely uncorrelated, ρ=0.0, then to achieve statistical significance, the global test statistic, z;, must exceed a critical value of1.96\sqrt|<||<|[|>|1|<|+|>|(4|<|-|>|1)(0.0)|<|]|>|/4|<|=|>|0.98|>|For example, if three of the z scores were 0.9 and one was 1.6 (individually not statistically significant), the average would be 1.08 and would be greater than 0.98, and the global z score would be statistically significant.
If we introduce correlation among the outcomes and assume all correlations (ρ) are equal to 0.6, then the global test statistic, z, must be greater than1.96\sqrt|<||<|[|>|1|<|+|>|(4|<|-|>|1)(0.6)|<|]|>|/4|<|=|>|1.64|>|Given the same z scores as before, the global test statistic (1.08) would not be statistically significant. Thus, by incorporating the correlation between outcomes, high correlation is penalized. If the four outcomes were perfectly correlated, (ρ=1.0), the global test statistic, z, must be greater than1.96\sqrt|<||<|[|>|1|<|+|>|(4|<|-|>|1)(1.0)|<|]|>|/4|<|=|>|1.96|>|Thus, perfectly correlated outcomes would be judged against the usual critical value of 1.96. Nothing would be gained by having four outcomes, and the power would equal the power of a single test. This example illustrates that using multiple outcomes can be more powerful than using a single outcome, but the gain decreases as correlations between outcomes increase. It should be noted that in the general case, when the correlations between the outcomes differ, the probability value is not as easily calculated and requires computation, as described in “Appendix 2.”
Data from the Nicardipine Pilot Study21 were analyzed as an example of the application of global testing. Outcomes for the Nicardipine Pilot Study are presented in Table 1⇓. Although treatment and placebo groups were not established, two groups were defined by time from stroke onset. All patients entering in the study were analyzed. Patients with missing data were considered to have unfavorable outcomes. Because of the small sample size, there was low power to detect differences, even with the global test. Thus, although the data are useful for illustration, no conclusions about the Nicardipine Pilot Study can be drawn.
Table 1⇑ shows the agreement between pairs of outcomes and the φ-coefficients22 representing the correlations between binary variables. The lowest agreement and correlation were between the Barthel Index and NIH Stroke Scale. Table 2⇓ shows the odds ratios for individual tests of each outcome, computed with logistic regression, and for the global test, computed with GEE. An odds ratio of 1.0 implies no difference between groups. The odds ratios for the Barthel Index and Placement assessment (see Table 1⇑ for definition) suggested that treating patients in less than 6 hours from stroke onset could be less beneficial than treating at 6 hours or later. The odds ratio was 0.48 for the Barthel Index and 0.83 for Placement assessment; neither odds ratio was statistically significant. The Performance assessment (see Table 1⇑ for definition) and the NIH Stroke Scale results suggested that treatment within 6 hours of stroke onset could be beneficial. The odds ratio was 1.39 for Performance assessment and 2.56 for the NIH Stroke Scale; again the odds ratios were not statistically significant. The global odds ratio of 0.96 confirmed that a consistent difference between those treated in less than 6 hours and those treated at 6 hours or later could not be detected. The odds ratio 0.96 was very close to 1.0, and the probability value was .95. When we removed the Barthel Index as an outcome, the global odds ratio increased to 1.49, a value greater than 1.0, but again the result, as for the univariate analysis, was not statistically significant (P=.54). Because the assumption of a common intervention effect was violated (two of the scales favored the <6-hour group and two favored the ≥6-hour group), there is a decrease in power of the global test. In this example, the global test is not more powerful than each of the individual tests.
Data from Part II of the NINDS t-PA Stroke Trial1 illustrate the application of global testing. Outcome scales (or measures) for the trial are listed in Table 3⇓. Any patients with missing data were considered to have unfavorable outcomes for those outcomes on which data were missing.
Table 3⇑ shows the agreement between pairs of outcomes and the φ-coefficients22 representing the correlations between binary variables. The lowest agreement and correlation were between the Barthel Index and NIH Stroke Scale. Table 4⇓ shows the odds ratios for individual tests of each outcome, computed with the Mantel-Haenszel approach, and the global test, computed with GEE. We concluded on the basis of the global test that treatment with rt-PA was beneficial. In addition, treatment significantly improved outcome on each of the four outcome measures. In the NINDS t-PA Stroke Trial, the results were unusually strong. The global test may have even more use in interpretation for trials with less resounding results.
A number of other ways to combine the four outcomes might have been considered. In Table 5⇓, we defined a “favorable outcome” in several ways, such as having a positive result on at least one of the four outcome measures. If any of these approaches had been chosen as the single primary outcome measure, we would have concluded a beneficial effect of treatment with t-PA. The NINDS t-PA Stroke Trial investigators believed that requiring statistically significant results on all four stroke scales would be too stringent. The choice of one of the less stringent end points (positive results on at least two or three of the scales) could have been considered arbitrary. In any case, the analyses in Table 5⇓ confirm the results of the global test.
Conclusions of the Workshop
Participants agreed that some implications of global testing needed further consideration. For example, consideration of the wording of drug labels would be needed if statistical significance was achieved for the global test but not for the individual outcomes. Another concern was the need to gain acceptance among the clinical/scientific community for this approach. Global testing has been rarely used in clinical trials.
Participants supported secondary testing of individual outcomes with an overall .05 level as a guideline to assess the significance of each single outcome measure because of simplicity in interpretation and because the approach was not as conservative as the Bonferroni approach. Participants agreed that presentation of the test data for each single outcome measure was essential to assist clinicians in the interpretation of a global test.
Workshop participants concluded that for the NINDS t-PA Stroke Trial, a “consistent and persuasive” difference should be a statistically significant result at the α=.05 level from a global test for binary outcomes accompanied by secondary tests of individual outcomes, with .05 as a guide to interpretation of the global test results. Workshop participants recommended that clinicians and the statistical community be familiarized with the global approach to testing through presentations at scientific meetings and through publication of the workshop recommendations.
After the 1993 workshop, the global approach planned for the NINDS t-PA Stroke Trial was presented at the XVIIth International Biometric Conference, Hamilton, Ontario, Canada, August 8-12, 1994; the International Business Communications Ischemic Stroke Conference, December 2, 199423 ; and the Society for Clinical Trials, Seattle, Wash, April 30-May 3, 1995. A presentation to the Food and Drug Administration occurred January 25, 1994.
T. Brott, MD, University of Cincinnati; R. Dachman, MD, Food and Drug Administration; A. Dodge, Genentech Inc; J.H. Ellenberg, PhD, NINDS; J. Feeney, MD, Drug Evaluation and Research, Food and Drug Administration; J. Froehlich, MD, Genentech, Inc; N.L. Geller, PhD, Office of Biostatistics Research, National Heart, Lung, and Blood Institute; J.C. Grotta, MD, University of Texas at Houston Medical School; G. Gupta, MD, Food and Drug Administration; S.M. Hedges, PhD, Kunitz and Associates, Inc; K. Higgins, Neuropharmacological Drug Evaluation and Research, Food and Drug Administration; R.G. Katz, MD, Neuropharmacological Drug Evaluation and Research, Food and Drug Administration; A. Kingman, PhD, National Institute of Dental Research; J. Legler, ScD, Department of Child and Family Research, National Institute of Child Health and Human Development; M. Lu, PhD, Division of Biostatistics and Research Epidemiology, Henry Ford Health Sciences Center; P.D. Lyden, MD, Stroke Center, University of California at San Diego; J. Marler, MD, Division of Stroke and Trauma, NINDS; L.M. Ryan, PhD, Dana-Farber Cancer Institute; J.P. Siegel, MD, FACP, Food and Drug Administration; B.C. Tilley, PhD, Division of Biostatistics and Research, Henry Ford Health Sciences Center; M.D. Walker, MD, NINDS; M. Walton, MD, Food and Drug Administration; F. Wang-Clow, ScD, Genentech Inc; K. Weiss, MD, Food and Drug Administration.
NINDS t-PA Stroke Trial Principal Investigators
Clinical Centers Listed in Order of Patient Recruitment
T. Brott, MD, Principal Investigator, University of Cincinnati; P. Lyden, MD, Principal Investigator, University of California at San Diego; J.C. Grotta, MD, Principal Investigator, University of Texas at Houston Medical School; T.G. Kwiatkowski, MD, and S.H. Horowitz, MD, Principal Investigators, Long Island Jewish Medical Center; S.R. Levine, MD, Principal Investigator, Henry Ford Hospital and Health Sciences Center; M. Frankel, MD, and B.C. Mackay, MD, Principal Investigators, Emory University School of Medicine; E.C. Haley, MD, Principal Investigator, University of Virginia Medical Center; M. Meyer, MD, and K. Gaines, MD, Principal Investigators, University of Tennessee Medical Center.
B.C. Tilley, PhD, Division Head, Biostatistics and Research Epidemiology, Henry Ford Health Sciences Center.
Data and Safety Monitoring Committee
J.D. Easton, MD, Brown University, Rhode Island Hospital; J.M. Hallenbeck, MD, NINDS; G. Lan, PhD, George Washington University; J.D. Marsh, MD, Wayne State University; M.D. Walker, MD, NINDS.
J.R. Marler, MD, NINDS.
Global Tests for Binary Outcomes
Global tests based on multivariate models provide a useful approach to the analysis of multiple correlated outcomes.10 14 15 16 17 Consider an experiment involving two treatment groups in which observations on k=4 binary variables are recorded for each subject. The statistical tests described here have been extended to more than two groups, to an arbitrary number of binary outcomes, and to outcomes that are clustered (eg, observations on families of different sizes), but we consider the simplest case here.
Let Yijk represent the kth response; k=1, 2, 3, 4 in the ith group; i=0, 1 for the jth subject; j=1, 2, ..., ni. For the stroke trial we examine the four stroke outcome measures for each patient in the two groups, so that i=0 for the placebo group and i=1 for the t-PA group.
The observation vectors for each subject are independent, each with a mean vector μi and variance of Yijk=φ μijk (1−μijk), where φ allows for overdispersion. We also allow for the outcomes within a subject to be correlated.
For multiple binary outcomes, we assume a linear logistic model for the probabilities of a favorable outcome. Thus, the model for the mean E(Yijk)=μik islogit |<|\mu|>|\mathit|<|_|<|ik|>||>||<|=|>||<|\alpha|>|\mathit|<|_|<|k|>||>||<|+|>||<|\beta|>|\mathit|<|x_|<|i|>||>|where αk allows for a different baseline favorable outcome occurrence rate for each of the four stroke scales and β is the common intervention effect coefficient for each of the four stroke outcome measures. Although it is not necessary to assume that there is one β for all of the measures, this is an efficient approach in practice and has been found to be valid in a wide range of cases.17
As in Lefkopoulou and Ryan,15 we use the statistical method of GEE to obtain a Wald-type test that simultaneously tests the null hypotheses that the four outcome measures are equal in the two treatment groups. The methodology also allows us to estimate β, the odds ratio of a favorable outcome (based on the four measures) when treated with rt-PA relative to placebo, and to obtain 95% (or other) confidence limits for β.
The SAS macro (GEE version 2.02) may be obtained from U. Grömping, Fachbereich Statistik, Universität Dortmund, 44221 Dortmund, Germany.
Selected Abbreviations and Acronyms
|GEE||=||generalized estimating equations|
|NIH||=||National Institutes of Health|
|NINDS||=||National Institute of Neurological Disorders and Stroke|
|(r)t-PA||=||(recombinant) tissue plasminogen activator|
This study was supported by the NINDS (NO1-NS-02382, NO1-NS-02374, NO1-NS-02377, NO1-NS-02381, NO1-NS-02379, NO1-NS-02373, NO1-NS-02378, NO1-NS-02376, and NO1-NS-02380). The authors wish to thank Lucy Debczak for her assistance in preparing the manuscript.
Reprint requests to Barbara C. Tilley, PhD, Henry Ford Health Sciences Center, Division of Biostatistics and Research Epidemiology, 1 Ford Place, 3E, Detroit, MI 48202. E-mail firstname.lastname@example.org.
A complete list of the participants in this research study appears at the end of this article.
- Received February 13, 1996.
- Revision received July 16, 1996.
- Accepted July 16, 1996.
- Copyright © 1996 by American Heart Association
Mahoney FI, Barthel DW. Functional evaluation: the Barthel Index. Md State Med J. February 1965:61-65.
Van Swieten JC, Koudstaal PJ, Visser MC, Schouten HJA, van Gijn J. Interobserver agreement for the assessment of handicap in stroke patients. Stroke. 1988;19:604-607.
Brott T, Adams HP, Olinger CP. Measurements of acute cerebral infarction: a clinical examination scale. Stroke. 1989;20:864-870.
Adams RJ, Meador KS, Sethi KD, Grotta JC, Thomson DS. Graded neurologic scale for use in acute hemispheric stroke treatment protocols. Stroke. 1987;18:665-669.
Brott T, Haley EC, Levy DE, Barsan W, Broderick J, Sheppard G, Spilker J, Kongable G, Reed R, Marler J. Urgent therapy for stroke, part I: pilot study of tissue plasminogen activator administered within 90 minutes. Stroke. 1992;23:632-640.
Haley EC, Levy DE, Brott TG. Urgent therapy for stroke, part II: pilot study of tissue plasminogen activator administered 91-180 minutes from onset. Stroke. 1992;23:641-645.
Miller RG. Simultaneous Statistical Inference. 2nd ed. New York, NY: Springer-Verlag; 1981.
Rao CR. Linear Statistical Inference and Its Applications. 2nd ed. New York, NY: John Wiley & Sons, Inc; 1973.
Rosenbaum D, Zabramski J, Frey J, Yatso F, Marler J, Septzler R, Grotta J. Early treatment of ischemic stroke with a calcium antagonist. Stroke. 1991;22:437-441.
Cohen J, Cohen P. Bivariate Correlation and Regression: Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc; 1983:39.
Tilley B, Divine G. Analytical approaches in stroke clinical trials. In: Grotta J, Miller L, Buchan A, eds. Ischemic Stroke: Recent Advances in Understanding and Therapy. Southborough, Mass; International Business Communications Inc; 1995:144-159.