(Stroke. 2002;33:466.)
© 2002 American Heart Association, Inc.
Original Contributions |
From the Departments of Neurology (K.C.J., E.C.H.) and Health Evaluation Sciences (K.C.J., D.P.W., A.F.C.), University of Virginia Health System, Charlottesville, Va.
Correspondence to Karen C. Johnston, MD, University of Virginia Health System, Department of Neurology, #800394, Charlottesville, VA 22908. E-mail kj4v{at}virginia.edu
| Abstract |
|---|
|
|
|---|
Methods Clinical information (National Institutes of Health Stroke Scale) and imaging information (CT infarct volume), measured at 1 week from 201 patients from the Randomized Trial of Tirilazad Mesylate in Acute Stroke (RANTTAS) study, were used in a multivariable logistic regression analysis to predict excellent and devastating 3-month outcome. The combined models were compared with the infarct volume models and the clinical models. Discrimination, calibration, and change in global model chi-square were assessed.
Results The combined models and models using clinical information alone had areas under the receiver operating characteristic curves that did not differ significantly (probability value = 0.092 to 0.4), ranging from 0.83 to 0.95. The imaging alone models performed less well (P<0.005) and had areas under the receiver operating characteristic curves that ranged from 0.70 to 0.80.
Conclusions The National Institutes of Health Stroke Scale at 1 week is highly predictive of 3-month outcome in ischemic stroke patients. The addition of 1-week infarct volume does not improve the accuracy of the predictive model.
Key Words: cerebral ischemia models, statistical prognosis stroke outcome
| Introduction |
|---|
|
|
|---|
Cranial CT infarct volume in stroke patients is one of the imaging measures that has been considered as a potential surrogate outcome in stroke clinical trials. Recently its value as an early measure has been questioned as a result of the weakness of the association between infarct volume and standard clinical outcome measures.6 Clinical information, including measures of neurological function and stroke severity, in combination with CT infarct volume has not been evaluated as a potential early outcome measure.
The purpose of this analysis was to determine whether the combination of clinical information and infarct volume information measured at 1 week after acute stroke will predict 3-month clinical outcome better than either clinical or imaging information alone in the participants of an acute stroke clinical trial.
| Subjects and Methods |
|---|
|
|
|---|
There were also numerous early measures captured in the RANTTAS trial at 7 to 10 days (±1 day) from stroke onset. These included the NIHSS and head CT infarct volume. A prospective random sample of half of all fully eligible patients was selected at study entry to submit 7- to 10-day head CT scans for volumetric measurement of infarct size. Noncontrast CTs were done using a standardized protocol that specified 5-mm slices. Infarct volume was calculated centrally using planimetric techniques by an investigator blinded to the treatment group and the clinical characteristics of the patient.11 A total of 556 fully eligible patients were enrolled in the trial, and 256 fully eligible patients with CT and clinical measures from the RANTTAS trial were included in this study. Because treatment with tirilazad mesylate did not have an effect on outcome,7 the 2 treatment groups were combined for this analysis.
Model Variables
Independent variables for the predictive models were prespecified and included the 7- to 10-day NIHSS score as the clinical predictor and the 7- to 10-day head CT infarct volume as the imaging predictor.
The dependent variables for the model were also prespecified. These included the 3 commonly used stroke clinical trial outcome measures: NIHSS, BI, and GOS collected at 3 months. Each of these measures are well-established and reliable measures of various aspects of clinical outcome.810 Each of the 3 outcome scales was prospectively dichotomized based on clinically relevant and literature-supported thresholds.7,1214 Each outcome measure was dichotomized twice. The first dichotomization identified excellent outcome versus everything else, where excellent outcome reflected full or nearly full recovery. The second dichotomization identified devastating outcome versus everything else, where devastating outcome reflected nursing home level disability or death. The definitions of the dichotomizations are shown in Table 1.
|
Missing Variables
Subjects were excluded from the analysis if they were missing the outcome variable required by the model (NIHSS, n=35; BI, n=27; GOS, n=27). In addition, if either the 7- to 10-day NIHSS score (n=17) or CT infarct volume (n=3) was missing, the subject was eliminated from the analysis. One-week infarct volume measures that occurred outside of the window of 7 to 10 days (±1 day) were used as infarct volumes for 21 subjects who were imaged slightly outside of this window. These included 10 patients who were imaged before the window (earliest was day 3) and 8 patients imaged after the window, with all but 1 being within 30 days. The remaining patients had incomplete data due to unknown exact timing of the imaging. This resulted in a total of 201 subjects for all the NIHSS analysis and 206 for all the BI and GOS analyses reported below.
Analysis
We used multivariable logistic regression analysis to estimate predictive models. The variable selection and the analysis were designed to avoid over fitting the models12,15 including limiting the number of prespecified predictor variables. We estimated 6 distinct models using this data set. We used restricted cubic spline15,16 with 3 knots (10th, 50th, 90th percentiles) for the independent variables of NIHSS and CT infarct volume to allow nonlinearity in the models. A combined model (including the clinical and imaging predictors) was developed for each of the 3 outcome measures (NIHSS, BI, GOS) for each of the 2 levels of outcome (excellent outcome, devastating outcome). For each of the combined models, comparison models including clinical variable only models and imaging variable only models were also evaluated. Table 1 demonstrates the different models. The change in model explanatory power was evaluated by testing the change in global model chi-square.
Secondary Analyses
A number of secondary analyses were conducted to assess whether the primary results were confounded by other variables or treatment bias. To evaluate the role of age as a potential confounder, we compared the 2-variable combined models with a 3-variable combined model including age as an additional clinical predictor. Sensitivity analyses were also conducted to evaluate both the effect of excluding those with zero infarct volume and excluding those who received tirilazad mesylate in the original trial.
Model performance among the combined models (clinical and imaging predictor), clinical models (clinical predictor alone), and imaging models (imaging predictor alone) were assessed using the area under the receiver operating characteristic curve (ROC curve) as our measure of discrimination. The ROC curve is a plot of the sensitivity versus 1 - specificity or the true positive rate versus the false-positive rate. The area under the ROC curve reflects the models ability to discriminate between those with excellent outcome and all other outcomes and those with devastating outcome compared with all other outcomes. An area under the ROC curve of 1 is perfect discrimination, and an area of 0.5 reflects discrimination that is no better than random chance.
Calibration curves were used to assess model calibration and are a plot of predicted probability of outcome versus actual outcome. In a calibration graph, the 45-degree line (ideal line or line of identity) represents perfect calibration, where each predicted probability of an outcome exactly matches the actual probability of an outcome. The closer the model calibration curve is to the ideal line, the better the calibration.
Each model was internally validated using bootstrap validation techniques.17 This method of internal validation assesses how accurately the models will predict outcome in a new similar sample of stroke patients. Resampling occurred 100 times for each bootstrap validation. All discrimination and calibration data presented are bootstrap corrected (bias corrected). All modeling analyses were done using Splus 4.5 software (MathSoft Inc.).
Simple Spearman correlations were also calculated between each of the predictor variables and each of the outcomes, as well as between the 2 predictor variables. The partial predictive power of each independent variable was assessed using plots of each of the independent variables versus the predicted probability of outcome using the combined models.
| Results |
|---|
|
|
|---|
|
The values measured for early outcome predictor variables, the 3-month outcome variables, and the dichotomized outcome frequencies are listed in Table 3. The bootstrap corrected (bias corrected) area under the ROC curve for the combined models, the clinical only models, and the imaging only models are shown graphically in Figure 1. The combined models for each of the excellent and devastating outcomes performed almost exactly as did the models based on clinical information alone (P value for difference ranged from 0.092 to 0.4). Area under the ROC curves for the combined models ranged from 0.83 to 0.94 and for models using clinical information alone from 0.84 to 0.95. The imaging alone models did not perform as well as the other models (P value for differences <0.005) and had areas under the ROC curves that ranged from 0.70 to 0.80. Model discrimination did not differ for the secondary analyses including the addition of age or the exclusion of zero infarct volume (data not shown). For the analysis excluding those who received tirilazad mesylate, the imaging alone models did not perform as well as the other models for 5 of the 6 models; in the sixth model, the imaging alone models were indistinguishable from the other models (data not shown).
|
|
Calibration curves for 2 of the combined models are demonstrated in Figure 2 (top and bottom). These represent the best calibration (top) and the worst calibration (bottom) among the 6 models. The hatched line reflects perfect calibration, and the solid line is the bootstrap corrected (bias corrected) calibration curve for the given model. The excellent outcome as determined by the BI combined model has a calibration curve that is nearly superimposed on the ideal line, suggesting excellent calibration. The calibration curve for a devastating outcome as determined by the NIHSS combined model deviates more from the ideal but still represents very good calibration. The calibration curves for the other 4 combined models are not shown but resemble the calibration curve in Figure 2 (top).
|
The role of each of the predictor variables (clinical and imaging) in the combined model was also examined as demonstrated in Figure 3 (top and bottom). Figure 3 (top) demonstrates the role of 7- to 10-day NIHSS score in predicting the probability of outcome as determined by the combined model. Figure 3 (bottom) demonstrates the role of the 7- to 10-day infarct volume by head CT in predicting the probabilities of outcome by the combined model.
|
Spearman correlations between the predictor variables and the outcome variables were calculated, again to determine the strength of these relationships. Table 4 shows the Spearman correlations that are consistently higher for the NIHSS score correlation with outcome than they are for the infarct volume correlation with outcome. The correlation between the 2 predictor variables was 0.64.
|
| Discussion |
|---|
|
|
|---|
Previous data have shown that infarct volume is related to stroke outcome,6,11 which is consistent with our data. It is uncertain why infarct volume adds relatively little to the prediction of 3-month outcome in this stroke population. Lesion location may confound the relationship between infarct size and clinical outcome, because 2 different infarcts of the same size in different locations could have very different functional expression. This may be magnified when the clinical measure is able to capture this functional difference, but the imaging variable is not. The large number of subjects with no infarct volume detected in this data set (71) also suggests that a potential lack of CT scan sensitivity may limit the use of this imaging technique as an early outcome measure. One-week head CT may be missing small infarcts because of lack of sensitivity, posterior fossa infarcts due to artifact, or other infarcts due to fogging effect.6 More sensitive imaging techniques, such as MRI, have been shown to have a stronger relationship with clinical outcome.3 Because both infarct volume and NIHSS measure some degree of stroke severity, it may also be that the NIHSS captures more complete information as it relates to 3-month outcome than does CT, although they clearly do not measure the exact same thing. Other potentially confounding variables such as previous brain injury or disability, medical complications, and differences in therapy may also play a role.
Patient age did not seem to be a confounding variable, in that the addition of age in the secondary analysis added little to the models predictive ability. Previous data have repeatedly suggested that age is an independent predictor of 3-month outcome.12,14,1820 The inability of age to add to the predictive ability of the model may reflect the fact that the influence of age on outcome is already captured by the 1-week NIHSS score. For example, the age effect may relate to more medical complications that have surfaced by 1 week but are then captured by the NIHSS. Medical complications have been demonstrated to be related to death but not so clearly related to 3-month disability.21 Other potential confounders such as prestroke disability, history of diabetes, previous stroke history, and stroke subtype were not analyzed. These have all been shown to have relationships to outcome and could be confounding our results.12
There are several limitations of this study. First, the models were only tested in the data sets in which they were derived. Although the bootstrap internal validation strongly suggests that the models were not over fitted to the data set and are likely to perform as well in a similar population, these models have not been externally validated with independent data. Our strict regression modeling technique, following published guidelines,15 including a limited number of prespecified predictor variables, allowing nonlinearity of the predictor variables, and internal validation to get a bias-corrected estimate of the models performance, all increase the likelihood that these models will perform equally well in a similar independent data set. The rule of 10 requires that there should be at least 10 least frequent outcomes for each degree of freedom used in the model. The use of 4 degrees of freedom in the devastating outcome as measured by NIHSS model, which only had 28 least frequent outcomes (as shown in Table 3), was the only model that violated the rule of 10, because we had less than 10 outcomes for each degree of freedom. This model did not validate as well internally and is less likely than the other 5 models to perform as well in another data set. The overall modest size of the data set also limits our ability to identify relationships. In a larger data set, other variables could be added to the model to potentially improve the prediction, but in this data set, we were limited to relatively few variables.
Another limitation of this analysis is the loss of approximately 50 subjects because of missing data. Although the age and baseline stroke severity on the missing population was the same as the population we used (data not shown), there is always the possibility that this has biased our sample. The fact that our baseline data and infarct volume correlations resemble those found in other published analyses also argues against a significant bias.11
A third potential limitation is the large number of zero infarct volume subjects (71), which could have resulted in a biased result. However, when we re-estimated clinical, imaging, and combined models on the 135 subjects with a measurable infarct volume, we obtained the same results. ROC areas from the clinical models were almost identical to those from the combined models, and ROC areas from the imaging models were much lower. The large number of participants without any measurable infarct volume, however, does raise the question of the stroke severity of this clinical trial population. These results may not be generalizable to other, more severely injured stroke populations. The use of more sensitive imaging measures, such as MRI, may result in fewer subjects with no measurable infarct volume.
Although the original RANTTAS trial demonstrated that tirilazad mesylate was not effective in changing the outcomes of these stroke patients, we conducted a second sensitivity analysis to assess whether our primary results were biased by an unsuspected effect of the experimental drug. The discrimination for the clinical models and combined models were very similar, and the imaging models performed less well in 5 of the 6 models and were the same in the sixth. These data suggest against a bias due to drug effect.
A valid early outcome measure that is easily and reliably obtained, inexpensive, and highly predictive of 3-month outcome in stroke clinical trials could be very valuable. Such a measure could potentially reduce the follow-up time required for patients, the cost of trials, and the number of patients lost to follow-up in trials. At a minimum, a highly predictive regression model could be used for those lost to follow-up at 3-months to better predict 3-month outcome to allow them to be included in the data analysis. In general, when patients in a clinical trial are lost to follow-up, there are 3 analysis options: (1) those subjects can be dropped from the analysis, (2) the last observation can be carried forward, or (3) a prediction model can be used to estimate the likely outcome.
If there is an independently established and valid prediction model with good explanatory power, the prediction option should provide the best estimate of the treatment effect. This is because it provides the least biased estimate of the missing data. Using predictions, one could conduct the analysis with all randomized cases and maintain intention to treat unaffected by potentially biased losses to follow-up and with more accuracy than carrying forward the last observation. Because an area under the ROC curve of >0.8 is generally accepted as a strong enough relationship to make individual predictions,12 these data are encouraging that individual predictions with such a model may be acceptably accurate. If valid, these models could be used to impute 3-month outcome for participants lost to follow-up in stroke clinical trials.
Although our analysis used 1-week CT infarct volume, which added little to the prediction, further study of other imaging information, such as that obtained by MRI imaging, may improve the prediction of long-term outcomes after acute ischemic stroke.
Infarct volume measured at 1 week adds little to NIHSS score measured at 1 week as an early outcome measure for predicting 3-month excellent and devastating stroke outcome as measured by NIHSS, BI, and GOS. If these results are proven valid, it would be difficult to justify the expense and patient inconvenience of CT imaging 1 week after acute stroke if the purpose of the scan is to obtain an early indicator of 3-month clinical outcome.
| Acknowledgments |
|---|
| Footnotes |
|---|
This work was presented, in part, at the 2nd Neurology Outcomes Research Conference of the American Neurological Association, Boston, Mass, October 15, 2000.
Received September 7, 2001; revision received October 30, 2001; accepted November 14, 2001.
| References |
|---|
|
|
|---|
2. Warach S, Boska M, Welch KMA. Pitfalls and potential of clinical diffusion-weighted MR imaging in acute stroke. Stroke. 1997; 28: 481482.
3. Lovblad KO, Baird AE, Schlaug G, Benfield A, Siewert B, Voetsch B, Connor A, Burzynski C, Edelman RR, Warach S. Ischemic lesion volumes in acute stroke by diffusion-weighted magnetic resonance imaging correlate with clinical outcome. Ann Neurol. 1997; 42: 164170.[CrossRef][Medline] [Order article via Infotrieve]
4. Warach S, Moseley M, Johnston K, Adams H, Zivin J. Diffusion weighted imaging: ready for prime time? Plenary Session Presented at the 23rd International Joint Conference on Stroke and Cerebral Circulation, February 5, 1998, Orlando, Fla.
5.
Thijs BN, Lansberg MG, Beaulieu C, Marks MP, Moseley ME, Albers GW. Is early ischemic lesion volume on diffusion-weighted imaging an independent predictor of stroke outcome? A multivariable analysis. Stroke. 2000; 31: 25972602.
6.
Saver JL, Johnston KC, Homer D, Wityk R, Koroshetz W, Truskowski LL, Haley EC. Infarct volume as a surrogate or auxiliary outcome measure in ischemic stroke clinical trials. Stroke. 1999; 30: 293298.
7.
The RANTTAS Investigators. A randomized trial of tirilazad mesylate in patients with acute stroke (RANTTAS). Stroke. 1996; 27: 14531458.
8. Mahoney FT, Barthel DW. Functional evaluation: Barthel Index. Md State Med J. 1965; 14: 6165.[Medline] [Order article via Infotrieve]
9. Jennett B, Bond M. Assessment of outcome after severe brain damage. Lancet. 1975; 1: 480484.[Medline] [Order article via Infotrieve]
10. Lyden P, Brott T, Tilley B, Welch KM, Mascha EJ, Levine S, Haley EC, Grotta J, Marler J. Improved reliability of the NIH Stroke Scale using video training. NINDS TPA stroke study group. Stroke. 1994; 25: 22202226.[Abstract]
11.
Brott T, Marler JR, Olinger CP, Adams HP Jr, Tomsick T, Barsan WG, Biller J, Eberle R, Hertzberg V, Walker M. Measurements of acute cerebral infarction: lesion size by computed tomography. Stroke. 1989; 20: 871875.
12.
Johnston KC, Connors AF, Wagner DP, Knaus WA, Wang X, Haley EC Jr. A predictive risk model for outcomes of ischemic stroke. Stroke. 2000; 31: 448455.
13.
The National Institute of Neurological Disorders, and Stroke rt-PA Stroke Study Group. Tissue plasminogen activator for acute ischemic stroke. N Engl J Med. 1995; 333: 15811587.
14.
The NINDS t-PA Stroke Study Group. Generalized efficacy of t-PA for acute stroke. Subgroup analysis of the NINDS t-PA stroke trial. Stroke. 1997; 28: 21192125.
15. Harrell FE, Lee KL, Mark DB. Tutorial in biostatistics: multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996; 15: 367387.
16. Durrleman S, Simon R. Flexible regression models with cubic splines. Stat Med. 1989; 8: 551561.[Medline] [Order article via Infotrieve]
17. Efron B, Gong G. A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Stat. 1983; 37: 3648.
18. Lefkovits J, Davis SM, Rossiter SC, Kilpatrick CJ, Hopper JL, Green R, Tress BM. Acute stroke outcome: effects of stroke type and risk factors. Aust N Z J Med. 1992; 22: 3035.[Medline] [Order article via Infotrieve]
19.
Censori B, Camerlingo M, Casto L, Ferraro B, Gazzaniga GC, Cesana B, Mamoli A. Prognostic factors in first-ever stroke in the carotid artery territory seen within 6 hours after onset. Stroke. 1993; 24: 532535.
20.
Fiorelli M, Alpérovitch A, Argentino C, Satchetti ML, Toni D, Sette G, Cavalletti C, Gori MC, Fieschi C. Prediction of long-term outcome in the early hours following acute ischemic stroke. Arch Neurol. 1995; 52: 250255.
21.
Johnston KC, Jiang YL, Lyden PD, Hanson SK, Feasby TE, Adams RJ, Faught RE Jr, Haley EC Jr. Medical and neurological complications of ischemic stroke: experience from the RANTTAS trial. Stroke. 1998; 29: 447453.
This article has been cited by other articles:
![]() |
K. C. Johnston, K. M. Barrett, Y. H. Ding, D. P. Wagner, and for the ASAP Investigators Clinical and Imaging Data at 5 Days as a Surrogate for 90-Day Outcome in Ischemic Stroke Stroke, April 1, 2009; 40(4): 1332 - 1333. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. R. Konig, A. Ziegler, E. Bluhmki, W. Hacke, P. M.W. Bath, R. L. Sacco, H. C. Diener, C. Weimar, and on behalf of the Virtual International Stroke Tria Predicting Long-Term Outcome After Acute Ischemic Stroke: A Simple Index Works in Patients From Controlled Clinical Trials Stroke, June 1, 2008; 39(6): 1821 - 1826. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. C. Johnston, D. P. Wagner, X.-Q. Wang, G. C. Newman, V. Thijs, S. Sen, S. Warach, and for the GAIN, Citicoline, and ASAP Investigators Validation of an Acute Ischemic Stroke Model: Does Diffusion-Weighted Imaging Lesion Volume Offer a Clinically Significant Improvement in Prediction of Outcome? * Definitions and Explanations Stroke, June 1, 2007; 38(6): 1820 - 1825. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Boyd, C. O. Weiss, J. Halter, K. C. Han, W. B. Ershler, and L. P. Fried Framework for Evaluating Disease Severity Measures in Older Adults With Comorbidity J. Gerontol. A Biol. Sci. Med. Sci., March 1, 2007; 62(3): 286 - 295. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. H. Dobkin Behavioral, Temporal, and Spatial Targets for Cellular Transplants as Adjuncts to Rehabilitation for Stroke Stroke, February 1, 2007; 38(2): 832 - 839. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Schiemanck, G. Kwakkel, M. W. M. Post, and A. J. H. Prevo Predictive Value of Ischemic Lesion Volume Assessed With Magnetic Resonance Imaging for Neurological Deficits and Functional Outcome Poststroke: A Critical Review of the Literature Neurorehabil Neural Repair, December 1, 2006; 20(4): 492 - 502. [Abstract] [PDF] |
||||
![]() |
S. K. Schiemanck, G. Kwakkel, M. W.M. Post, L. J. Kappelle, and A. J.H. Prevo Predicting Long-Term Independency in Activities of Daily Living After Middle Cerebral Artery Stroke: Does Information From MRI Have Added Predictive Value Compared With Clinical Information? Stroke, April 1, 2006; 37(4): 1050 - 1054. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Schiemanck, M. W.M. Post, Th. D. Witkamp, L. J. Kappelle, and A. J.H. Prevo Relationship between Ischemic Lesion Volume and Functional Status in the 2nd Week after Middle Cerebral Artery Stroke Neurorehabil Neural Repair, June 1, 2005; 19(2): 133 - 138. [Abstract] [PDF] |
||||
![]() |
D. Georgiadis, J. Oehler, S. Schwarz, V. Rousson, M. Hartmann, and S. Schwab Does acute occlusion of the carotid T invariably have a poor outcome? Neurology, July 13, 2004; 63(1): 22 - 26. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2002 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |