Predicting Outcome After Acute and Subacute Stroke
Development and Validation of New Prognostic Models
Background and Purpose— Statistical models to predict the outcome of patients with acute and subacute stroke could have several uses, but no adequate models exist. We therefore developed and validated new models.
Methods— Regression models to predict survival to 30 days after stroke and survival in a nondisabled state at 6 months were produced with the use of established guidelines on 530 patients from a stroke incidence study. Three models were produced for each outcome with progressively more detailed sets of predictor variables collected within 30 days of stroke onset. The models were externally validated and compared on 2 independent cohorts of stroke patients (538 and 1330 patients) by calculating the area under receiver operating characteristic curves (AUC) and by plotting calibration graphs.
Results— Models that included only 6 simple variables (age, living alone, independence in activities of daily living before the stroke, the verbal component of the Glasgow Coma Scale, arm power, ability to walk) generally performed as well as more complex models in both validation cohorts (AUC 0.84 to 0.88). They had good calibration but were overoptimistic in patients with the highest predicted probabilities of being independent. There were no differences in AUCs between patients seen within 48 hours of stroke onset and those seen later; between ischemic and hemorrhagic strokes; and between those with and without a previous stroke.
Conclusions— The simple models performed well enough to be used for epidemiological purposes such as stratification in trials or correction for case mix. However, clinicians should be cautious about using these models, especially in hyperacute stroke, to influence individual patient management until they have been further evaluated. Further research is required to test whether additional information from brain imaging improves predictive accuracy.
Accurate prognostic models for patients with acute and subacute stroke would have several important uses. They could guide patient management1,2⇓ (eg, patients with a good prognosis could be spared potentially risky treatments such as thrombolysis); allow more reliable information to be given to patients and their relatives2; improve planning of rehabilitation and discharge for patients3; correct for case mix when different cohorts of patients are compared (eg, in hospital “league” tables)2,4⇓; and be used to analyze the results of randomized trials or meta-analyses according to baseline severity.5,6⇓ Unfortunately, none of the existing models have been adequately generated, and very few have been properly validated.7 The accuracy of a model’s predictions must be externally validated in at least one (preferably several) independent cohort of patients (test cohort) that was not used to generate the model.1,2,8⇓⇓ We developed new prognostic models according to established guidelines1–3,8,9⇓⇓⇓⇓ and then validated them in 2 independent cohorts of stroke patients.
Description of Training Data Set
The models were generated from patients in the Oxfordshire Community Stroke Project (OCSP), a community-based incidence study in the United Kingdom (1981–1986) of first-ever stroke of any pathological type or site.10 Patients were assessed as soon as possible after the stroke by a study neurologist (median delay, 4 days) and followed up at 1, 6, and 12 months and yearly thereafter to determine their level of disability. There were no losses to follow-up.11 For those who died, the date of death was obtained from general practitioner or hospital notes. Forty-five percent of patients were not admitted to the hospital, those admitted were not managed on a stroke unit, and few patients were treated with antithrombotic agents.
We wished to maximize the generalizability of the models and therefore included patients with any stroke in the OCSP who were first assessed within 30 days of stroke onset. Previous investigators have shown that a single model can be applicable to patients seen at any time up to 30 days from stroke onset.12 The models were developed on 530 of the original 675 patients: 54 patients were excluded because they were not assessed by a study neurologist (50 died before they could be assessed), 58 were seen after 30 days, and 33 had a subarachnoid hemorrhage (SAH).
Definition of Outcomes
We were interested in 2 important but simple outcomes: survival at 30 days and survival in an independent state (Oxford Handicap Scale score <3) at 6 months.3,13,14⇓⇓ Models for outcomes at 1 year are available from the authors on request. Outcomes were available for all 530 patients and were assessed blind to the predictor variables.
Data Reduction and Definitions of Predictor Variables
Predictor variables must be easy to collect (to minimize missing data), clinically relevant, and reliable.1,3,8,9,15⇓⇓⇓⇓ The number of variables in multiple regression analyses must also be carefully controlled.9,16⇓ Too few variables means that important predictors may be omitted, while too many variables can result in overfitting (a type I error in which false-positive predictors are erroneously included in the model); underfitting (a type II error in which important variables are omitted from the final model); and paradoxical fitting (a type III error in which a variable that, in truth, has a positive association with the outcome is found to have a negative association).16,17⇓ The risk of these problems increases as the ratio of outcome events to the number of predictor variables becomes smaller (the events per variable [EPV] ratio, in which the number of events is the lower figure for binary outcomes). The risk of error is especially high with EPVs <10.17–19⇓⇓
In the OCSP, 131 baseline variables were initially collected, whereas the optimal number to ensure an EPV of ≥10 was approximately 25. We reduced the number of variables by excluding (1) those with too many missing data (ie, on >50 patients) because this suggested that they were not easy to collect; (2) those with too little data (predictive factor present in <10 patients); (3) those that we considered unlikely to be associated with the outcome (eg, whether the onset occurred during sleep); and (4) those with questionable reliability20 (eg, sensory assessments). Data from CT scans were excluded because 186 patients (35%) did not have a scan or were scanned >2 weeks after their stroke, and the early generation scans available provided limited data compared with modern CT. Several of the remaining variables were combined to make composite variables; this resulted in a final set of 39 variables (Appendix). This number was higher than the 25 required to maintain the EPV at ≥10, but we decided against further data reduction because all the remaining variables appeared potentially clinically relevant. Most variables were dichotomized (normal versus abnormal) to keep the numbers of variables small, to improve the reliability of collecting the variables, and for clinical simplicity. Age was retained as a continuous variable, but systolic blood pressure, hemoglobin concentration, platelet count, and glucose concentration were split into 3 groups (see Appendix) because their association with outcome was not linear. The definitions of all variables and the coding of any data that were missing or deemed not assessable (eg, because the patient was unconscious, confused, or dysphasic) are available from the authors on request. The final models were not sensitive to changing the coding of these missing data.
Grouping of Variables
We classified the final variables into 3 groups (Appendix): set 1 variables were simple variables that could be collected easily on all patients; set 2 variables were variables that would only be available after a more detailed history and examination; and set 3 variables were the results of investigations. All variables were collected prospectively and without knowledge of the patient’s outcome.
Statistical Techniques for Generating the Models
The models to predict being alive and independent at 6 months were developed with the use of forward stepwise multiple logistic regression.21 Survival was predicted with Cox proportional hazards regression analysis21,22⇓ so that a single model could be applied to different time points up to 1 year. For both the logistic regression and Cox models, the probability for entry of a variable was set at 0.05 and for removal at 0.10. For each outcome, 3 models were produced: the first model entered only the simple variables from set 1; the next entered all variables from both sets 1 and 2; and the final models entered all variables from sets 1, 2, and 3. We expected that the addition of more detailed information would produce more accurate predictions, although they might be less practical to use in some situations. For each model we checked the statistical assumptions of linearity (for age)2,23⇓ and proportionality21 and looked for interactions between age, sex, and prior disability and the other variables in the model by using a pooled interaction test.2 Analyses were performed with the use of the SPSS package (version 6.1 for Windows).
Validation of Models
We had access to 3 independent prospective cohorts (test cohorts) of stroke patients that had been established to collect data similar to those of the OCSP. Two of these cohorts were community based. SEPIVAC (Italian initials for Epidemiological Study of Incidence of Acute Cerebrovascular Disease) collected data on first-ever strokes from a region in Italy (1986–1989)24, while the Perth Community Stroke Study collected data on all strokes in Perth, Australia (1989–1990).25 The median time from stroke onset to assessment in both cohorts was 4 to 5 days. These cohorts had relatively small numbers of eligible patients (310 in SEPIVAC, 228 in Perth), and therefore we combined them to make 1 community-based cohort that was restricted to patients with a first-ever stroke (not SAH) who were assessed by a study neurologist within 30 days of onset. The other test cohort, the Lothian Stroke Register (LSR), was hospital based and included stroke patients (both inpatients and outpatients, first and recurrent strokes excluding SAH) from 1 hospital in Edinburgh who were prospectively registered, examined, and followed up from 1990 onward.26 For validation, we again restricted this cohort to those seen within 30 days of onset.
We plotted calibration curves of the proportion of patients in each test cohort who actually had a good outcome against the proportion predicted by the model (in deciles of predicted probability). We assessed discrimination by calculating the area under a receiver operating characteristic (ROC) curve (AUC) of sensitivity versus 1 minus specificity.2,27–29⇓⇓⇓ An area of 1 implies a test with perfect sensitivity and specificity, while an area of 0.5 implies that the model’s predictions are no better than chance. The AUC for each model was calculated with the trapezoidal rule.27,29⇓
Because each outcome had 3 models (1 developed from set 1 variables, 1 from set 1 and set 2 variables, and 1 from all variables), the best model for each outcome in each test cohort was defined as the model with the largest AUC29,30⇓ or, if there were no statistically significant differences in areas, the model with the simplest variables. Since we made multiple comparisons between different ROC curves, we chose a more rigorous definition of statistical significance (2P≤0.01). For the logistic regression model for alive and independent, we also produced variance/covariance matrices so that confidence intervals could be produced for each prediction.
Using the best model (as defined above) in the hospital test cohort (LSR), we determined whether the AUC (ie, accuracy of the predictions) for alive and independent at 6 months varied according to the following: when the clinical data were collected (≤48 or >48 hours since stroke onset); the pathological type of stroke (definitely hemorrhagic confirmed by CT scanning within 14 days of onset versus the rest); whether the patients were seen as inpatients or outpatients; and whether or not the patients had had a previous stroke. These subgroup analyses were restricted to the LSR test cohort because this cohort had the most data.
The characteristics and outcomes of the patients in the OCSP are given in Table 1 (only variables that appeared in any of the models are shown), along with the percentage of patients with data that were not assessable in the training cohort and those variables that were independent predictors in each model. Analyses showed that the models did not violate any of the necessary statistical assumptions. The simplest model for alive and independent at 6 months only included 6 variables (age, living alone, prior disability, the verbal component of the Glasgow Coma Scale, arm power, and ability to walk), and these same variables also appeared in most other models. We therefore generated 1 additional survival model using only these 6 variables and compared this with the original set 1 survival model that included 11 variables.
The characteristics of patients in the 2 test cohorts are shown in Table 1. These were broadly similar to the OCSP cohort, although the OCSP included patients with slightly less severe strokes than the community test cohort, and patients in the hospital test cohort were 5 years younger on average. Fewer patients from the community test cohort were alive and independent at 6 months compared with the OCSP.
The AUCs for each model in each test cohort are shown in Table 2, and the ROC curves for the hospital test cohort are shown in Figure 1. In both test cohorts, the models using only set 1 variables generally did not have significantly smaller AUCs than the more complex models (indeed, they often gave slightly larger areas). The only exception was alive and independent in the hospital test cohort, where the set 1+2 model was marginally better (2P=0.01) than the set 1 model. Similarly, the survival model with only 6 simple variables was as good as the set 1 model with 11 variables. The exact models using only the 6 simple variables are given in Table 3.
The predictions of the simplest model for each outcome were reasonably well calibrated against the actual outcomes (Figure 2). The model for alive and independent overestimated the proportion who were independent in those with the highest predicted probabilities (>0.7) of good outcome. The reason for this was unclear.
The comparisons of the predictions of the best model (set 1+2) for alive and independent at 6 months in various different types of stroke patient in the LSR are shown in Table 4. There were no clear differences in the predictions between the various subgroups except that the discrimination was poorer for patients first seen as outpatients than for inpatients. The actual prognosis in outpatients was, as expected, much better than in the inpatients: >99% of outpatients were alive at 30 days, and 82% were alive and independent compared with 86% and 37%, respectively, in the inpatients. Outpatients were also seen much later after their stroke onset (median time, 11 days) than inpatients (median time, 2 days). Similar results for all these subgroup analyses were found with the simple set 1 models and if survival alone was analyzed.
These new models have several strengths. They were developed from high-quality data following established guidelines and have good predictive accuracy in 2 independent cohorts from different settings, countries, and time periods. The accuracy in the LSR is particularly reassuring because this was a much more recent cohort and therefore patients in this cohort were often treated with early administration of aspirin in a stroke unit. Because the models were developed on a community-based sample of patients with any type of stroke, they are more likely to be generalizable to other cohorts. We have demonstrated validity in those with both ischemic and hemorrhagic strokes (although our definition of hemorrhagic may have missed some small hemorrhages in patients not scanned within the first few days), in those with first-ever and recurrent stroke, and in those seen within 48 hours of stroke onset and those seen later. However, although there was a nonsignificant trend for better predictions in those seen within 48 hours, very few patients were seen within 6 hours of onset, and therefore the accuracy of the models in those seen in the hyperacute phase of stroke is unknown. The models may also be less applicable to very young patients (only 5% to 10% of patients in the training and test cohorts were aged <50 years), and they have poor discrimination in outpatients. The LSR outpatients were highly selected patients with a very good outcome, and the models were not able to predict the small numbers of outpatients who did poorly.
There were several other limitations of this study. The numbers of patients and outcomes on which the models were generated, although much larger than most comparable studies,7 were still relatively small, and many different regression analyses were performed. This may have resulted in false-positive variables in some models and in important variables being excluded from others because of lack of power.31 One hundred twelve patients were excluded from the training data set, but these patients were equally divided between those with severe strokes (ie, who died early before they could be seen) and those with milder strokes (ie, survivors who were first seen >30 days after their stroke). Few of these patients were admitted to the hospital (they died too quickly or had very minor strokes), and therefore most would also have been excluded from hospital-based studies, except in countries where almost all subjects with suspected stroke or transient ischemic attack are admitted. Several potentially important variables were not included in the models. Urinary incontinence has been shown in several studies to be an important predictor of poor outcome, 32,33⇓ but this information was not collected for most patients in the OCSP because it predated these studies. Moreover, detailed brain imaging was not available for many patients in the OCSP, and therefore we were not able to analyze the predictive value of certain radiological variables on outcome (eg, size of infarct, presence of hemorrhage). However, this means that the models can be used in any healthcare system regardless of whether or not there is access to CT or MRI scanning.
We have demonstrated that the simplest models that included only 6 clinical variables usually gave predictions that were as good as (if not better than) models using more complicated, less easily collected variables. Although we had expected that including more detailed variables would improve the models’ performance, other researchers have also found that simple models predict as accurately as more complex ones.34–36⇓⇓ This may be because more complex variables are more difficult to measure reliably or because increasing the number of variables leads to overfitting. The 6 simple variables do have some face validity since they cover age, social circumstances, prior disability, and stroke severity in terms of both upper and lower limb function, reduced level of consciousness, cognition, and speech. An abnormal verbal component of the Glasgow Coma Scale may also indicate confusion and dysphasia as well as impaired consciousness, which may explain why this component was selected rather than the motor or eye components of the score. We have shown that the 6 variables can be collected reliably, with interobserver κ values all >0.65, indicating excellent agreement.37
The simplicity of these models will make them useful for epidemiological studies. We have already used them to correct for differences in case mix when outcomes of stroke patients from different hospitals were compared,38 to assess whether certain other variables were additional independent predictors of prognosis,39 and to stratify patients by baseline prognosis in trials.40,41⇓
The use of models to influence the management of individual patients is more complex. They are certainly simple to use, either by programming them into hand-held computers or using nomograms. However, they do not predict specific outcomes that may be more relevant in clinical management (eg, to determine whether the patient will regain useful speech or be able to walk). The models’ predictions would need to be better than physicians’ informal predictions and to improve patient care and outcomes. False-positive and false-negative predictions of good outcome could harm patients, eg, a patient falsely predicted to do poorly might be given a hazardous treatment or denied an effective treatment. Hence, we cannot recommend the models’ use in clinical practice until they have been evaluated in clinical trials.15
We believe that we have produced useful, straightforward, and valid models to predict outcome in most acute stroke patients in any inpatient clinical setting. Because the models depend on only 6 easily assessed and robust clinical variables, it takes just minutes to categorize individuals or groups of patients with differing prognosis. The models will be available interactively on www.dcn.ed.ac.uk/models. In the future, we plan to try to validate these models in other cohorts (eg, those seen within hours of stroke onset), to compare them with clinical predictions and other models, and to assess whether they can be improved by adding other variables (eg, urinary continence or imaging data).
Variables Entered Into the Multiple Regression Models
Set 1: Simple Clinical Variables
Age, sex, living alone, employed, independent before stroke, history of hypertension, myocardial infarct, diabetes, malignancy, examined within 2 days, systolic blood pressure >160 mm Hg, systolic blood pressure <120 mm Hg, Glasgow Coma Scale eye/motor/verbal scores, able to lift arms, able to lift legs, able to walk.
Set 2: More Detailed Clinical Variables
Current smoker, previous transient ischemic attack, peripheral vascular disease, apoplectic onset, cervical bruit, cardiac disease, dysphasia, cognitive deficit (eg, neglect, dyspraxia, or visuospatial problems), visual field defect, gaze palsy, brain stem function, proprioception.
Set 3: Investigation Results (Initial Results After Stroke)
High hemoglobin level (men >17 g/dL, women >16 g/dL), anemia (men <13 g/dL, women <11 g/dL), platelet count >400×109/L, platelet count <150×109/L, urea <7 mmol/L, glucose >11 mmol/L, glucose <7 mmol/L, any atrial fibrillation, abnormal cardiac rhythm.
Examination findings were obtained at initial assessment.
This study was supported by a Wellcome Trust training fellowship in clinical epidemiology (Dr C. Counsell) and by a Wellcome Trust project grant (M. McDowall). We thank the patients included in the various cohorts and the staff who collected the initial data; the principal investigators of the SEPIVAC and Perth studies who allowed us to use their data to validate the models (Dr Stephano Ricci for SEPIVAC, Dr Edward Stewart-Wynne, Dr Konrad Jamrozik, and Dr Craig Anderson for the Perth study); Robyn Broadhurst, who extracted the data from the Perth study to send to us; and Jim Slattery and Dave Signorini for statistical help and guidance.
- Received August 21, 2001.
- Revision received December 5, 2001.
- Accepted December 21, 2001.
- ↵Kwakkel G, Wagenaar RC, Kollen BJ, Lankhorst GJ. Predicting disability in stroke: a critical review of the literature. Age Ageing. 1996; 25: 479–489.
- ↵Sharp SJ, Thompson SG, Altman DG. The relation between treatment benefit and underlying risk in meta-analysis. BMJ. 1996; 313: 735–738.
- ↵Bamford J, Sandercock P, Dennis M, Warlow C, Jones L, McPherson K, Vessey M, Fowler G, Molyneux A, Hughes T, et al. A prospective study of acute cerebrovascular disease in the community: the Oxfordshire Community Stroke Project 1981–86, I: methodology, demography and incident cases of first-ever stroke. J Neurol Neurosurg Psychiatry. 1988; 51: 1373–1380.
- ↵Dennis MS, Burn JPS, Sandercock PAG, Bamford JM, Wade DT, Warlow CP. Long-term survival after first-ever stroke: the Oxfordshire Community Stroke Project. Stroke. 1993; 24: 796–800.
- ↵Prescott RJ, Garraway WM, Akhtar AJ. Predicting functional outcome following acute stroke using a standard clinical examination. Stroke. 1982; 13: 641–647.
- ↵Dombovy ML, Basford JR, Whisnant JP, Bergstralh EJ. Disability and use of rehabilitation services following stroke in Rochester, Minnesota 1975–1979. Stroke. 1987; 18: 830–836.
- ↵Wolfe CDA, Taub NA, Woodrow EJ, Burney PGJ. Assessment of scales of disability and handicap for stroke patients. Stroke. 1991; 22: 1242–1244.
- ↵Wyatt JC, Altman DG. Prognostic models: clinically useful or quickly forgotten? BMJ. 1995; 311: 1539–1541.
- ↵Lindley RI, Warlow CP, Wardlaw JM, Dennis MS, Slattery J, Sandercock PAG. Interobserver reliability of a clinical classification of acute cerebral infarction. Stroke. 1993; 24: 1801–1804.
- ↵Altman DG. Analysis of survival times.In: Altman DG, ed. Practical Statistics for Medical Research. London, UK: Chapman & Hall; 1993: 365–395.
- ↵Hosmer DW, Lemeshow S. Applied Logistic Regression. New York, NY: John Wiley & Sons; 1989: 82–134.
- ↵Ricci S, Celani MG, La Rosa F, Vitali R, Duca E, Ferraguzzi R, Paolotti M, Seppoloni D, Caputo N, Chiurulla C, et al. SEPIVAC: a community-based study of stroke incidence in Umbria, Italy. J Neurol Neurosurg Psychiatry. 1991; 54: 695–698.
- ↵Davenport RJ. The Evaluation of Novel Services: An Example From Stroke Medicine [dissertation]. Nottingham, UK: University of Nottingham; 1996.
- ↵Volinsky CT, Madigan D, Raftery AE, Kronmal RA. Bayesian model averaging in proportional hazards models: assessing the risk of stroke. Appl Stat. 1997; 46: 433–448.
- ↵Wade DT, Wood VA, Hewer RL. Recovery after stroke: the first 3 months. J Neurol Neurosurg Psychiatry. 1985; 48: 7–13.
- ↵Jongbloed L. Prediction of function after stroke: a critical review. Stroke. 1986; 17: 765–776.
- ↵Weingarten S, Bolus R, Riedinger MS, Maldonado L, Stein S, Ellrodt AG. The principle of parsimony: Glasgow Coma Scale score predicts mortality as well as the APACHE II score for stroke patients. Stroke. 1990; 21: 1280–1282.
- ↵Counsell C. The Prediction of Outcome in Patients With Acute Stroke [dissertation]. Cambridge, UK: University of Cambridge; 1998.
- ↵Weir N, Dennis M, on behalf of the Scottish Stroke Outcomes Study Group. Towards a national system for monitoring the quality of hospital-based stroke services. Stroke. 2001; 32: 1415–1421
- ↵Wardlaw JM, Lewis SC, Dennis MD, Counsell C, McDowall M. Is visible infarction on computed tomography associated with an adverse prognosis in acute ischemic stroke? Stroke. 1998; 29: 1315–1319.
- ↵The International Stroke Trials Collaboration. The FOOD trial (Feeding or Ordinary Diet) [online protocol]. Available at: http://www.dcn. ed.ac.uk/food./ Accessed August 1, 2001.
- ↵The International Stroke Trials Collaboration. The Third International Stroke Trial (IST 3) [online protocol]. Available at: http://www.dcn.ed.ac.uk/ist3/http://www.dcn.ed.ac.uk/ist3./ Accessed August 1, 2001.