Comparison of Neurological Scales and Scoring Systems for Acute Stroke Prognosis
Background and Purpose Clinical trials routinely use stroke scales to compare baseline characteristics of treatment groups. It is unclear which stroke scale provides the most prognostic information. This often leads to collection of multiple scales in clinical trials. We aimed to determine which of several commonly used scales best predicted outcome.
Methods A single observer scored consecutive admissions to an acute stroke unit on the National Institutes of Health Stroke Scale (NIHSS), the Canadian Neurological Scale, and the Middle Cerebral Artery Neurological Score. Guy's prognostic score was determined from clinical data. Outcome at 2, 3, 6, and 12 months was categorized as good (alive at home) or poor (alive in care or dead). Predictive accuracy of the variables was compared by receiver operating characteristic curves and stepwise logistic regression.
Results Of the 408 patients studied, 373 had confirmed acute stroke and completed follow-up. The three stroke rating scales each predicted 3-month outcome with an accuracy of .79 or greater. The NIHSS provided the most prognostic information: sensitivity to poor outcome, .71 (95% confidence interval [CI], .64 to .79); specificity, .90 (95% CI, .86 to .94); and overall accuracy, .83 (95% CI, .79 to .87). Logistic regression showed that the NIHSS added significantly to the predictive value of all other scores. No score added useful information to the NIHSS. A cut point of 13 on the NIHSS best predicted 3-month outcome.
Conclusions Baseline NIHSS best predicts 3-month outcome. The Canadian Neurological Scale and Middle Cerebral Artery Neurological Score also perform well. Baseline assessments in clinical trials only need to include a single stroke rating scale.
Stroke scales exist principally as a result of clinical trials, and their existence reflects the heterogeneity of stroke patients and attendant difficulties in reliably assessing outcome with respect to disability or neurological deficit. Scales seek to quantify different aspects of function within the framework of the World Health Organization hierarchy of impairment, disability, and handicap.1 Since the introduction of the Mathew scale2 in 1972, there has been a steadily increasing number of scales that seek to quantify neurological impairment. These impairment scales involve scoring various modalities of neurological function for an individual and then summing the scores to provide an index of neurological status. These scales were developed for a variety of reasons, including monitoring neurological status for deterioration3 and adjusting final outcome for initial severity of stroke in clinical trials.4 Although the purpose of many of these scales has not been explicit, their primary uses have been (1) to compare the baseline stroke severity of patient groups and (2) to quantify neurological recovery over time. In effect, impairment scales have often been used to predict outcome despite not having been designed for this purpose. Baseline measurements on the CNS predict functional outcome 6 months after stroke.5 Acute scores on the NIHSS correlate with both CT infarct volume6 at 7 to 10 days after stroke and functional outcome4 at 3 months. Stroke assessment scales should not, however, be used as a measure of functional outcome itself, since impairment scales only partly explain functional health.7
Several multivariate scoring systems have been developed with the sole aim of predicting outcome. A recent evaluation8 compared five multivariate prognostic scoring systems with simple univariate predictors of outcome, such as level of consciousness, and concluded that multivariate scoring systems, when applied outside the context of their development, fare no better than simple predictors.
We sought to determine the best statistical model for predicting outcome at 3 months after stroke using baseline measures on three stroke scales (the MCANS,9 10 CNS,3 and NIHSS4 ) and a specifically designed prognostic score (Guy's prognostic score11 ).
Subjects and Methods
The admission criteria and protocol of the ASU of the Western Infirmary, Glasgow, are described in detail elsewhere.12 Briefly, all patients within a well-defined geographic region suffering a new focal or global neurological deficit are admitted, regardless of age or severity of neurological deficit. CT or MRI is performed routinely within 72 hours of admission. Details of each patient's risk factors, presenting complaints, neurological examination, results of investigations, and final diagnosis are prospectively recorded and transferred to a computerized database. The patients included in this study represent a series of consecutive admissions to our unit. Patients whose symptoms were found to be caused by a condition other than stroke were excluded from the analysis.
A single experienced observer (K.W.M.) assessed each of the patients within 72 hours of admission according to the NIHSS, CNS, and MCANS. The originally described version of the NIHSS was used for all patients. Guy's prognostic score was derived for each patient with information from the ASU clinical database.
Outcome follow-up was by record linkage13 to death records from the Registrar General of Scotland and to hospital discharge records to obtain information on medical events after stroke. This technique has been validated previously in an epidemiological study of hypertension14 and has also been used for end point monitoring in a large clinical trial.15 The method of record linkage is a reliable one; however, admissions to private hospitals or institutions outside Scotland are not detected. Outcome was categorized as alive at home, alive in care, or dead at 2, 3, 6, and 12 months after stroke. Outcome at 3 months was chosen as the outcome measure for the subsequent multivariate analysis. This practical outcome measure is a marker for 3-month functional outcome, an end point commonly used in trials of therapeutic agents in acute stroke.16
Throughout the analysis, nonparametric methods were used, since stroke scales provide ordinal level data that are not normally distributed. Correlations between the stroke scales and Guy's score were expressed with the use of Spearman's rank correlation coefficient. Kruskal-Wallis tests were used to investigate differences in median scores between patients in the alive at home, alive in care, and dead outcome groups. ROC curves17 were used to assess the usefulness of the individual scores in predicting whether outcome was poor (alive in care or dead) or good (alive at home).
Stepwise logistic regression18 was used to assess which subset of the variables best predicted good or poor outcome as defined above. This sequential procedure first includes the best predictor variable, then the next best predictor variable, and so on until no significant variables remain outside the model. Logistic regression estimates the probability of poor outcome for each patient, and by choosing a cutoff probability, we may then predict whether patients will have a good or poor outcome. These predictions may be compared with the true outcomes to obtain the sensitivity and specificity of the procedure for identifying patients who will have a poor outcome. A cutoff probability of .5 was chosen. We performed the statistical analysis using MINITAB and BMDP on a PC and Splus on a UNIX workstation. The logistic regression analysis was repeated after the exclusion of patients with clinical signs of posterior circulation stroke, since the CNS was designed for use in carotid territory stroke with motor signs and the MCANS for use in middle cerebral artery strokes. This avoids a biased comparison of the stroke assessment scales since the NIHSS is the only one to include signs indicative of vertebrobasilar stroke, such as ataxia.
Four hundred eight patients were included in this study. A nonstroke diagnosis was reached in 29 patients (6 old stroke, 6 seizure activity, 5 tumor, 4 nonorganic, 8 others), who were therefore excluded from the analysis. Primary intracerebral hemorrhage was diagnosed in 43 patients (11% of strokes) and hemorrhagic infarct in 2 patients (1%). The remaining 334 patients (88%) were diagnosed as having ischemic stroke. The low proportion of hemorrhagic infarction may be due to early CT scanning of patients in our ASU, often within 24 hours of onset. At this stage hemorrhagic transformation is less likely to be observed. Outcome data were unavailable in 6 patients because no matching records were identified by record linkage. Data on 373 patients were therefore available for analysis. The median age was 69 years (range, 22 to 96 years; interquartile range, 59 to 77 years). There were 191 male patients (51%).
Table 1⇓ shows the placement of patients at 2, 3, 6, and 12 months after admission to the ASU. Table 2⇓ shows pairwise Spearman rank correlation coefficients among the four scores being tested. Table 3⇓ gives the median of each score according to the 3-month outcome grouping. For each score, a Kruskal-Wallis ANOVA showed highly significant differences in median baseline score between outcome groups (P<.001 in each case). The Figure⇓ presents ROC curves for prediction of 3-month poor outcome for each of the numerical scores. A comparison of the predictive power of variables can be made by assessing which curve approaches the top left corner of the plot most closely. Guy's prognostic score appears to be a weaker predictor of outcome than the stroke scales, of which the NIHSS seems narrowly to be the best.
In the stepwise logistic regression model, the NIHSS (P<.0001) was the first variable to be included. This model, in which only the NIHSS was used, gave a sensitivity to poor outcome of .71 (95% CI, .64 to .79), a specificity of .90 (95% CI, .86 to .94), a positive predictive value of .82 (95% CI, .75 to .89), and an overall predictive accuracy of .83 (95% CI, .79 to .87). Guy's prognostic score was the next variable to be added to the model. However, although this variable was statistically significant (P=.0016), the number of correct predictions decreased slightly (sensitivity, .70 [95% CI, .62 to .77]; specificity, .89 [95% CI, .85 to .93]; positive predictive value, .80 [95% CI, .73 to .87]; and overall predictive accuracy, .82 [95% CI, .78 to .86]). None of the other variables (CNS, MCANS) was significant. Prediction according to the model based on NIHSS score alone is equivalent to choosing a cutoff of 13 on the baseline NIHSS and predicting all patients scoring 13 or more as having a poor outcome. Despite its statistical significance, the model in which the NIHSS alone was used did not give substantially greater accuracy than those in which the CNS or MCANS alone was used.
To further investigate the apparent superiority of the NIHSS over the CNS, MCANS, and Guy's prognostic score, each variable in turn was forced into the model. Stepwise logistic regression was then used to test whether any of the other variables significantly improved the fit of the model. Table 4⇓ shows the results of this additional modeling. In each case the NIHSS was found to add extra predictive information to the variable that was initially forced into the model. However, these more complex models did not result in better predictive accuracy than the model in which only the NIHSS was used.
After the exclusion of patients with posterior circulation events, the results of the stepwise logistic regression modeling were identical to those in which all patients were considered. The NIHSS was the best predictor of outcome, providing extra predictive information over the other scores. Overall accuracy of the individual scales, except for Guy's score, was slightly lower than in the entire patient group (Table 5⇓).
Outcome at 3 months as assessed by simple and clinically relevant criteria is best predicted by the baseline NIHSS score. A score of 13 discriminates patients likely to be independent from those likely to be dependent with good predictive value. This score represents approximately 50% of the practical maximum score on this version of the NIHSS. Little additional information is added by Guy's prognostic score or the other acute impairment scales, although all baseline measures correlate well with outcome.
None of the acute impairment scales has sought to assess disability, and none was designed to provide any indication of prognosis. The use of stroke scales in clinical trials to measure either baseline severity or progress has therefore hitherto been based on the assumption that the features assessed by a scale are of relevance to disability. In contrast, scales designed to predict outcome have not been used widely in clinical trials.
Most impairment scales, including the MCANS, CNS, the Scandinavian Stroke Scale,19 and the European Stroke Scale,20 are weighted very heavily toward motor function in the hemiparetic limbs, with minor additional scores for language function, level of consciousness, or hemianopia. There has if anything been a tendency to increase the relative importance of the motor score and to specify more complex assessments with each new scale. The NIHSS is constructed differently, in that each test item is graded, but no significant weighting is given to limb function, and many additional items, such as ataxia, sensory loss, or visuospatial perception, are also included. However, scoring these additional aspects of neurological function may be a mixed blessing since they are often untestable in aphasic or comatose patients. Total scores cannot be reliably compared between the NIHSS and other scales,21 and in clinical use the NIHSS total score has a practical ceiling well below its theoretical upper limit because of nonscoring of untestable items. Patients with language disorders are particularly likely to produce scores of poor comparability between the NIHSS and MCANS or CNS.
Given these differences between stroke scales, why should the NIHSS provide a better prediction of 3-month outcome than either the MCANS or CNS? Unlike the MCANS or CNS, the NIHSS is not weighted in an arbitrary fashion toward motor function of limbs, but since all tested items are graded more or less equally, it rather reflects the overall degree of neurological deficit. While limb strength is certainly an important determinant of functional recovery from stroke, our results suggest that the MCANS and CNS perhaps place unnecessary emphasis on assessment of the degree of weakness over other neurological features. An even more heavily motor-weighted scale such as the European Stroke Scale is clearly open to similar criticism. These differences will also be determined by the chosen outcome measure since many of the disability scales in current use are similarly heavily weighted toward motor function (notably the Barthel Index22 ). Simple and robust outcome criteria were chosen for this study since these are of greater importance to patients and caregivers than minor differences in the abstract numbers of a rating scale. Our inclusion of patients “alive in care” in the poor outcome group reflects the situation in Scotland, where more severely disabled patients are cared for in the hospital rather than at home.
If several rating scales for prediction of outcome are available, sensitivity and specificity to poor outcome are only two criteria used to compare scales. If there is no substantial difference in the predictive accuracy of a number of rating scales, then simplicity of use becomes important. The NIHSS requires scoring of a greater number of aspects of neurological function than the CNS or MCANS. However, video training in the use of the NIHSS is available, which provides a standard for the use of the scale and improves interobserver reliability.23
Our results suggest that when baseline comparison of treatment groups in a clinical trial is required, the NIHSS is sufficiently accurate and for many trials should be routine. The CNS, MCANS, and Guy's prognostic score offer no useful additional information. The NIHSS remains accurate even when patients with vertebrobasilar stroke are not considered, a fact relevant to the many clinical trials that limit inclusions to patients with hemispheric strokes only.
Selected Abbreviations and Acronyms
|ASU||=||acute stroke unit|
|CNS||=||Canadian Neurological Scale|
|MCANS||=||Middle Cerebral Artery Neurological Score|
|NIHSS||=||National Institutes of Health Stroke Scale|
|ROC||=||receiver operating characteristic|
This study was supported by the Wellcome Trust Prize Studentship Scheme (C.J.W.). Thanks are due to Pauline McBride for assistance in collating patient records.
- Received March 28, 1996.
- Revision received July 4, 1996.
- Accepted July 5, 1996.
- Copyright © 1996 by American Heart Association
World Health Organization. International Classification of Impairments, Disabilities and Handicaps. Geneva, Switzerland: World Health Organization; 1980.
Cote R, Hachinski VC, Shurvell BL, Norris JW, Wolfson C. The Canadian Neurological Scale: a preliminary study in acute stroke. Stroke. 1986;17:731-737.
Brott TG, Adams HP, Olinger CP, Marler JR, Barsan WG, Biller J, Spilker J, Holleran R, Eberle R, Hertzberg V, Rorick M, Moomaw CJ, Walker M. Measurements of acute cerebral infarction: a clinical examination scale. Stroke. 1989;20:864-870.
Cote R, Battista RN, Wolfson C, Boucher J, Adam J, Hachinski V. The Canadian Neurological Scale: validation and reliability assessment. Neurology. 1989;39:638-643.
Brott TG, Marler JR, Olinger CP, Adams HP, Tomsick TA, Barsan WG, Biller J, Eberle R, Hertzberg V, Walker M. Measurements of acute cerebral infarction: lesion size by computed tomography. Stroke. 1989;20:871-875.
De Haan R, Horn J, Limburg M, Vandermeulen J, Bossuyt P. A comparison of five stroke scales with measures of disability, handicap, and quality of life. Stroke. 1993;24:1178-1181.
Gladman JRF, Harwood DMJ, Barer DH. Predicting the outcome of acute stroke: prospective evaluation of 5 multivariate models and comparison with simple methods. J Neurol Neurosurg Psychiatry. 1992;55:347-351.
Orgogozo JM, Capildeo R, Anagnostou CN, Juge O, Pere JJ, Dartigues JF, Steiner TJ, Yotis A, Rose FC. Development of a neurological score for the clinical evaluation of sylvian infarctions. Presse Med. 1983;12:3039-3044.
Orgogozo JM, Dartigues JF. Clinical trials in acute brain infarction: the question of assessment criteria. In: Battistini N, Fiorani P, Courbier R, Plum F, Fieschi C, eds. Acute Brain Ischaemia: Medical and Surgical Therapy. New York, NY: Raven Press; 1986:282-289.
Allen CMC. Predicting the outcome of acute stroke: a prognostic score. J Neurol Neurosurg Psychiatry. 1984;47:475-480.
Morris AD, Grosset DG, Squire IB, Lees KR, Bone I, Reid JL. The experiences of an acute stroke unit - implications for multicentre acute stroke trials. J Neurol Neurosurg Psychiatry. 1993;56:352-355.
Kendrick S, Clarke J. The Scottish record linkage system. Health Bull (Edinb).. 1993;51:1-15.
Altman DG. Diagnostic tests. In: Altman DG, ed. Practical Statistics for Medical Research. London, UK: Chapman & Hall; 1991:415-418.
Engelman L. Stepwise logistic regression. In: Dixon WJ, Brown MB, Engelman L, Jennrich RI, eds. BMDP Statistical Software Manual. Berkeley, Calif: University of California Press; 1990:1013-1046.
Scandinavian Stroke Study Group. Multicenter trial of hemodilution in ischemic stroke: background and study protocol. Stroke. 1985;16:885-890.
Hantson L, De Weerdt W, De Keyser J, Diener HC, Franke C, Palm R, Van Orshoven M, Schoonderwalt H, De Klippel N, Herroelen L, Feys H. The European Stroke Scale. Stroke. 1994;25:2215-2219.
Muir KW, Grosset DG, Lees KR. Interconversion of stroke scales: implications for therapeutic trials. Stroke. 1994;25:1366-1370.
Lyden PD, Brott TG, Tilley B, Welch KMA, Mascha EJ, Levine S, Haley EC, Grotta JC, Marler JR. Improved reliability of the NIH Stroke Scale using video training. Stroke. 1994;25:2220-2226.