Unified Neurological Stroke Scale Is Valid in Ischemic and Hemorrhagic Stroke
Background and Purpose The growing interest in testing new therapeutic agents for acute brain injury has lead to increased use of stroke scales. The reliability and validity of these measures need to be examined more completely. We used structural equation modeling, a technique that merges the analytic procedures of factor analysis and multiple regression, to examine the reliability and construct validity of the Middle Cerebral Artery Neurological Scale and the Scandinavian Neurological Stroke Scale used together as the Unified Neurological Stroke Scale. We also analyzed the predictive validity, sensitivity, and specificity of the scales in predicting mortality and functional outcome.
Methods We prospectively studied 84 consecutive patients admitted to a neurology/neurosurgery intensive care unit with intracerebral hemorrhage (n=30), subarachnoid hemorrhage (n=15), ischemic stroke (n=15), and traumatic brain injury (n=24). Patients were evaluated within 24 hours of admission and at 48-hour intervals until intensive care unit discharge. A total of 386 assessments were obtained. The Functional Independence Measure was administered by telephone 3 months after hospital discharge.
Results High levels of reliability and construct validity were observed for the majority of the Unified Stroke Scale items. Facial palsy and eye movement items had the lowest reliability and validity. Both the Middle Cerebral Artery and Scandinavian Scales were significant predictors of outcome. Sensitivity and specificity varied by diagnosis. Predictive validity of functional outcome was best in groups with ischemic and hemorrhagic stroke rather than traumatic brain injury and subarachnoid hemorrhage.
Conclusions The Unified Stroke Scale demonstrates reliability and construct and predictive validity, and its use is supported in ischemic and hemorrhagic stroke. Structural equation modeling is an appropriate technique for use with scales of this type.
Clinical assessment of neurological injury or illness uses a qualitative description of a patient’s neurological deficit. Although this approach is useful in guiding clinical decision making for individual patients, it lacks the precision needed for clinical studies. In response to the need for more precise measurement, a number of quantitative stroke scales have been developed.1 2 3 These scales are used in clinical studies for stratification of patient groups, quantification of changes in neurological status, and prediction of outcome.4
Stroke scales are now routinely used in the testing of new agents for treatment of acute brain injury. Despite increased reliance on such scales, few demonstrate the reliability and validity necessary for use in clinical trials.5 For these scales to be useful in clinical investigation, their reliability and validity must be assessed.6
The Unified Neurological Stroke Scale7 is composed of the Neurological Scale for Middle Cerebral Artery Infarction (MCANS)8 9 and the Scandinavian Neurological Stroke Scale (SNSS).10 The Unified Neurological Stroke Scale uses a scoring format that permits generation of a score for one or both of the stroke scales. Orgogozo et al7 suggest that the use of the Unified Neurological Stroke Scale may provide answers to questions of sensitivity and specificity and the ability of the items individually or in combination to predict mortality, neurological outcome, and future function.
To answer these questions, an assessment of the validity of each scale is necessary. Although there are many different statistical methods for evaluating the validity of a scale, structural equation modeling (SEM) is a method that merges the analytic procedures of multiple regression and factor analysis for the determination of construct validity.11 SEM permits measurement of the extent to which the scale items “fit” the theoretical constructs that the scale was designed to evaluate and the consistency or the degree of random error associated with each item. This approach is particularly appropriate when the scale items are ordinal, as is the case with many clinical measures.
In addition, although stroke scales were originally developed and used to quantify neurological deficits after an ischemic stroke, a broader application may be possible. Since quantification of neurological status is an important aspect of investigations of the clinical efficacy of therapeutic agents in all populations, it is important to evaluate the utility of these scales in a wider population. If the scales are robust, they will demonstrate internal consistency over a range of diagnostic groups. Then the scales could be useful in the study of hemorrhagic stroke, subarachnoid hemorrhage, or cranial trauma resulting in acute focal neurological deficits.
Our prospective study focused on patients admitted to a combined neurology/neurosurgery intensive care unit. Our goals were (1) to determine the reliability, construct validity, and predictive validity of the MCANS and SNSS scales using SEM and logistic regression and (2) to assess the applicability of the MCANS and SNSS scales to ischemic stroke, ICH, SAH, and TBI.
Subjects and Methods
All patients admitted to a neurology/neurosurgery ICU from November 1, 1993, to February 1, 1994, were screened for possible enrollment in the study. The 16-bed unit admits approximately 1000 patients per year. Four diagnostic groups of patients were studied. The ICH (n=30) group consisted of all patients with CT scan evidence of acute supratentorial parenchymal hemorrhage and diminished level of consciousness. The SAH (n=15) group consisted of patients presenting with acute SAH demonstrated on CT scan. All patients presenting to our institution with ICH or SAH are routinely admitted to the neurology/neurosurgery ICU and thus are screened for enrollment in the study. The ischemic stroke (CVA) group (n=15) consisted of patients who were admitted to the ICU with acute onset of a neurological deficit in the supratentorial compartment without CT evidence of hemorrhage. The indications for ICU admission were diminished level of consciousness or, in the majority of cases (73%), potential for cardiac instability. Those who were later determined to have had an infratentorial event on the basis of clinical or radiographic findings were not included the study. The TBI (n=24) group consisted of all patients with head injury presenting to our institution with a diminished level of consciousness. Many of these patients were admitted after surgical intervention.
The items for the MCANS and SNSS scales are presented in Table 1⇓. The 10 MCANS items reflect 21 ordinal degrees of severity. The MCANS items are summed to create a total score ranging from 0 to 100. This score was used to determine the predictive validity of the scale. The SNSS yields three separate scores: prognostic (SNSS-PRG), long-term (SNSS-LT), and a total score. The prognostic score includes consciousness, gaze palsy, and limb strength. The long-term score items evaluate limb strength, dysphasia, facial palsy, orientation, and gait. Each of these scores was included in the analyses.
The scales were administered within 24 hours of admission to the ICU and then at 48-hour intervals until ICU discharge. A total of 386 assessments were obtained; the mean number of assessments per patient was five. The assessments were performed by three occupational therapy graduate students working under the supervision of the medical director of the ICU (M.N.D.). Interrater reliability of .92 or greater was achieved on all scales before beginning the study. Scores from the Acute Physiology and Chronic Health Evaluation (APACHE II)12 at admission were retrospectively obtained from patient charts as an index of severity of illness. A 3-month posthospital discharge follow-up telephone interview was conducted by one nurse. Information obtained during the interview included outcome rated as (1) death, (2) nursing home or custodial care, (3) home-dependent, and (4) independent. The telephone version of the FIM13 was used to evaluate functional performance. This ordinal scale rates 18 activities of daily living on the basis of the amount of assistance required. Responses to the 18 items are used to create a motor score (13 items) and a cognitive score (5 items).14 Both the categorical outcome rating and the two FIM scores were used to determine the predictive validity of the MCANS and SNSS.
Preliminary estimates of reliability for each scale were computed by diagnostic group using Chronbach’s α.15 Reliability and construct validity for the total sample were evaluated by means of an SEM using Lisrel 8.16 Using this method, a model is developed before data analysis that consists of one or more theoretical constructs. The constructs are definitions of the properties being measured by the scale (ie, motor function, consciousness). Before the initial analyses, three latent constructs are hypothesized for each stroke scale: consciousness, upper body function, and lower body motor function. Each scale item is then assigned to a particular construct. The scale items are simultaneously loaded into the model to determine their individual contribution to the construct. If the model is not statistically significant, the Lisrel 8 program generates alternative models to be tested. The final models showing the association between the scale items and the constructs are shown in the Figure⇓.
Since all of the Unified Neurological Stroke Scale items were ordinal, polychoric correlations and asymptotic covariance matrices were computed. The weighted least-squares method17 was used by the Lisrel 8 program to estimate the model and test the significance of the fit. Five goodness-of-fit statistics are generated as part of the overall model estimation process: χ2, root-mean-square error of approximation, adjusted goodness of fit index, comparative fit index, and incremental fit index. Each of these indices emphasizes different aspects of model fit.18 19 20 Significance on all five indices was required for the acceptance of a model.
Logistic regression equations were computed for each Unified Neurological Stroke Scale score to evaluate how well each scale at the time of ICU admission predicted survival (alive versus dead). These analyses also provided sensitivity and specificity estimates for each scale by diagnostic group and for the total sample.
Stepwise multiple regression analyses were used to determine the ability of each scale to predict postdischarge FIM motor and cognitive scores. A value of P<.05 was required for statistical significance.
Demographic characteristics, disease severity at the time of ICU admission, and outcome data are presented for each diagnostic group in Table 2⇓. As expected, the mean age of the TBI patients was significantly younger than the other three groups. The severity of illness based on Glasgow Coma Scale scores did not differ significantly across the diagnostic groups. The mean probability of death based on APACHE II scores, which is a predictor of in-hospital mortality in critically ill patients, did not differ among groups. It ranged from 12% in the CVA group to 19% in the SAH group, indicative of an intermediate risk of mortality.12
Twenty-three percent of the total patient sample was unavailable for follow-up; the majority of these patients were in the TBI group. The 3-month mortality rate for the total sample was 23%. The rate differed across diagnostic groups (χ2=32.04, P<.0001), with the lowest rate (6%) in CVA patients and the highest mortality (36%) in ICH patients.
The scores from the three stroke scales are presented by diagnostic group in Table 3⇓. They did not differ significantly across the diagnostic groups. Since the diagnostic groups did not differ significantly with respect to each stroke scale score or severity of illness, they were combined for logistic and multiple regression analyses used to calculate the overall predictive validity of the measures. Subsequent analyses examined each diagnostic group separately.
Chronbach’s α, a measure of internal consistency, for the MCANS and SSNS was .65 and .59, respectively, for the sample as a whole. The α coefficients for the MCANS were .79 (CVA), .76 (ICH), .75 (SAH), and .76 (TBI). The SSNS coefficients were lower across all diagnostic groups: .53 (CVA), .56 (ICH), .59 (SAH), and .58 (TBI). Since the scale items are ordinal, these coefficients were interpreted as suggestive but not definitive estimates of reliability. The SEM modeling offers the advantage of accommodating ordinal data for determination of both reliability and validity. The final models for each scale were significant on all five fit indices.
The initial SNSS models were not statistically significant. Two items, gait and facial palsy, had highly skewed distributions. Eighty-one percent of the patients were bedridden and were scored 0 on the gait item. The facial palsy item has only two categories; 91% of the patients were assigned to the paralysis or marked paresis group. Therefore, in an attempt to improve the precision of the SNSS, the gait and facial palsy items were deleted, and the model was simplified to two constructs: consciousness and motor function. The repeat analysis produced a final model with significant fit indices. The eye movement item had the lowest reliability (.31) and validity (.56). The reliability and validity coefficients for all scale items are presented in Table 4⇓.
All MCANS items except for facial palsy and eye movements show high levels of reliability and validity. The facial palsy item had the lowest levels of reliability (.01) and validity (.12), and yet, when it was removed from the model, the MCANS equation was no longer statistically significant. Similarly, the eye movement item is weak, particularly when compared with other items associated with the same construct. The results for the eye movement item are similar for both scales, although the coefficients are slightly higher for the MCANS. The results of the SEM analysis for the MCANS and SNSS are shown in the Figure⇑.
The admission MCANS, SNSS-LT, and SNSS-PRG scores were all significant predictors of mortality for the total sample (Table 5⇓). Sensitivity was high (eg, >90%) for all three measures; however, the SNSS-PRG score produced the best combination of sensitivity and specificity.
The CVA group had only two patients who died. The combination of the small sample size and low mortality makes it impossible to estimate a regression equation for this group. For SAH and TBI patients, the MCANS was able to significantly predict mortality. In both groups, specificity was higher than sensitivity. The results for the SNSS-PRG score were similar to the MCANS findings. The SNSS-LT scores were predictive of mortality only for SAH patients.
To examine the ability of the scales to predict functional outcome, linear regressions using FIM scores were computed for the combined sample and by diagnosis (Table 6⇓). The three scales were significant predictors of the cognitive FIM scores for the combined sample. When the diagnostic groups are analyzed separately, only the MCANS scores of ICH patients were significant predictors of the cognitive FIM. The predictive performance was better when the motor components of function were assessed. For the total sample, all three scales achieved statistically significant R2. The MCANS reliably predicted motor FIM scores for the CVA and ICH groups. A similar pattern was seen with the SNSS-PRG score. The SNSS-LT score had the best predictive validity for the motor performance of ICH patients. The scales were unable to predict the motor or cognitive function of TBI patients. Since a large number of TBI patients were lost to follow-up, these results must be interpreted with caution.
The proliferation of clinical trials for treatment of acute stroke has resulted in the introduction of a number of stroke scales. These scales were all designed to provide a system for quantifying neurological deficits. However, while there is general agreement that these types of scales should meet the clinometric criteria21 of reliability and validity, no single scale has been recognized as having completely achieved this goal.3 In the present study, we chose to examine the reliability and validity of the Unified Neurological Stroke Scale with four types of acute neurological injury: ischemic stroke (CVA), ICH, SAH, and TBI. We chose to do so for two reasons. The first was an attempt to more rigorously assess the reliability of the scales. According to Asplundh,6 internal consistency is the most important clinometric criteria for the development of a credible and universally acceptable neurological scoring system. Ideally, with a robust scale, the items perform consistently even when the patient population is variable.22
The second reason for applying these scales to a heterogeneous population was to determine the utility of the scales in other types of acute brain injury. Therapeutic trials of new agents are not limited to ischemic stroke patients, and studies in other populations have the same need for quantification of neurological deficits. A scale that retains its reliability across different samples of patients, testing environments, and testers is considered to have generalizability.23 The results of this study demonstrate that the MCANS and SSNS have reliability and validity even when a heterogeneous sample is used.
The internal consistency of the MCANS and SSNS is comparable to that of other stroke scales.24 25 26 The differences in internal consistency between the MCANS and SSNS may be due in part to the differences in numbers of items and the numbers of possible response categories for each item. The MCANS has 10 items, nine of which have only two or three response options, while the SSNS has nine items, with seven of the nine items offering four response categories. Since internal consistency is a function of the number of items on a scale and the mean correlation between the items, longer scales with fewer choices per item will generally have higher coefficients.27
The poor reliability of the assessment of facial palsy and eye movement has been reported for other stroke scales as well. The facial movement item of the European Stroke Scale had a weighted κ, which was lowest of the 14 items on the scale.25 Several authors26 28 29 have also reported poor interrater agreement for facial palsy and gaze items of the National Institutes of Health scale. Lyden et al26 suggest that it is difficult for raters to distinguish between mild, moderate, and severe facial weakness. Orgogozo et al9 reduced the number of scoring categories on facial palsy from three to two when the scale was in development because of poor interrater reliability. Perhaps the scoring could be made more reliable if clearly defined criteria were delineated. The MCANS and SSNS upper and lower extremity function, consciousness, and orientation items were robust (reliability coefficients ranging from 1.00 to .78). The findings are consistent with the results for similar items on other stroke scales.26
We believe that SEM is a more precise technique for reliability estimation. All stroke scales are composed of items ranked ordinally. The use of statistical techniques such as Chronbach’s α, which other investigators have used as an index of reliability, relies on the assumption of continuous or interval measurement. This may lead to an overestimation or underestimation of the reliability coefficients.30
Criterion and construct validity and specificity and sensitivity of stroke scales are often not examined.5 Construct validity is the most difficult to establish because the constructs are not “real” (that is, they are not directly observable) and exist as abstract representations of the theoretical structure that guides construction (or specification) of individual items.23 A scale can represent several constructs as long as the items are clearly associated with the theoretical constructs and the scoring reflects the multidimensional structure of the scale.15 Since the items of a stroke scale are used to quantify the standard neurological examination, the theoretical constructs can be identified, and the construct validity of the scale can be tested. We have demonstrated that the MCANS and SSNS are composed of two theoretically and statistically distinct theoretical constructs.
Criterion, or predictive validity, was measured in two ways in this study. First, the ability to predict mortality was determined, and sensitivity and specificity were computed. In the case of mortality, specificity is more important than sensitivity in that it may be in the best interest of an investigator to exclude patients likely to die from a clinical trial; false-negatives, or the prediction of survival in patients who die, are more problematic than false-positives (patients who are predicted to die but survive).23 Overall, the SNSS-PRG performed better than the SNSS-LT and MCANS when the total sample mortality was analyzed. Within the different diagnostic groups, the differences in sensitivity and specificity between the prognostic and long-term scores were minimal. The results for the TBI patients should be interpreted with caution. Thirty-eight percent of this group was lost to follow-up; therefore, we cannot accurately account for the mortality of this group. It is difficult to justify the calculation of both the prognostic and long-term scores on the basis of this finding. Similar conclusions were reached by DeHaan et al,31 who compared the two SNSS scores with measures of disability and quality of life and found that the two scores yielded identical correlation coefficients with the Barthel Index.
Predictive validity of functional outcome was also examined with the FIM. When analyzing the total sample, the scales are able to reliably predict both postdischarge cognitive and motor performance. Different patterns were observed across the four diagnostic groups. The scales were more predictive of function in groups with localized rather than diffuse brain injury. The ability of the MCANS and SNSS-PRG to predict long-term motor function was best with the CVA and ICH patients. These patients are more likely to develop isolated focal brain injury, whereas in SAH and TBI there may be brain stem injury or both focal and diffuse damage. This indicates that these scales will be of limited utility for prediction of outcome in patients with these diagnoses. This finding is consistent with the original intent of the scales.
The poor predictive power of the scales for cognitive function implies that factors not captured by these scales have important impact on long-term cognitive function. It is unreasonable to expect high levels of predictive power from scores obtained at the earliest stages of recovery. Although the findings of this study suggest that the stroke scales do measure the contributions of neurological status to functional performance, more elaborated predictive models that include sociodemographic information and measures of prior disability, treatment, and environmental support systems will improve our predictive power.
In conclusion, the results of this study support the use of the MCANS and SNSS in trials of ischemic and hemorrhagic stroke but not SAH or TBI. The scales have achieved acceptable levels of reliability and demonstrate both construct and predictive validity. Considering the many concerns about the psychometric and clinometric status of the existing stroke assessments,31 the use of new statistical methods is essential6 for the development of appropriate clinical measures.
Selected Abbreviations and Acronyms
|FIM||=||Functional Independence Measure|
|ICU||=||intensive care unit|
|MCANS||=||Neurological Scale for Middle Cerebral Artery Infarction|
|SEM||=||structural equation modeling|
|SNSS||=||Scandinavian Neurological Stroke Scale|
|TBI||=||traumatic brain injury|
The authors wish to acknowledge the contributions of Michelle Gettinger, MSOT, Karen Graves, MSOT, and Heidi Nemeth, MSOT, in collecting the stroke scale assessments; Mary Sauer, MSN, for conducting the follow-up FIM evaluations; and Stacy Darin for data entry and preparation of the manuscript. The authors also wish to thank Barnes Hospital Neuroclinical Service Line and Carolyn M. Baum, PhD, OTR, Director of the Program in Occupational Therapy, for their generous support of this project.
- Received February 2, 1995.
- Revision received June 1, 1995.
- Accepted June 20, 1995.
- Copyright © 1995 by American Heart Association
Adams RJ, Nichols FT, Thompson WO. Neurological assessment in acute stroke: issues in the use of rating scales. In: Amery W, Bousser MG, Rose FC, eds. Clinical Trial Methodology in Stroke. London, England: Ballier Tindall; 1989:54-63.
Candelisi L. Stroke scores and scales. Cerebrovasc Dis. 1992;2 (suppl 1):239-247.
Boysen G, Lindenstrom E. Stroke scale comparisons. Stroke. 1993;25:1885-1886.
Lyden PD, Lau GT. A critical appraisal of stroke evaluation and rating scales. Stroke. 1991;25:1345-1352.
Asplundh K. Clinimetrics in stroke research. Stroke. 1987;18:528-530.
Orgogozo JM, Asplundh K, Boysen G. A unified form for neurological scoring of hemispheric stroke with motor impairment. Stroke. 1992;23;1678-1679.
Orgogozo JM, Dartigues JF. Clinical trials in acute brain infarction: the question of assessment criteria. In: Battistini N, ed. Acute Brain Ischemia: Medical and Surgical Therapy. New York, NY: Raven Press Publishers; 1986:282-289.
Orgogozo JM, Dartigues JF. Methodology of clinical trials in acute cerebral ischemia. Cerebrovasc Dis. 1991;1(suppl 1):100-111.
Scandinavian Stroke Study Group. Multicenter trial of hemodilution in ischemic stroke: background and study protocol. Stroke. 1985;16:885-890.
Ecob R, Cuttance P. An overview of structural equation modeling. In: Cuttance P, Ecob R, eds. Structural Equation Modeling by Example. Cambridge, UK: Cambridge University Press; 1987:10-24.
Granger CV, Hamilton BB, Keith RA, Zielny M, Sherwin FS. Advances in functional assessment for medical rehabilitation. Top Ger Rehabil. 1986;1:59-74.
Nunnally JC. Psychometric Theory. New York, NY: McGraw Hill; 1979.
Joreskog KG, Sorbom D. LISREL 8: a guide to program and applications. Chicago, Ill: Scientific Software; 1993.
Brown MW, Cudek R. Alternative ways of assessing model fit. In: Bollen K, Long JS, eds. Testing Structural Equation Models. Newbury Park, Calif: Sage Publications; 1993:136-162.
Jorescog KG. Testing structural equation models. In: Bollen KA, Long JS, eds. Testing Structural Equation Models. Newbury Park, Calif: Sage Publications; 1993:294-336.
Feinstein AR. Clinimetrics. New Haven/London: Yale University Press; 1987.
Carmines EG, Zeller RA. Reliability and Validity Assessment. Newbury Park, Calif: Sage Publications; 1991.
Portnoy LG, Watkins MP. Foundations of Clinical Research: Applications to Practice. Norwalk, Conn: Appleton & Lange; 1993.
Cote R, Battista RN, Wolfson C, Boucher J, Adam J, Hachinski V. The Canadian Neurological Scale: validation and reliability assessment. Neurology. 1989;39:638-643.
Hantsen L, De Weerdt W, DeKeyser J, Diener HC, Franke C, Palm R, Van Orshoven M, Schoonderwalt H, De Klippel N, Herroelen L, Feys H. The European Stroke Scale. Stroke. 1994;25:2215-2219.
Lyden P, Brott T, Tilley B, Welch KMA, Mascha EJ, Levine S, Haley EC, Grotta J, Marler J. Improved reliability of the NIH Stroke Scale using video training. Stroke. 1994;25:2220-2225.
Bollen KA. Measurement models: the relation between latent and observed variables. In: Structural Equations With Latent Variables. New York, NY: John Wiley & Sons; 1989:179-223.
DeHaan RJ, Horn J, Limburg M, van der Meulen J, Bossuyt P. A comparison of five stroke scales with measures of disability, handicap, and quality of life. Stroke. 1993;24:1178-1181.
van Gijn J. Measurement of outcome in stroke prevention trials. Cerebrovasc Dis. 1992;2(suppl 1):23-34.