(Stroke. 1995;26:1852-1858.)
© 1995 American Heart Association, Inc.
Articles |
From the Program in Occupational Therapy (D.F.E.), the George Warren Brown School of Social Work (Y.-W.C.), and the Department of Neurology (D.F.E., M.N.D.), Neurology/Neurosurgery Intensive Care Unit (M.N.D.), Washington University School of Medicine, St Louis, Mo.
Correspondence to Dorothy F. Edwards, PhD, Program in Occupational Therapy, Box 8505, Washington University, St Louis, MO 63110. E-mail DorothyE@OT-Link.WUSTL.EDU.
| Abstract |
|---|
|
|
|---|
Methods We prospectively studied 84 consecutive patients admitted to a neurology/neurosurgery intensive care unit with intracerebral hemorrhage (n=30), subarachnoid hemorrhage (n=15), ischemic stroke (n=15), and traumatic brain injury (n=24). Patients were evaluated within 24 hours of admission and at 48-hour intervals until intensive care unit discharge. A total of 386 assessments were obtained. The Functional Independence Measure was administered by telephone 3 months after hospital discharge.
Results High levels of reliability and construct validity were observed for the majority of the Unified Stroke Scale items. Facial palsy and eye movement items had the lowest reliability and validity. Both the Middle Cerebral Artery and Scandinavian Scales were significant predictors of outcome. Sensitivity and specificity varied by diagnosis. Predictive validity of functional outcome was best in groups with ischemic and hemorrhagic stroke rather than traumatic brain injury and subarachnoid hemorrhage.
Conclusions The Unified Stroke Scale demonstrates reliability and construct and predictive validity, and its use is supported in ischemic and hemorrhagic stroke. Structural equation modeling is an appropriate technique for use with scales of this type.
Key Words: cerebral ischemia intracerebral hemorrhage stroke assessment stroke outcome subarachnoid hemorrhage
| Introduction |
|---|
|
|
|---|
Stroke scales are now routinely used in the testing of new agents for treatment of acute brain injury. Despite increased reliance on such scales, few demonstrate the reliability and validity necessary for use in clinical trials.5 For these scales to be useful in clinical investigation, their reliability and validity must be assessed.6
The Unified Neurological Stroke Scale7 is composed of the Neurological Scale for Middle Cerebral Artery Infarction (MCANS)8 9 and the Scandinavian Neurological Stroke Scale (SNSS).10 The Unified Neurological Stroke Scale uses a scoring format that permits generation of a score for one or both of the stroke scales. Orgogozo et al7 suggest that the use of the Unified Neurological Stroke Scale may provide answers to questions of sensitivity and specificity and the ability of the items individually or in combination to predict mortality, neurological outcome, and future function.
To answer these questions, an assessment of the validity of each scale is necessary. Although there are many different statistical methods for evaluating the validity of a scale, structural equation modeling (SEM) is a method that merges the analytic procedures of multiple regression and factor analysis for the determination of construct validity.11 SEM permits measurement of the extent to which the scale items "fit" the theoretical constructs that the scale was designed to evaluate and the consistency or the degree of random error associated with each item. This approach is particularly appropriate when the scale items are ordinal, as is the case with many clinical measures.
In addition, although stroke scales were originally developed and used to quantify neurological deficits after an ischemic stroke, a broader application may be possible. Since quantification of neurological status is an important aspect of investigations of the clinical efficacy of therapeutic agents in all populations, it is important to evaluate the utility of these scales in a wider population. If the scales are robust, they will demonstrate internal consistency over a range of diagnostic groups. Then the scales could be useful in the study of hemorrhagic stroke, subarachnoid hemorrhage, or cranial trauma resulting in acute focal neurological deficits.
Our prospective study focused on patients admitted to a combined neurology/neurosurgery intensive care unit. Our goals were (1) to determine the reliability, construct validity, and predictive validity of the MCANS and SNSS scales using SEM and logistic regression and (2) to assess the applicability of the MCANS and SNSS scales to ischemic stroke, ICH, SAH, and TBI.
| Subjects and Methods |
|---|
|
|
|---|
The items for the MCANS and SNSS scales are presented in Table 1
. The 10 MCANS items reflect 21 ordinal degrees of
severity. The MCANS items are summed to create a total score ranging
from 0 to 100. This score was used to determine the predictive validity
of the scale. The SNSS yields three separate scores: prognostic
(SNSS-PRG), long-term (SNSS-LT), and a total score. The prognostic
score includes consciousness, gaze palsy, and limb strength. The
long-term score items evaluate limb strength, dysphasia, facial
palsy, orientation, and gait. Each of these scores was included in the
analyses.
|
The scales were administered within 24 hours of admission to the ICU and then at 48-hour intervals until ICU discharge. A total of 386 assessments were obtained; the mean number of assessments per patient was five. The assessments were performed by three occupational therapy graduate students working under the supervision of the medical director of the ICU (M.N.D.). Interrater reliability of .92 or greater was achieved on all scales before beginning the study. Scores from the Acute Physiology and Chronic Health Evaluation (APACHE II)12 at admission were retrospectively obtained from patient charts as an index of severity of illness. A 3-month posthospital discharge follow-up telephone interview was conducted by one nurse. Information obtained during the interview included outcome rated as (1) death, (2) nursing home or custodial care, (3) home-dependent, and (4) independent. The telephone version of the FIM13 was used to evaluate functional performance. This ordinal scale rates 18 activities of daily living on the basis of the amount of assistance required. Responses to the 18 items are used to create a motor score (13 items) and a cognitive score (5 items).14 Both the categorical outcome rating and the two FIM scores were used to determine the predictive validity of the MCANS and SNSS.
Preliminary estimates of reliability for each scale were computed by
diagnostic group using Chronbach's
.15
Reliability and construct validity for the total sample were evaluated
by means of an SEM using LISREL 8.16 Using
this method, a model is developed before data analysis that
consists of one or more theoretical constructs. The constructs are
definitions of the properties being measured by the scale (ie, motor
function, consciousness). Before the initial analyses, three
latent constructs are hypothesized for each stroke scale:
consciousness, upper body function, and lower body motor function. Each
scale item is then assigned to a particular construct. The scale items
are simultaneously loaded into the model to determine their
individual contribution to the construct. If the model is not
statistically significant, the LISREL 8 program generates
alternative models to be tested. The final models showing the
association between the scale items and the constructs are shown in the
Figure
.
|
Since all of the Unified Neurological Stroke Scale items were ordinal,
polychoric correlations and asymptotic covariance matrices
were computed. The weighted least-squares method17 was
used by the LISREL 8 program to estimate the model and
test the significance of the fit. Five goodness-of-fit
statistics are generated as part of the overall model estimation
process:
2, root-mean-square error of
approximation, adjusted goodness of fit index, comparative fit index,
and incremental fit index. Each of these indices emphasizes different
aspects of model fit.18 19 20 Significance on all five
indices was required for the acceptance of a model.
Logistic regression equations were computed for each Unified Neurological Stroke Scale score to evaluate how well each scale at the time of ICU admission predicted survival (alive versus dead). These analyses also provided sensitivity and specificity estimates for each scale by diagnostic group and for the total sample.
Stepwise multiple regression analyses were used to determine the ability of each scale to predict postdischarge FIM motor and cognitive scores. A value of P<.05 was required for statistical significance.
| Results |
|---|
|
|
|---|
|
Twenty-three percent of the total patient sample was unavailable
for follow-up; the majority of these patients were in the TBI
group. The 3-month mortality rate for the total sample was 23%. The
rate differed across diagnostic groups
(
2=32.04, P<.0001), with the lowest
rate (6%) in CVA patients and the highest mortality (36%) in ICH
patients.
The scores from the three stroke scales are presented by
diagnostic group in Table 3
. They did not
differ significantly across the diagnostic groups. Since
the diagnostic groups did not differ significantly with
respect to each stroke scale score or severity of illness, they were
combined for logistic and multiple regression analyses used to
calculate the overall predictive validity of the measures. Subsequent
analyses examined each diagnostic group
separately.
|
Chronbach's
, a measure of internal consistency, for
the MCANS and SSNS was .65 and .59, respectively, for the sample as a
whole. The
coefficients for the MCANS were .79 (CVA), .76 (ICH),
.75 (SAH), and .76 (TBI). The SSNS coefficients were lower across all
diagnostic groups: .53 (CVA), .56 (ICH), .59 (SAH), and .58
(TBI). Since the scale items are ordinal, these coefficients were
interpreted as suggestive but not definitive estimates of reliability.
The SEM modeling offers the advantage of accommodating ordinal data for
determination of both reliability and validity. The final models for
each scale were significant on all five fit indices.
The initial SNSS models were not statistically significant. Two items,
gait and facial palsy, had highly skewed distributions. Eighty-one
percent of the patients were bedridden and were scored 0 on the gait
item. The facial palsy item has only two categories; 91% of the
patients were assigned to the paralysis or marked paresis group.
Therefore, in an attempt to improve the precision of the SNSS, the gait
and facial palsy items were deleted, and the model was simplified to
two constructs: consciousness and motor function. The repeat
analysis produced a final model with significant fit indices.
The eye movement item had the lowest reliability (.31) and validity
(.56). The reliability and validity coefficients for all scale items
are presented in Table 4
.
|
All MCANS items except for facial palsy and eye movements show high
levels of reliability and validity. The facial palsy item had the
lowest levels of reliability (.01) and validity (.12), and yet, when it
was removed from the model, the MCANS equation was no longer
statistically significant. Similarly, the eye movement item is weak,
particularly when compared with other items associated with the same
construct. The results for the eye movement item are similar for both
scales, although the coefficients are slightly higher for the MCANS.
The results of the SEM analysis for the MCANS and SNSS are
shown in the Figure
.
The admission MCANS, SNSS-LT, and SNSS-PRG scores were all
significant predictors of mortality for the total sample (Table 5
). Sensitivity was high (eg, >90%) for all three
measures; however, the SNSS-PRG score produced the best combination of
sensitivity and specificity.
|
The CVA group had only two patients who died. The combination of the small sample size and low mortality makes it impossible to estimate a regression equation for this group. For SAH and TBI patients, the MCANS was able to significantly predict mortality. In both groups, specificity was higher than sensitivity. The results for the SNSS-PRG score were similar to the MCANS findings. The SNSS-LT scores were predictive of mortality only for SAH patients.
To examine the ability of the scales to predict functional outcome,
linear regressions using FIM scores were computed for the combined
sample and by diagnosis (Table 6
). The three scales were
significant predictors of the cognitive FIM scores for the combined
sample. When the diagnostic groups are analyzed
separately, only the MCANS scores of ICH patients were significant
predictors of the cognitive FIM. The predictive performance was
better when the motor components of function were assessed. For the
total sample, all three scales achieved statistically significant
R2. The MCANS reliably predicted motor FIM
scores for the CVA and ICH groups. A similar pattern was seen with the
SNSS-PRG score. The SNSS-LT score had the best predictive validity for
the motor performance of ICH patients. The scales were unable
to predict the motor or cognitive function of TBI patients. Since a
large number of TBI patients were lost to follow-up, these results
must be interpreted with caution.
|
| Discussion |
|---|
|
|
|---|
The second reason for applying these scales to a heterogeneous population was to determine the utility of the scales in other types of acute brain injury. Therapeutic trials of new agents are not limited to ischemic stroke patients, and studies in other populations have the same need for quantification of neurological deficits. A scale that retains its reliability across different samples of patients, testing environments, and testers is considered to have generalizability.23 The results of this study demonstrate that the MCANS and SSNS have reliability and validity even when a heterogeneous sample is used.
The internal consistency of the MCANS and SSNS is comparable to that of other stroke scales.24 25 26 The differences in internal consistency between the MCANS and SSNS may be due in part to the differences in numbers of items and the numbers of possible response categories for each item. The MCANS has 10 items, nine of which have only two or three response options, while the SSNS has nine items, with seven of the nine items offering four response categories. Since internal consistency is a function of the number of items on a scale and the mean correlation between the items, longer scales with fewer choices per item will generally have higher coefficients.27
The poor reliability of the assessment of facial palsy and eye movement
has been reported for other stroke scales as well. The facial movement
item of the European Stroke Scale had a weighted
, which was lowest
of the 14 items on the scale.25 Several
authors26 28 29 have also reported poor interrater
agreement for facial palsy and gaze items of the National Institutes of
Health scale. Lyden et al26 suggest that it is difficult
for raters to distinguish between mild, moderate, and severe facial
weakness. Orgogozo et al9 reduced the number of scoring
categories on facial palsy from three to two when the scale was in
development because of poor interrater reliability. Perhaps the scoring
could be made more reliable if clearly defined criteria were
delineated. The MCANS and SSNS upper and lower extremity function,
consciousness, and orientation items were robust (reliability
coefficients ranging from 1.00 to .78). The findings are
consistent with the results for similar items on other stroke
scales.26
We believe that SEM is a more precise technique for reliability
estimation. All stroke scales are composed of items ranked ordinally.
The use of statistical techniques such as Chronbach's
, which other
investigators have used as an index of reliability, relies on the
assumption of continuous or interval measurement. This may lead to an
overestimation or underestimation of the reliability
coefficients.30
Criterion and construct validity and specificity and sensitivity of stroke scales are often not examined.5 Construct validity is the most difficult to establish because the constructs are not "real" (that is, they are not directly observable) and exist as abstract representations of the theoretical structure that guides construction (or specification) of individual items.23 A scale can represent several constructs as long as the items are clearly associated with the theoretical constructs and the scoring reflects the multidimensional structure of the scale.15 Since the items of a stroke scale are used to quantify the standard neurological examination, the theoretical constructs can be identified, and the construct validity of the scale can be tested. We have demonstrated that the MCANS and SSNS are composed of two theoretically and statistically distinct theoretical constructs.
Criterion, or predictive validity, was measured in two ways in this study. First, the ability to predict mortality was determined, and sensitivity and specificity were computed. In the case of mortality, specificity is more important than sensitivity in that it may be in the best interest of an investigator to exclude patients likely to die from a clinical trial; false-negatives, or the prediction of survival in patients who die, are more problematic than false-positives (patients who are predicted to die but survive).23 Overall, the SNSS-PRG performed better than the SNSS-LT and MCANS when the total sample mortality was analyzed. Within the different diagnostic groups, the differences in sensitivity and specificity between the prognostic and long-term scores were minimal. The results for the TBI patients should be interpreted with caution. Thirty-eight percent of this group was lost to follow-up; therefore, we cannot accurately account for the mortality of this group. It is difficult to justify the calculation of both the prognostic and long-term scores on the basis of this finding. Similar conclusions were reached by DeHaan et al,31 who compared the two SNSS scores with measures of disability and quality of life and found that the two scores yielded identical correlation coefficients with the Barthel Index.
Predictive validity of functional outcome was also examined with the FIM. When analyzing the total sample, the scales are able to reliably predict both postdischarge cognitive and motor performance. Different patterns were observed across the four diagnostic groups. The scales were more predictive of function in groups with localized rather than diffuse brain injury. The ability of the MCANS and SNSS-PRG to predict long-term motor function was best with the CVA and ICH patients. These patients are more likely to develop isolated focal brain injury, whereas in SAH and TBI there may be brain stem injury or both focal and diffuse damage. This indicates that these scales will be of limited utility for prediction of outcome in patients with these diagnoses. This finding is consistent with the original intent of the scales.
The poor predictive power of the scales for cognitive function implies that factors not captured by these scales have important impact on long-term cognitive function. It is unreasonable to expect high levels of predictive power from scores obtained at the earliest stages of recovery. Although the findings of this study suggest that the stroke scales do measure the contributions of neurological status to functional performance, more elaborated predictive models that include sociodemographic information and measures of prior disability, treatment, and environmental support systems will improve our predictive power.
In conclusion, the results of this study support the use of the MCANS and SNSS in trials of ischemic and hemorrhagic stroke but not SAH or TBI. The scales have achieved acceptable levels of reliability and demonstrate both construct and predictive validity. Considering the many concerns about the psychometric and clinometric status of the existing stroke assessments,31 the use of new statistical methods is essential6 for the development of appropriate clinical measures.
| Selected Abbreviations and Acronyms |
|---|
|
| Acknowledgments |
|---|
Received February 2, 1995; revision received June 1, 1995; accepted June 20, 1995.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Barber, P. Langhorne, A. Rumley, G. D.O. Lowe, and D. J. Stott Hemostatic Function and Progressing Ischemic Stroke: D-dimer Predicts Early Clinical Progression Stroke, June 1, 2004; 35(6): 1421 - 1425. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.E. Simon, S.C. Morgan, J.H.W. Pexman, M.D. Hill, and A.M. Buchan CT assessment of conjugate eye deviation in acute stroke Neurology, January 14, 2003; 60(1): 135 - 137. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Ageno, S. Finazzi, L. Steidl, M. G. Biotti, V. Mera, G. Melzi d'Eril, and A. Venco Plasma Measurement of D-Dimer Levels for the Early Diagnosis of Ischemic Stroke Subtypes Arch Intern Med, December 9, 2002; 162(22): 2589 - 2593. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. J. Powers, A. R. Zazulia, T. O. Videen, R. E. Adams, K.D. Yundt, V. Aiyagari, R. L. Grubb Jr., and M. N. Diringer Autoregulation of cerebral blood flow surrounding acute (6 to 22 hours) intracerebral hemorrhage Neurology, July 10, 2001; 57(1): 18 - 24. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Lyden, M. Lu, C. Jackson, J. Marler, R. Kothari, T. Brott, and J. Zivin Underlying Structure of the National Institutes of Health Stroke Scale : Results of a Factor Analysis Stroke, November 1, 1999; 30(11): 2347 - 2354. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. Njemanze, A. Chidi-Ebere, and J.-M. Orgogozo Three-Dimensional Vector Component Analysis of Neurological Stroke Scales • Response Stroke, August 1, 1999; 30(8): 1731 - 1733. [Full Text] [PDF] |
||||
![]() |
L. D'Olhaberriague, I. Litvan, P. Mitsias, and H. H. Mansbach A Reappraisal of Reliability and Validity Studies in Stroke Stroke, December 1, 1996; 27(12): 2331 - 2336. [Abstract] [Full Text] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 1995 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |