(Stroke. 2001;32:656.)
© 2001 American Heart Association, Inc.
Original Contributions |
From the Department of Medicine (Neurology) (C.D.B., D.C.C.J., L.B.G.), Duke Center for Cerebrovascular Disease (C.D.B., D.C.C.J., L.B.G.), Center for Clinical Health Policy Research (L.B.G.), Duke University and Durham Veterans Affairs Medical Center (C.D.B., D.C.C.J., L.B.G.), Durham, NC.
Correspondence to Larry B. Goldstein, MD, Director, Duke Center for Cerebrovascular Disease, Department of Medicine (Neurology), PO Box 3651, Durham, NC 27710. E-mail golds004{at}mc.duke.edu
| Abstract |
|---|
|
|
|---|
MethodsRandomly selected records of patients with ischemic stroke admitted to an AMC (n=20) and community hospitals with (CH1, n=19) and without (CH2, n=20) acute neurological consultative services were reviewed. NIHSS and CNS scores were assigned independently by 2 neurologists using published algorithms. Interrater reliability of the scores was determined with the intraclass correlation coefficient, and the numbers of missing items were tabulated.
ResultsThe intraclass correlation coefficient for NIHSS and CNS, respectively, were 0.93 (95% CI, 0.82 to 1.00) and 0.97 (95% CI, 0.90 to 1.00) for the AMC, 0.89 (95% CI, 0.75 to 1.00) and 0.88 (95%, 0.73 to 1.00) for the CH1, and 0.48 (95% CI, 0.26 to 0.70) and 0.78 (95% CI, 0.60 to 0.96) for the CH2. More NIHSS items were missing at the CH2 (62%) versus the AMC (27%) and the CH1 (23%, P=0.0001). In comparison, 33%, 0%, and 8% of CNS items were missing from records from CH2, AMC, and CH1, respectively (P=0.0001).
ConclusionsThe levels of interrater agreement were almost perfect for retrospectively assigned NIHSS and CNS scores for patients initially evaluated by a neurologist at both an AMC and a CH. Levels of agreement for the CNS were substantial at a CH2, but interrater agreement for the NIHSS was only moderate in this setting. The proportions of missing items are higher for the NIHSS than the CNS in each setting, particularly limiting its application in the hospital without acute neurological consultative services.
Key Words: cerebral infarction quality of health care stroke assessment
| Introduction |
|---|
|
|
|---|
The retrospective application of these scales has only been assessed in limited settings thus far. For example, the NIHSS was found to be both reliable and valid when applied retrospectively in a study of patients enrolled in clinical trials who had been prospectively assessed.4 A retrospective algorithm developed to apply the NIHSS on the basis of data extracted from patients medical records in an academic hospital setting also appeared to be reliable and valid.5 The reliability and validity of the CNS was established in a similar setting.6 However, in comparison to the CNS, the NIHSS requires detailed neurological evaluations that may not be reflected in all patient records. Even when retrospectively applied in an academic medical center (AMC), only 1 record provided information that permitted completion of all items of the NIHSS.5
Only a minority of stroke patients are admitted to AMCs. The majority are cared for by non-neurologists and are admitted to community hospitals.8 However, neither the NIHSS nor the CNS has been used retrospectively in these settings. Because of the detail necessary for NIHSS scoring, we hypothesized that retrospective assessment of the CNS would be more reliable than the NIHSS when using records from community hospitals in which evaluations were performed by non-neurologists. The aim of the present study was to assess the reliability of the published retrospective algorithms and to document the proportions of missing items for the NIHSS and CNS in stroke patients admitted to an AMC in comparison to community hospitals with and without acute neurological consultative services.
| Subjects and Methods |
|---|
|
|
|---|
The initial neurological examination documented in the admission notes was preferentially used for retrospective assessment of stroke severity. Discharge summaries were used only when the admission note was not available and the admission neurological examination was adequately documented (n=2). Data abstraction was performed independently by 2 neurologists who were certified in prospective administration of the NIHSS but were not blinded to the source of the hospital records. The NIHSS and the CNS scores were assigned using published algorithms.5 6 Missing items from the NIHSS and the CNS were scored as normal.5 6 Scores for individual scale items, the total scores, and the number of missing items for each scale were recorded.
Interrater reliability was assessed with the intraclass
correlation coefficient (ICC). The ICC is a measure of the total
variance of the sample, which includes the differences among reviewers,
the differences among subjects, and the unexplained residual variance.
The ICC is maximized if the variance caused by differences among
subjects is high, relative to the variance caused by differences among
reviewers and residual
variance.9 Weighted
scores were calculated for individual items of the NIHSS and CNS for
each hospital. For reference, both the ICC and weighted-
scores may
be interpreted according to the following guidelines: chance (0), poor
(0 to 0.19), fair (0.20 to 0.39), moderate (0.40 to 0.59), substantial
(0.60 to 0.79), and almost perfect agreement (0.80 to 1.0). Statistical
analysis was performed using the SAS statistical software
package (SAS). Kruskal-Wallis nonparametric ANOVA
statistics were used to compare scores among hospitals. The protocol
was reviewed and exempted by the Institutional Review Board at each
hospital.
| Results |
|---|
|
|
|---|
2 test,
P=0.90). The median and range
of NIHSS and CNS total scores are given in
Table 1
|
|
The medians, ranges, and proportion of missing items for
each scale in each setting are shown in
Table 3
. CH2 had significantly more missing items for both
the NIHSS and the CNS than the AMC or CH1 (Kruskal-Wallis
2=70.6,
P=0.0001, and
2=52.2,
P=0.0001, for NIHSS and CNS,
respectively). The proportions of missing items from the NIHSS and the
CNS by setting are given in
Figures 1
and 2
,
respectively.
|
|
|
Interobserver agreement for individual retrospective NIHSS
items is shown in
Table 4
. For the AMC, substantial (
range, 0.6 to 0.8)
or almost perfect (
>0.8) agreement was found for all items of the
NIHSS, except for visual fields (
=0.47). At CH1, 9 of 13 items
showed substantial or better agreement, but all 3 level of
consciousness (LOC) subscores and gaze assessments resulted in only
fair or moderate agreement. At CH2, only 4 of 13 items had substantial
or better agreement. In addition, extinction was missing from 100% of
the records at CH2, therefore the
score for this item could not
be assessed. Visual field assessment resulted in chance
agreement.
|
Interobserver agreement for individual retrospective CNS
items is given in
Table 5
. For the AMC, all items had almost perfect
agreement, with the exception of orientation assessment, which was
moderate. At CH1, all items had substantial or better agreement except
for LOC and orientation assessments, which were fair to moderate. At
CH2, all items had substantial or better agreement except for distal
arm and leg assessments, which were only fair.
|
| Discussion |
|---|
|
|
|---|
The retrospective NIHSS algorithm was developed using typical hospital discharge summaries of stroke patients.5 However, the retrospective application of the NIHSS is dependent on documentation of the neurological examination at admission. The retrospective algorithm was developed and assessed with patients admitted to an AMC and was validated in comparison to prospective scores. In this setting, missing items from the retrospective score were frequently normal when correlated with the prospective score.5 This assumption may be appropriate when neurologists evaluate acute stroke patients, but, as shown in the present study, non-neurologists may not routinely document neurological examinations in the same detail as neurologists. We found that the retrospective interpretation of ambiguously documented examination findings was difficult, leading to poorer reliability of the retrospective NIHSS. The retrospective scoring algorithm may need to be revised to address this problem.
The proportion of missing items was higher for both scales
at the CH2 than for the AMC or the CH1
(Table 3
). Over one half of the NIHSS items were missing
from each record at CH2, and some individual records were
lacking 90% of the items for this scale. According to the
retrospective algorithms, missing items are scored as normal. This
could lead to an underestimation of the magnitude and uncertainty as to
the nature of the stroke-related deficit in settings with a high
proportion of missing items. The high proportion of missing
retrospective NIHSS items limits its application for outcome studies,
particularly when acute neurological consultation is not available. In
addition, the items most commonly documented at CH2 included motor
deficit of the arm (60%) and leg (60%) and LOC (56%), effectively
reducing the NIHSS score to the items captured by the CNS.
Consistent with Williams et al,5 individual NIHSS items most commonly missing from patient records were assessments of dysarthria, visual fields, and neglect/extinction. Assessments of motor arm deficit, visual fields, and aphasia were particularly unreliable at CH2. Items such as hemianopia and extinction may not be documented because those features of the examination may not affect clinical decision-making at admission. However, both hemianopia and neglect/inattention are predictors of functional independence and outcome in stroke patients.1 10 11 Therefore, inconsistencies in documentation of these items in the retrospective NIHSS may provide misleading assessments of stroke prognosis and outcome.
The reliability of the individual NIHSS items was better at
the AMC than in either of the community hospitals
(Table 4
). This result may be due to a higher level of
documentation in a teaching hospital than in a community hospital
setting.5 Furthermore, when
the examination was documented by a non-neurologist in the community
hospital, the raters disagreed more frequently on which items were
missing. This affected scoring (missing NIHSS items scored as normal),
and therefore estimates of the weighted
scores were statistically
unstable, with large standard errors and wide confidence limits. In
addition, with items such as visual fields, the simple agreement was
high (95%), but the weighted
was low (0). This paradox has been
well-described and occurs when the marginal or column values are
unbalanced, leading to a high expected proportion as a result of
chance.12 13
Because the numerator portion of the formula is equal to the observed
minus the expected proportion caused by chance, the resulting
score
is low. These
scores should therefore be interpreted with
caution.
This study has several limitations. First, most patients had only mild to moderate strokes (median NIHSS, 4.5 to 8), reflecting the patient populations at the participating hospitals. Reliability may differ in settings with patients with a wider range of deficits. Second, neither the NIHSS nor the CNS scores were assessed prospectively (gold standard) as a method of validation. However, the purpose of this study was to assess the comparative reliability of the NIHSS and the CNS and not to revalidate the retrospective algorithm. Third, some bias cannot be excluded because the abstractors were not blinded to the source of the hospital records. However, it is very difficult to effectively blind the assessments because of obvious differences in the ways in which examinations were documented in the medical record. Finally, abstractors for the present study were neurologists certified in the prospective use of the NIHSS. In practice, abstractors should also be certified in retrospective application of either algorithm if they are to be used in clinical studies.
Unlike the comprehensive neurological assessment provided by the NIHSS, the CNS focuses on LOC and motor deficits. As a result, the NIHSS is more frequently used in prospective clinical trials to give a fuller assessment of stroke-related impairments. Both scales have significant prognostic value;7 however, the increased comprehensiveness of the NIHSS must be balanced against the greater reliability of the CNS as applied retrospectively, especially in studies relying on assessments documented in the medical record by non-neurologists. The impairments captured by the CNS correlate well with disability measurements of activities of daily living such as the Barthel Index,14 a common outcome scale used in both prospective15 and retrospective16 stroke studies. Therefore, the CNS provides a reliable and clinically meaningful retrospective assessment of initial stroke severity that can be applied in a variety of clinical settings.
A reliable and valid assessment of stroke severity is a critical covariate for the analysis and interpretation of outcome studies.1 The reliability of the retrospective NIHSS and CNS scoring was acceptable in both AMC and CH settings in which neurologists performed and documented the initial evaluations. However, the high proportion of missing items limits the application of the retrospective NIHSS in a setting in which a neurologist did not perform the evaluation at the time of hospital admission. The use of the CNS may result in a more reliable assessment of stroke severity than the NIHSS for retrospective outcome studies that include community hospitals without acute neurologic consultative services.
| Acknowledgments |
|---|
Received August 31, 2000; revision received November 23, 2000; accepted December 6, 2000.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. M. Boyd, C. O. Weiss, J. Halter, K. C. Han, W. B. Ershler, and L. P. Fried Framework for Evaluating Disease Severity Measures in Older Adults With Comorbidity J. Gerontol. A Biol. Sci. Med. Sci., March 1, 2007; 62(3): 286 - 295. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Fischer, M. Arnold, K. Nedeltchev, C. Brekenfeld, P. Ballinari, L. Remonda, G. Schroth, and H. P. Mattle NIHSS Score and Arteriographic Findings in Acute Ischemic Stroke Stroke, October 1, 2005; 36(10): 2121 - 2125. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. W. Duncan, R. Zorowitz, B. Bates, J. Y. Choi, J. J. Glasberg, G. D. Graham, R. C. Katz, K. Lamberty, and D. Reker Management of Adult Stroke Rehabilitation Care: A Clinical Practice Guideline Stroke, September 1, 2005; 36(9): e100 - e143. [Full Text] [PDF] |
||||
![]() |
L. B. Goldstein, G. P. Samsa, D. B. Matchar, and R. D. Horner Charlson Index Comorbidity Adjustment for Ischemic Stroke Outcome Studies Stroke, August 1, 2004; 35(8): 1941 - 1945. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.B. Goldstein, D.B. Matchar, J. Hoff-Lindquist, G.P. Samsa, and R.D. Horner VA Stroke Study: Neurologist care is associated with increased testing but improved outcomes Neurology, September 23, 2003; 61(6): 792 - 796. [Abstract] [Full Text] [PDF] |
||||
![]() |
I-P. Hsueh, C.-H. Wang, C.-F. Sheu, and C.-L. Hsieh Comparison of Psychometric Properties of Three Mobility Measures for Patients With Stroke Stroke, July 1, 2003; 34(7): 1741 - 1745. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Saur, T. Kucinski, U. Grzyska, B. Eckert, C. Eggers, W. Niesen, V. Schoder, H. Zeumer, C. Weiller, and J. Rother Sensitivity and Interrater Agreement of CT and Diffusion-Weighted MR Imaging in Hyperacute Stroke AJNR Am. J. Neuroradiol., May 1, 2003; 24(5): 878 - 885. [Abstract] [Full Text] [PDF] |
||||
![]() |
N U Weir, C E Counsell, M McDowall, A Gunkel, and M S Dennis Reliability of the variables in a new set of models that predict outcome after stroke J. Neurol. Neurosurg. Psychiatry, April 1, 2003; 74(4): 447 - 451. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Kasner, B. L. Cucchiara, M. L. McGarvey, J. M. Luciano, D. S. Liebeskind, and J. A. Chalela Modified National Institutes of Health Stroke Scale Can Be Estimated From Medical Records Stroke, February 1, 2003; 34(2): 568 - 570. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. C. Meyer, T. M. Hemmen, C. M. Jackson, and P. D. Lyden Modified National Institutes of Health Stroke Scale for Use in Stroke Clinical Trials: Prospective Reliability and Validity Stroke, May 1, 2002; 33(5): 1261 - 1266. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Bushnell, G. P. Samsa, and L. B. Goldstein Hormone replacement therapy and ischemic stroke severity in women: A case-control study Neurology, May 22, 2001; 56(10): 1304 - 1307. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2001 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |