Modified National Institutes of Health Stroke Scale Can Be Estimated From Medical Records
Background and Purpose— The 15-item National Institutes of Health Stroke Scale (NIHSS) is a quantitative measure of stroke-related neurological deficit with established reliability and validity for use in clinical research. An abridged 11-item modified NIHSS (mNIHSS) has been described that simplifies or eliminates redundant and less reliable items. We aimed to determine whether the mNIHSS could be accurately abstracted from medical records to facilitate retrospective research.
Methods— We selected 39 patient records for which NIHSS scores were formally measured. Handwritten notes from medical records were abstracted, and NIHSS item scores were estimated by 5 raters blinded to actual scores. Estimated scores were compared among raters and with actual measured scores.
Results— Interrater reliability for total NIHSS on admission and discharge was excellent, with intraclass correlation coefficients (ICCs) of 0.85 and 0.79, respectively. However, ICCs for 2 items (facial palsy and dysarthria) were poor (<0.40). Interrater reliability for total mNIHSS was slightly greater, with ICCs of 0.87 and 0.89 on admission and discharge, respectively. None of the 11 mNIHSS items had poor reliability, 4 were moderate (ICC, 0.40 to 0.75), and 7 were excellent (ICC >0.75). Sixty-two percent of estimated total NIHSS scores were within 2 points of actual scores and 91% were within 5 points, whereas 70% of estimated total mNIHSS scores were within 2 points and 95% were within 5 points.
Conclusions— The mNIHSS can be estimated from medical records with a high degree of reliability and validity. In retrospective assessment of stroke severity, the mNIHSS performs better than the standard NIHSS and may be easier to use because it has fewer and simpler items.
Quantitative measures of stroke-related deficit are necessary for clinical research to account for baseline severity and to analyze outcomes. Retrospective studies often precede prospective cohort or randomized clinical trials to provide preliminary data and generate hypotheses. However, retrospective stroke studies that rely on chart review must transform qualitative clinical narratives of the neurological examination into quantitative measures for formal statistical analyses.
The National Institutes of Health Stroke Scale (NIHSS) is a quantitative measure of stroke-related neurological deficit that was developed for prospective clinical research. It includes key aspects of the neurological examination: level of consciousness, speech and language function, neglect, visual fields, eye movements, facial symmetry, motor strength, sensation, and coordination.1–3 The scale has proven intrarater and interrater reliability and has predictive validity for stroke outcome.1–4 It has gained widespread use as a measure of initial and final neurological deficit in acute stroke trials. Subsequent research has demonstrated that the NIHSS can also be estimated from medical records with a high degree of reliability and validity, although accuracy may be better in academic and teaching hospitals than in community hospitals.5–7
A modified version of the NIHSS (mNIHSS) was recently developed that is simpler and easier to use than the standard NIHSS.8,9 Redundant items and those with lower reliability were deleted: level of consciousness (item 1a), facial palsy (item 4), limb ataxia (item 7), and dysarthria (item 10). The sensory item (item 8) was collapsed from 3 to 2 choices to improve its reliability. The mNIHSS has been shown to be clinometrically similar to the standard NIHSS with excellent reliability and validity in prospective studies and appeared to offer increased power for statistical analysis.8,9
We hypothesized that the mNIHSS score could be abstracted from medical records with a degree of reliability and validity that was comparable to or better than retrospective estimation of the standard NIHSS.
Subjects and Methods
This study was performed at an academic university hospital after review and approval by our institutional review board.
Patients were selected for this study if they had suffered an acute ischemic stroke leading to enrollment in an experimental stroke protocol. To be included, patients must have had a formal measurement of the NIHSS score (as part of the experimental stroke protocol) at the time of admission. This information was collected prospectively during 3 separate clinical trials, providing 39 patients for analysis. The medical monitors for these clinical trials permitted the use of their case report forms to determine the actual NIHSS item and total scores for this study. For each eligible patient, the handwritten notes from the days of admission and discharge were photocopied, edited to remove any reference to the actual measured NIHSS scores, and then photocopied again for distribution to the raters. The raters estimated each component of the NIHSS for each patient on admission and discharge and calculated the total score. The admission and discharge notes were collated together so that raters could make inferences for a given patient about the discharge score based in part on the admission score, even if documentation was incomplete in the discharge note. This approach to handling poor or missing information was chosen because it approximates the actual process of retrospective chart abstraction. Actual and estimated mNIHSS scores were calculated by deletion of the 4 standard NIHSS items as described above and by dichotomization of sensory (item 8) scores as 0=normal and 1=abnormal.
Five raters of various levels of experience reviewed the records, including a stroke specialist-attending physician, stroke fellow, senior neurology resident, junior neurology resident, and nurse coordinator. All were previously trained and certified in the administration of the NIHSS. All were blinded to the actual NIHSS score and to each other’s ratings.
Statistical analysis was performed by use of STATA version 7.0 (Stata Corp). Scores were compared among raters, and interrater reliability was determined by calculation of an intraclass correlation coefficient (ICC) through analysis of variance. The ICC reflects the proportion of the total variance that is due to the “true” variance among patients and is calculated as ςs2/(ςs2+ςr2+ςe2), where ςs2 is the variance component for subjects, ςr2 is the variance component for the raters, and ςe2 is the variance component for residual error.10 The ICC can be interpreted as a weighted κ statistic: an ICC of 1 suggests perfect reliability, and an ICC >0.75 is generally considered to represent excellent reliability.11 Pairwise comparisons between raters were also assessed with the ICC. Criterion validity was determined by comparison of estimated scores with actual scores as recorded on the case report forms of the clinical trials in which these patients had been enrolled.
Actual NIHSS item and total scores were available on admission for 39 patients, and of these 39, 30 had formal NIHSS measurements on discharge, for a total of 69 patient records. Notes were written by at least 1 neurologist (attending or resident physician) for all patients on admission and 28 of the 30 patients on discharge. Complete or nearly complete neurological examinations were documented in 38 of 39 admission notes but in only 15 of 30 of discharge notes.
Actual NIHSS total scores ranged from 0 to 23, with a median of 9; estimated standard NIHSS total scores ranged from 0 to 21, with a median of 9. Interrater reliability for estimated total standard NIHSS on admission and discharge was excellent, with ICCs of 0.85 and 0.79, respectively. Agreement between pairs of raters was also excellent, with ICCs ranging from 0.81 to 0.94. ICCs for individual items on admission are summarized in the Table. Similar results were found for discharge (data not shown). ICCs for 2 items were poor (<0.40), 6 were moderate (ICC, 0.40 to 0.75), and 7 were excellent (ICC >0.75).
Actual mNIHSS total scores ranged from 0 to 21, with a median of 6, whereas estimated mNIHSS total scores ranged from 0 to 18, with a median of 6. Interrater reliability for total mNIHSS was slightly greater than for the standard NIHSS, with ICCs of 0.87 and 0.89 on admission and discharge, respectively. Agreement between pairs of raters was also excellent, with ICCs ranging from 0.84 to 0.94. None of the 11 mNIHSS items had poor reliability, 4 were moderate (ICC, 0.40 to 0.75), and 7 were excellent (ICC >0.75).
Estimated scores were compared with actual scores. Sixty-two percent of estimated total NIHSS scores were within 2 points of actual scores and 91% were within 5 points, whereas 70% of estimated total mNIHSS scores were within 2 points and 95% were within 5 points.
The mNIHSS appears to offer a number of advantages over the standard NIHSS.8,9 It is simpler and shorter yet provides comparable reliability and validity. Our analysis suggests that the mNIHSS can be retrospectively estimated from the review of medical records with a high degree of reliability and validity, comparable to and perhaps slightly better than the standard NIHSS.
Given the inherent difficulties of chart review, estimation of the mNIHSS seems to be more “user friendly” than for the standard NIHSS and should therefore make this task less onerous in retrospective research. Observational retrospective cohort and case-control studies cannot substitute for carefully designed prospective trials but are often important for hypothesis generation and for situations in which randomized clinical trials and prospective cohorts are not feasible. The findings from the present study will be most useful for those retrospective studies in which information about stroke-related neurological deficits must be abstracted qualitatively and then transformed into a quantitative format for analysis.
These data were obtained from a single academic university setting in which the vast majority of patients were evaluated by neurologists. A recent study confirmed the reliability of chart abstraction of the NIHSS in academic settings and in community hospitals with acute neurological consultative services available, but this method proved much less reliable at community hospitals without such services. The simpler but less comprehensive and less commonly used Canadian Neurological Scale12,13 was comparable to the standard NIHSS in the first 2 hospital types but clearly outperformed the standard NIHSS in the community hospitals without acute neurological consultation.5 Further studies may determine whether the mNIHSS offers an improvement in this setting by eliminating and simplifying items that are characterized by poor interobserver reliability and may be lacking in medical records written by non-neurologists.
Dr Kasner was supported by NIH/NINDS grant K23-NS02147.
- Received August 7, 2002.
- Accepted August 29, 2002.
Brott TG, Adams HP, Olinger CP, Marler JR, Barsan WG, Biller J, Spilker J, Holleran R, Eberle R, Hertzberg V, Rorick M, Moomaw CJ, Walker M. Measurements of acute cerebral infarction: a clinical examination scale. Stroke. 1989; 20: 864–870.
Goldstein LB, Samsa GP. Reliability of the National Institutes of Health Stroke Scale: extension to non-neurologists in the context of a clinical trial. Stroke. 1997; 28: 307–310.
Lyden P, Brott T, Tilley B, Welch KM, Mascha EJ, Levine S, Haley EC, Grotta J, Marler J. Improved reliability of the NIH Stroke Scale using video training. Stroke. 1994; 25: 2220–2226.
Muir KW, Weir CJ, Murray GD, Povey C, Lees KR. Comparison of neurological scales and scoring systems for acute stroke prognosis. Stroke. 1996; 27: 1817–1820.
Bushnell CD, Johnston DCC, Goldstein LB. Retrospective assessment of initial stroke severity: comparison of the NIH Stroke Scale and the Canadian Neurological Scale. Stroke. 2001; 32: 656–660.
Kasner SE, Chalela JC, Luciano JM, Cucchiara BL, Raps EC, McGarvey ML, Conroy MB, Localio AR. Reliability and validity of estimating the NIH Stroke Scale from medical records. Stroke. 1999; 30: 1534–1537.
Williams LS, Yilmaz EY, Lopez-Yunez AM. Retrospective assessment of initial stroke severity with the NIH Stroke Scale. Stroke. 2000; 31: 858–862.
Lyden PD, Lu M, Levine SR, Brott TG, Broderick J. A modified National Institutes of Health Stroke Scale for use in stroke clinical trials: preliminary reliability and validity. Stroke. 2001; 32: 1310–1317.
Meyer BC, Hemmen TM, Jackson CM, Lyden PD. Modified National Institutes of Health Stroke Scale for use in stroke clinical trials: prospective reliability and validity. Stroke. 2002; 33: 1261–1266.
Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use. 2nd ed. Oxford, UK: Oxford University Press; 1995: 111–112.
Fleiss JL. Statistical Methods for Rates and Proportions. New York, NY: Wiley & Sons; 1981: 218.
Cote R, Battista R, Wolfson C, Boucher J, Adam J, Hachinski V. The Canadian Neurological Scale: validation and reliability assessment. Neurology. 1989; 39: 638–643.
Cote R, Hachinski V, Shurvell B, Norris J, Wolfson C. The Canadian Neurological Scale: a preliminary study in acute stroke. Stroke. 1986; 17: 731–737.