Categorizing Stroke Prognosis Using Different Stroke Scales
Background and Purpose— Stroke severity and dependency are often categorized to allow stratification for randomization or analysis. However, there is uncertainty whether the categorizations used for different stroke scales are equivalent. We investigated the amount of information retained by categorizing severity and dependency, and whether the currently used cut-offs are equivalent across different stroke scales.
Methods— Stroke severity and dependency have been categorized as mild, moderate, or severe. We studied 2 acute stroke unit cohorts, measuring Scandinavian Stroke Scale (SSS), modified Rankin Scale (mRS), Barthel Index (BI), and modified National Institutes of Health Stroke Scale (mNIHSS). Receiver operating characteristic (ROC) curves were examined to determine the ability of full and categorized scales to predict death and dependency. A weighted kappa analysis assessed agreement between the categorized scales.
Results— When scales are categorized, the area under the ROC curve is significantly reduced; however, the differences are small and may not be practically important. BI, mRS, and SSS all have excellent agreement with each other when categorized, whereas mNIHSS has substantial agreement with mRS and BI.
Conclusions— Little predictive information is lost when stroke scales are categorized. There is substantial to almost perfect agreement among categorized scales. Therefore the use and categorization of a variety of stroke severity or dependency scales is acceptable in analyses.
Stroke is a heterogeneous condition which can resolve within a few hours or be rapidly fatal. When measuring severity, further variation may be introduced by the use of different assessment scales.
Stroke severity or dependency categories are often used to stratify randomization or analysis. However, there is uncertainty whether the categorizations used for different stroke scales are equivalent. Although discrepancies between severity or dependency classifications in the different stroke scales would not influence estimates of treatment effect, they may bias the assessment of interactions between severity and treatment. We investigated the amount of information retained by categorizing severity or dependency, and whether the currently used cut-offs are equivalent across different stroke scales.
We studied 2 hospital-based cohorts of consecutive unselected acute stroke admissions. The first cohort of 733 patients, Barber,1 measured baseline SSS, mRS, and the BI2; death and dependency (defined as equivalent to mRS >2) were measured at one month. In addition to baseline mRS and BI, the second cohort of 412 patients, Sellars,3 recorded the mNIHSS4; death and dependency were measured 3 months poststroke.
Table shows an example (from the Stroke Units Trialists’ Collaboration5) of stroke scale categories for acute stroke severity. The cut-off values were chosen in a similar fashion to the minimum probability value approach.6 Three categories were chosen because a larger number of strata may lead to some strata containing few patients and hence incomplete randomized blocks. Conversely, dichotomizing severity may discard too much information.
ROC curves were used to assess the usefulness of each stroke scale in predicting outcome with comparisons made using nonparametric methods.7 Agreement between the categorizations of different scales was assessed using a weighted kappa analysis,8 exact agreement being given a weight of 1 and disagreements in adjacent and disparate categories weighted as 0.5 and 0, respectively.9
Of 733 patients in the first cohort, 665 (91%) had mRS, BI, and SSS recorded within 3 days of admission. In the second cohort (412 patients), 405 (98%) had mRS, BI and mNIHSS recorded by day 5. In both cohorts, death and dependency were recorded for all patients.
The distribution of severity or dependency categories was consistent across scales within cohorts and between cohorts: mild (54% to 57%), moderate (22% to 31%), and severe (15% to 21%).
Full Versus Categorized Scales
The ROC curves for both cohorts show moderate or high accuracy in predicting outcome for both the full and categorized scales. Figures 1 and 2⇓ show little difference between the full and categorized scales for the outcome of death, whereas there is slightly lower predictive accuracy of the categorized scales for death or dependency. The predictive accuracy for the full scales is also lower for this outcome.
Comparison of Stroke Scales
For the Barber1 cohort (Figure 1), the majority of the significant differences in areas lie within the death or dependency outcome. For both the full and categorized scales, mRS has a significantly larger area under the curve than SSS and BI, whereas BI has a larger area than SSS, suggesting mRS has the best predictive accuracy at 1 month after stroke. However, any differences were small with narrow confidence intervals. The differences in predictive accuracy between stroke scales are less apparent for 3-month outcome (Figure 2).
Subjective guidelines for interpreting weighted kappa analysis10 indicate BI, mRS, and SSS all have excellent agreement with each other when categorized (BI/mRS weighted kappa 0.85, percentage agree 88%; BI/SSS 0.80, 84%; mRS/SSS 0.77, 82%), whereas mNIHSS has substantial agreement with mRS and BI (mNIHSS/mRS 0.66, 73%; mNIHSS/BI 0.66, 74%).
Weighted kappa results across Oxfordshire Community Stroke Project (OCSP) classification categories were broadly consistent with the weighted kappa for the entire dataset: BI/mRS weighted kappa 0.77 to 0.80, percentage agree 73% to 88%; BI/SSS 0.65 to 0.80, 78% to 90%; mRS/SSS 0.62 to 0.84, 76% to 93%; mNIHSS/mRS 0.39 to 0.60, 0.72 to 0.76%; mNIHSS/BI 0.21 to 0.63, 68% to 78%.
The SSS, mRS, BI and mNIHSS all have moderate to high predictive accuracy. When the scales are categorized the reduction in area under the ROC curve, although statistically significant, is small and may be unimportant practically: little predictive information is lost.
When comparing the prognostic accuracy of stroke scales at 1-month follow-up, the mRS predicted death or dependency better than the SSS or the BI. Dependency was measured using the mRS, therefore baseline mRS would be expected to be a better predictor. Nevertheless, the differences between scales either when full or categorized were small and of little practical importance. Weighted kappa analysis of the categorized scales showed there was substantial to almost perfect agreement among scales, and the results are broadly consistent within OCSP classifications. The mNIHS is poor at detecting the symptoms and signs of posterior circulation syndrome. However, with the exception of this category the levels of mNIHSS agreement were consistently substantially higher (mNIHSS/mRS 0.51 to 0.60; mNIHSS/BI 0.48 to 0.63).
The 2 cohorts contained several hundred unselected patients with a broad range of case mix. There were few missing data, limiting the possibility of bias. Each cohort had several severity measurements allowing us to test 4 commonly used stroke scales for equivalence between full and categorized versions. The different follow-up times for the cohorts showed that the results are not restricted to one duration of follow-up. However, it would be of interest to examine whether these results could be replicated in other cohort studies.
This study indicates that categorization of a stroke scale does not substantially reduce its predictive ability. Scales stratified in this way are broadly equivalent. Although prognostic accuracy for longer follow-up is lower, it is not further reduced by categorization of severity. These findings emphasize that stratifying randomization in acute stroke clinical trials by severity can be a pragmatic approach, retaining much of the prognostic information contained in the corresponding full assessment scale.
Sources of Funding
L.G. is supported by a Chest Heart and Stroke Scotland research studentship.
- Received May 12, 2009.
- Revision received June 9, 2009.
- Accepted June 29, 2009.
De Hann R, Horn J, Limburg M, Van Der Meulen J, Bossuyt P. A comparison of five stroke scales with measures of disability, handicap and quality of life. Stroke. 1993; 24: 1178–1181.
Sellars C, Bowie L, Bagg J, Sweeney MP, Miller H, Tilston J, Langhorne P, Stott DJ. Risk factors for chest infection in acute stroke: a prospective cohort study. Stroke. 2007; 38: 2284–2291.
Lyden PD, Lu M, Levine SR, Brott TG, Broderick J. A modified National Institutes of Health Stroke Scale for use in stroke clinical trials - preliminary reliability and validity. Stroke. 2001; 32: 1310–1316.
Stroke Unit Trialists’ Collaboration. Organised inpatient (stroke unit) care for stroke. Cochrane Database of Systematic Reviews. 2007.
Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cut points in the evaluation of prognostic factors. J Natl Cancer Inst. 1994; 86: 829–835.
Cicchetti DV, Allison T. A new procedure for assessing reliability of scoring EEG sleep recordings. Am J EEG Technol. 1971; 11: 101–109.