Accuracy and Clinical Usefulness of Intracerebral Hemorrhage Grading Scores
A Direct Comparison in a UK Population
Background and Purpose—Various grading scores to predict survival after intracerebral hemorrhage (ICH) have been described. We aimed to test the accuracy and clinical usefulness of 3 well-known scores (original ICH score, modified ICH score, and ICH grading scale) in a large unselected cohort of typical ICH patients.
Methods—A total of 1364 ICH cases were referred to our center from January 1, 2008, to October 17, 2010. Clinical details were prospectively recorded, and the first computed tomography brain scan was retrospectively reviewed to determine ICH volume and location and to identify intraventricular hemorrhage. The original ICH, ICH grading scale, and modified ICH score were calculated. Receiver operating characteristic and decision curves for 30-day mortality were generated.
Results—A total of 1175 patients were included in the final analysis. All 3 scores and the Glasgow Coma Scale (GCS) divided cases into groups with highly significant differences in mortality. The area under the receiver operating characteristic curve was very similar for original ICH (0.861), ICH grading scale (0.874), and GCS (0.872), but was less for modified ICH score (0.824). Age was much less predictive (0.565). Combining GCS with age, log ICH volume, and intraventricular hemorrhage to derive a multifactorial risk of death at 30 days significantly increased the area under the receiver operating characteristic curve (0.897). All scores and GCS demonstrated a similar net benefit for threshold probabilities of 10% to 95%. Above 95%, the net benefit of GCS became inferior to the prognostic scores.
Conclusions—Although existing grading scores are highly predictive of 30-day mortality, GCS alone was as predictive in our cohort, but age was not.
Various prognostic scoring systems have been devised and tested to predict survival after intracerebral hemorrhage (ICH) with the intention of improving prediction of prognosis, but none are used routinely in clinical practice. They differ in the factors included, their complexity, and ease of use. There has been concern that prediction of a poor outcome using these scores may lead to inappropriate withdrawal or limitation of care very early after ICH, and thus, these predictions of poor outcome may become self-fulfilling prophecies.1 Because ICH has a 30-day mortality of ≈40%,2 it is, nonetheless, important to provide patients and their families with a personalized assessment of the likelihood of survival after ICH and to do so with reasonable accuracy. Physicians often make an informal assessment of their patient’s likelihood of survival on the basis of their own personal experience, and this assessment may be inaccurate. Prognostic scores on the basis of large data sets may allow physicians to prognosticate more objectively. Whether aggressive supportive care can improve this predicted longer term outcome is currently unclear.3
Level of consciousness and hematoma volume at admission are the most consistent outcome predictors, and grading scores combining these variables with other independent outcome predictors (including the original ICH [oICH] score,4 the modified ICH score [mICH],5 and the ICH grading scale [ICH-GS])6 show the best predictive values.7 These grading scores have been validated previously using measures of discrimination and calibration,7–9 but it remains unclear whether these scores are useful in clinical practice.
The aim of our study was to test existing scores in a large cohort representative of typical ICH patients in hospital-based stroke care in the United Kingdom using measures of discrimination, calibration, and decision curve analysis (DCA).10 We also compare these grading scores with commonly used clinical characteristics that physicians may informally rely on to inform their assessment of prognosis.
The neurosurgical department, Salford Royal NHS Foundation Trust, Salford, United Kingdom, serves the 2.6 million population of Greater Manchester and receives referrals from 14 hospitals throughout the region. On the basis of known incidence,2 the ≈450 acute ICH cases referred annually would represent ≈75% of incident ICH cases in our population. Details of every referral were prospectively recorded by the duty neurosurgeon in an electronic database, including the date and time of the referral, demographic details, clinical findings (including the Glasgow Coma Scale [GCS] score at time of referral), investigation findings, and the agreed management plan. Each case was reviewed within 24 hours by the senior neurosurgeon and neuroradiologist on call. The exact time of symptom onset was not recorded for all cases in the database. For analysis of survival, the date and time of the first computed tomography (CT) brain scan was used as an alternative to onset and is very likely to have been performed within 24 hours of symptom onset in the vast majority of cases. Time from symptom onset to first CT brain scan was recorded for 924 (67.7%) cases, and the median time was 6 hours (interquartile range, 2–18 hours).
National Health Service (NHS) Research Ethics Committee approval was obtained for our study. We identified 1364 patients referred between January 1, 2008, and October 17, 2010, whose diagnosis had been recorded as ICH, and a study investigator and neurosurgeon (K.A.) reviewed each case for study inclusion. Cases were included in the analysis if the first CT brain scan after onset could be obtained for review. If there was a clear history of a major head injury before presentation, patients were assumed to have sustained a traumatic ICH and were excluded. Cases in which the diagnosis was of hemorrhage into other intracranial compartments without ICH, no hemorrhage at all, or hemorrhagic transformation of an infarct were excluded.
The first CT brain imaging study after onset of symptoms was retrospectively reviewed by a study investigator (K.A.) who recorded ICH location as either deep or lobar and supratentorial or infratentorial. Deep ICH was defined as involving deep brain structures, including the basal ganglia, thalamus, vermis, and brain stem. Lobar ICH was defined as involving the cerebral or cerebellar lobes without involvement of deep structures. Where blood was evident in both compartments, the compartment containing the majority of the blood was recorded. Hematoma volume was calculated using the ABC/2 method, as previously described,11 and intraventricular hemorrhage was recorded. The oICH score and ICH-GS were derived for each patient as described in their original publications.4–6 The mICH score was calculated for all cases in the data set, although it was devised for use in basal ganglia ICH only. Although the mICH score takes hydrocephalus into account, we excluded this because we felt that the presence or absence of this CT finding is subjective and has been shown to have poor interobserver agreement in this setting.12 The survival status and date of death of nonsurvivors were obtained on October 21, 2011, via the Medical Research Information Service, Southport, United Kingdom, allowing a minimum follow-up period of 370 days for all cases. Data collection and image analysis were conducted before collection of survival data and were thus blinded to outcome.
Cases with incomplete data were excluded. The χ2 test was used for comparisons of categorical variables, and the t test or the Wilcoxon rank sum test for continuous variables according to manner of distribution. Kaplan–Meier survival curves were produced for the study population divided by prognostic categories and compared using the log-rank test. To aid clarity, GCS was divided into 7 categories for the Kaplan–Meier curve only. Receiver operating characteristic (ROC) curves were generated for each prognostic measure, and the area under the ROC curve (AuROC) was calculated. Confidence intervals were calculated according to binomial exact formula. To assess calibration, observed 30-day (oICH score, ICH-GS) and 6-month (mICH) mortality and 95% confidence intervals were derived from the Kaplan–Meier analysis and plotted against predicted mortality as published for each derivation cohort.4–6 A similar analysis was performed to assess the calibration of the GCS against an independent published ICH cohort.13
Logistic regression was performed using GCS, age, intraventricular hemorrhage, and the natural logarithm of ICH volume to derive a predicted risk of death at 30 days for each case and an ROC curve for this was then derived. All data were expressed as median and interquartile range unless otherwise stated.
DCA (see Methods in the online-only Data Supplement) was used to compare the clinical usefulness of the ICH scores and GCS across the full range of threshold mortality risk probabilities.10 All statistical analyses were performed in SPSS 16·0 for Windows (SPSS Inc) and the R statistical package (www.r-project.org).
All 1364 cases with a recorded diagnosis of ICH in the neurosurgical referral database for the period of the study were reviewed. After 176 exclusions (Figure 1; Table I in the online-only Data Supplement), 1188 cases were submitted to Medical Research Information Service on October 21, 2011. Thirteen could not be traced, leaving 1175 in the final analysis. Clinical and imaging characteristics at referral are outlined in Table 1. Case fatality rate was 25.4% (n=298) at 3 days, 41.1% (n=483) at 30 days, 49.4% (n=581) at 6 months, and 52.5% (n=617) at 1 year. The distribution of cases across the range of possible grading scale scores is shown in Table II in the online-only Data Supplement. All 3 prognostic scores and GCS (split into 7 categories) divided cases into groups with highly statistically significant differences in mortality (Figure 2; P<0.0001 for all, log-rank test).
Discrimination and Calibration
Discrimination of the prognostic scores for 30-day mortality, as determined by AuROC, showed that the mICH score was inferior to the ICH-GS (P<0.0001) and oICH score (P<0.0001; Table 2). However, when applied to deep supratentorial ICH only, oICH (AuROC, 0.856; 95% confidence interval, 0.822–0.885; P=0.1135) and ICH-GS (AuROC, 0.860; 95% confidence interval, 0.826–0.889; P=0.0977) reduced their discrimination and were similar to mICH.
When performance of the individual components of the scores was tested (age, ICH volume, and GCS), the GCS alone performed as well as the ICH-GS (P=0.9891) and oICH score (P=0.1427). ICH volume was less discriminatory, whereas age was only weakly predictive of death at 30 days. Logistic regression was performed, including GCS, age, log ICH volume, and intraventricular hemorrhage, as predictive factors to derive a multifactorial risk of death at 30 days. The overall performance of this combined risk was better than the GCS and the prognostic scores (P<0.0001, for all comparisons). The 3 components of the GCS were tested individually and were less discriminative than the total score (P<0.0001, for all comparisons). The calibration of the prognostic scores for 30-day mortality (oICH score and ICH-GS) and 6-month mortality (mICH score) between our cohort and the cohorts in which they were derived is shown in Figure 3. Our cohort had similar survival to that predicted from the oICH cohort, higher survival than the ICH-GS cohort, and markedly lower survival than most of the mICH cohort.
Decision Curve Analysis
The relative performance of scores and GCS was similar in DCA (Figure 4) and in the AuROC analysis. The net benefit for all 3 scores and GCS surpasses the strategies of treat all and treat none between threshold probabilities of 10% and 95%. Between threshold probabilities of 20% to 90%, ICH-GS and GCS have a similar net benefit, are slightly superior to the oICH score, and clearly better than the mICH score. Only above a threshold probability of 95% does the net benefit of GCS become inferior to the prognostic scores.
We have shown that 3 previously described grading scores for ICH are highly predictive for 30-day mortality when applied to our cohort of patients. In addition to being much larger than the study populations in which these scores were first described,4–6 our study population was drawn from patients in a different country and healthcare system, thus adding support to the external validity of these grading scores as prognostic scores. No substantive difference in the performance of the 3 scores was found except for lower performance of the mICH score when applied to all cases. However, when it was used in the subgroup of patients with basal ganglia ICH, its performance was similar to those of the other scores. It is of note that the GCS alone, which contributes to all 3 scores, was as good at predicting 30-day mortality as the prognostic scores. The GCS has the advantage of being simpler to use, because it does not require analysis of brain imaging and measurement of hematoma volume followed by the calculation of an additional score. Combining other factors that are included in the score and have previously been shown to predict prognosis did add to the predictive power of GCS, but this can largely be explained by overfitting because the model was tested in the same cohort from which it was derived. Our finding that age alone was a comparatively weak predictor of survival is an important reminder to clinicians not to provide undue weight to this factor when considering likelihood of survival for individual patients.
Thirty-day survival by oICH score was very similar between our cohort and the derivation cohort, suggesting this score is well calibrated to our population. For the ICH-GS, our cohort has a higher 30-day survival relative to the derivation cohort across the range of scores. The derivation cohort for the mICH score included highly selected patients enrolled in a randomized trial of surgery, and this is reflected in the marked differences in outcome relative to our cohort. It is also well recognized that mortality is significantly lower after ICH in Japan,2 making comparisons between Asian and non-Asian populations difficult.
We used DCA to confirm the clinical usefulness of the scores and GCS. By reducing a curve to a single value, it is possible for AuROC analysis to mask important differences in sensitivity versus specificity across the full range of prognosis.10 Only at the extreme end of the scale, for patients with <5% survival probability, did the leading models diverge. In our cohort, only the lowest possible GCS score (3) confers a prognosis worse than 5% survival, whereas both the ICH-GS (12, 13) and the oICH (5, 6) contain 2 divisions within this range. Thus, this may favor using the ICH-GS over the GCS, in which distinguishing risks of death >95% is important.
The strengths of our study include our large sample size, the prospective nature of the data collection (except analysis of imaging), and the blinding of image analysis and prognostic scores determination to survival status. ICH grading scores are not routinely used for clinical care at our center and were determined retrospectively for this study, thus preventing them from influencing care decisions and, hence, prognosis. To the best of our knowledge, this is the largest population in which ICH prognostic scores have been tested to date. We were able to ascertain the outcome status of the vast majority of the patients with otherwise complete data (only 13 of 1188 [1%] could not be traced), so very few patients were lost to follow-up. Finally, many previous studies have applied prognostic scores to historical data recorded as part of clinical research studies conducted for other reasons, leading to a sample of patients that may not be representative of typical ICH patients. In our study, prognostic scores were calculated from data collected as part of routine clinical care, and thus, our findings more accurately reflect the performance of these scores across a wide range of ICH patients. GCS was likely to have been measured by nonspecialists in most cases, but it is important to note that those referring patients to the neurosurgical team and providing the GCS were usually the physicians primarily responsible for the patient’s acute care. As our aim was to determine the validity of these scores for routine clinical use, the GCS as determined by the physician responsible for the patient’s acute care is of the most relevance to this.
Our study does have limitations. First, we excluded 107 patients from the analysis because of incomplete data, and 83 of the 107 were excluded because we were unable to access their initial CT brain scans. This may introduce some bias, but this is unlikely to have altered the overall conclusions. Second, our patients were included in this study because they had been referred to our neurosurgical service. Although we believe that ≈75% of all ICHs in our population are referred to the neurosurgical service, this is likely to have introduced some selection bias. For example, some older patients may not have been referred as evidenced by the slightly lower mean age of our patients, relative to a UK population-based study14 (71 versus 76, respectively). This may make the results of our study less applicable to the very old. Third, we do not know the functional outcomes for survivors in our study. Survival and functional outcome are of great importance to patients and their families after ICH, and the ability of these scores to predict good functional outcome needs to be tested in additional large studies. Fourth, we used the GCS at the time of referral to neurosurgery in our analysis because our aim was to assess our ability to predict survival at the point of referral to neurosurgery, when decisions are still being made about patient management. GCS at this time point may have greater prognostic significance than at initial presentation because it would be expected that a minority of patients will have already had early neurological decline. Finally, we were unable to test some newer ICH prognostic scores because some of the factors that inform these scores were not recorded in our data set (eg, National Institutes of Health Stroke Scale Score). However, in a small cohort study, there was little additional benefit to be gained by using these newer scores,8 relative to the scores we were able to test in our study.
Our data support the use of ICH grading scores as prognostic scores to provide patients and families with an estimate of the likelihood of survival after ICH. Such scores may also provide a useful tool in the design and conduct of acute intervention studies in ICH as a means of stratification. There is little to discriminate between the scores, but given that the GCS alone is simpler than any of the published scores, performs at least as well as all 3 ICH scores in this study, and removes the need to measure the hematoma volume, we would advocate its use in estimating 30-day mortality in clinical practice.
Physicians face an uncertain situation in managing patients immediately after ICH, in which they must balance whether to offer aggressive supportive care with the importance of providing dignified end-of-life care to a patient who may be very likely to die in hospital, with or without treatment. Although physicians should be cautious of using this prognostic information to make important management decisions, we feel that accurate prognostic information should not be denied to patients and their families simply because our ability to improve survival by aggressive supportive care is unclear. The GCS represents the simplest way to estimate survival after ICH.
The online-only Data Supplement is available with this article at http://stroke.ahajournals.org/lookup/suppl/doi:10.1161/STROKEAHA.113.001009/-/DC1.
- Received January 31, 2013.
- Accepted April 5, 2013.
- © 2013 American Heart Association, Inc.
- Hemphill JC III.,
- Bonovich DC,
- Besmertis L,
- Manley GT,
- Johnston SC
- Ruiz-Sandoval JL,
- Chiquete E,
- Romero-Vargas S,
- Padilla-Martínez JJ,
- González-Cornejo S
- Vickers AJ,
- Elkin EB
- Kothari RU,
- Brott T,
- Broderick JP,
- Barsan WG,
- Sauerbeck LR,
- Zuccarello M,
- et al