Symptomatic Intracranial Hemorrhage After Stroke Thrombolysis
Comparison of Prediction Scores
Background and Purpose—Several prognostic scores have been developed to predict the risk of symptomatic intracranial hemorrhage (sICH) after ischemic stroke thrombolysis. We compared the performance of these scores in a multicenter cohort.
Methods—We merged prospectively collected data of patients with consecutive ischemic stroke who received intravenous thrombolysis in 7 stroke centers. We identified and evaluated 6 scores that can provide an estimate of the risk of sICH in hyperacute settings: MSS (Multicenter Stroke Survey); HAT (Hemorrhage After Thrombolysis); SEDAN (blood sugar, early infarct signs, [hyper]dense cerebral artery sign, age, NIH Stroke Scale); GRASPS (glucose at presentation, race [Asian], age, sex [male], systolic blood pressure at presentation, and severity of stroke at presentation [NIH Stroke Scale]); SITS (Safe Implementation of Thrombolysis in Stroke); and SPAN (stroke prognostication using age and NIH Stroke Scale)-100 positive index. We included only patients with available variables for all scores. We calculated the area under the receiver operating characteristic curve (AUC-ROC) and also performed logistic regression and the Hosmer–Lemeshow test.
Results—The final cohort comprised 3012 eligible patients, of whom 221 (7.3%) had sICH per National Institute of Neurological Disorders and Stroke, 141 (4.7%) per European Cooperative Acute Stroke Study II, and 86 (2.9%) per Safe Implementation of Thrombolysis in Stroke criteria. The performance of the scores assessed with AUC-ROC for predicting European Cooperative Acute Stroke Study II sICH was: MSS, 0.63 (95% confidence interval, 0.58–0.68); HAT, 0.65 (0.60–0.70); SEDAN, 0.70 (0.66–0.73); GRASPS, 0.67 (0.62–0.72); SITS, 0.64 (0.59–0.69); and SPAN-100 positive index, 0.56 (0.50–0.61). SEDAN had significantly higher AUC-ROC values compared with all other scores, except for GRASPS where the difference was nonsignificant. SPAN-100 performed significantly worse compared with other scores. The discriminative ranking of the scores was the same for the National Institute of Neurological Disorders and Stroke, and Safe Implementation of Thrombolysis in Stroke definitions, with SEDAN performing best, GRASPS second, and SPAN-100 worst.
Conclusions—SPAN-100 had the worst predictive power, and SEDAN constantly the highest predictive power. However, none of the scores had better than moderate performance.
The only approved clot-busting medical treatment in ischemic stroke, intravenous thrombolysis (IVT), is not without complications. One of the major reasons for withholding the therapy remains fear of symptomatic intracranial hemorrhage (sICH), which can worsen patients’ outcomes.1 The number needed for IVT to cause fatal sICH is 36.5, and to cause any worsening of outcome (≥1 grade on modified Rankin Scale) ranges from 29.7 to 40.1.2 There are several scoring systems for predicting the risk of sICH.3–8 In an ideal situation, a prediction score could identify patients with very high risk of post-thrombolysis sICH. We aimed to compare the performance of existing risk prediction scores in a large multicenter cohort.
Patients and Methods
The current analysis comprises data from 7 centers. The study was approved by the relevant authorities in each participating center per local requirements. This study was approved in the coordinating center (Helsinki) as a quality registry and did not require review by the ethical board. All patients were prospectively included in the database. Data from individual consecutive patients receiving IVT within a 4.5-hour time window for acute ischemic stroke were collected using a standardized form with predefined variables. For sICH definitions, radiological ICH categorization (hemorrhagic infarction 1 and 2, parenchymal hemorrhage 1 and 2) was collected and used to prospectively assign sICH for the current study by 1 of the study authors in a blinded fashion. Data from all the centers were compiled in the coordinating center, where the analyses of pooled data were performed. The baseline population comprised 3543 patients. We excluded 531 patients lacking data necessary to calculate ≥1 of the scores; hence, the final cohort included 3012 patients with ischemic stroke in anterior or posterior circulation. None of the patients underwent endovascular procedure. None of the patients in the current analysis was included in the derivation cohort of any of the scores/indices.
Selection Criteria for sICH Risk Scores
First, we only considered scores and indices but not regression models, because our aim was to evaluate only tools that are suitable for quick bedside calculations without depending on potentially time-consuming, computer-based systems. Second, we considered only scores and indices based on parameters available shortly after admission, before administration of thrombolysis. Finally, we included scores that had been primarily developed to predict another outcome, such as final disability level, if they also had been previously tested and shown to predict risk of symptomatic hemorrhage after IVT.
We tested the ability of the scores/indices to predict the risk of post-thrombolytic sICH according to the criteria of European Cooperative Acute Stroke Study II (ECASS-II) trial, National Institute of Neurological Disorders and Stroke (NINDS) study, and Safe Implementation of Thrombolysis in Stroke (SITS) registry.9–11 Discrimination was judged with c-statistics and area under the receiver operating characteristic curve (AUC-ROC). Logistic regression determined the odds ratios for all sICH criteria per point increase of the scores. Calibration was assessed with the Hosmer–Lemeshow test, which is a quality control statistical test for goodness of fit for the logistic regression model. It has been frequently used to test the calibration of risk prediction models. This test assesses whether or not the observed event rates matched the expected rates in subgroups of the model population. When the expected and observed event rates are similar (hence the P value for the Hosmer–Lemeshow test is high), the models are considered well calibrated. The analyses were performed on IBM SPSS 21 (IBM Corp, Armonk, NY), SigmaPlot 11.0 (Systat Software, Inc, Chicago, IL), and Confidence Interval Analysis (version 2.1.2; Trevor Bryant, University of Southampton). A 2-tailed P value <0.05 was considered statistically significant.
We identified 6 scores/indices that fulfilled our selection criteria: MSS (published in 2008),3 HAT (2008),4 SEDAN (2012),5 GRASPS (2012),6 SITS (2012),7 and SPAN-100 positive index (2013)8 (Table 1).
The merged cohort (n=3012) is described in Table 2. The excluded patients (15%) did not differ from the included patients in demographics and baseline characteristics (data not shown). The numbers of included patients per center were: Basel, Switzerland (n=680); Helsinki, Finland (n=452); Lausanne, Switzerland (n=540); Lille, France (n=273); Melbourne, Australia (n=322); St Gallen, Switzerland (n=156), and Tampere, Finland (n=589). In the whole data set, we observed 141 (4.7%; 95% confidence interval [CI], 4.0–5.5) cases of sICH according to ECASS-II criteria, 221 (7.3%; 95% CI, 6.5–8.3) cases according to NINDS criteria, and 86 (2.9%; 95% CI, 2.3–3.5) cases according to SITS criteria. Center-specific frequencies of sICH ranged from 3.1% to 9.3% (ECASS-II criteria), from 4.4% to 9.9% (NINDS), and from 2.2% to 5.1% (SITS).
Frequencies of sICH, according to the 3 criteria,1 per point increase of the scores are outlined in Figures 1–3 and Figure I in the online-only Data Supplement. Based on the logistic regression analysis, all scores were associated with sICH according to all 3 criteria (Table I in the online-only Data Supplement). The results of the Hosmer–Lemeshow test showed worst model fit for GRASPS in case of ECASS-II sICH. Because SPAN-100 is a binary index, the test could not have been calculated.
Score comparisons by means of AUC-ROCs are presented separately for each sICH definition (Table 3). SEDAN had the highest absolute values of AUC-ROC in all analyses, and except for the comparison with GRASPS, these differences were statistically significant. SPAN-100 positive index had the lowest AUC-ROC values in all comparisons.
With comprehensive data from several dedicated stroke centers, we had a unique opportunity to perform a head-to-head comparison of the existing sICH prediction scores. In general, SPAN-100 showed poor predictive power, and all other scores moderate predictive power. Of all scores, SEDAN had constantly the highest nominal predictive performance in all comparisons, most of which were statistically significant, except for the comparison with GRASPS, which showed the second highest AUC-ROC values. In 2-way comparisons, the differences between GRASPS and other scores were frequently nonsignificant.
We observed rather low frequencies of post-thrombolytic sICH in the current merged cohort with considerable intercenter differences. This contributes to the relatively low risk of sICH even with the worst scores compared with the original reports (perhaps with the exception of MSS and HAT, relatively smaller number of patients scored the highest points). Nonetheless, what is crucial is that the relative risk of high-risk patients compared with low-risk patients remained similar. For example, according to the original report,5 a patient with SEDAN of 5 had almost 4-fold higher risk of sICH (33.3%) compared with SEDAN of 2 (8.5%) and >20-fold higher risk compared with SEDAN of 0 (1.4%). Here, the magnitude of these relative risks remained similar, 4 and 18, respectively.
Taken together, the scores consist of parameters related to (1) underlying parenchymal injury, microangiopathy (age, history of hypertension and diabetes mellitus, blood glucose as a marker of diabetes mellitus history), (2) degree of acute parenchymal injury (CT findings and, to certain level, also the National Institutes of Health Stroke Scale [NIHSS] and onset-to-treatment time), (3) coagulation process (platelet count, use of antiplatelet agents, and perhaps patient’s weight determining the dose of alteplase), (4) physical factors (systolic blood pressure), and (5) sex and ethnicity. In fact, modest differences in the AUC-ROC values among the scores reflect that most of the scores include similar components: age, NIHSS, and baseline glucose level being the most common (Tables 1 and 3). The differences are often in the relative weighting given to individual components. SPAN-100 (consisting of age and NIHSS) had rather low AUC-ROC values according to all sICH criteria (0.55–0.56) as compared with 0.73 per NINDS criteria in the original report.8 Potential explanations may be that SPAN-100 was postulated rather than derived from a specific cohort. Also, validation was performed in a rather small cohort of 312 patients with IVT from the NINDS trial. Furthermore, the timing of treatment in the present study was, on average, somewhat later than in the NINDS patients, half of whom were treated <90 minutes and all <3 hours of symptom onset. Although patient’s age and baseline stroke severity are major components of all the scores, our data showed that other parameters also matter. Imaging parameters (necessary for the calculation of HAT and SEDAN) seem to improve the performance of outcome prediction scores.12 Although their assessing requires training, we think it is readily achievable with continuous education in centers delivering IVT. Interestingly, platelet count was included in 1 score only (MSS), but it did not, for example, improve the model of SEDAN (data not shown). One possible explanation is the fact that a vast majority of patients in its derivation cohort had similar platelet counts, being in the physiological range.
Another source of differences in the performances of scores may reflect the fact that they were derived to predict particular definitions of sICH. For example, SITS had higher AUC-ROC values than for ECASS-II or NINDS criteria (Table 3). SEDAN had the highest AUC-ROC value for sICH per ECASS-II definition, for which it was developed. Whereas, GRASPS had almost identical AUC-ROC values for each definition. Another aspect influencing the performance of scores is the number of component items. Scores derived from larger data sets, such as GRASPS and SITS, had the statistical power to detect the significance for more parameters, but also made the scores somewhat complex (in case of GRASPS requiring computer-based platforms). Potential points in GRASPS range from 45 to 101, which is most probably also the reason for considerable oscillations in sICH frequencies per increasing score point (Figures 1–3 and Figure I in the online-only Data Supplement). Also, MSS and GRASPS were derived from cohorts with 3-hour time window, whereas SEDAN and SITS were derived from cohorts with 4.5-hour time window. SPAN-100 was not derived from any cohort, and HAT was based on literature search of sICH predictors in studies including patients with 3-hour and 6-hour time window. All of these factors could have influenced the results.
We did not include the iScore (derived for socioeconomical purposes to help with discharge planning, and in comparing facilities by policymakers)13 in our analysis because we aimed to compare only the scores that can be applied shortly after patient’s admission, before IVT. The authors of iScore acknowledged that it may take several hours to obtain all the necessary parameters to calculate the score. Indeed, it is unlikely that cause, 1 of the components of the iScore, would be ascertained within a short time frame after admission.
Sung et al14 recently compared sICH scores. However, that study was rather small (n=548) and excluded SEDAN because of unavailability of imaging data in their database. In that study, the observed odds ratios based on regression models and AUC-ROC values were similar to those in our study. Moreover, the most important findings of that study (SPAN-100 had the worst predictive value, and HAT and MSS had better performance than in the original validation cohort)15 were in line with our observations. The only difference was the worse performance by GRASPS in Sung et al’s study.14 This was perhaps because of including exclusively patients of Asian race, which was identified as a strong predictor of sICH in the GRASPS model. Last but not least, the dose of tissue plasminogen activator ranged from 0.6 to 0.9 mg/kg according to the Taiwan guidelines. Another study validated SEDAN in a large data set of SITS cohort.16 The authors of that study acknowledged some intercohort differences in baseline characteristics and a relatively large proportion (20%) of patients with unavailable parameters to calculate the score. Also, the proportion of sICH per ECASS-II definition was lower in the SITS cohort (5.1%), similar to the figures from the current merged cohort but lower than that reported in the original SEDAN cohort (7.0%). Central imaging read would be probably of importance. With >750 SITS centers, it could play a role but is also challenging from a logistics and financial point of view. Furthermore, our study provides a head-to-head comparison of all existing scores.
One limitation of the current study is that the database comprises almost exclusively white patients. Fifteen percent of the patients were excluded because they had parameters lacking for ≥1 of the scores, but they did not differ from the included patients in demographics and baseline characteristics. Each center performed its imaging read. However, our study represents a large sample size with patients from several centers operating under different conditions. Unfortunately, we could not analyze the scores separately in men and women or in patients treated <180 minutes and between 180 and 270 minutes because sex and treatment delays are components of some of the scores. SEDAN and SITS are the only scores that were derived and validated in patients treated in an extended thrombolysis time window of 4.5 hours.
Although SEDAN had consistently the highest absolute AUC-ROC values in all analyses, the performance was only moderate. This was also the case for all other scores except for the poor performance of SPAN-100. At the moment, we must acknowledge the limitations of post-thrombolysis hemorrhage prediction scores, and we do not have the data to support withdrawal of thrombolysis treatment, which has a proven benefit, in patients with high risk of sICH. We think that in future sICH prediction scores may play a role in personalized and tailored medicine and can optionally be used together with the scores predicting functional outcome after IVT.17 Patients at very high risk of sICH would most probably benefit from intensive monitoring (including blood pressure and blood glucose) and alertness of the staff after IVT administration. Although not currently evidence-based, some centers may perhaps prefer refraining from IVT and proceed directly to intra-arterial procedures in patients with high likelihood of poor outcome superimposed by a high risk of sICH after IVT. Another possible use of the scores would be in the selection of patients for randomized trials on add-on therapy to reduce post-thrombolytic sICH. Naturally, the use of such scores can only be one part of the decision-making process, complementary to clinical, laboratory, and radiological findings. Nonetheless, we are only taking the initial steps in this area, and prediction scores will gain critical additional power and high precision over time. CT imaging may gradually be replaced by MRI; early biomarkers with quick bedside laboratory methodology may become available; and genetic profiles of individuals will probably disclose their genetic predispositions to hemorrhage. A sophisticated prediction score may probably include more detailed items, but yet the actual scores should allow quick calculations.
Sources of Funding
This study was supported by the Helsinki University Central Hospital (HUCH) governmental subsidiary (EVO) funds for clinical research. The authors received funding from the HUCH (D.S., A.M., T.T.), the Biomedicum Helsinki Foundation (A.M.), the Sigrid Juselius Foundation (A.M., T.T.), the Finnish Medical Foundation (A.M.), the Stroke-[Hirnschlag]-Fund Basel (D.J.S.), the Swiss National Science Foundation No. 33CM30-124119 (S.T.E.), and Australian National Health and Medical Research Council Centre for Research Excellence grant 1001216 (A.M., S.M.D.). The supporting sources had no involvement in the design, analyses, or interpretation of the study.
Dr Michel received research grants from the Swiss Cardiology Foundation (significant) and Cardiomet CHUV (significant), speaker fees from Boehringer-Ingelheim (modest), and advisory board compensation from Boehringer-Ingelheim (modest). Dr Davis received travel grants from EVER Neuropharma (modest) and Sanofi (modest), as well as speakers fees from Boehringer-Ingelheim (modest). Dr Tatlisumak received honoraria from Boehringer-Ingelheim (modest) and advisory board compensations from Boehringer-Ingelheim (modest) and H Lundbeck A/S (modest). The other authors have no conflicts to report.
Guest Editor for this article was Tatjana Rundek, MD, PhD.
The online-only Data Supplement is available with this article at http://stroke.ahajournals.org/lookup/suppl/doi:10.1161/STROKEAHA.113.003806/-/DC1.
- Received October 9, 2013.
- Revision received December 4, 2013.
- Accepted December 17, 2013.
- © 2014 American Heart Association, Inc.
- Saver JL
- Menon BK,
- Saver JL,
- Prabhakaran S,
- Reeves M,
- Liang L,
- Olson DM,
- et al
- Mazya M,
- Egido JA,
- Ford GA,
- Lees KR,
- Mikulik R,
- Toni D,
- et al
- Turc G,
- Apoil M,
- Naggara O,
- Calvet D,
- Lamy C,
- Tataru AM,
- et al
- Saposnik G,
- Kapral MK,
- Liu Y,
- Hall R,
- O’Donnell M,
- Raptis S,
- et al
- Sung SF,
- Chen SC,
- Lin HJ,
- Chen YW,
- Tseng MC,
- Chen CH
- Mazya MV,
- Bovi P,
- Castillo J,
- Jatuzis D,
- Kobayashi A,
- Wahlgren N,
- et al