Utility of Outcome Measures After Treatment for Intracranial Aneurysms
A Prospective Trial Involving 520 Patients
Background and Purpose— As different intracranial aneurysm treatments are compared in upcoming trials, complete characterization of patient outcomes will be critical. Currently, graded scales such as the Glasgow Outcome Score (GOS) or the Rankin scale are commonly used. Our objective was to test the utility of different outcome instruments in patients after aneurysm treatment.
Methods— A prospective trial comparing 6 outcome instruments at 3 to 12 months after aneurysm treatment: the Rankin and GOS, the Barthel Index (activities of daily living), the National Institutes of Health Stroke Score (NIHSS) (neurological examination), the Short Form-36 (subjective experience of recovery), and the Mini-Mental Status Examination (MMSE) (cognitive recovery). All tests were administered to each patient at the same time by an independent grader. The Spearman correlation coefficient was calculated between instruments (with 1 representing complete correlation).
Results— In 4 years, 520 patients with 618 ruptured or unruptured aneurysms were enrolled: 385 surviving patients were tested. The resulting scores showed a wide distribution for the MMSE and the SF-36, but almost no variability for the Barthel Index and NIHSS. Correlations between scores were poor: 0.15 when the GOS was compared with the MMSE; 0.27 compared with the SF-36. Many patients given the highest GOS or Rankin scores showed significant cognitive deficits.
Conclusions— These data indicate that a single graded scale does not address all aspects of recovery after aneurysm treatment, particularly cognitive dysfunction and the patient’s perception of health. The implications of these findings are discussed.
The field of cerebrovascular disease is in a phase in which different intracranial aneurysm treatment modalities are being compared in practice and in clinical trials. The first landmark study was the International Subarachnoid Aneurysm Trial (ISAT), published in Lancet in 2002.1 A North American trial is currently being planned. For all future efforts, accurate and complete characterization of patient outcomes will be critical.2
In neurosurgery, the most widely used outcome instruments are graded scales such as the Glasgow Outcome Score (GOS)3 and the modified Rankin Scale. For the GOS, the patient is graded from 1 to 5, with 1 representing death and 5 good recovery. The GOS has been recommended as the key outcome instrument in clinical trials involving head injury.4 The modified Rankin scale has become the most commonly used endpoint for clinical trials involving stroke.5 The Rankin scale was the primary measure used in the ISAT trial.1 Compared with the GOS, the Rankin scale offers more categories of disability and also takes symptoms into account.
These graded scores allow outcome assessment without the need for a detailed neurological or psychological assessment. However, several limitations have been noted. Inter-rater reliability generally has been high (up to 95%), but has been reported to be as low as 50%.6,7 Some reports have also noted that these scales are insensitive to subtle differences in cognitive recovery or the patient’s experience of health, and that they give disproportionate weight to physical disability relative to cognitive and behavioral impairment. After subarachnoid hemorrhage, major cognitive deficits have been noted with detailed neuropsychological testing, even in patients who appeared to be “normal.”7
In addition to the Rankin and GOS, there are other outcome instruments with accepted rates of reliability and validity. These tests can measure different aspects of a patient’s status: the subjective perception of health (Short Form-36 [SF-36]),8,9 cognitive function (Mini-Mental Status Examination [MMSE]),10 functional independence (Barthel Index),11 and the neurological examination (National Institutes of Health Stroke Score [NIHSS]).12,13 The tests can be completed in a few minutes and do not need to be administered by highly trained personnel such as a neuropsychologist.
With this study, the usefulness of different outcome instruments was tested in patients after aneurysm treatment. Our hypothesis was that patients given the same GOS or Rankin score would show quantifiable differences in outcome when a comprehensive assessment was performed. The study design was a prospective trial involving consecutive patients treated for intracranial aneurysm. At follow-up, 6 outcome measures were simultaneously administered to each patient, allowing direct comparisons between the results. Any differences would be the result of the test itself, and not a change in the patient’s condition.
Materials and Methods
In July 1998, we began a prospective 4-year study on consecutive patients treated for subarachnoid hemorrhage and intracranial aneurysm by 1 cerebrovascular team. Between July 1998 and July 2002, 638 patients were treated by the first author at the University of Texas Health Science Center in Houston. The inclusion criteria were as follows: (1) patients with saccular aneurysms, ruptured or unruptured; (2) patients with subarachnoid hemorrhage (not intracranial hemorrhage); and (3) no associated cerebrovascular anomalies such as an arteriovenous malformation
A total of 520 patients, harboring 618 aneurysms, met the inclusion criteria. Excluded were 118 patients for the following reasons: 25 patients had aneurysms associated with arteriovenous malformations; 6 patients presented with a stroke; 15 patients had dissecting, not saccular, aneurysms; 34 patients had an intraparenchymal aneurysm rupture; 2 patients bled from a tumor or cavernous angioma; and 36 patients were transferred after treatment had already been attempted. Because this trial involved follow-up testing only, with no changes in intervention, cost, or risk, no patient declined to participate.
Thirty-six patients had subarachnoid hemorrhage but no aneurysm (unknown cause). Forty-two patients, all with grade 5 status, had intracranial aneurysms that were not repaired. Therefore, 442 of these patients had 526 intracranial aneurysms that were treated. These were repaired by 483 total procedures: 437 operations for clipping (482 aneurysms clipped) and 46 procedures for coiling (44 aneurysms coiled). Three patients had aneurysms that were coiled after an attempted clipping; 2 patients had aneurysms that were clipped after attempted coiling. Fourteen patients with successful coil embolization required subsequent clipping for aneurysm recanalization. The demographic characteristics for the 520 enrolled patients are noted in Table 1.
A total of 122 patients died, 104 during the initial hospitalization and 18 at an outside facility during the follow-up period. Mortality was 24.1%, correlated with Hunt–Hess grade. In grade 0 patients, mortality was 1.5%.
Of the 520 enrolled patients, only 13 were lost to follow-up (2.5%); 309 survivors were eligible for testing, and 385 patients completed the tests 3 to 12 months after the subarachnoid hemorrhage or initial procedure (96.7%). The average follow-up period was 4.7 months. All 6 outcome measures (the GOS, the Rankin scale, the Barthel Index, the NIHSS, the MMSE, and the SF-36) were administered in the same session by 1 of 2 independent graders (G.V.G. or C.L.H.).
The data were analyzed in 3 ways. First, the distribution of the scores was noted for variability. The usefulness of a test would be limited if most of the patients achieved the same outcome score.
Second, patients were categorized by Hunt–Hess grade and the outcomes compared using the different instruments. One of the concerns regarding the routine use of the MMSE and SF-36 is that the higher levels of variability associated with these tests may make statistical comparisons difficult. If no statistically significant differences could be discerned between patients presenting with a grade 1 versus a grade 4 bleed by a particular test, that test would have little usefulness in determining differences in outcomes from alternative treatment modalities. Because these data are not normally distributed, the nonparametric Kruskal–Wallis test was used for statistical comparison, with significance set at P<0.05.
Third, the null hypothesis that different outcome scores would give the same result was tested. Spearman correlation coefficents were calculated between different scores (for example, the Rankin with the MMSE). A correlation score of 1 would indicate perfect correlation, whereas a score of 0 would indicate no association. A correlation of −1 would indicate perfect inverse correlation, which can occur because the “best” outcome for the GOS gets the highest score, whereas the best score for Rankin is 0. We believed that a coefficient >0.7 or −0.7 would reasonably indicate high correlation.
At time of discharge, the majority of patients had the best GOS outcome of 5 (71.7%), but a significant number had GOS scores of 2 to 4 (29.3%). By the time of the follow-up visit, all but 11.3% of patients had achieved the highest score, indicating continuing recovery. Only 65.2% patients achieved the best Rankin score of 0, but another 21.7% of patients had a score of 1, indicating symptoms only. Overall, 87.9% of surviving patients showed no disability on the Rankin scale. The usefulness of an outcome measure may be limited when almost 90% of survivors eventually achieve the same score.
The distribution of scores was even more skewed for the Barthel Index and the NIHSS. Only 3.9% of patients did not achieve the highest Barthel score. For the NIHSS, 91.9% achieved the highest score (Table 2).
The most variable outcomes were noted with the MMSE and the SF-36 (Table 2). Although the number of possible scores was the largest for these 2 tests, the distribution was much wider as well; both the range and standard deviation (SD) for these scores were the highest. For example, the average SF-36 score was of 58.1, but the range was from 17.8 to 78.8 and the SD was 13.6. By contrast, the average SD for the GOS was 0.36.
When the patients were divided by Hunt–Hess grades on presentation and the outcome scores compared, significant differences were noted for the GOS, Rankin scale, MMSE, and SF-36 (Table 3). For patients with a grade 1 or 2 subarachnoid hemorrhage, the Rankin score averaged 0.37. For patients with grade 4 subarachnoid hemorrhage, the average score was 1.0 (P<0.0001). Similarly, patients with grade 0 subarachnoid hemorrhage had an average SF-36 score of 61. Patients with a grade 4 subarachnoid hemorrhage averaged 55.4 (P=0.003).
Two measures did not show different outcome scores (with P=0.058 and P=0.117). Comparing grade 4 patients to grade 2 patients, the average NIHSS was 0.12 and 0.12, respectively. For the Bartel Index, the average scores were 99.3 and 99.1, respectively.
When each score was compared directly with the result from the other tests (performed on the same patient), no correlation reached a coefficient >0.7 or −0.7 (Table 4). The highest coefficient was the Rankin compared with the SF-36 at −0.609. The rest of the coefficients were <0.5 or −0.5, with the lowest coefficient at 0.15 (GOS compared with the MMSE). These data indicate that for a given Rankin scale, there was significant variability in the SF-36 scores, even though coefficient between these 2 tests was the highest. This is graphically noted in the Figure, in which SF-36 scores are plotted against the Rankin score. Note that even for patients who scored a 0, or complete recovery without any symptoms, there was a wide range in the SF-36 scores. In addition, many patients with the same SF-36 score had different Rankin scores ranging from 0 to 4.
This is the first study to our knowledge to directly compare 6 outcome instruments in a large number of patients after intracranial aneurysm treatment. There are several strengths to this study. All data were gathered prospectively and the follow-up tests were given by 1 of 2 independent graders, and very few patients were lost to follow-up. Because all 6 tests were administered to same patient at the same time, meaningful conclusions could be made on the similarities, differences, and usefulness of each instrument. Each test measured a different aspect of recovery, was administered by ancillary staff, and has been widely accepted. They required no more than a few minutes to complete. All of these are important considerations if they are to be used in future trials.
We report 3 main findings in this article. First, neither the Barthel Index nor the NIHSS was found to be useful for this patient population. Most survivors achieved the maximum possible score at 3 months. Even patients presenting with grade 4 SAH achieved outcome scores similar to those with unruptured aneurysms.
Second, we found that the Rankin, GOS, MMSE, and the SF-36 showed statistically significant differences in outcome for patients with different presentations. Patients with a mild subarachnoid hemorrhage had outcome scores comparable to those with unruptured aneurysms, whereas those with more severe bleeds showed progressive decreases in the average score. This is an important consideration for the use of the MMSE and SF-36. Despite the increased variability of these scores, statistically significant differences could be noted with a reasonable number of patients.
The main finding of this article is that graded scores such as the Rankin and GOS can have poor correlation with other outcome instruments such as the MMSE and the SF-36. These data clearly indicate that patients with significant cognitive dysfunction or a subjective decrease in the quality of life can be given the highest graded scale.
The exclusive use of graded scales in this patient population results in the majority of the surviving patients achieving a “good” outcome score. This means that direct comparisons will be made on a minority of patients. For example, the primary end-point in the ISAT trial were patients with death or disability after subarachnoid hemorrhage (Rankin scores between 3 and 6).1 These patients represented 23.7% in the endovascular group and 30.6% in the surgical group. In such situations, the clinical status of the majority of patients may not have been completely characterized. Based on these data, we believe that adding the MMSE and the SF-36 will measure aspects of recovery that a simple graded scale does not. Consideration of the cognitive status and the patient’s own perception of health will add considerably to data on death and disability.
This work was supported by grant k23 RR00194-01 from the National Institutes of Health (D.H.K.).
- Received July 22, 2004.
- Revision received October 19, 2004.
- Accepted December 7, 2004.
Qureshi AI, Hutson AD, Harbaugh RE, Stieg PE, Hopkins LN. Methods and design considerations for randomized clinical trials evaluating surgical or endovascular treatments for cerebrovascular diseases. Neurosurgey. 2004; 54: 248–267.
Duncan PW, Jorgensen HS, Wade DT. Outcome measures in acute stroke trials; a systematic review and some recommendations to improve practice. Stroke. 2000; 31: 1429–1438.
Kreiter KT, Copeland D, Bernardini GL, Bates JE, Peery S, Claassen J, Du E, Stern Y, Connolly ES, Mayer SA. Predictors of cognitive dysfunction after subarachnoid hemorrhage. Stroke. 2002; 33: 200–209.
McDowell I, Newell C. Measuring Health-A Guide To Rating Scales and Questionnaires. New York: Oxford University Press; 1996.
Collin C, Wade DT, Davies S. The Barthel ADL Index; a reliability study. Int Disabil Stud. 1998; 10: 61–63.
Brott T, Adams HP, Olinger CP, Marler SR, Barsan WG, Biller J, Spilker J, Holleran R, Eberle R, Herzberg V, Rorick M, Moomaw CJ, Walker M. Measurements of acute cerebral infarction: a clinical examination scale. Stroke. 1989; 20: 864–870.
Goldstein LB, Samsa GP. Reliability of the National Institutes of Health Stroke Scale; extension to non-neurologists in the context of a clinical trial. Stroke. 28 (2): 307–310,Feb 1997.