A Discriminative Prediction Model of Neurological Outcome for Patients Undergoing Surgery of Brain Arteriovenous Malformations
Background and Purpose— To develop and validate a discriminative model for predicting neurological morbidity after brain arteriovenous malformation (bAVM) surgery.
Methods— Of 233 consecutive, prospectively enrolled patients undergoing bAVM surgery, the first 175 were used to derive, and the last 58 to validate, the prediction model. Demographic and angiographic factors were related to modified Rankin Scale scores assigned before, within 72 hours, at 7 days and at ≥1 year after surgery to seek predictors of postoperative neurological deficits (modified Rankin Scale score ≥3). These factors included nidus size, eloquence, venous drainage, diffuseness, white matter configuration, arterial perforator supply and associated aneurysms.
Results— Brain eloquence, diffuse nidus and deep venous drainage were significant predictors of early disabling neurological deficits (odds ratios of 4.33, 3.49 and 2.38, respectively). The rounded odds ratios form a weighted 9-point prediction model (maximum scores for eloquence+diffuseness+deep drainage=4+3+2). The score discriminated the probability of experiencing both early (first week) and permanently (at ≥1 year) disabling neurological deficits as follows: 0 to 2: 1.8%, 3 to 5: 17.4%, 6 to 7: 31.6%, >7: 52.9% for early and 0 to 2: 1.8%, 3 to 5: 4.4%, 6 to 7: 18.4%, >7: 32.4% for permanently disabling outcomes. The discrimination of the model was 0.80 with 2.8% optimism. Validation in the second patient cohort revealed good performance at risk stratification.
Conclusions— Relative weights assigned to brain eloquence, diffuse nidus morphology and deep venous drainage of a bAVM provide a simple and discriminative prediction model for neurological outcome after bAVM surgery.
Brain arteriovenous malformations (bAVMs) impart affected individuals with a life-long risk of hemorrhagic stroke.1 They may also cause seizures, headaches, and progressive neurological deficits.1,2 BAVMs have diverse distributions, morphologies and angioarchitectural associations including brain aneurysms3; thus, their clinical and morphological heterogeneity requires that treatments be individualized. When surgery is considered, grading schemes are used to aid in predicting the associated risks.4–6 The most frequently used is the Spetzler-Martin (SM) grading,4 in which ordinal scores are assigned to 3 variables: bAVM size, eloquence of location, and venous drainage pattern. Such grading systems facilitate the stratification of bAVM characteristics between patients and the comparison of clinical outcomes between institutions.4–8 However, simplicity carries the price of a reduced discrimination between grades in the prediction of postoperative neurological outcomes.8 For example, different patients harboring a SM grade III bAVM may be deemed to have different surgical risks owing to features unaccounted for by the grading system, and this has caused some to suggest adjustments to the SM scale.8
Our goal was to develop and prospectively validate a simple but also discriminative prediction model for neurological outcome after bAVM surgery. Thus, in addition to the SM variables above, we also studied the influence of additional angioarchitectural features of bAVMs believed to impact on surgical morbidity.7–12 From this analysis, we generated and internally, as well as prospectively, validated a simple, discriminative model for predicting neurological outcome after bAVM surgery.
Materials and Methods
The University of Toronto Brain AVM Study Group collects an ongoing, prospective database of demographic, clinical, morphological and treatment data on all bAVM patients seen at our institution. Between 1990 and 2005, 233 of a total of 1058 bAVM patients had surgery at our hospital. We used the first 175 consecutive patients (derivation cohort) to develop and internally validate the model. Data from the subsequent 58 consecutive patients (validation cohort) were used to validate the model. This 3:1 size ratio between the derivation and validation cohorts is generally accepted as appropriate for prospective validation of a model.13
For model derivation, all morphological bAVM features were analyzed from magnetic resonance images (MRI) and conventional cerebral angiograms by a neuroradiologist (K.G.T.) blinded to all patient data. Three were taken from the SM system: nidus size, surrounding brain eloquence and deep venous drainage. The additional variables were diffuse nidus morphology, a deep white matter configuration, a deep arterial supply and associated aneurysms. All predictors were recorded as binary variables/exposures. Size was dichotomized into <3 cm or ≥3 cm based on MRI measurements (only 2 patients with bAVMs >6 cm underwent surgery). Interobserver variability for key measures was determined between 2 neuroradiologists (K.G.T., R.A.W.) and again between a neuroradiologist (K.G.T.) and a neurosurgeon (M.T.).
For scoring consistency, the physicians assessing the bAVM angioarchitectural variables were required to use solely the criteria contained in the following operational definitions of the variables: Eloquence: as in the SM system4 - a nidus location in sensorimotor, language, and visual cortex, thalamus, hypothalamus, internal capsule, brain stem, cerebellar peduncles and deep cerebellar nuclei. Deep drainage: any nidus drainage into the deep venous system. Deep arterial supply: when arteries penetrated the bAVM through brain parenchyma rather than from the cortical surface. Deep white matter configuration: a nidus located in the deep white matter tracts and not exhibiting the classic wedge-shaped, corticoventricular configuration. Associated aneurysms included prenidal, intranidal or postnidal aneurysms. Aneurysms on arteries remote from the bAVM were excluded from analysis. Diffuse nidus morphology was defined as a nidus in which intervening brain parenchyma was visualized by MRI or was anticipated on cerebral angiography (Figure 1). This contrasts with a compact nidus, which exhibits clearly defined margins and no intervening brain parenchyma (Figure 2).
The primary outcome measures were the occurrence of new neurological deficits (within 72 hours of surgery) and early disabling neurological outcomes (at 7 days) after surgery. A secondary outcome measure was a permanently disabling neurological outcome at 1 year or more. New neurological deficits were defined as any of motor (eg, monoparesis, hemiplegia), speech (eg, dysphasia) or visual deficits. Early and permanently disabling neurological outcomes were scored using the modified Rankin Scale (mRS) score and the Glasgow outcome score (GOS). They were defined as a GOS ≤3 or an increase in mRS of ≥3 points when the preoperative mRS was 0, of ≥2 points when the preoperative mRS was 1, or of ≥1 when preoperative mRS was 2, 3, 4 or 5. Pre-existing neurological disabilities were accounted for by recording the change in pre- and postoperative mRS. However, neurological morbidity incurred from previous treatments (ie, embolization or radiation) was incorporated into the reported neurological outcomes. All outcomes were evaluated by a physician blinded to the patient’s imaging and prior clinical data. Patients lost to follow-up had their last recorded neurological examination considered as their long-term outcome. Last follow up was in-person (86%), by telephone (12%) or via the patient’s family physician (2%).
Except where otherwise indicated, analyses focused on clinical outcomes scored by mRS. Univariable analysis was used to determine the significance of each variable in predicting a new neurological deficit, early, and permanently disabling neurological outcome. All variables related to early disabling neurological outcome in univariable screens (P<0.10) were entered into a forward, stepwise logistic regression model setting the entry and elimination thresholds at P<0.01. Odds ratios for each predictor variable were rounded to the nearest integer to create relative weightings that reflect the relative importance of each variable.14 From this we derived a model with 4 risk strata for predicting neurological outcome after surgery.
Prediction Model Evaluation
Discrimination of the model measures its ability to classify the patients into those who will incur neurological morbidity after surgery versus those who will not. This was determined by calculating the area under a receiver operator characteristic curve (ROC area) for each outcome (new neurological deficit, early disabling outcome and permanently disabling outcome). An ROC area of 0.5 indicates no discrimination, whereas an ROC area of 1.0 indicates perfect discrimination. The model was calibrated by determining the discrepancies between the predicted and actual outcomes and obtaining the Hosmer-Lemeshow goodness-of-fit statistic.
Prospective validation was performed by applying the model to 58 subsequently treated consecutive patients and observing the trend in the model’s predictive accuracy for each neurological outcome.
Patient and bAVM Characteristics
Demographic and bAVM characteristics of the patient cohort are presented in Table 1. The ages and sex distribution in the derivation and validation cohorts are consistent with previous reports.1 Of the 233 patients (derivation cohort n=175, validation cohort n=58), 38.6% underwent at least 1 embolization as part of their treatment, 5.1% had radiation and 4.3% had both embolization and radiation in addition to surgery. In select cases (1.7%), surgery was performed on patients with progressive neurological deficits (Table 1). The incidence of left and right sided bAVMs, and location in eloquent brain, was comparable in the derivation and validation cohorts.
Interobserver variability (κ statistic) for assessing key angioarchitectural variables, assessment of SM grade and proposed model risk category, was measured both between 2 neuroradiologists (K.G.T., R.A.W.) and between a neuroradiologist (K.G.T.) and a neurosurgeon (M.T.; Table 2). Notably, the determination of SM grade and of our proposed score exhibit similar degrees of agreement, with κ values consistent with previous reports.15,16 Moreover, agreement for determining AVM nidus diffuseness, a significant parameter in our model (below), was as good as or better than the other parameters, including SM variables. In brief, interobserver agreement is considered poor with a κ statistic between 0 and 0.20, fair between 0.21 to 0.40, moderate between 0.41 to 0.60, substantial between 0.61 to 0.80, and near perfect between 0.81 to 1.00.17
There were no differences between genders, nor an effect of age, nor an impact of preoperative embolization on the occurrence of new neurological deficits, early or permanently disabling neurological outcomes (supplemental Table Ia, available online at http://stroke.ahajournals.org). Therefore, these variables were excluded from further multivariable analysis.
New Neurological Deficits (within 72 hours)
A new neurological deficit was detected in 69 of 175 patients in the derivation cohort (39.4%). All angioarchitectural variables, except for associated aneurysms, were related to new neurological deficits on univariable analyses (supplemental Table Ib), but on multivariable modeling only eloquent brain location and diffuse nidus morphology were predictive (Table 3).
Early Disabling Neurological Outcome (within 7 days)
Early disabling outcomes occurred in 24 (13.7%) or 39 (22.3%) of 175 patients as assessed using GOS and mRS, respectively, suggesting a higher sensitivity of the mRS in scoring neurological morbidity. mRS scores were used in all subsequent tests. In univariable analyses, bAVM size, eloquence, deep venous drainage, diffuse nidus and an arterial perforator supply were significant (supplemental Table Ib). On multivariable analysis, the number of independent predictors was reduced to only 3: eloquent nidus location, diffuse nidus morphology, and deep venous drainage (Table 3).
Permanently Disabling Neurological Outcomes (≥1 year)
These occurred in 10 (5.7%) or 21 (12.0%) of 175 patients as assessed using GOS and mRS, respectively. Thirty-day mortality was 1.5% (n=2). One was attributable to a postoperative hemorrhage, the other to a pulmonary embolus. One-year mortality was 2.9% (6 patients). Of the additional 4 deaths, 1 was attributable to a brain hemorrhage and 3 to medical complications of pre-existing neurological disabilities. Additional details are provided (supplemental Table II, available online at http://stroke.ahajournals.org).
This was undertaken using data for early disabling neurological outcomes because too few patients experienced permanently disabling outcomes to provide the requisite statistical power. The model was derived using the 3 significant predictor variables from multivariable analysis: eloquence, diffuse nidus and deep venous drainage. The derived odds ratios (Table 3), when rounded to the nearest integer, form a 9-point stratified risk score in which each predictor’s contribution is related to its relative weight (eloquence=4, diffuse nidus=3, deep venous drainage=2). When applied to the derivation cohort, this 9-point score discriminates the % probability of incurring an early disabling neurological outcome as follows: Low Risk (0 to 2 points)=1.8%, Moderate Risk (3 to 5 points)=17.4%, High Risk (6 to 7 points)=31.6%, Very High Risk (>7 points)=52.9% (Table 4a). In comparison, applying the SM system to the same data produced the following: SM grade I: 2.1%, II: 9.4%, III: 17.3%, IV: 39.1%. There were no Grade V patients.
Prediction of New Deficits and Permanently Disabling Outcomes
When the 9-point model was applied to predicting new neurological deficits (NND) and permanently disabling neurological outcomes (PD), it retained its ability to stratify patients into 4 risk categories (Table 4a) further demonstrating its utility. By contrast, applying the SM system to the same PD outcome suggested a loss of discrimination in predicting disabilities in the lower grades (SM grade I: 2.1%, II: 5.7%, III: 1.9%, IV: 21.7%.
Measures of Discrimination
The discrimination of the model (ROC area; Methods) was determined to test its ability to correctly classify patients into those who will incur neurological morbidity. The model performed well for predicting early disabling neurological outcome (ED) with an ROCED area=0.80 using mRS and 0.82 using GOS, as compared with the ROCED area of 0.76 for the SM system using mRS, and 0.74 using GOS in our derivation cohort. The discrimination of our proposed model for predicting NND and PD was also high, with ROCNND area=0.75 and ROCPD area=0.79 using mRS. By comparison, the ROCPD area for the SM system was 0.69, suggesting a loss of discrimination in predicting permanent deficits. An ROC curve area of ≥0.70 is considered clinically useful.18 For example, a well respected model for predicting adverse outcomes after coronary artery bypass grafting had an ROC area of 0.73.19
Internal Model Validation
Optimism is the degree to which a prediction model can overstate its predictive ability. Optimism was estimated by repeatedly calculating the ROC area using 1000 bootstrap resamples of the data set and calculating the difference between the ROC area of the test sample with that of the average of the ROC areas of all bootstrap samples. Optimism was determined as 2.8%. Calibration of the model by predicting early disabling outcomes (Table 4a) showed it to slightly overestimate the risk in the lowest and highest strata and underestimate the risk in the 2 intermediate categories. The Hosmer-Lemeshow statistic was 3.69 (P=0.45) indicating satisfactory goodness-of-fit.
Prospective Model Validation
After model derivation, prospective validation was undertaken in a subsequent 58 consecutive bAVM patients. A NND was detected in 20 of 58 patients (34.5% versus 39.4% in the derivation cohort). Early disabling and permanently disabling outcomes occurred in 10 patients (17.2% versus 22.3% in the derivation cohort) and 2 patients (3.4% versus 12.0% in the derivation cohort), respectively. From the prospective application of the model to the validation cohort, it retained its ability to stratify patients into defined risk categories for the incidence of NNDs (0 to 2: 15.6%, 3 to 5: 53.3%, 6 to 7: 60.0%, >7: 100%), ED outcome (0 to 2: 9.3%, 3 to 5: 26.6%, 6 to 7: 30.0%, >7: 0.0%), and PD outcome (0 to 2: 0.0%, 3 to 5: 0.0%, 6 to 7: 20.0%, >7: 0.0%), respectively (Table 4b). The model slightly underestimated the incidence of new neurological deficits and early poor outcomes in the validation cohort, but clearly shows increasing morbidity associated with increasing risk category.
A prediction model must balance simplicity and clinical utility. Such utility requires the model to be sufficiently discriminative to separate different types of bAVMs into meaningful risk categories. In the case of the SM system, the authors reported PD neurological outcomes 6 months postoperatively at 0% for grade I-III, and 21% and 17% for grade IV and V bAVM patients, respectively.5 In another study, bAVM patients graded by the SM system had a disabling neurological outcome at hospital discharge of 8%, 17%, 11%, 19% and 67% for grades I-V, respectively.7 At 1 year, these figure were 8%, 6%, 4%, 4% and 33% for grades I–V. Thus, the SM system includes parameters that have predictive value on outcome,4,5,7,11,20 but it stratifies patients into essentially 2 categories: low- and high-risk groups. Our data reflect a similar experience, with patients in SM Grades I–III having low, and SM grade IV having high, risk of incurring a PD outcome. Our work indicates that discrimination may be enhanced by incorporating 3 new aspects: First, by the recognition of nidus diffuseness as a key predictor variable. Second, by assigning relative weights to key predictor variables according to their importance. Third, by grading outcome using the mRS, a system commonly used to measure stroke outcomes and which has high statistical power.21
The concept of nidus diffuseness has been discussed previously,9,10,12,15,16 but never analyzed in detail or included in a surgical scoring system. Chin et al, in children,10 correlated the angiographic appearance of diffuse bAVMs with histopathological specimens and demonstrated normal neurons and white matter interspersed between AVM vessels. This morphology was associated with a higher surgical morbidity.10 Here, we show that quantification of diffuseness is practical by adhering to a strict operational definition (a nidus containing intervening brain parenchyma; Methods). Moreover, this variable is interpreted by both neuroradiologists and neurosurgeons with a substantial degree of agreement (k=0.62 and 0.67, respectively; Table 2) and a comparable degree of agreement to that of the other SM variables (Table 2).
An arterial perforator supply has been described as an additional challenge to bAVM surgery. In 1 study, patients with bAVMs fed primarily by the middle cerebral artery with a lenticulostriate supply had a worse neurological outcome than those without.20 Though our univariable analysis suggested a relationship between an arterial perforator supply and a worse outcome, this was not significant in the multivariable analysis. Deep arterial supply may interact with other variables and be accounted for by its association with these other predictors, such as deep venous drainage. A similar explanation may apply to deep white matter configuration.
Nidus size (≥3 cm) was not predictive of an early disabling neurological deficit on multivariable testing, whereas it was important in other classification schemes.4,5,7 As the bAVM size increases, it is more likely to impinge on eloquent brain, have deep venous drainage and have a diffuse component. Thus, these 3 independent predictors likely account for the surgical risks associated with larger bAVMs. However, we cannot exclude the possibility that very large nidus size (>6 cm) could be independently related to surgical risk because only 2 patients in our study had such bAVMs.
Lastly, bAVMs with associated aneurysms have been reported to have a higher hemorrhage rate,3,22,23 and this has motivated some surgeons to operate in order to remove the AVM. Because associated aneurysms were unrelated to adverse surgical outcomes, this factor need not influence treatment decisions in bAVM patients unless, of course, the aneurysm is the cause of a brain hemorrhage.
The overall clinical results highlight the safety of surgical extirpation of appropriately selected bAVMs. Though the absolute risks associated with each risk stratum in our model may be most applicable to neurosurgical centers with significant experience with bAVM surgery, we believe that the proposed model is generalizable. Prospective application of the model to the validation cohort demonstrates there to be satisfactory stratification for neurological outcome after surgery in a second cohort. In the future, this model will require validation in another institution; however, our results provide a simple, discriminative risk stratification scale which has been both internally and prospectively validated. The proposed model is meant to aid physicians who are contemplating surgical resection in the treatment of bAVM patients.
J.S. is a recipient of a Heart and Stroke Foundation of Canada and Merck Frosst Canada research fellowship. M.M. is a recipient of an Ontario Heart and Stroke Foundation John D. Schultz Science Student Scholarship award. M.T. is a Clinician-Scientist (Phase II) of the Canadian Institutes of Health Research. The authors wish to acknowledge Dr Fran Cook, Harvard School of Public Health, for his assistance in reviewing the manuscript.
- Received February 25, 2006.
- Revision received March 25, 2006.
- Accepted March 28, 2006.
Al-Yamani M, TerBrugge K, Willinsky R, Montanera W, Tymianski M, Wallace MC. Palliative embolisation of brain arteriovenous malformations presenting with progressive neurological deficit. Interventional Neuroradiology. 2000; 6: 177–183.
Hartmann A, Stapf C, Hofmeister C, Mohr JP, Sciacca RR, Stein BM, Faulstich A, Mast H. Determinants of neurological outcome after surgery for brain arteriovenous malformation. Stroke. 2000; 31: 2361–2364.
Batjer HH. (Comments). Neurosurgery. 1992 ;31: 869.
Martin NA, Vinters HV. Arteriovenous Malformations. In: Carter LP, Spetzler RF, Hamilton MG, eds. Neurological Surgery. New York, NY: McGraw-Hill; 1995; 875–903.
Spetzler RF, Anson JA. (Comments). Neurosurgery. 1992 ;31: 868.
Baggish AL, Siebert U, Lainchbury JG, Cameron R, Anwaruddin S, Chen A, Krauser DG, Tung, R, Brown DF, Richards AM, Januzzi JL. A validated clinical and biochemical score for the diagnosis of acute heart failure: The ProBNP investigation of dyspnea in the emergency department (PRIDE) acute heart failure score. Am Heart J. 2006; 151: 48–54.
Al-Shahi R, Pal N, Lewis SC, Bhattacharya JJ, Sellar RJ, Warlow CP. Observer agreement in the angiographic assessment of arteriovenous malformations of the brain. Stroke. 2002; 33: 1501–1508.
Swets JA. Measuring the accuracy of diagnostic systems. Science. 1988; 240: 1285–1293.
Young FB, Lees KR, Weir CJ. Strengthening acute stroke trials through optimal use of disability endpoints. Stroke. 2003; 34: 2676–2680.
Da Costa LB, Wallace MC, TerBrugge K, Willinsky R, Tymianski M. A single-center, prospective analysis of the natural history of hemorrhage from brain arteriovenous malformations with or without associated aneurysms. Neurosurgery. 2005; 57: 396–397.Abstract.