Patient Phenotypes Associated With Outcomes After Aneurysmal Subarachnoid Hemorrhage
A Principal Component Analysis
Background and Purpose—Predictors of outcome after aneurysmal subarachnoid hemorrhage have been determined previously through hypothesis-driven methods that often exclude putative covariates and require a priori knowledge of potential confounders. Here, we apply a data-driven approach, principal component analysis, to identify baseline patient phenotypes that may predict neurological outcomes.
Methods—Principal component analysis was performed on 120 subjects enrolled in a prospective randomized trial of clazosentan for the prevention of angiographic vasospasm. Correlation matrices were created using a combination of Pearson, polyserial, and polychoric regressions among 46 variables. Scores of significant components (with eigenvalues >1) were included in multivariate logistic regression models with incidence of severe angiographic vasospasm, delayed ischemic neurological deficit, and long-term outcome as outcomes of interest.
Results—Sixteen significant principal components accounting for 74.6% of the variance were identified. A single component dominated by the patients’ initial hemodynamic status, World Federation of Neurosurgical Societies score, neurological injury, and initial neutrophil/leukocyte counts was significantly associated with poor outcome. Two additional components were associated with angiographic vasospasm, of which one was also associated with delayed ischemic neurological deficit. The first was dominated by the aneurysm-securing procedure, subarachnoid clot clearance, and intracerebral hemorrhage, whereas the second had high contributions from markers of anemia and albumin levels.
Conclusions—Principal component analysis, a data-driven approach, identified patient phenotypes that are associated with worse neurological outcomes. Such data reduction methods may provide a better approximation of unique patient phenotypes and may inform clinical care as well as patient recruitment into clinical trials.
- coronary vasospasm
- outcome assessment (health care)
- principal component analysis
- subarachnoid hemorrhage
Patients with ruptured intracranial aneurysms represent a heterogeneous population. Considerable efforts have been placed on identifying clinical features on presentation that may predict a protracted subsequent disease course, namely the incidence of angiographic vasospasm, delayed ischemic neurological deficit (DIND), and poor long-term neurological outcomes.1 Increasingly, studies are using multivariate statistical models to evaluate features that may be independently associated with outcomes. Although useful, such approaches are limited by the need for a priori hypotheses on potential associations, exclusion of potentially useful variables because of restrictions in model building, and inability to include highly collinear covariates in the same model. Importantly, the intrinsic correlations among variables are also rarely studied, prompting questions as to whether uncorrelated linear combinations of individual predictors may reflect more accurately specific patient phenotypes.
Attempts to systematically identify, summarize, and prioritize the vast amount of heterogeneity in clinical presentations would facilitate understanding of the disease process and provide insight into specific phenotypes associated with a protracted disease course. Principal component analysis (PCA) offers an attractive alternative to hypothesis-driven models of outcome prediction. PCA is a dimensionality reduction technique using a linear transformation applied on multidimensional data. The original dimensions (input variables) are transformed into a new coordinate system in which the new coordinate axes (orthogonal principal components) contain the greatest variance.
The current study attempts to categorize multiple variables collected on presentation after subarachnoid hemorrhage (SAH) into clinically meaningful patient phenotypes that are associated with subsequent clinical course. Using PCA, we transform 46 variables from a large database derived from the Clazosentan to Overcome Neurological iSChemia and Infarction Occurring after Subarachnoid hemorrhage (CONSCIOUS-1) trial into uncorrelated linear combinations (orthogonal principal components). The association between principal component scores derived from this data-driven approach and angiographic vasospasm, DIND, and neurological outcome was then evaluated.
We performed a post hoc analysis of 413 subjects enrolled in the CONSCIOUS-1 trial, a prospective, randomized, double-blinded phase IIb trial evaluating the efficacy of clazosentan in preventing angiographic vasospasm.2 Subjects were enrolled between January 2005 and March 2006. The methods and results have been previously published.2 From this database, a subset of 120 subjects with complete data for all variables analyzed was extracted.
All patients recruited to the CONSCIOUS-1 trial had SAH confirmed on computed tomography. Historical data collected included the subjects’ ages and sex and whether they had a history of hypertension or nicotine use. The initial systolic blood pressure, mean arterial pressure, and heart rate were also collected. The severity of the subjects’ presenting symptoms was classified on the basis of the World Federation of Neurosurgical Societies (WFNS) scale.3 The patients underwent microsurgical clipping or endovascular coiling to treat the ruptured aneurysm. The choice of procedure was at discretion of the treating physician.
All patients underwent computed tomography on presentation to the respective institutions. The subarachnoid clot burden was quantified using the Hijdra scale, which evaluates the amount of clot in 10 fissures and cisterns using a scoring system as follows: 0 (no blood), 1 (small amount of blood), 2 (moderately filled with blood), or 3 (completely filled with blood) for a range of scores from 0 to 30.4 The change in Hijdra score between the baseline computed tomography and one performed after the aneurysm-securing procedure was defined as clot clearance. Intraventricular hemorrhage was quantified using a modification of the Graeb score, whereby a score of 0 (no blood), 1 (sedimentation, <25% filled), 2 (moderately filled), or 3 (completely filled) was given to each ventricle for a maximum possible score of 12.5,6 The frequency of intracerebral hemorrhage and subdural hematoma was also documented. Hydrocephalus was evaluated as the ventriculocranial ratio, the ratio of the width of the frontal horns of the lateral ventricles at the level of the foramen of Monroe to the distance between the inner tables of the skull on the same computed tomographic scan slice.7
All subjects also underwent catheter digital subtraction angiography within 48 hours after aneurysm rupture. The aneurysm location and aneurysm size were evaluated using digital subtraction angiography.
Numerous laboratory investigations were also recorded on presentation. These included a complete blood count, extended electrolytes, renal and liver function tests, and markers of coagulation. Calcium concentrations were corrected for albumin levels, where corrected calcium was defined as follows: measured calcium [mmol/L]+0.02 (40−serum, albumin [g/L]), where 40 represents the average albumin concentration in g/L. The remainder of the laboratory tests were recorded and analyzed as continuous variables in their respective standard units.
Principal Component Analysis
To perform PCA, the data set (with no missing entries) was organized into an n-by-m matrix, with n rows indicating observations (subjects) and m columns representing 46 baseline variables (dimensions). Because the range of numeric values of the different variables differed considerably, data centering was performed such that each of the dimensions had a mean of 0. To center the dimensions representing laboratory values, we derived a z score representing the deviation of the tests from normal ranges published by the Canadian Medical Council. This approach is more desirable than averaging within the sample distribution because it places greater weight on values that deviate significantly from normal ranges. To center the means of other dimensions (such as subarachnoid and intraventricular clot burden), a z score was derived from the sample distribution.
An m-by-m correlation matrix was then calculated from the centered data set. Data scaling by standardizing the covariance matrix was performed to allow the variables to have unit variances. This prevents certain features from dominating the analysis because of their large numeric values. The heterogenous correlation matrix consisted of Pearson product–moment correlations between numeric variables, polyserial correlations between numeric and ordinal variables, and polychoric correlations between ordinal variables.8,9 The matrix was calculated using the hetcor function of the polycor package of R statistical software.
The correlation matrix underwent eigenvalue decomposition to obtain its eigenvectors and corresponding eigenvalues. The latter represents the amount of variance (or correlation because of the standardized input structure) captured by the various principal components, whereas the former represents the contribution of each original dimension to the principal component. The eigenvectors were then sorted by variances (eigenvalues) in decreasing order. Principal components with an eigenvalue >1 were considered significant. Finally, the original data set was transformed using the eigenvectors as weighting coefficients to obtain principal component scores.
To illustrate the relationship between the variables in the space of selected components, biplots were created. These plots simultaneously show the loadings (the contributions of each original dimension to the respective principal component) and the principal component scores. The former are shown as a vector and the latter are shown as points on the Cartesian plane. The vectors of variables with low loadings are drawn in gray, and those contributing strongly to the components are drawn in red, with increasing line width indicative of a greater contribution. PCA and biplot plotting were performed using custom code in MATLAB software (Natick, MA).
Outcomes and Statistical Analysis
The scores of significant principal components (with an eigenvalue >1) were included as independent variables in a multivariate logistic regression. Three primary outcomes were considered as dependent variables in separate logistic regression models. The first outcome of interest was the incidence of moderate or severe angiographic vasospasm, which was defined as >50% change in the diameter of large, proximal vessels between the baseline digital subtraction angiography and one performed 7 to 11 days after SAH.
The second outcome of interest was the incidence of DIND, which was defined by the investigators as angiographic vasospasm on catheter angiography or transcranial Doppler ultrasound associated with neurological worsening lasting >2 hours after exclusion of other causes. Neurological worsening was defined as a decline of ≥2 points in the modified Glasgow Coma Scale or an increase of 2 points in the abbreviated National Institutes of Health Stroke Scale.10
The third dependent variable of interest was poor long-term neurological outcome defined as a modified Rankin Scale (modified Rankin Scale >3) measured at 3 months after SAH.11 This correlates to long-term outcome worse than moderate disability.
Of 120 subjects included in the analysis, 32% were men. The subjects had a mean age (±SD) of 51 (±11) years. The majority of patients presented with a good neurological grade (WFNS I–III; 74%), and nearly half of the subjects underwent microsurgical clipping, whereas the other half underwent endovascular coiling of the ruptured aneurysm. A complete list of the subjects’ demographic data is presented in Table 1.
Principal Component Analysis
After eigenvalue decomposition of the 46 original dimensions, 16 principal components were found to be significant (with an eigenvalue >1). These components explained 74.6% of the variance in the data. The 5 principal components explaining the greatest data variance are presented in Table 2.
The first principal component accounting for 11.7% of the variance was dominated by indicators of anemia (hemoglobin, hematocrit, and erythrocyte count) and albumin/protein levels. The second component accounting for 7.6% of the variance reflected the severity of the patients’ presentation. Variables that predominated in this component included hemodynamic status (systolic blood pressure, mean arterial pressure, and heart rate), WFNS score, extent of neurological injury (SAH and intraventricular hemorrhage burden based on the Hijdra and modified Graeb scores, respectively), as well as history of hypertension and leukocyte and neutrophil counts. The third component accounting for 7.2% of the variance profiled the patients’ baseline medical status. Variables that predominated in this component included the patients’ age, sex, renal and liver functions, coagulation status, and ventriculocranial ratio. The fourth component explaining 5.8% of the variance received strong contributions from hemodynamic status on presentation and liver function tests. Finally, the fifth component was largely predominated by the method of aneurysm treatment (microsurgical clipping versus coiling), WFNS score, subarachnoid clot clearance, intracerebral hemorrhage, and liver function tests. This component accounted for 5.4% of the data variance. A biplot of the first and second components (those explaining the greatest variance) is shown in the Figure (A), demonstrating the similarity in loading weights among correlated variables in the respective components.
Association With Angiographic Vasospasm, DIND, and Neurological Outcome
The 16 significant principal components were included in multivariate logistic regression models with angiographic vasospasm, DIND, and neurological outcome on the basis of the modified Rankin Scale score as the dependent variables. The second component was most strongly associated with the incidence of moderate or severe angiographic vasospasm (Table I in the online-only Data Supplement; odds ratio [OR], 0.73; 95% confidence interval [CI], 0.58–0.91; P=0.0045), followed by the fifth and first components (OR, 0.69; 95% CI, 0.52–0.90; P=0.0072 and OR, 1.14; 95% CI, 1.01–1.3; P=0.036, respectively). Furthermore, the fifth (OR, 1.70; 95% CI, 1.20–2.42; P=0.003) and first (OR, 0.83; 95% CI, 0.71–0.97; P=0.02) components were also significantly associated with the incidence of DIND (Table II in the online-only Data Supplement). Finally, only the second component was associated with poor neurological outcome on multivariate analysis (Table III in the online-only Data Supplement; OR, 1.68; 95% CI, 1.25–2.24; P=0.00049). Biplots of components associated with angiographic vasospasm, DIND, and clinical outcomes are shown in the Figure (B and C).
The clinical and radiographic features of SAH are phenotypically heterogeneous. Model-based analyses of clinical data from patients with aneurysmal SAH are limited by the need to formulate a priori hypothesis on intrinsic correlations present in the data. PCA represents an alternative data-driven approach that is capable of linearly transforming highly correlated multidimensional data into orthogonal, meaningful components, which more accurately represent patient phenotypes. This statistical approach is based on the notion that presentations of SAH may be grouped into distinct categories that may be associated with varying clinical courses and prognoses. Although increasingly used in genome-wide analyses to dissociate the contributions of multiple genetic factors to a single pathological phenotype (ie, a tumor)12 or to identify population migration patterns on the basis of genomic structure,13 PCA is largely underused in clinical trial data sets.
A single principal component, accounting for 7.6% of the variance in the data, was significantly associated with neurological outcome on multivariate analysis. This component was largely dominated by variables that describe the severity of the neurological insult, including the patient’s initial hemodynamic status, WFNS score, and radiographic neurological injury (SAH and intraventricular hemorrhage scores). These have all been reported previously to affect outcomes after aneurysmal SAH.14 Interestingly, the leukocyte and neutrophil levels are also correlated with the other dimensions forming this component. It has been suggested that early elevations in circulating neutrophils after SAH may be a potential biomarker for SAH outcome.15 It has also been shown previously using the CONSCIOUS-1 database that systemic inflammatory response syndrome after SAH is associated with a poor outcome.16 Systemic inflammatory response syndrome was defined as 2 of the following 4 variables: hypothermia/fever, tachycardia, tachypnea, and leukocytosis or leukopenia. To our knowledge, the current study is the first to demonstrate a significant association between circulating neutrophil levels and patient phenotypes associated with poor neurological outcome in a large clinical cohort using data-driven methodology.
Three principal components were associated with the incidence of moderate or severe angiographic vasospasm. The most significantly associated component comprised the same patient phenotype associated with poor neurological outcome. Many of the same features including hemodynamic status,17 WFNS grade,14 and radiography injury14 have been correlated previously with vasospasm. It is noteworthy that neutrophils are also increasingly understood to mediate early microvascular injury after SAH,18 and immunomodulation of these cells’ activity has been suggested as a strategy to prevent angiographic vasospasm.19 The importance of neutrophils in this component provides evidence of an association with angiographic vasospasm in a clinical cohort.
The second component associated with angiographic vasospasm, which accounts for 5.4% of the variance in the data, is largely dominated by the method of aneurysm treatment, clot clearance, intracerebral hemorrhage, and liver transaminases. Interestingly, this patient phenotype was also associated significantly with DIND. A growing body of work has demonstrated consistently that microsurgical clipping of aneurysms14,20,21 and reduced clot clearance22 are associated with greater incidence of angiographic vasospasm. Interestingly, transaminase levels were also correlated with other dimensions in this component. Although we cannot exclude the possibility of a confounding effect (eg, from the use of statin therapy), transaminases have been implicated recently in the incidence of neurovascular diseases, such as intracerebral hemorrhage,23 which also contributed to this principal component. Our findings suggest that the biological significance of the association between aminotransferase levels and the development of angiographic vasospasm and DIND merits further study.
The final component significantly, albeit less strongly, associated with angiographic vasospasm and DIND and, accounting for 11.6% of the variance in the data, comprised exclusively laboratory values. This principal component was dominated by hemoglobin, hematocrit, erythrocyte count, albumin, and protein levels. DIND may be affected by several factors, including cerebral blood flow and oxygen delivery.24 After SAH, more than half of patients develop anemia.25 Previous studies have consistently linked anemia or large hemoglobin reductions to infarction, death, and dependency,26–28 as well as delayed ischemia.26
An important observation in the current study is that multiple previously identified independent predictors of patient outcomes, in fact, represent a single patient phenotype. Furthermore, this phenotype was dissociable from other phenotypes that were associated with angiographic vasospasm and DIND but not long-term outcomes. This is of particular interest because a direct causal relationship between angiographic vasospasm, DIND, and long-term outcomes has been questioned by recent clinical evidence.2,29 Furthermore, the same clinical variables contributed differently to unique principal components, emphasizing that the interplay between the different variables, which is often ignored in traditional statistical models, is important for understanding patient phenotypes. This is analogous to insights gleaned by applying PCA to whole-genome sequencing data, whereby neoplastic conditions may be subclassified on the basis of the varying contributions of different combinations of genetic abnormalities to the pathological phenotype.12
Our results are limited by the exclusion of patients because of incomplete data and the inclusion of only 46 dimensions for eigenvalue decomposition. PCA is a robust procedure that can perform data reduction on n-dimensional data sets. We also did not attempt to identify independent predictors of outcomes using traditional model–driven methods.14,30 We have, however, shown that the current data-driven approach is effective in identifying patient phenotypes associated with a protracted disease course. These findings may be used to better select patients for clinical trial or allocate resources in the clinical setting. As a model-agnostic method, PCA can also serve as an exploratory, hypothesis-generating exercise, as demonstrated by the discovery of numerous putative associations between outcomes and biochemical profiles that are being described only recently.
Through a data-driven PCA approach, we identified primary patient phenotypes that are associated with angiographic vasospasm, DIND, and poor neurological outcome. These may be useful when evaluating patients who present with aneurysmal SAH and may inform more accurate inclusion criteria into clinical trials. We have also identified various putative associations that merit further study and characterization.
Sources of Funding
Actelion Pharmaceuticals Ltd was the sponsor of the CONSCIOUS-1 trial; the company provided the trial data set but had no role in this analysis or the development of the article. The data analysis and writing are the work of the authors.
Dr Macdonald receives grant support from the Physicians Services Incorporated Foundation, Brain Aneurysm Foundation, Canadian Institutes for Health Research, and the Heart and Stroke Foundation of Canada, and is Chief Scientific Officer of Edge Therapeutics Inc.
The online-only Data Supplement is available with this article at http://stroke.ahajournals.org/lookup/suppl/doi:10.1161/STROKEAHA.113.003078/-/DC1.
- Received September 30, 2013.
- Accepted December 4, 2013.
- © 2014 American Heart Association, Inc.
- Macdonald RL,
- Kassell NF,
- Mayer S,
- Ruefenacht D,
- Schmiedek P,
- Weidauer S,
- et al
- Teasdale GM,
- Drake CG,
- Hunt W,
- Kassell N,
- Sano K,
- Pertuiset B,
- et al
- Hijdra A,
- Brouwers PJ,
- Vermeulen M,
- van Gijn J
- Kotz S,
- Johnson N
- Drasgow F
- Brott T,
- Adams HP Jr.,
- Olinger CP,
- Marler JR,
- Barsan WG,
- Biller J,
- et al
- Farrell B,
- Godwin J,
- Richards S,
- Warlow C
- Northcott PA,
- Korshunov A,
- Witt H,
- Hielscher T,
- Eberhart CG,
- Mack S,
- et al
- Ibrahim GM,
- Macdonald RL
- Koivisto T,
- Vanninen R,
- Hurskainen H,
- Saari T,
- Hernesniemi J,
- Vapalahti M
- Kim HC,
- Kang DR,
- Nam CM,
- Hur NW,
- Shim JS,
- Jee SH,
- et al
- Ibrahim GM,
- Weidauer S,
- Vatter H,
- Raabe A,
- Macdonald RL