Gene Expression Profiling of Blood for the Prediction of Ischemic Stroke
Background and Purpose—A blood-based biomarker of acute ischemic stroke would be of significant value in clinical practice. This study aimed to (1) replicate in a larger cohort our previous study using gene expression profiling to predict ischemic stroke; and (2) refine prediction of ischemic stroke by including control groups relevant to ischemic stroke.
Methods—Patients with ischemic stroke (n=70, 199 samples) were compared with control subjects who were healthy (n=38), had vascular risk factors (n=52), and who had myocardial infarction (n=17). Whole blood was drawn ≤3 hours, 5 hours, and 24 hours after stroke onset and from control subjects. RNA was processed on whole genome microarrays. Genes differentially expressed in ischemic stroke were identified and analyzed for predictive ability to discriminate stroke from control subjects.
Results—The 29 probe sets previously reported predicted a new set of ischemic strokes with 93.5% sensitivity and 89.5% specificity. Sixty- and 46-probe sets differentiated control groups from 3-hour and 24-hour ischemic stroke samples, respectively. A 97-probe set correctly classified 86% of ischemic strokes (3 hour+24 hour), 84% of healthy subjects, 96% of vascular risk factor subjects, and 75% with myocardial infarction.
Conclusions—This study replicated our previously reported gene expression profile in a larger cohort and identified additional genes that discriminate ischemic stroke from relevant control groups. This multigene approach shows potential for a point-of-care test in acute ischemic stroke.
Stroke is a leading cause of adult death and disability.1,2 The diagnosis of ischemic stroke (IS) is made with clinical assessment in combination with brain imaging. However, the diagnosis is not always straightforward, particularly in the acute setting where an accurate, inexpensive, and rapid diagnosis is critical to optimally treat patients.
Extensive efforts have been directed toward identifying blood based biomarkers for IS. More than 58 proteins and 7 panels of proteins have been described as biomarkers of IS.3–5 RNA expression profiles in the blood have also been described in IS.6–8 We previously reported a 29-probe set expression profile predictive of IS.6 This profile required validation in a second cohort, which has been done in the current study. We also describe a 97-probe set expression profile that differentiates IS from control subjects who are healthy, have vascular risk factors, and who have myocardial infarction. These profiles represent further refinement of gene expression as a diagnostic tool in patients with acute IS, which could be used to aid in the diagnosis of stroke in the context of clinical information and evaluation.
Materials and Methods
The study had 2 objectives: (1) to demonstrate that the previously identified 29 probes distinguish IS from healthy control subjects6 in a new cohort; and (2) to identify additional genes that discriminate IS from vascular risk factor (Sex, Age and Variation in Vascular functionalitY [SAVVY]) control subjects and myocardial infarction (MI) control subjects. Whole blood was drawn from patients with IS (n=70, 199 samples) at ≤3, 5, and 24 hours (3-hour IS, 5-hour IS, 24-hour IS) as part of the Combined approach to Lysis utilizing Eptifibatide And Recombinant tissue-type plasminogen activator (CLEAR) trial9 (NCT00250991 at www.Clinical-Trials.gov). IS subjects were treated with recombinant tissue plasminogen activator with or without eptifibatide after the 3-hour blood sample was obtained. Control subjects included healthy subjects (n=38), subjects with acute MI (n=17), and subjects with at least 1 cardiovascular risk factor (hypertension, diabetes mellitus, hyperlipidemia, or tobacco smoking) recruited from the SAVVY study (n=52). The Institutional Review Board at each site approved the study, and each patient provided informed consent. Blood samples were collected in PAXgene tubes (PreAnlytix). Isolated RNA was processed using Ovation Whole Blood reagents (Nugen Technologies, San Carlos, Calif) and hybridized onto Affymetrix Genome U133 Plus 2 GeneChips (Affymetrix, Santa Clara, Calif). Data were normalized using Robust Multichip Averaging10 and our internal-gene normalization approach.11
The predictive ability of the 29 previously identified genes was determined using the k-nearest neighbor in PAM (Prediction Analysis of Microarrays).12 IS and healthy subjects were randomly split in half stratified by group and time-point (for the IS samples) into a training set to develop the prediction algorithm and an independent test (validation) set for evaluating the accuracy of the prediction algorithm.
To identify genes able to discriminate between IS and all control groups, an analysis of covariance adjusted for age, gender, and microarray batch effect was used. Genes significant on the analysis of covariance models were input into PAM where their number was further reduced using the nearest shrunken centroids algorithm (see Supplemental Materials for details; available at http://stroke.ahajournals.org). The ability of the identified genes to predict IS from control subjects was assessed using (1) 10-fold crossvalidation (CV); and (2) assessed in a second (independent) test (validation) set using several prediction algorithms (k-nearest neighbor, support vector machine [SVM], linear discriminant analysis, and quadratic discriminant analysis). Only the 3-hour IS (not treated) and 24-hour IS samples were analyzed for Objective 2 because they were considered most clinically relevant. See Supplemental Materials and Methods (Supplemental Figure I and Supplemental text) for details of the prediction and CV analyses for Objectives 1 and 2.
Demographic information is presented in Table 1 (Objective 1) and Table 2 (Objective 2). Age was significantly different between IS and control groups (P<0.05; Tables 1 and 2⇓). Gender was significantly different (P<0.05) between IS and healthy subjects in the Tang et al study6 and the current study (Table 1) as well as between IS and vascular risk factor (SAVVY) control subjects from the current study (Table 2). Race was significantly different between IS compared with healthy and MI control subjects (Table 2). Hypertension and diabetes were not significantly different between the groups.
Replication of Tang et al IS Predictors in a Larger Cohort
Due to the different array processing protocols in the study by Tang et al,6 and the current studies, the following analyses were performed: (1) the prediction algorithm was retrained on the first random half of the new samples (training set) and the performance of the 29 probe sets evaluated in the second half (test/validation set); and (2) the samples used in the Tang et al study6 and the current study were internal gene-normalized. Overall, 92.9% sensitivity for IS and 94.7% specificity for healthy control subjects with high test set probabilities was achieved (Figure 1; Table 3). The results are similar to the ability of these predictors to classify the previously published patients6 with 88.9% sensitivity for IS and 100% specificity for healthy control subjects (Table 3). In addition, for comparison purposes to the previous study,6 Robust Multichip Averaging normalization and CV (used in the previous study6) on our complete set of IS and healthy samples was performed. Similar results were obtained (Supplemental Table I and Supplemental Figure II).
Refinement of Prediction of IS Against Several Different Control Groups
Differentiation of IS Patients From Control Subjects
Predictive gene expression signatures were derived individually for each comparison. To discriminate the 3-hour IS group from the healthy (training set), MI (CV set, due to small sample size for MI), and SAVVY (training set) control groups, the PAM classification algorithm derived 17, 31, and 22 predictor probe sets/genes, respectively. Putting these genes into PAM to predict the class of the subjects in the test groups yielded 87.9/94.7%, 98.5/82.4%, and 100/96.2% sensitivity/specificity for 3-hour IS compared with healthy, MI, and SAVVY control samples, respectively (Supplemental Figures III, IV, and V, respectively).
To discriminate the 24-hour IS group from the healthy (training set), MI (CV set, due to small sample size for MI), and SAVVY (training set) control groups, the PAM classification algorithm derived 20, 19, and 9 predictor probe sets/genes, respectively. Putting these genes into PAM to predict the class of the subjects in the test groups yielded 90.9/94.7%, 93.9/88.2%, and 97/100% sensitivity/specificity for 24-hour IS compared with healthy, MI, and SAVVY control samples, respectively (Supplemental Figures VI, VII, and VIII, respectively).
Prediction Accuracy of 3-Hour IS Predictors on 3-Hour IS, Healthy, MI, and SAVVY Subjects
Combining the lists of the 3-hour predictors from the individual comparison analyses yielded 60 unique probe sets representing 56 annotated genes. Their prediction probability using PAM on the test set is presented in Figure 2A. The percent correctly predicted samples from PAM as well as the best performing prediction model (SVM) are presented in Table 4. Overall (normalized) accuracy was 91.2%. With SVM, the sensitivity was 94% and specificities were 96% for SAVVY, 88% for MI, and 68% for healthy. Analysis in PAM produced lower sensitivity for IS but higher specificity for healthy subjects compared with SVM (Table 4). In addition to the split sample analysis, we performed a 10-fold CV, which is a preferred method for developing and evaluating prediction algorithms for small sample sizes. This produced the expected better prediction results (Supplemental Table II; Supplemental Figure IX).
Prediction Accuracy of 24-Hour IS Predictors on 24-Hour IS, Healthy, MI, and SAVVY Subjects
Combining the lists of the 24-hour predictors from the individual comparison analyses yielded 46 unique probe sets representing 32 annotated genes. Their prediction probability using PAM on the test set is presented in Figure 2B. The percent correctly predicted samples from PAM as well as SVM (best performing prediction model) are presented in Table 4. Overall (normalized) accuracy was 89.2%. With SVM, the sensitivity was 94% and specificities were 96% for SAVVY, 50% for MI, and 84% for healthy subjects. Better results were again obtained using a 10-fold CV (Supplemental Table II; Supplemental Figure IXB).
Prediction Accuracy of Combined 3-Hour and 24-Hour IS Predictors on 3-Hour and 24-Hour IS, Healthy, MI, and SAVVY Subjects
Combining the lists of the 3-hour and 24-hour predictors from the individual comparison analyses yielded 97 unique probe sets representing 79 annotated genes. Their prediction probability using PAM on the test set is presented in Figure 2C. The percent correctly predicted samples from PAM and SVM (best performing prediction model) are presented in Table 4. Overall (normalized) accuracy was 91.2%. With SVM, the sensitivity was 95% and specificities were 96% for SAVVY, 75% for MI, and 68% for healthy subjects. Analysis in PAM produced lower sensitivity for IS but higher specificity for healthy subjects compared with SVM (Table 4). Similarly, due to the small sample numbers of MI subjects, 10-fold CV was performed, which yielded somewhat better results (Supplemental Table II; Supplemental Figure IXC).
Main Biological Function of Biomarkers Described
Using Ingenuity Pathway analysis software (see Supplemental Materials), the coagulation system was the only statistically overrepresented biofunction in the combined 97-probe set list of 3-hour and 24-hour IS predictors. The coagulation genes included coagulation factor V (proaccelerin, labile factor) and thrombomodulin. GO annotations and the complete list of predictors are presented in Supplemental Table III. Less stringent criteria yielded large numbers of genes with many more regulated pathways (not shown).
Diagnosis of IS is based on clinical impression combined with brain imaging. However, in the acute setting, brain imaging is not always readily accessible, and clinical evaluation by persons experienced in stroke is not always readily available. In such patients, a blood test could be of use to diagnose IS. Several protein biomarkers have been associated with IS, but in the acute setting, these have not yet shown sufficient sensitivity nor specificity to be clinically useful.3–5 In this study, we show that gene expression profiles could be used as biomarkers of IS, replicated our previous findings, and refined the gene expression signature of IS by including more relevant control groups.
We previously reported a 29-probe set profile that distinguished IS from healthy control subjects.6 When this profile was used to predict a larger cohort of patients in this study, it distinguished IS from healthy subjects with a sensitivity of 92.9% and specificity of 94.7%. This is important in that it represents a validation of the concept that gene expression profiles can identify patients with stroke. Replication of gene expression profiles has been a challenge in the field, in large part due to false discovery associated with performing multiple comparisons. Robust biological responses and careful analyses made it possible to validate this 29-probe set profile in this study.
To obtain more biologically useful predictors of IS, we identified gene profiles that distinguish IS from patients with vascular risk factors and MI. Using the individual group comparisons, we predicted the diagnosis of IS compared with the vascular risk factor group with >95% sensitivity and specificity. Using the individual group comparisons, we differentiated patients with IS from MI with >90% sensitivity and >80% specificity. Biologically, this suggests at least some differences in the immune responses to infarction in the brain and heart.
The 3-hour time point was a focus of most comparisons because this represents the critical time when decisions are made regarding acute therapy such as thrombolysis. Thus, for the development of a point-of-care test, this time period is when gene expression profiles could be of greatest use. With the 60-probe set signature, at the 3-hour time point, we achieved correct classification rates of 85% to 94%, 92% to 96%, 88%, and 68% to 84% for IS, vascular risk factor, MI, and healthy control subjects, respectively. These are approaching clinically useful ranges.
Although RNA profiles were the focus in this study, the identified genes could be used as a guide in the evaluation of protein biomarkers for IS. Genes for Factor 5 and thrombomodulin were both identified as differentially expressed in IS compared with control subjects. Both of these molecules have also been identified as proteins associated with IS.6,8,13 Many of the other genes we identified have not yet been studied but may represent potential candidates for the development of protein biomarker profiles.
The goal of this study was not to identify all differentially expressed genes between IS and control subjects, but rather identify sets of genes whose patterns of expression may be useful for stroke diagnosis. As a result, these analyses have excluded large numbers of differentially expressed genes that are biologically relevant in IS. These will be the subject of future studies. Limitations of this study include (1) lack of stroke “mimics” in the control groups; (2) lack of validation by quantitative reverse transcription–polymerase chain reaction, which would likely be used for clinical applications; (3) the confounding treatment effects in the 5-hour and 24-hour blood samples from patients with IS; (4) race was not factored in due to different distributions with zero subjects in some of the race categories; (5) age is a confounder that we tried to address by factoring it in analysis of covariance models and by selecting control groups with close age distribution to the patients with IS; and finally, (6) an analysis of variance for all of the groups combined yielded a significant number of regulated genes. However, these genes were not as predictive. This likely occurred because the PAM derivation of the training set of genes was not optimal, whereas individual group comparisons yielded more predictive genes. In the end, statistical validation was achieved by using our training set of genes to predict an independent test set of samples.
We thank the investigators of the Specialized Program for Translational Research in Acute Stroke (SPOTRIAS) Stroke Network involved in the CLEAR trial at the University of Cincinnati for supplying blood samples for analysis. We appreciate the support of the MIND Institute, the Genomics and Expression Resource at the MIND Institute, and the University of California Davis Department of Neurology.
Sources of Funding
This study was supported by National Institutes of Health/National Institute of Neurological Disorders and Stroke grants NS056302 (F.R.S.), PO21040N635110 (J.P.B.), and the American Heart Association Bugher Foundation (F.R.S.). G.J. is a fellow of the Canadian Institutes of Health Research (CIHR). H.X., B.A., and Y.T. are Bugher Fellows. This publication was also made possible by Grant Number UL1 RR024146 from the National Center for Medical Research to the CTSC at the University of California Davis (B.S., G.J.). Its contents are the responsibility of the authors and do not represent the official view of the National Center for Research Resources or the National Institutes of Health.
J.B. is Principal Investigator of the National Institute of Neurological Disorders and Stroke (NINDS)-funded Interventional Management of Stroke (IMS) III Trial, UC SPOTRIAS Center (includes NINDS-funded CLEAR-ER and STOP-IT Clinical Trials), NINDS-funded Familial Intracranial Aneurysm (FIA) Study, and NINDS-funded T-32 Cerebrovascular Fellowship Training Program for Cerebrovascular Disease; is coinvestigator of NINDS-funded Genetic and Environmental Risk Factors for Hemorrhagic Stroke, NINDS-funded “Comparison of Hemorrhagic and Ischemic Strokes Among Blacks and Whites,” and NINDS-funded Insulin Resistance Intervention after Stroke (IRIS) Trial, Carotid Revascularization Endarterectomy Versus Stenting Trial (CREST), Carotid Occlusion Surgery Study (COSS), and SWISS Studies, Genentech Inc. (supplier of alteplase for NINDS-funded CLEAR-ER and IMS III trials). EKOS Corporation supplies catheter devices for ongoing IMS III clinical trials. Concentric Inc supplies devices for the IMS III trial. Johnson and Johnson supplies catheters for the IMS III Study. Schering Plough supplies drug for the NINDS-funded CLEAR-ER Trial. J.B. received honoraria for speaking fees from Oakstone Medical Publishing (honorarium received 3/11/10); and consulting fees for participation in stroke advisory board (by PhotoThera on 2/25/10 and by Genetech Inc. on 4/24/10); consulting fees and honoraria are placed in an educational/research stroke fund in the Department of Neurology. A.P. received a research grant from the NINDS for CLEAR–ER Study and received other research support from Genentech and Schering Plough for the study drug.
The online-only Data Supplement is available at http://stroke.ahajournals.org/cgi/content/full/STROKEAHA.110.588335/DC1.
- Received April 27, 2010.
- Revision received June 28, 2010.
- Accepted July 29, 2010.
Thom T, Haase N, Rosamond W, Howard VJ, Rumsfeld J, Manolio T, Zheng ZJ, Flegal K, O'Donnell C, Kittner S, Lloyd-Jones D, Goff DC Jr, Hong Y, Adams R, Friday G, Furie K, Gorelick P, Kissela B, Marler J, Meigs J, Roger V, Sidney S, Sorlie P, Steinberger J, Wasserthiel-Smoller S, Wilson M, Wolf P. Heart disease and stroke statistics—2006 update: a report from the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. Circulation. 2006; 113: e85–e151.
World Health Organization. The Atlas of Heart Disease and Stroke. Geneva: WHO; 2005.
Whiteley W, Tseng MC, Sandercock P. Blood biomarkers in the diagnosis of ischemic stroke: a systematic review. Stroke. 2008; 39: 2902–2909.
Foerch C, Montaner J, Furie KL, Ning MM, Lo EH. Invited article: searching for oracles? Blood biomarkers in acute stroke. Neurology. 2009; 73: 393–399.
Tang Y, Xu H, Du X, Lit L, Walker W, Lu A, Ran R, Gregg JP, Reilly M, Pancioli A, Khoury JC, Sauerbeck LR, Carrozzella JA, Spilker J, Clark J, Wagner KR, Jauch EC, Chang DJ, Verro P, Broderick JP, Sharp FR. Gene expression in blood changes rapidly in neutrophils and monocytes after ischemic stroke in humans: a microarray study. J Cereb Blood Flow Metab. 2006; 26: 1089–1102.
Xu H, Tang Y, Liu DZ, Ran R, Ander BP, Apperson M, Liu XS, Khoury JC, Gregg JP, Pancioli A, Jauch EC, Wagner KR, Verro P, Broderick JP, Sharp FR. Gene expression in peripheral blood differs after cardioembolic compared with large vessel atherosclerotic stroke: biomarkers for the etiology of ischemic stroke. J Cereb Blood Flow Metab. 2008; 28: 1320–1328.
Moore DF, Li H, Jeffries N, Wright V, Cooper RA Jr, Elkahloun A, Gelderman MP, Zudaire E, Blevins G, Yu H, Goldin E, Baird AE. Using peripheral blood mononuclear cells to determine a gene expression profile of acute ischemic stroke: a pilot investigation. Circulation. 2005; 111: 212–221.
Pancioli AM, Broderick J, Brott T, Tomsick T, Khoury J, Bean J, del Zoppo G, Kleindorfer D, Woo D, Khatri P, Castaldo J, Frey J, Gebel J Jr, Kasner S, Kidwell C, Kwiatkowski T, Libman R, Mackenzie R, Scott P, Starkman S, Thurman RJ. The combined approach to lysis utilizing eptifibatide and rt-PA in acute ischemic stroke: the CLEAR stroke trial. Stroke. 2008; 39: 3268–3276.
Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003; 19: 185–193.
Stamova BS, Apperson M, Walker WL, Tian Y, Xu H, Adamczy P, Zhan X, Liu DZ, Ander BP, Liao IH, Gregg JP, Turner RJ, Jickling G, Lit L, Sharp FR. Identification and validation of suitable endogenous reference genes for gene expression studies in human peripheral blood. BMC Med Genomics. 2009; 2: 49.
Tibshirani R, Hastie T, Narasimhan B, Chu G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A. 2002; 99: 6567–6572.