Genetic Heritability of Ischemic Stroke and the Contribution of Previously Reported Candidate Gene and Genomewide Associations
Background and Purpose—The contribution of genetics to stroke risk, and whether this differs for different stroke subtypes, remainsuncertain. Genomewide complex trait analysis allows heritability to be assessed from genomewide association study (GWAS) data. Previous candidate gene studies have identified many associations with stoke but whether these are important requires replication in large independent data sets. GWAS data sets provide a powerful resource to perform replication studies.
Methods—We applied genomewide complex trait analysis to a GWAS data set of 3752 ischemic strokes and 5972 controls and determined heritability for all ischemic stroke and the most common subtypes: large-vessel disease, small-vessel disease, and cardioembolic stroke. By systematic review we identified previous candidate gene and GWAS associations with stroke and previous GWAS associations with related cardiovascular phenotypes (myocardial infarction, atrial fibrillation, and carotid intima-media thickness). Fifty associations were identified.
Results—For all ischemic stroke, heritability was 37.9%. Heritability varied markedly by stroke subtype being 40.3% for large-vessel disease and 32.6% for cardioembolic but lower for small-vessel disease (16.1%). No previously reported candidate gene was significant after rigorous correction for multiple testing. In contrast, 3 loci from related cardiovascular GWAS studies were significant: PHACTR1 in large-vessel disease (P=2.63e−6), PITX2 in cardioembolic stroke (P=4.78e−8), and ZFHX3 in cardioembolic stroke (P=5.50e−7).
Conclusions—There is substantial heritability for ischemic stroke, but this varies for different stroke subtypes. Previous candidate gene associations contribute little to this heritability, but GWAS studies in related cardiovascular phenotypes are identifying robust associations. The heritability data, and data from GWAS, suggest detecting additional associations will depend on careful stroke subtyping.
Stroke is a common cause of death and the major cause of adult chronic disability. Conventional risk factors such as hypertension fail to explain all stroke risk. The most robust data for a genetic component to disease, defined as the proportion of the variation in phenotype explained by genotype, so-called genetic heritability, comes from twin studies. Such studies suggest stroke is more common in monozygotic than dizygotic twins, but the number of stroke cases in these -prospective studies is small and therefore the estimates of heritability have wide CIs.1,2 Much more data are available from family history studies, which show family history of stroke is a risk factor for stroke, supporting a role for genetic risk factors.3 However, such an association could also be accounted for by shared early life environmental risk factors.
A further complicating factor is that stroke is a clinical syndrome resulting from a number of different disease processes. Approximately 80% of stroke is ischemic, whereas 20% is due to primary hemorrhage. Ischemic stroke itself has a number of subtypes with the most common being large-vessel atherosclerotic stroke (LVD), small-vessel disease (SVD), and cardioembolism (CE). It has been suggested that genetic predisposition may differ for these subtypes,4 and of note, most monogenic forms of stroke predispose to individual stroke subtypes.4 However, whether this is also the case for the majority of common ischemic stroke has been debated. This has direct relevance in deciding whether to look for genetic variants in larger, but perhaps less well-phenotyped samples or smaller but better phenotyped cohorts.
Although twin studies can obtain estimates of heritability, and differentiate environmental and genetic risk, -performing such studies in stroke is challenging. In twin cohorts the number of stroke cases is small. Furthermore, in studies performed to date, subtyping is not available, and therefore no estimates of heritability for specific stroke subtypes are -available.1,2 Recently, a method has been developed for determining the genetic component of disease heritability in complex diseases such as stroke using genotype data derived from genomewide association studies (GWAS). Genomewide complex trait analysis (GCTA)5 allows estimation of heritability by assessing the proportion of variation in case–control status explained by genotyped single nucleotide polymorphisms (SNPs). This tool estimates the variance explained by all the SNPs entered against a phenotypic trait rather than individual SNPs. An advantage of this technique is that it can be applied to large case–control data well-phenotyped data sets allowing estimates of heritability to be obtained for specific stroke subtypes.
To date 2 main approaches have been used in studies trying to identify the underlying molecular genetic basis of multifactorial ischemic stoke: the candidate gene and GWAS methods. Until recently, almost all studies have been candidate gene association studies in which the frequency of a variant in a known gene is compared between stroke cases and controls.6 A large number of such studies have been performed, but these have produced inconclusive results. The reasons for this have been explored in detail and include small sample sizes, failure to replicate, and in some cases lack of stroke subtyping.7,8 Meta-analyses of candidate gene associations have suggested a number of these associations represent real findings.9 However, these conclusions may be confounded by publication bias,10 and whether these associations really do contribute to stroke risk requires confirmation in large prospective studies with statistical correction for multiple comparisons.
More recently the GWAS approach has transformed the genetics of complex diseases.11 In GWAS, a large number of variants, which can exceed a million, and are randomly distributed throughout the genome, are genotyped. This approach does not depend on a prior hypothesis and therefore novel associations can be detected. Combined with large sample sizes, primary replication of findings, and rigorous statistical approaches, the GWAS approach has identified novel associations in a large number of complex diseases. Using this approach, novel associations have been identified with cardiovascular phenotypes related to stroke, namely coronary artery disease and atrial fibrillation, a few of which have been tested for and replicated in LVD12 and CE13,14 stroke, respectively. In addition, the GWAS approach has identified other novel associations with related cardiovascular phenotypes, which have not been tested for replication in stroke.
In this study we used a large case–control data set of >3500 cases and 5700 controls from the recent Wellcome Trust Case Control Consortium 2 (WTCCC2) ischemic stroke GWAS15 to estimate the heritability of ischemic stroke and its subtypes using the GCTA method. We then used the same data set to determine how many of the previously reported candidate gene association with ischemic stroke replicated in this data set. Lastly we determined whether other recently reported GWAS associations with phenotypes related to stroke such as coronary artery disease and atrial fibrillation were also risk factors for all ischemic stroke or for specific stroke subtypes.
Our aim was to investigate whether there were heritability differences between ischemic stroke subtypes and whether previous candidate gene studies were informative in understanding the genetic architecture of ischemic stroke, particularly when considering ischemic stroke subtypes individually.
We used data from the WTCCC2 ischemic stroke GWAS.15 A total of 2343 cases of imaging confirmed ischemic stroke from the United Kingdom and 1197 cases from Germany were compared with 5397 UK and 824 German controls. All cases and control subjects were white of European ancestry. All cases were extensively phenotyped with brain imaging in 100%, imaging of extracerebral vessels in 100%, electrocardiography in 100%, and echocardiography in 43%. Stroke subtyping was performed using the Trial of ORG 10172 in Acute Stroke Treatment system.16 Cases with other specified causes, including monogenic causes of stroke, were excluded. Mean (SD) age was 69.9 (13.4) and 69.0 (14.1) years in the UK and German data sets, respectively. The number of cases with the major subtypes LVD, CE, and SVD were 494, 451, and 467 in the UK cohort and 352, 334, and 110 in the German cohorts, respectively. Full details of case and control populations, genotyping, quality control, and analysis have been published15 and are summarized in the online-only Data Supplement methods. Cases were genotyped using an Illumina 660W array and control subjects using an Illumina 1 million array, which covered those SNPs included on the 660W array.
Heritability Estimates Using GCTA
WTCCC2 raw genotypes were called using GENCALL and data then went through quality control as previously described.15 Further stringent quality control filters were then applied to this data set as follows: SNPs with a minor allele frequency <1% were excluded as were cases and controls showing deviation from Hardy-Weinberg equilibrium of P=0.0001. Quality control filters of 1% individual missingness, 0.01% genotyping missingness, and estimated relatedness (using GCTA) > 0.125 were also applied. The centers were combined using PLINK, leaving a common subset of 307 110 SNPs passing quality control in both data sets, which were subsequently used for GCTA analysis. Principal components were calculated using EIGENSTRAT on an linkage disequilibrium-pruned subset of the data set. The first 20 principal components were then used as covariates in the heritability analysis to reduce genomic inflation (λ) as a consequence of population stratification. Although genomic inflation was minimal when 2 principal components were included (λ=1.06), failure to account for population stratification adequately can lead to overinflated heritability estimates using the GCTA tool. Therefore, 20 principal components were included to rule out any overestimation of heritability due to population structure.
GCTA analysis was performed on the phenotypes of all ischemic stroke, LVD, SVD, and CE to obtain heritability estimates. We also permuted random assignment of case and control status 100 times for each analysis to obtain the null distribution between cases and controls as a comparison.
Identification of Previous Associations With Stroke and Related Cardiovascular Phenotypes
Published Association With Ischemic Stroke
For selection of candidate gene associations with ischemic stroke we undertook a systematic review of the published literature using PubMed on August 16, 2011, using the search terms “Stroke” and “Genetics.” The search was limited to adult humans and studies reported in the English language. Abstracts were reviewed by 2 independent reviewers who identified associations meeting inclusion criteria. Inclusion criteria were any of the following: (1) a positive meta-analysis of >1000 cases for a single gene; (2) a reported genomewide significant association (threshold of 1e−8); (3) a reported association with subsequent replication, either in the same or multiple papers for the same gene (identified by name) or genetic variant (identified by rs identifier or amino acid change), in which both studies were >400 cases or where the initial study was >1000 cases and the replication was of any size. Mutations associated with monogenic stroke such as NOTCH3 causing cerebral autosomal-dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) were excluded. Studies were included regardless of ethnicity and stroke subtype. Variants initially in GWAS studies of related cardiovascular phenotypes but which were subsequently found to be associated with ischemic stroke (eg, PITX2, ZFHX3, 9p21 locus) were considered as GWAS hits from related cardiovascular phenotypes.
Published Associations With Related Cardiovascular Phenotypes
We included associations detected with atrial fibrillation, coronary artery disease, and myocardial infarction. In addition, we included studies of the intermediate stroke phenotype carotid intima-media thickness. We identified previously reported loci in these phenotypes identified by GWAS at a statistical threshold of 1e−8 as reported on a database of GWAS associations (at www.genome.gov/gwastudies) in August 2011. The search terms identified 3222 publications. From these we identified 32 genes from previous candidate gene association studies in ischemic stroke, which met our inclusion criteria, and a further 22 loci from related cardiovascular phenotype GWAS studies. Four loci were common to both cohorts, resulting in 50 loci in total. These are shown in Table 1. The full list of genes examined, specific SNPs, and window boundaries are shown in online-only Data Supplement Table I. All loci were examined for association in all ischemic stroke, LVD, SVD, and CE stroke.
Replication of Identified Loci in the WTCCC2 Data Set
All analyses were conducted on meta-analyzed UK and German data. UK and German results were combined in a meta-analysis using METAL under a fixed effects inverse variance model weighted on the inverse of the square of the SE to produce a final probability value per SNP tested. Separate analyses were performed for all ischemic stroke and the subtypes LVD, CE, and SVD.
Where a specific rs# identifier or unambiguous position was identified in the literature, we established a 100-kb window central to that location and examined all directly genotyped variants within that window for significance. Where >1 variant had been identified as significant, we took the most significantly associated variant as the central point for establishment of the window. Additionally, where a gene name has been associated with the variant reported as significant, we established a 100-kb window around the gene boundaries as listed within HAPMAP release 28 and examined all directly genotyped variants within these boundaries. We report the single most significant variant from our GWAS data from either window in this study.
The only exception is for the PITX2 locus associated with atrial fibrillation. For this locus the most significantly associated variants lie 200 kb distal to the gene PITX2 and as such a window of 200 kb was used to capture both the gene and associated loci together with the intervening DNA sequence. Details of all regions and loci windows are contained in the online-only Data Supplement.
Correction for Multiple Testing
We initially assessed significance using a Bonferroni correction for the total number of genes investigated (n=50, P=0.001). This does not take into account the number of variants examined at each locus and therefore we also applied the Nyholt method17 with the modification proposed by Li.18 This method takes account of the linkage disequilibrium structure within regions to determine the number of tests that need to be corrected for. Applying this method to each locus independently, we summed the Nyholt values to establish a global threshold for significance. In total we examined 2872 variants, which using the Nyholt correction method was calculated to be the equivalent of 1496 independent tests, giving a threshold of 3.34e−5.
Analysis of QQ Plots
To gain an estimate of how much of the genetic contribution to stroke was accounted for by both the stroke candidate gene loci and the related cardiovascular GWAS associations, we plotted QQ plots for all variants from within 100-kb gene and SNP windows around significant candidate gene and GWAS SNP loci. These were plotted using R (http://cran.ma.imperial.ac.uk/) and the package ggplot2 and the qqman script available from http://dL.dropbox.com/u/66281/0_Permanent/qqman.r.
Heritability Estimates for Ischemic Stroke and Its Subtypes
The heritability estimates are shown in Table 2. For all ischemic stroke it was estimated at 37.9% (Table 2). It was at a similar level for LVD (40.3%) and CE (32.6%) but lower for SVD (16.1%). In contrast, estimates under the null hypothesis from permuting 100 rounds of randomly assigning case–control status were 1.46% (SD 2.27%) for all ischemic stroke, 2.92% (SD 3.87%) for LVD, 2.18% (SD 3.12%) for CE, and 2.34% (SD 3.46%) for SVD.
Contribution of Previously Reported Associations With Stroke and Cardiovascular Phenotypes to This Heritability
Candidate Gene Associations
Of the 32 previously reported candidate genes associated with ischemic stroke, 4 were associated using a Bonferroni correction for the number of genes examined of P=0.001: ALOX5AP with CE, APOA (LPA) with SVD, fibrinogen with all ischemic stroke, and paroxonase-1 with SVD. Applying the more conservative Nyholt correction, no association met significance. See Table 3 for full results.
GWAS Associations of Related Cardiovascular Phenotypes
Of the 18 novel genes identified from GWAS of related cardiovascular phenotypes, 3 showed association with specific stroke subtypes at the modified Nyholt threshold: PHACTR1 in LVD (P=2.63e−6), PITX2 in CE stroke (P=4.78e−8), and ZFHX3 in CE stroke (P=5.50e−7). The population-attributable risk for the lead SNPs in HDAC9, PITX2, and ZFHX3 was 3.6%, 5.1%, and 3.3%, respectively. The previously reported association with the chromosome 9p21 locus and LVD was significant using the Bonferroni correction based on the number of genes but not significant at the Nyholt-corrected level (P=0.001372). No loci were significant in all ischemic stroke or SVD.
QQ Plot Analysis
Figure A shows the variants identified within 100 kb of candidate gene loci; Figure B shows the variants identified within 100 kb of GWAS significant associations in related cardiovascular phenotypes. The lack of deviation from the expected distribution for the ischemic stroke candidate gene loci (Figure A) suggests there is no strong effect on all ischemic stroke risk from these loci. In contrast for related cardiovascular phenotypes, the deviation from the expected distribution as shown in Figure B suggests evidence of shared genetic factors between these phenotypes and ischemic stroke.
Using a novel method of assessing the genetic risk of stroke, our findings suggest a moderate heritability for stroke but that the contribution varies by stroke subtype, being higher for LVD and CE stroke than for SVD. This finding is in line with previous family history studies on ischemic stroke subtypes, showing a greater risk associated with LVD and CE stroke than SVD stroke. This may in part be due to the heterogeneity of SVD as a distinct phenotypic subtype. The GCTA approach offers a novel way of estimating heritability, however, by considering all genetic information from a GWAS experiment rather than estimated heritability from familial studies. As highlighted in the introduction, previous data from twin and family history studies investigating the heritability of ischemic stroke have significant limitations, but these data from GCTA suggest genetic factors are important in stroke risk. When using this approach, we controlled for population stratification by including the first 20 principal components as covariates in our analysis, because population stratification can lead to falsely high heritability estimates.
Many studies have suggested candidate gene associations with ischemic stroke, but these are potentially subject to publication bias. In this large data set, with appropriate correction for multiple testing, we found little evidence for these candidate gene associations being important in stroke risk. Our systematic review of the literature identified 32 associations with ischemic stroke from candidate gene studies. Using a typical “candidate gene correction” based on a number of genes investigated, 4 associations were significant (ALOX5AP with CE, APOA [LPA] with SVD, fibrinogen with all ischemic stroke, and paroxonase-1 with SVD). These merit examination in even larger data sets. However, when -applying more stringent correction taking into account the number of SNPs examined, all were nonsignificant. Consistent with this, our QQ plots suggested there was no strong effect on ischemic stroke risk from the aggregated candidate gene SNPs.
These results are in contrast to those trying to replicate loci identified by GWAS studies of related cardiovascular phenotypes. Of these, 3 showed association at the more stringent Nyholt threshold; factor PHACTR1 in LVD and PITX2 and ZFHX3 in CE stroke. The 2 associations with CE stroke were first identified in atrial fibrillation14,19 and were subsequently associated with ischemic stroke,13,14 and a very limited portion of the WTCCC2 cohort used here were also used in replication of 3 SNPs in this finding. The PHACTR1 association was first reported in myocardial infarction20 and therefore might be expected to be associated with large artery stroke, although this has not previously been confirmed in stroke.
We also examined a number of other previously reported GWAS associations with related phenotypes and intermediate phenotypes for stroke. Although none was significantly associated in our population, the QQ plot for the accumulation of these variants deviated markedly from the null in all ischemic stroke, suggesting that these variants in combination do make a significant contribution to ischemic stroke risk and may be individually significant in larger sample sizes.
Three previous GWAS studies have found novel associations with ischemic stroke. We were unable to replicate the association of NINJ2 with all ischemic stroke,21 consistent with a similar failure to replicate in a larger case–control study.22 We were also unable to replicate an association with PRKCH, which has been associated with SVD in a Japanese population.23 The individual SNP implicated (rs2230500) is of very low frequency in white populations but we were unable to identify any association in the region surrounding the implicated SNP. The third association with a region on 7p21.1 encompassing the HDAC9 gene was reported for the first time in the WTCCC2 population15 and therefore we could not attempt independent replication in this data set.
Although we carried out this replication in a large cohort of >3000 cases, which is much larger than most previous stroke genetic studies, we still had only moderate power to detect associations, particularly with stroke subtypes. With 3548 cases and 5972 control subjects, we are able to detect an OR of ≥1.24. This is within the range of candidate gene associations reported in large-scale meta-analyses of up to 18 000 cases and 58 000 controls.9 However, the power is lower when considering subtypes where we are powered to detect risks in the region of ≥1.43. It is possible that a number of these variants may predispose to specific stroke subtypes when examined in larger populations. Another potential limitation of the study is that whether an association is detected depends on the method of multiple correction used, and the best approach is uncertain. For this reason we presented data both with a conventional approach controlling for the number of genes studied and a more stringent approach correcting for the number of variants studied.
It should be noted that the GCTA tool uses only GWAS data in its heritability estimates. This is not all genetic variation in the genome, but rather covers all directly genotyped SNPs and those in linkage disequilibrium with the genotyped SNPs. This will miss heritability due to rare variants not typed in the GWAS array. In addition, GCTA represents narrow-sense heritability, because it does not capture gene–environment interactions or epistatic (gene–gene interactions), denoted as broad-sense heritability. Any estimate of heritability has limitations, however, with pedigree estimates potentially biased by shared environmental effects. As a consequence, estimates based on genetic information and estimates based on familial information will differ because they are using different measures to arrive at estimates of heritability. Neither should be considered more accurate than the other.
We could find no previously published reports of heritability estimated by GTCA in ischemic stroke. However, a recent study has used GTCA to estimate narrow-sense heritability for cardiovascular risk factors using data from the Atherosclerosis Risk in Communities (ARIC). It found heritability estimates to be 34% for body mass index, 28% for waist–hip ratio, 33% for fasting glucose, 23% for fasting insulin, 47% for fasting triglyceride, 48% for fasting high-density lipoprotein, and 30% for systolic blood pressure.24
In conclusion, our results suggest most associations from previous candidate gene studies are likely to be of little or low very risk in stroke pathogenesis. In contrast, associations derived from GWAS studies of related cardiovascular phenotypes appear to be more replicable. Associations identified to date appear to be with specific stroke subtypes. Our data suggest that there is substantial heritability for ischemic stroke but that this varies markedly for different stroke subtypes. This suggests that GWAS studies are likely to identify further associations with ischemic stroke but that their success will depend on careful subtyping and adequate power within each individual stroke subtype.
We acknowledge use of the British 1958 Birth Cohort DNA collection, funded by Medical Research Council grant G0000934 and Wellcome Trust grant 068545/Z/02, and of the UK National Blood Service controls funded by the Wellcome Trust. A full list of WTCCC2 authors is provided in the supplementary material.
Sources of Funding
The principal funding for this study was provided by the Wellcome Trust as part of the Wellcome Trust Case Control Consortium 2 project (085475/B/08/Z and 085475/Z/08/Z).
The online-only Data Supplement is available with this article at http://stroke.ahajournals.org/lookup/suppl/doi:10.1161/STROKEAHA.112.665760/-/DC1.
- Received May 25, 2012.
- Revision received August 2, 2012.
- Accepted August 27, 2012.
- © 2012 American Heart Association, Inc.
- Brass LM,
- Isaacsohn JL,
- Merikangas KR,
- Robinette CD
- Bak S,
- Gaist D,
- Sindrup SH,
- Skytthe A,
- Christensen K
- Jerrard-Dunne P,
- Cloud G,
- Hassan A,
- Markus HS
- Dichgans M,
- Markus HS
- Lin B,
- Clyne M,
- Walsh M,
- Gomez O,
- Yu W,
- Gwinn M,
- et al
- Adams HP,
- Bendixen BH,
- Kappelle LJ,
- Biller J,
- Love BB,
- Gordon DL,
- et al