Genetic Overlap Between Diagnostic Subtypes of Ischemic Stroke
Background and Purpose—Despite moderate heritability, the phenotypic heterogeneity of ischemic stroke has hampered gene discovery, motivating analyses of diagnostic subtypes with reduced sample sizes. We assessed evidence for a shared genetic basis among the 3 major subtypes: large artery atherosclerosis (LAA), cardioembolism, and small vessel disease (SVD), to inform potential cross-subtype analyses.
Methods—Analyses used genome-wide summary data for 12 389 ischemic stroke cases (including 2167 LAA, 2405 cardioembolism, and 1854 SVD) and 62 004 controls from the Metastroke consortium. For 4561 cases and 7094 controls, individual-level genotype data were also available. Genetic correlations between subtypes were estimated using linear mixed models and polygenic profile scores. Meta-analysis of a combined LAA–SVD phenotype (4021 cases and 51 976 controls) was performed to identify shared risk alleles.
Results—High genetic correlation was identified between LAA and SVD using linear mixed models (rg=0.96, SE=0.47, P=9×10−4) and profile scores (rg=0.72; 95% confidence interval, 0.52–0.93). Between LAA and cardioembolism and SVD and cardioembolism, correlation was moderate using linear mixed models but not significantly different from zero for profile scoring. Joint meta-analysis of LAA and SVD identified strong association (P=1×10−7) for single nucleotide polymorphisms near the opioid receptor μ1 (OPRM1) gene.
Conclusions—Our results suggest that LAA and SVD, which have been hitherto treated as genetically distinct, may share a substantial genetic component. Combined analyses of LAA and SVD may increase power to identify small-effect alleles influencing shared pathophysiological processes.
Ischemic stroke (IS) is a complex disease influenced by numerous clinical, genetic, and lifestyle risk factors. Although conventional factors, such as hypertension, dyslipidaemia, diabetes mellitus, and smoking are well-established, genetic factors contribute ≤30% to 40% of risk1 and are poorly understood. Despite recent advances in high throughput genotyping, gene discovery for IS has progressed slowly because of the technical nature of case ascertainment and etiologic heterogeneity of the IS diagnosis. The latter produces pathophysiological differences and implies genetic differences between patients, complicating efforts to identify susceptibility genes.
To assist diagnosis and clinical management, schemes have been developed to categorise IS into diagnostic subtypes.2,3 The major types are large artery atherosclerosis (LAA), cardioembolism, and small vessel (lacunar) disease (SVD). By exploiting these phenotypically more homogeneous categorizations, genome-wide association studies (GWAS) have identified several genetic associations specific to individual subtypes.4–7 In contrast, only 2 genome-wide significant associations have been identified for broadly defined IS,8,9 in spite of its ≈5-fold larger sample sizes. Power of association studies is a balance between sample size and the (unknown) effect sizes of risk loci, with estimable effect size depending on genetic homogeneity. To date, GWAS of IS have focussed on individual subtypes which reduce sample size and may reduce power at some loci if there is a shared genetic basis between subtypes.10
This study aimed to estimate genetic correlations between the 3 major IS subtypes using individual-level GWAS data and meta-analysis summary statistics from the International Stroke Genetics Consortium6,7 and Metastroke.5 Genetic correlations were estimated using 2 different methods: linear mixed models (LMMs)11 and polygenic profile scoring.12
The Metastroke study included 15 individual studies contributing 12 389 total ischemic stroke cases and 62 004 controls of European ancestry (Table I in the online-only Data Supplement). Details of these 15 studies, including genotyping, phenotyping, and participants’ demographic details have been previously described in detail.5 Cases and controls did not overlap between studies and were confirmed as unrelated using genotypic data. Stroke subtyping was performed using the TOAST system,2 identifying 2167 cases with large artery atherosclerosis (LAA), 2405 with cardioembolism, and 1854 cases with SVD; the remainder had other, undetermined, or cryptogenic pathogenesis. Each study conducted genotype imputation using either HapMap Phase 2 or 1000 Genomes reference panels, fitted additive logistic regression models for all single nucleotide polymorphisms (SNPs), and provided regression summary statistics for IS and its subtypes (if available). Individual-level GWAS data were also available for 3 of the largest Metastroke cohorts: 2 from the Wellcome Trust Case Control Consortium 2 Study (United Kingdom and Munich)6 and the Australian Stroke Genetics Collaborative,7 which were genotyped using Illumina arrays with similar content. All studies were approved by appropriate ethics committees and participants provided written informed consent.
Linear Mixed Modeling
Genotype data for the 3 samples with individual-level data were combined to yield a single data set using the software PLINK.13 Stringent quality control removed SNPs not directly genotyped in all samples, with >0.5% missing data, Hardy–Weinberg P value <0.05, minor allele frequency <1%, or differential missingness (P<0.05) between samples. We excluded samples with >1% data missingness and one from each pair with an absolute value of genome-wide similarity >0.05.14 Principal components of ancestry were calculated in the pooled sample after 3 iterations of principal components analysis with outlier removal (>5 standard deviations from the mean on PC1-5).15
Heritability within and genetic correlations (rg) between subtypes were estimated using LMMs,14 adjusting for 20 principal components. Likelihood ratio statistics were used to test whether estimates were significantly different to zero. Heritability estimates were transformed to the liability scale assuming 2% lifetime prevalence for IS, to which the 3 major subtypes each contribute ≈20% (total 60%), equating to 0.4% prevalence (20%×2%) for each subtype.5,16 The remaining 0.8% prevalence (40%×2%) was assumed to reflect other stroke types.
Polygenic Profile Scoring
The profile scoring approach uses SNP association statistics from a given phenotype to build a linear predictor and tests this for association with the same or a different phenotype in independent data. To facilitate interpretation of cross-subtype analyses, we first assessed association of profile scores within IS and its 3 subtypes. For each, this was performed using leave-one-out validation for the 3 target data sets with individual-level data. In turn, each of the 3 was set aside and a GWAS discovery meta-analysis was performed using all other Metastroke data sets. SNPs with data from ≥5 Metastroke studies were retained and pruned for linkage disequilibrium (r2>0.2 within 1Mb) using PLINK’s clump algorithm,13 which preferentially retains the most associated SNP in a linkage disequilibrium region. From the pruned set, we extracted subsets passing 10 graded significance thresholds (PT=1×10−4, 1×10−3, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 1). For subsets passing each threshold, PLINK’s score function was used to calculate profile scores for individuals in the left-out data set. These scores represent an average risk allele burden across all SNPs in the score, with weights assigned as the log odds ratio from the discovery meta-analysis.
Associations of profile scores with stroke subtypes were assessed by logistic regression adjusted for 3 ancestry principal components. Variance explained by the score was computed as the difference in Nagelkerke’s pseudo-R2 between the model including the profile score and principal components and that including only principal components. Results for the 3 target data sets were combined via random-effects meta-analysis to estimate overall significance. Overall variance explained was estimated as the sample-size weighted mean of target data set–specific pseudo-R2 estimates.
To assess polygenic sharing between subtypes, we used the same approach, with different subtypes alternately specified as discovery and target traits. There was no sample overlap between discovery and target analyses. Using profile score results, genetic correlations (rg) were estimated using a quantitative genetics framework.12 At α=0.05, we had 98% power to detect polygenic scores explaining ≥0.2% of variance in case/control status for any target subtype, 81% to 83% power to detect scores explaining ≥0.1% of variance (varying by subtype), and 52% to 54% power for scores explaining ≥0.05% of variance.12
Joint Meta-Analysis of LAA and SVD
Joint, fixed-effects meta-analysis of allelic effects for LAA and SVD was performed using Metal17 for 2167 LAA cases, 1854 SVD cases (4021 total cases), and 51 976 controls from 12 studies (Table I in the online-only Data Supplement). To control type 1 error caused by overlapping controls for LAA and SVD within cohorts, a covariance correction was applied.18 Power to detect associated SNPs was calculated19 assuming an additive model, perfect linkage disequilibrium between risk and marker alleles, and a significance level of α=5×10−8. For a genetic risk ratio of 1.2, we had 37%, 89%, and 98% to 99% power to identify risk alleles with frequency 0.1, 0.2, and 0.3 to 0.5, respectively. For a true risk ratio of 1.1, power was low, ranging from 0.2% to 10% across allele frequencies.
Linear Mixed Models
After stringent quality control, individual-level genotype data were available for 4561 IS cases and 7094 controls (Table) at 345 336 directly genotyped SNPs. Using LMMs, the estimated proportion of variance in case–control status explained by the SNPs (h2SNP) was significant for all stroke traits (Table II in the online-only Data Supplement). Higher and more significant values were estimated for IS (h2SNP=0.18; P=1×10−14), LAA (h2SNP=0.19; P=2×10−5), and cardioembolism (h2SNP=0.24; P=2×10−6), whereas the estimate for SVD was lower (h2SNP=0.10; P=0.04).
To estimate the genetic correlation (rg) between subtype cases, controls were randomly allocated to one of the 2 subtype groups in each analysis. This allocation and estimation process was repeated 10 times and the mean and standard deviation of parameter estimates and mean standard errors (SE) derived (Table III in the online-only Data Supplement). The rg value was highest and significantly different from zero between LAA and SVD at 0.96 (SD=0.059; P=9×10−4), although the large standard error (0.47) indicates low precision. Reduced, but nominally significant correlation was observed between LAA and cardioembolism (rg=0.39, SE=0.21, P=0.024) and between cardioembolism and SVD (rg=0.64, SE=0.40, P=0.017).
Polygenic Profile Scoring
Although LMM analyses were restricted to samples with individual-level data, profile scoring could use Metastroke samples with summary statistics for discovery meta-analyses. In analyses within traits, profile scores for IS, LAA, and cardioembolism showed strong association with the same trait in independent target cohorts (Tables IV–VI and Figure I in the online-only Data Supplement). For IS, strong association was observed across most of the discovery P value distribution with maximum association observed for PT<1 (Pscore=1.1×10−8), typical of relatively small discovery samples.20 There was little effect size heterogeneity across target cohorts. For LAA, maximum association (Pscore=1.7×10−8) was observed for PT<0.05 with no heterogeneity. For cardioembolism, maximum association (Pscore=2×10−4) was observed for predictors including SNPs reaching PT<0.001 and PT<0.01, with no heterogeneity. For all 3 subtypes, profile scores explained a small proportion of observed case–control variance, being highest for LAA (pseudo-R2=0.45%) and lowest for SVD (pseudo-R2=0.05%). We also note that SVD-based scores did not associate with SVD in target cohorts and many showed effect heterogeneity between studies (Table VII in the online-only Data Supplement).
Analyses between stroke subtypes detected significant polygenic sharing between LAA and SVD (Tables VIII and IX and Figure II in the online-only Data Supplement). The majority of SVD-based scores were associated with LAA, with no heterogeneity. The highest association was observed for a score including ≈36 000 SNPs reaching discovery PT<0.1 (Pscore=2×10−4), which explained an estimated 0.19% of observed LAA case/control variance. In the reverse analysis, 3 LAA-based profile scores were associated with SVD at P<0.05, for example, the score including ≈20 000 SNPs reaching PT<0.05 (Pscore=0.032, R2=0.08%). In analyses of the other 2 subtype pairs (LAA and cardioembolism, cardioembolism and SVD), no coassociation of profile scores was observed (Tables X–XIII and Figures III and IV in the online-only Data Supplement).
Using profile score results within LAA, the estimated proportion of LAA variance in liability explained by the score most strongly associated with LAA and SVD (PT=0.05) was 12.8% (Table V in the online-only Data Supplement). SVD-based scores did not associate with SVD in target samples, but the score most significantly associated with LAA (PT=0.1) explained 0.8% of SVD liability variance (Table VII in the online-only Data Supplement). Using these estimates and the observed cross-trait association results, the estimated genetic correlation12 between LAA and SVD was rg=0.72, which was significantly different from zero (95% confidence interval [CI], 0.52–0.92). The SNP-based correlation was not significantly different from zero for LAA and cardioembolism (rg=0.13, 95% CI, 0–0.56), or cardioembolism and SVD (rg=0.64, 95% CI: 0–0.92).
Quantitative Bias Analysis: Subtype Misclassification
Bias analysis was performed to assess the extent to which the genetic correlation (rg) between LAA and SVD could result from subtype misclassification21 (see Methods and Table XIV in the online-only Data Supplement). Allowing for rates of subtype misclassification consistent with reported values of inter-rater reliability,22 rg was still significantly different from zero. Assuming all misclassified LAA cases were truly SVD and vice versa, the estimate was rg=0.63 (95% CI, 0.34–0.74). Assuming all misclassified cases were neither LAA nor SVD, the estimate was rg=0.75 (95% CI, 0.43–0.98). This suggests robustness of the observed genetic correlation to likely levels of subtype misclassification.
GWAS Meta-Analysis of LAA and SVD
Given evidence for shared common variants between LAA and SVD, joint meta-analysis of LAA and SVD was performed (Figures V and VI in the online-only Data Supplement). Although no SNPs reached genome-wide significance (P<5×10−8), suggestive association (P=1×10−7 at rs17084671; P=2×10−7 at rs6938958 and rs7763080) was observed for a cluster of SNPs at chromosome 6q25.2 (Table XV in the online-only Data Supplement), ≈100 kb upstream of the opioid receptor μ1 (OPRM1) gene.
GWAS of IS have revealed the importance of diagnostic subtype classifications. However, exclusive reliance on discrete subtypes reduces sample size and assumes an absence of risk alleles influencing multiple subtypes. Using some of the largest extant GWAS collections and 2 different analytic approaches, this study suggests the presence of extensive genetic overlap between large artery atherosclerotic and small vessel ischemic stroke. We estimated the genetic correlation between these subtypes exceeds 0.7, but larger samples will increase the accuracy of this estimate.
We were careful to eliminate potential sources of bias in our analyses. In individual-level data, European ancestry was strictly defined and principal components of ancestry included as covariates. We also checked that positive score effects were present in multiple target studies and not driven by a single study. Misclassification of TOAST subtypes was considered an important potential source of error because MRI, which increases diagnostic accuracy particularly for SVD, was only used for subtyping ≈50% of all cases.5 However, sensitivity analyses suggested that typical rates of misclassification would have minimal effects on the estimated correlation.
Although SVD-based profile scores did not show significant association with SVD, they associated with LAA in the cross-trait analysis. In the complementary analysis, which used the more powerful LAA discovery sample, LAA-based profile scores showed significant association with both LAA and SVD. This does not imply lack of consistency or polygenic architecture for SVD. Indeed, our and a previous LMM analysis1 detected significant SNP-based heritability for SVD. Profile score analyses are influenced both by sample size and phenotypic homogeneity of discovery and target traits. SVD had the lowest case numbers and is also phenotypically heterogeneous. In profile analyses conducted exclusively within SVD, these factors will reduce both the accuracy of polygenic predictors and statistical power in target samples.
The proportion of observed LAA and SVD variance explained by SVD-based and LAA-based profile scores (pseudo-R2) was 0.2% and 0.08%, respectively. Although these values are low, this does not mean that the true genetic overlap is small. By combining sampling errors in effect estimates across all SNPs in the score, profile scoring produces estimates of explained variance typically lower than true values,20,23 but which will increase as sample size increases. Pseudo-R2 measures for binary traits can be difficult to interpret because they can depend on ascertainment, that is, the proportion of cases in the sample.24 Profile scoring results were used to estimate liability-scale variance explained and genetic correlations using theory that accounts for sample size and ascertainment. For example, within LAA, the pseudo-R2 for the maximum profile score was 0.48%, but the estimated LAA variance explained by the score, adjusted for sample size and ascertainment, was 12.8%. Thus, although estimates of observed cross-trait variance explained are small, they can signify a higher genetic correlation. When genetic correlation has been estimated from the same data set, the results from the profile score and LMM agree well.25 Here, the use of the profile score method allowed the use of a larger sample via data sets for which only association summary statistics were available.
The SNPs most strongly associated with the joint LAA-SVD trait were near the OPRM1 gene, alleles within which have previously shown suggestive association with coronary heart disease (P=5×10−6),26 which has an atherosclerotic pathogenesis. Estimated genetic correlation between LAA and SVD is also consistent with an atherosclerotic pathogenesis in the majority of LAA and a subset of SVD cases. The primary pathophysiological mechanism for LAA is presumed to be atherosclerosis of the large cerebral arteries.2 For SVD, pathological and imaging studies suggest the presence of significant disease heterogeneity, with 2 major underlying vascular pathologies being hypothesized.27,28 The first involves localized atherosclerosis of the larger perforating arteries, typically resulting in a larger, isolated lacunar infarct. The second involves diffuse, nonatherosclerotic arteriopathy of the smaller perforating arteries, associated with multiple, smaller infarcts and often coexistent radiological leukoaraiosis.28 Earlier risk factor analyses suggested that conventional atherosclerotic factors were more common in the isolated lacunar infarct subtype. This subgroup could thus account for the genetic overlap between the broader SVD category and LAA.
Our analyses strongly suggest that LAA and SVD—which have been hitherto considered genetically distinct—may have a shared genetic pathogenesis. Further investigation of the genetic relationship between ischemic stroke subtypes is merited. Although recent GWAS have identified several subtype-specific genetic associations, the pace of discovery has been constrained by small numbers for individual subtypes. If there exists small-effect variants influencing multiple subtypes, joint subtype analyses will offer higher power to identify these and may also identify biological mechanisms shared by these traditionally distinct clinical diagnoses.
Sources of Funding
E.G. Holliday was supported by the Australian Heart Foundation and National Stroke Foundation (100071). N.R. Wray was supported by the Australian National Health and Medical Research Council (NHMRC, 613602). H.S. Markus is supported by a National Institute for Health Research (NIHR) Senior Investigator award. H.S. Markus and S. Bevan are supported by the Cambridge University Hospital National Institute for Health Research Biomedical Research Centre (NIHR BRC). J. Rosand and B.B. Worrall received Research Funding from the National Institute of Health. Acknowledgments of funding for the individual Metastroke studies are detailed in the original publication by Traylor et al.5
Dr Rosand has a consulting relationship with Boehringer Ingelheim. Dr Worrall is an Associate Editor of the American Academy of Neurology (AAN) journal Neurology. All the other authors report no conflict.
Guest Editor for this article was Jeffrey L. Saver, MD.
↵* Drs Attia and Wray jointly directed this work.
The online-only Data Supplement is available with this article at http://stroke.ahajournals.org/lookup/suppl/doi:10.1161/STROKEAHA.114.007930/-/DC1.
- Received October 27, 2014.
- Revision received December 17, 2014.
- Accepted December 19, 2014.
- © 2015 American Heart Association, Inc.
- Bevan S,
- Traylor M,
- Adib-Samii P,
- Malik R,
- Paul NL,
- Jackson C,
- et al
- Adams HP Jr,
- Bendixen BH,
- Kappelle LJ,
- Biller J,
- Love BB,
- Gordon DL,
- et al
- Traylor M,
- Farrall M,
- Holliday EG,
- Sudlow C,
- Hopewell JC,
- Cheng YC,
- et al
- Holliday EG,
- Maguire JM,
- Evans TJ,
- Koblar SA,
- Jannes J,
- Sturm JW,
- et al
- Kilarski LL,
- Achterberg S,
- Devan WJ,
- Traylor M,
- Malik R,
- Lindgren A,
- et al
- Willer CJ,
- Li Y,
- Abecasis GR
- Purcell S,
- Cherny SS,
- Sham PC
- Meschia JF,
- Barrett KM,
- Chukwudelunzu F,
- Brown WM,
- Case LD,
- Kissela BM,
- et al
- Boiten J,
- Lodder J,
- Kessels F