Whole Genome Approaches in Ischemic Stroke
Background and Purpose— The field of ischemic stroke genetics is moving beyond candidate gene studies into the realm of genomewide association studies. Such studies have resulted in discoveries in diverse, complex disorders.
Methods— The author conducted an informal qualitative review of peer-reviewed medical literature.
Results— The power of genomewide association studies to confirm prior associations and establish new ones is illustrated by recent work focusing on type 2 diabetes mellitus. A pilot genomewide association study of ischemic stroke failed to identify a single gene of major effect.
Conclusions— Follow-up studies with substantially greater statistical power are essential and are being planned by the Wellcome Trust and others.
Genomewide association studies have recently led to pivotal discoveries in a host of common disorders, including age-related macular degeneration, inflammatory bowel disease, and type 2 diabetes mellitus (T2D). Gene chip technology makes possible the genotyping of up to 500 000 to one million single-nucleotide polymorphisms (SNPs) in large numbers of disease-affected patients and unaffected control subjects. Investigators can now screen for regions within the human genome that associate with disease without having to hypothesize which variants of which genes are related to disease. This testing for genetic associations in a hypothesis-neutral way represents a striking departure from the common approaches of the past in which candidate genes were chosen a priori on the basis of assumed biological relevance. Hundreds of such candidate gene association studies have been done in ischemic stroke for various genes, including factor V, prothrombin, angiotensin-converting enzyme, and methylene tetrahydrofolate reductase. Studies generally lack the statistical power to exclude modest effects. Several meta-analyses have been done to increase statistical power.1,2 However, such analyses may magnify publication bias and are incapable of identifying novel genes.
Definitive genomewide association studies in ischemic and hemorrhagic stroke have yet to be completed. However, recent successes in other complex disorders illustrate the potential of the technique to yield important discoveries. Several large, well-executed genomewide studies have been done in T2D. In the interest of brevity, I focus on the design and findings of one study in particular, the Diabetes Genetics Initiative.3 The study was conducted in 2 stages. The first, so-called discovery stage involved 1464 patients with T2D and 1467 control subjects from Finland and Sweden. Investigators attempted to genotype a total of 500 568 SNPs, but only 386 731 SNPs survived rigorous quality control. The second, confirmatory stage did not require genomewide genotyping. A total of 10 850 additional subjects, both with and without T2D, from Sweden, Poland, and the United States, were genotyped for 107 SNPs. In addition to T2D, a series of secondary phenotypes were constructed, most of which were biometric data like body mass index and laboratory measures like insulinogenic index.
The Diabetes Genetics Initiative combined data for analysis from the Wellcome Trust Case Control Consortium (WTCCC) and the Finland–United States Investigation of NIDDM (Non-Insulin-Dependent Diabetes Mellitus) genetics studies, yielding a pooled sample size of more than 32 000 subjects. The result was confident identification of an association between several SNPs and T2D. Some findings were confirmatory; others novel. Investigators modeled sample sizes that would have been required, in retrospect, to discover associations between SNPs and T2D. They assumed 80% power, α of 0.05, observed pooled ORs, equal numbers of cases and control subjects, and 10% prevalence of T2D. Of the 8 SNPs that were modeled, sample size calculations ranged from approximately 800 to 7900 (median sample size estimate, 4250).
The first foray into genomewide association studies in ischemic stroke was a case–control study of approximately 500 subjects.4 Cases were recruited in the 5-center prospective inception cohort study known as the Ischemic Stroke Genetics Study (ISGS).5 All had neurologist-diagnosed recent ischemic stroke. All cases were centrally adjudicated for subtype of stroke using the Trial of Org 10172 in Acute Stroke Treatment (TOAST) criteria. Control subjects were neurologically normal adult men and women who had been previously genotyped. Adjusting for conventional stroke risk factors, no SNP attained clear genomewide significance. After attempting to harvest low-hanging fruit, investigators returned with an empty bushel. Nonetheless, the data set has proved useful. In a focused analysis, the enormous data set showed that the recently discovered cardiovascular risk factor locus on chromosome 9p21 also appeared to be a risk factor for ischemic stroke.6 The data set has been made publicly available through dbGaP to the extent allowable by the informed consent process (http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap).
Assuming that ischemic stroke is no less genetically heterogeneous than T2D and that the effects of individual SNPs in imparting stroke risk are no greater than those in T2D, future stroke genetics studies will need samples from several thousand patients. The Wellcome Trust is funding a multistage genomewide association study in ischemic stroke as a collaboration between the International Stroke Genetics Consortium and the WTCCC-2 under the direction of Dr Hugh Markus. An initial genomewide screen will be performed on 4189 stroke cases, which will be compared with an equal number of previously genotyped control subjects from the WTCCC and the German KORA-gen population-based biobank. A replication stage will be done using 4432 cases from 2 additional UK and European populations. A third and final replication will be performed in 2873 cases and 2743 control subjects from 4 US populations.
There is intense interest among US investigators to conduct a similarly large genomewide association study. Preparatory to research, investigators have pooled data on individuals who have donated DNA. Samples are potentially available from a Phase 3 clinical trial, population- and hospital-based observational studies, and family-based studies. The Figure shows the age distribution of cases and control subjects of potentially available samples. Along with identifying sources of genetic material to test, investigators have worked together to create a uniform set of core common data elements across diverse studies in the hope of expediting large collaborative efforts (Table). It is the expectation that these core data elements will be richly supplemented in future efforts with additional clinimetric and radiographic phenotypic data to the extent feasible.
I thank Hugh Markus, MA, BMBch, DM, FRCP, for details regarding the planned WTCCC genomewide association study.
Sources of Funding
J.F.M. receives funding from an unrestricted grant from the National Institute of Neurological Disorders and Stroke for SWISS (the Siblings with Ischemic Stroke Study) (R01 NS39987).
Presented in part at the 26th Princeton Conference on Cerebrovascular Disease, Houston, Texas, March 28, 2008.
- Received July 30, 2008.
Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research, Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ, Hughes TE, Groop L, Altshuler D, Almgren P, Florez JC, Meyer J, Ardlie K, Bengtsson Boström K, Isomaa B, Lettre G, Lindblad U, Lyon HN, Melander O, Newton-Cheh C, Nilsson P, Orho-Melander M, Rastam L, Speliotes EK, Taskinen MR, Tuomi T, Guiducci C, Berglund A, Carlson J, Gianniny L, Hackett R, Hall L, Holmkvist J, Laurila E, Sjogren M, Sterner M, Surti A, Svensson M, Svensson M, Tewhey R, Blumenstiel B, Parkin M, Defelice M, Barry R, Brodeur W, Camarata J, Chia N, Fava M, Gibbons J, Handsaker B, Healy C, Nguyen K, Gates C, Sougnez C, Gage D, Nizzari M, Gabriel SB, Chirn GW, Ma Q, Parikh H, Richardson D, Ricke D, Purcell S. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007; 316: 1331–1336.
Matarin M, Brown WM, Scholz S, Simon-Sanchez J, Fung HC, Hernandez D, Gibbs JR, De Vrieze FW, Crews C, Britton A, Langefeld CD, Brott TG, Brown RD Jr, Worrall BB, Frankel M, Silliman S, Case LD, Singleton A, Hardy JA, Rich SS, Meschia JF. A genome-wide genotyping study in patients with ischaemic stroke: initial analysis and data release. Lancet Neurol. 2007; 6: 414–420.
Matarin M, Brown WM, Singleton A, Hardy JA, Meschia JF; ISGS Investigators. Whole genome analyses suggest ischemic stroke and heart disease share an association with polymorphisms on chromosome 9p21. Stroke. 2008; 39: 1586–1589.