| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(Stroke. 2001;32:1239.)
© 2001 American Heart Association, Inc.
Editorial |
From the Department of Neurology, Northwestern University Medical School, Chicago, Illinois.
Correspondence to Mark J. Alberts, MD, Director, Stroke Program, Department of Neurology, Northwestern University Medical School, 710 N Lake Shore Dr, Room 1122, Chicago, IL 60611. E-mail m-alberts{at}northwestern.edu
Key Words: genetics stroke
Two articles were published in
February 2001 that will have a significant impact on our understanding
of human development, the pathogenesis of many human diseases, and the
discovery of new therapies for many
disorders.1 2 These
articles deal with the mapping of the human genome. Two different
entities, one a publicly traded company (Celera) and the other the
Human Genome Project (HGP, sponsored and funded by NIH), published
somewhat different versions of the human genome. The HGP began in 1990
(although extensive sequencing of the human genome began in 1995) and
cost approximately $3 billion, while the Celera effort began in
1998.1 2 The HGP
involved multiple laboratories in the United States and abroad. The 2
projects produced maps that differ from each other in terms of
completeness, order of some genetic markers, and the ability to search
the database for specific DNA sequences. A comparison of some features
of both projects is in
Table 1
.
|
The challenge of sequencing the 3 billion base pairs of the human genome required the development of unique tools and approaches. Celera constructed a facility capable of high-throughput sequencing at a rate of 175 000 reads per day and conducted sequencing 24 hours a day, 7 days a week. The HGP divided the sequencing task among several large laboratories with demonstrated expertise in large-scale DNA sequencing. The strategy employed by the HGP focused on subcloning the human genome into bacterial artificial chromosomes (BAC), which were then sequenced and properly arranged.1 Each BAC could hold an insert of 150 000 bases on average. Celera used a shotgun whole genome approach to sequencing, which involved generating many small, random fragments of DNA for sequencing.2 After the sequence was determined, advanced computational algorithms combined with publicly available mapping and sequence information were used to generate a final sequence map. For both projects, the genome was sequenced approximately 5 times to minimize errors and eliminate gaps in the final map.
The completed map shows that only 1.1% of the genome codes for exons; 24% is intronic and 75% is intergenic DNA. The average human gene spans 27 000 to 29 000 bases of DNA and consists of 7 to 9 exons. The mean coding sequence for a human gene is 1340 base pairs. Between 20% and 40% of human genes have just 1 or 2 exons, while some have more than 20 exons. The gene with the most exons is titin, which has 234 exons. Genes are not evenly distributed throughout the human genome. Some chromosomes (17, 19, and 22) have a relative abundance of genes, while others (chromosomes 4, 13, 18, and X) have an apparent paucity of genes.
Each map produced several unexpected findings. Prior studies had estimated that humans have approximately 100 000 unique genes, and it was expected that these genome mapping projects would identify essentially all of these genes. Such was not the case, however. Results from the genome mapping projects have shown that humans have between 26 500 and 39 000 genes, compared with about 18 000 genes in a worm and 13 000 genes in a fly. This has implications for understanding evolutionary and developmental genetics. It may indicate that gene function and control is more important than simply having more genes producing more proteins as the explanation for the obvious differences between humans and other species.
The relatively small number of genes does not indicate a similarly small number of proteins. A single gene can undergo alternative splicing, thereby producing different proteins. Studies of messenger RNA have shown that there may be an average of 2.6 to 3.2 distinct transcripts per gene. Parts of separate genes can combine to form unique proteins. Proteins can undergo extensive posttranslational modifications that can change their function and activity. Studies comparing the human genome with other vertebrate genomes have shown an increase in genes related to neuronal function, developmental regulation, hemostasis, and immunologic function in the human genome.
A question sometimes asked is "Whose DNA was used for the sequencing?" The Celera group initially enrolled 21 donors, of whom 5 were selected for the DNA sequencing studies. The 5 individuals consisted of 3 females and 2 males: 2 Caucasians, 1 Asian-Chinese, 1 African-American, and 1 Hispanic. The DNA was obtained from whole blood in all cases, and additional DNA was obtained from semen samples. For the HGP, samples were obtained from a group of volunteers. However, all identifying information was removed from each sample; therefore, there is no information about the gender or ethnic/racial background of the donor samples sequenced by the HGP. For both the Celera project and the HGP, an Institutional Review Board reviewed the donation protocol, and in all cases the donors signed informed consent forms.
In addition to identifying all human genes, the mapping projects have also produced a detailed map of thousands of genetic markers spaced fairly evenly throughout the genome.2 3 There are various types of markers that have been mapped as part of these projects, the most abundant of which are the single nucleotide polymorphisms, also referred to as SNPs. The human genome sequencing projects have identified approximately 2.1 millions SNPs.2 4 These markers are important for performing genetic linkage studies, which is one powerful method for determining exactly where on a chromosome a specific disease locus may reside. The availability of a map with numerous markers that are accurately mapped in terms of physical location provides researchers with a valuable tool for determining with high precision the location of a disease-causing gene. The researcher can then examine the genes that are close to the linked markers and determine whether one or more are good candidate genes for the disease under study. Since all of the genes have now been identified and mapped, the identification of appropriate candidate genes has been simplified greatly. This general approach has been used to identify several dozen pathogenic genes using the publicly available draft genome.
Another by-product of the genome sequencing effort is the identification of new targets for conventional drug therapy. Examples include the cloning of the ß-secretase gene/protein that cleaves the amyloid precursor protein, 2 new dopamine receptors, a new serotonin receptor, 2 chains of the tubulin gene, and a glycine receptor.1 Using this knowledge to develop new medications that act on various cell receptors or extracellular proteins has been easier and more productive than direct gene therapy. One reason is that it is much more difficult to transport a specific gene into specific cells and have it act at the proper location in a complex genome, compared with "simply" blocking or activating a specific cell-surface receptor. Many of the most common and efficacious medications now in use act on cellular receptors or extracellular proteins, while few act on intracellular pathways or specific genes.
Identifying all of the human genes is but one step in understanding disease pathogenesis. We must now learn how these new proteins function (and malfunction) in novel and complex metabolic pathways. While the human genome sequence gives us a road map, much work remains in understanding where the road leads and what detours may exist. We are only beginning to understand the complex, multigenic disorders that are responsible for some of the most common maladies in society such as atherosclerosis, hypertension, diabetes, psychiatric disorders, and stroke.5 Correlating the thousands of individual genomic variations with some of these disorders will be extremely challenging due to the large number of variables within an individual and across entire populations.
In a separate but related project, a study by Stefasson and colleagues from deCode in Iceland report linkage and cloning of a unique stroke gene. The gene, on chromosome 5, appears to be important for endothelial function, although more details have not yet been published.6 7 One surprising finding is that the gene appears to be related to all types of ischemic stroke but is not linked to other types of vascular disease such as myocardial infarction or peripheral arterial disease. Perhaps this reflects some unique aspect of the vessels in and around the brain. Another possibility is that due to some unique genetic features of the Icelandic population, the results may not be generally applicable for all stroke types or in all populations. Further studies will be needed to address these issues.
The identification of a putative "stroke gene" is quite different from many recent studies of stroke genetics. To date, many genetic studies of stroke have dealt with specific polymorphisms that are overrepresented or underrepresented in the disease state compared with controls.8 There have been dozens of studies of disease-related polymorphisms, yet few have any functional significance in terms of gene function and few have been linked to a disease locus using rigorous genetic linkage techniques. One exception is the apolipoprotein E gene, in which specific genotypes have been strongly associated with cerebral amyloid angiopathy, cerebral hemorrhage, and recovery of function after a stroke.9 10 11 In general, caution is urged in interpreting such polymorphism studies, since past experience has shown that in many cases they do not identify causative genes.
There are examples of specific mutations that cause hereditary forms of stroke. A classic example is the CADASIL syndrome, where specific mutations in the NOTCH3 gene cause the disease.12 Mutations in the KRITI gene cause some forms of familial cavernous hemangiomas.13 Moyamoya disease has recently been linked to several genetic loci, although a specific gene or mutation has not yet been identified.14 15 Research is ongoing to define specific disease-causing genes and mutations for other common disorders such as familial aneurysmal subarachnoid hemorrhage, fibromuscular dysplasia, and lupus anticoagulants.
In summary, sequencing the human genome along with the mapping and characterization of thousands of genes is a dramatic step toward unraveling the many genetic disorders that cause familial cerebrovascular disease. The cloning of a putative stroke gene that is responsible for cerebral infarction is another major advance in this area. Much work is still needed to discern how these mutated proteins lead to the disease phenotype and how to treat or prevent strokes in susceptible individuals. These developments may also make it possible to provide genetic counseling for at-risk individuals. Considering the rapidly changing knowledge in this area, neurologists and other healthcare professionals should keep abreast of new developments, since it may directly affect their clinical practices and their patients.
Footnotes
The opinions expressed in this editorial are not necessarily those of the editors or of the American Heart Association.
References
1. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001;409:860921.[Medline] [Order article via Infotrieve]
2.
Venter J, Adams M,
Myers E, Li P, Mural R, Sutton G, et al. The sequence of the human
genome. Science. 2001;291:13041351.
3. The International Human Genome Mapping Consortium. A physical map of the human genome. Nature. 2001;409:934941.[Medline] [Order article via Infotrieve]
4. The International SNP Map Working Group. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001;409:928933.[Medline] [Order article via Infotrieve]
5.
Peltonen L,
McKusick V. Dissecting human disease in the postgenomic era.
Science. 2001;291:12241229.
6. Gretarsdottir S, Sveinbjornsdottir S, Jonsson H, Jakobsson J, Einarsdottir E, Agnarsson U, et al. A major susceptibility locus for common stroke identified on chromosome 5q21. In press.
7. Stefansson K. Identification of a gene for common stroke. In: Program and abstracts of the 26th International Stroke Conference; February 1416, 2001; Fort Lauderdale, Fla.
8.
Hassan A, Markus H.
Genetics and ischemic stroke.
Brain. 2000;123:17841812.
9. McCarron MO, Nicoll JA. Apolipoprotein E genotype and cerebral amyloid angiopathy-related hemorrhage. Ann N Y Acad Sci. 2000;903:176179.[Medline] [Order article via Infotrieve]
10.
ODonnell HC,
Rosand J, Knudsen KA, Furie KL, Segal AZ, Chiu RI, Ikeda D, Greenberg
SM. Apolipoprotein E genotype and the risk of recurrent lobar
intracerebral hemorrhage.
N Engl J Med. 2000;342:240245.
11.
McCarron MO,
Hoffmann KL, DeLong DM, Gray L, Saunders AM, Alberts MJ.
Intracerebral hemorrhage outcome:
apolipoprotein E genotype, hematoma, and edema volumes.
Neurology. 1999;53:21762179.
12. Joutel A, Dodick DD, Parisi JE, Cecillon M, Tournier-Lasserve E, Bousser MG. De novo mutation in the Notch3 gene causing CADASIL. Ann Neurol. 2000;47:388391.[Medline] [Order article via Infotrieve]
13. Laberge-le Couteulx S, Jung H, Labauge P, Houtteville JP, Lescoat C, Cecillon M, Marechal E, Joutel A, Bach JF, Tournier-Lasserve E. Truncating mutations in CCM1, encoding KRIT1, cause hereditary cavernous angiomas. Nat Genet. 1999;23:189193.[Medline] [Order article via Infotrieve]
14.
Yamauchi T, Tada
M, Houkin K, Tanaka T, Nakamura Y, Kuroda S, Abe H, Inoue T, Ikezaki K,
Matsushima T, Fukui M. Linkage of familial moyamoya disease
(spontaneous occlusion of the circle of Willis) to chromosome 17q25.
Stroke. 2000;31:930935.
15.
Inoue TK, Ikezaki
K, Sasazuki T, Matsushima T, Fukui M. Linkage analysis of
moyamoya disease on chromosome 6.
J Child Neurol. 2000;15:179182.
This article has been cited by other articles:
![]() |
A. Santamaria, J. Mateo, I. Tirado, A. Oliver, R. Belvis, J. Marti-Fabregas, R. Felices, J. M. Soria, J. C. Souto, and J. Fontcuberta Homozygosity of the T Allele of the 46 C->T Polymorphism in the F12 Gene Is a Risk Factor for Ischemic Stroke in the Spanish Population Stroke, August 1, 2004; 35(8): 1795 - 1799. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. P. Reiner, S. M. Schwartz, M. B. Frank, W.T. Longstreth Jr, L. A. Hindorff, G. Teramura, F. R. Rosendaal, L. K. Gaur, B. M. Psaty, D. S. Siscovick, et al. Polymorphisms of Coagulation Factor XIII Subunit A and Risk of Nonfatal Hemorrhagic Stroke in Young White Women Editorial Comment Stroke, November 1, 2001; 32(11): 2580 - 2587. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2001 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |