Recommendations From the International Stroke Genetics Consortium, Part 2
Biological Sample Collection and Storage
The revolution in human genetics, catalyzed by the sequencing of the human genome in 2003 and the development of genome-wide genotyping technologies, has led to the identification of >2000 trait-associated genetic variants. Because most of these variants have individually small effects on disease risk, successful gene discovery efforts have required large sample sizes (involving thousands, tens, or hundreds of thousands of cases and controls) to achieve sufficient study power. Amassing such sample sizes has depended on international collaboration on a scale never seen before in human genetics or even in clinical research. Disease-specific consortia bringing together many individual sites and collaborators have now evolved for many major diseases. Each consortium has faced with ≥2 fundamental questions: how to assemble a study sample of sufficient size, homogeneity, and phenotypic quality and how to retain and analyze, sometimes repeatedly over several years, biological samples from enrolled subjects.
The International Stroke Genetics Consortium Biorepository
The International Stroke Genetics Consortium (ISGC) has addressed these 2 challenges by taking into account the nature of the disease—stroke occurs suddenly with high case fatality (≤251 per 100 000; 5.9 million of 53 million reported deaths in 2010)1,2 and develops through multiple biological mechanisms. Because the stroke community consists of clinicians and investigators from across the world practicing in a range of environments with varied resources and constraints, the ISGC has developed processes that allow maximum flexibility while maintaining standardization, reliability, and quality. A central tenet of the consortium involves transparency and trust, without which large-scale collections and studies would not be possible. The growing collection of stroke samples created at the Massachusetts General Hospital under the auspices of the ISGC includes physical samples for ≈5000 cases and 4500 controls from >20 institutions in Europe, North America, South America, Australia, and Asia. It provides a state-of-the science biorepository that ensures reliable stroke clinical data and samples are available for use in future research.
Here, we outline the processes and infrastructure necessary for individual sites to collect and contribute samples to large-scale genetic efforts. Throughout this article, a central site refers to the coordinating site that houses the samples and data contained within a biorepository, whereas a contributing site is defined as a site that enrolls research subjects, collects samples and data, and sends samples to a central biorepository. Our companion article describes standardized phenotypic data collection, population selection, demographic information, stroke subtyping, neuroimaging standards, outcome definitions, and pertinent ethical considerations.2a
Several approaches are available to establish a biorepository for large-scale collaborative genetic studies, each with specific advantages and challenges. A genetic biorepository includes physical samples, genotyping data, or both, for a large set of individuals with a phenotype of interest, along with control subjects.
Electronic Medical Record Data With Associated Sample Collections
This approach is cost-effective because it uses data already collected for routine purposes (eg, billing) and discarded clinical samples for genetic analysis. Challenges include local ethical and institutional limitations, as well as a need for a high-quality electronic medical record system and carefully constructed algorithms to characterize phenotypes of interest properly. The Vanderbilt University-led BioVU3 and National Human Genome Research Institute–funded Electronic Medical Records and Genomics (eMERGE)4 network are primary examples of this sort of collection, organized by the National Human Genome Research Institute to combine DNA biobanking with electronic medical record–based phenotyping.
Assembly of a Biorepository From Existing Research Collections
This approach involves centrally assembling samples already collected elsewhere before the collaboration. The primary advantage of this approach is its low cost because it takes advantage of existing collections not originally assembled expressly for genetic studies, using samples from which DNA can be extracted. Challenges include the inability to modify sample or data collection and handling at local sites, and ethical considerations when data and samples were collected some time ago and new uses may be different from those originally approved by local ethics committees. Studies that have used this strategy are the ISGC-Wellcome Trust Case Control Consortium 2 Genome-Wide Association Study,5 the National Institute of Neurological Disorders Ischemic Stroke Genetics Network,6 and the ISGC’s genome-wide association study of intracerebral hemorrhage.7
Centrally Coordinated Prospective Study
Advantages of this approach include central control over the entire process, resulting in the least possible heterogeneity in quality control, data, and samples. The major challenge to this approach is its high-cost. Nested case–control studies constitute an appealing strategy to keep the advantages of a cohort design, while reducing costs. This design entails ascertaining 1 or several outcomes over time in a complete cohort, with ascertainment of exposure (genotypes in this case) in a given number of controls each time a case occurs, minimizing genotyping costs. Another drawback for this approach is the time necessary for incident cases to accumulate. For example, with 500 000 participants, the UK Biobank8 had 7000 prevalent strokes at the baseline, presently has 3000 participants with stroke, and expects 20 000 stroke cases by 2027.
Retaining genotyping data allows subjects to be included in future analyses. In silico data are, thus, often transferred from a repository for multiple different collaborative studies, provided each study falls within the scope of the permissions granted for use of the samples. In silico sharing of genetic data can involve either summary statistics or individual-level genotype data. Sharing of summary statistics is easier, and approval from human studies committees is generally faster to obtain than approvals required for the sharing of individual-level data. However, the precision and sophistication of analyses that can be completed using summary statistics are limited compared with what can be achieved with pooled individual-level data. The ISGC’s METASTROKE collaboration9–11 has used pooled summary statistics to great effect. Through the sharing of individual-level data, however, the Psychiatric Genomics Consortium12,13 has made far more substantial advances.
Essential Biorepository Components
Study Protocol and Regulatory Approvals
Before participating in a central biorepository, contributing sites must confirm that they comply with regulatory approvals required by their home institutions and documentation required by the central site. Requirements vary widely by country and individual institution. Although many regulatory approvals are relevant to a biorepository protocol, the most important describes the sharing of samples and data to outside institutions; specifically, to the central site of the biorepository.
If the contributing site’s consent form does not clearly address sample and data sharing, documentation of a regulatory amendment granting approval may need to be sent to the central site. Because sharing among all biorepository contributors will allow for greater scientific discovery, the contributing site should also consider amending its protocol to allow samples and data to be sent to and analyzed by any member of a given consortium.
Although the above recommendations are based on the ISGC experience at Massachusetts General Hospital, other institutions cite different experiences, when seeking approval for sample sharing (Table). Although each site may differ, certain requirements are common to all institutions. First, some type of approval must be granted before transfer. These include either review of the study protocol or research proposal by the contributing institution’s ethics committee, or the completion of a material transfer agreement or institution-specific transfer certificate. Second, shared samples and data must be often deidentified. Less widespread requirements include registration of transferred samples with the Department of Health (Taiwan) and the reconsent of subjects, when the research proposal differs from the original consent (Utah). Finally, a requirement specific to all National Institutes of Health (NIH)–funded sites includes certification that the submission of all NIH-funded data to the database for Genotypes and Phenotypes NIH repository and subsequent sharing for research purposes are consistent with the informed consent of study participants from whom the data were obtained. Sites anticipating NIH funding must complete a verification letter, signed by both the regulatory committee Chair and the principal study investigator and certified by the responsible Institutional Official before sending any samples or data to the biorepository. An overview of current NIH policies on data sharing in genome-wide association studies can be found on the NIH Web site, including links to a draft NIH genomic data sharing policy.14,15 An overview of database for Genotypes and Phenotypes policies can be found on the database for Genotypes and Phenotypes Web site.16
Because of the variety of required approvals, we recommend that any site expecting to share samples outside of its home institution includes language on sample sharing in its regulatory protocols and consents as early in the study as possible. In addition, study investigators should review institution-specific requirements to ensure that these approvals are met. It is important that an ongoing dialogue exists between contributing sites and the central biorepository as requirements may change over time and protocol or sample use modifications may be necessary.17 Sample informed consent and protocol summary forms are provided (Figures I and II in the online-only Data Supplement).
Uniform Phenotype Data Collection
When enrolling subjects or collecting samples for analysis, it is important to collect certain items from an agreed minimum phenotype data set, allowing coding to a standardized format, so that samples from different institutions can be used harmoniously within a biorepository. This avoids haphazard, nonuniform collection by creating a standardized minimum data set with additional optional layers of details available. To prevent the need for interpretation and data cleaning at the central biorepository, a common variable and coding definition sheet may always be referenced, when reporting/sharing data. ISGC recommendations for accurate, clear, and uniform phenotyping protocols are described in the companion ISGC article.2a
Standardized Sample Collection and Processing
We recommend that all sites follow a standardized sample collection and processing procedure. This helps to guarantee that collected samples are of high-quality and will be useful in future experiments. Samples can be shipped as either whole blood or extracted DNA.
The DNA concentration of each sample may vary based on several factors, including time to initial processing, white blood cell concentration, and volume of the blood draw. It is important to have a reliable value for DNA concentration of a sample to assess its viability for inclusion in genetic studies. Contributing sites with the resources to extract DNA should agree on a standard procedure with the central site, including the specific extraction kit, desired sample concentration and volume, quality control measures that will be conducted by the contributing and central sites, the frequency of shipments, and the format for sample shipment.
Implementation of protocols for sample collection and processing has helped facilitate a streamlined workflow between contributing and central sites for the ISGC, while increasing efficiency as the size of the biorepository has expanded. Protocols describe DNA extraction, DNA quantification, DNA storage and sample organization, database and bioinformatic recording of sample-associated information, receipt of samples and data from contributing sites, and shipment of requested samples and data to collaborators (Figure III in the online-only Data Supplement).
After contributing sites have evaluated available resources and agreed on sample and data collection and shipment methods with the central site, they can enroll subjects and send samples and data to the biorepository. If a contributing site chooses to send unprocessed whole blood samples for DNA extraction at the central site, the samples should be shipped as soon as possible, ideally on the day they are collected, so that they may be processed quickly. When ready to send whole blood samples, contributing sites may notify the central site, detailing how many samples will be sent and providing the affiliated data. Once this is confirmed and the samples have been properly packed and sent, the contributing site can notify the central site with relevant tracking information, and the central site, will confirm the receipt of the samples, when they arrive, addressing any issues at this time. Sites can also collect and freeze whole blood samples for batch shipment to the central site, although this is not ideal because each freeze–thaw cycle reduces the maximum achievable DNA concentration for each sample.
Contributing sites that choose to send extracted DNA samples are encouraged to establish a timeline for shipment (eg, biannual, quarterly) and desired sample parameters (eg, concentration, volume, tube type, and labeling format) with the central site. When ready to be shipped, the samples may be transferred at the agreed on specifications into the specified container, quantified, and each sample’s location in the plate recorded, along with relevant concentration, volume, and phenotypic information. The contributing site may then notify the central site of the shipment, including the completed plate layout for review. Sample labeling is important at this stage to ensure the correct linkage between samples and associated phenotypic information. When the central and contributing sites agree that all necessary samples and information are present, the contributing site will pack the samples according to the International Air Transport Association dangerous goods regulations,18 using adequate dry ice to ensure that samples do not melt before arrival. For some shipments, specific customs regulations and declarations may apply, along with export or import control laws. Once samples are received by the central site, the samples will be quantified and stored until needed. All relevant data will be kept by the central site in a specifically designed, secure database. After samples are genotyped and their data were analyzed, genetic data may be returned to the contributing site for internal use (Figure). Although these considerations are presented in the context of DNA sample processing and handling, these broad principles and procedures extend to any biological sample type with specific modifications as needed.
Database and Bioinformatics
After sample collection methods are determined, secure storage of both the phenotypic and sample information must be arranged. The US Health Insurance Portability and Accountability Act prohibits all protected health information sharing between the contributing site and the central site and recommends that all samples and data to be deidentified before shipment. Although deidentification specifically requires that data be stripped of common identifiers, a link between the deidentified data set and the fully identified data set may remain at the home institution. Contributing sites are encouraged to keep this link at their institution because further phenotypic data may be requested in the future; however, this link should never be stored at the repository. Off-site back-up storage may also be desired as an additional layer of security.
When organizing biological sample data, we have found the following categories useful: collaborating site name, individual ID, alternative site ID, aliquot ID, sample type, concentration (ng/μL), volume (μL), container type, freezer location, sample degradation status, shipment history, and date received. Complications that may arise when storing sample data include linking multiple sample aliquots to 1 individual, recording multiple sample types (DNA, plasma, etc.), ensuring that no aliquot ID is duplicated, and updating of concentrations, volumes, and shipping histories. To address these complications, the central site may use a standard naming convention and storage system.
The last, but perhaps most important, aspect of databases and bioinformatics is sample coding. All individual IDs and sample aliquots should be coded in a uniform manner without any possibility for duplication. The ISGC biorepository has handled this by adding prefixes and suffixes onto sample names, where the first 3 letters of the contributing institution’s name is used as a prefix. For example, BOS_0001 denotes an individual from Boston, whereas LUN_0001 denotes an individual from Lund, Sweden. Using this prefix system, multiple institutions can contribute samples with the same ID numbers without causing duplication in the database. However, a single individual ID may have multiple aliquot IDs. To address this, a suffix individualizes these IDs and separates sample types. The suffix system used by the ISGC biorepository comprises the following: a letter is used to distinguish sample types (D=DNA, P=plasma, C=cerebrospinal fluid, S=serum, and R=RNA), and a number is used to identify the sample aliquot. For example, if individual BOS_0001 has 3 DNA sample aliquots, their IDs will be BOS_0001_D1, BOS_0001_D2, and BOS_0001_D3. Although the central site should code each site’s samples in a uniform yet unique manner, the contributing site’s original deidentified ID should be stored in an alternative ID field to ensure a permanent link between the information about the samples stored at the central site and the sample information back at the contributing site should a need to relink them arise.
Laboratory resources are essential for the success of any biorepository. Before determining what particular equipment is needed, the central site must determine what services it aims to provide. As a basic example, storing samples requires only a −80°C freezer and a freezer alarm system with emergency back-up power, whereas sample processing and aliquoting require advanced robotics and various laboratory tools. Thus, the central site should address the following questions with its collaborating sites at the outset, before accepting samples.
How many samples will be sent and what are their storage requirements?
a. What types of samples are being stored? Must they be stored in specific container types? Do they require storage at different temperatures?
Prestorage: will any manipulation be done to the samples?
a. This may include DNA extraction, sample aliquoting, DNA degradation assessment, or sample dilution.
Poststorage: will the samples be manipulated or altered before they are distributed?
a. The receiving site may have specific sample requirements, including particular storage containers, plate layouts, and concentration or volume maximum/minimum. However, if the central site is planning to complete projects in-house, it should acquire the necessary equipment independently along with laboratory space to store these tools.
Once these items have been addressed, the central site should acquire the resources needed for their biorepository. The ISGC biorepository based at Massachusetts General Hospital currently has the following tools to support the ISGC samples and ongoing genotyping projects: two −80°C freezers, 2 separate alarm notification systems, departmental back-up freezer support, small and large bench centrifuges, automated 8-armed and multichannel robotic liquid handling systems, DNA extraction kits, multichannel pipettes, single-channel pipettes, degradation gel cassettes/bases/imager and access to a department stock room, which carries a variety of disposable equipment (plates, pipette tips, gloves, etc.).
Of great importance to this equipment’s operation is the availability of the central site’s staff. Acting as the administrative and laboratory support for a biorepository requires substantial time and effort on behalf of personnel. The central site should assess how much time it will take to run a biorepository successfully and the availability of trained staff to fulfill these responsibilities. For large-scale repositories, robotic automated sample handling may represent an alternative to human sample preparation; despite the initial expense, this may improve efficiency, reduce costs, and eliminate human error in the long term.
The final, perhaps most important, resource in a successful biorepository is a trusting, transparent relationship among collaborators. The whole process of study design, sample collection, linkage to phenotypic data, genetic analysis, and result dissemination cannot be achieved without a willing and able network of coinvestigators who strive toward a common goal. The ISGC as a whole has worked to foster a sense of camaraderie throughout the group, allowing for transparency and trust to form the basis of its operations.
We have detailed the considerations necessary for investigators seeking to participate in a large research biorepository. Current and future ISGC members have the option to retain samples locally or send them to a larger central repository. We encourage all investigators with an interest in enrolling research subjects for stroke genetic studies to follow the procedures we have recommended. We hope the ISGC’s experience is useful for enhancing collaborative studies across a range of disease areas.
Sources of Funding
This work was funded by National Institutes of Health/National Institute of Neurological Disorders Ischemic Stroke R01 NS059727 and U01NS069208 (J. Rosand). R. Lemmens is a senior clinical investigator of Fonds Wetenschappelijk Onderzoek Flanders. A. Lindgren received funding from Lund University, Region Skåne, Swedish Heart and Lung Foundation, the Freemasons Lodge of Instruction Eos in Lund, King Gustav V and Queen Victoria’s Foundation, and the Swedish Stroke Association.
Drs Majersik and Kittner were supported by research grants, National Institutes of Health. Dr Anderson was supported by research grants, National Institutes of Health/National Institute of Neurological Disorders Ischemic Stroke, American Brain Foundation, Massachusetts General Hospital Institute for Heart, Vascular, and Stroke Care. Dr Fernandez-Cadenas was supported by the Miguel Servet program from the Spanish Ministry of Health, Carlos III Institute (CP12/03298). Dr Tatlisumak was supported by research grants, Helsinki University Central Hospital Research Funds and the Sigrid Juselius Foundation. Dr Rosand was supported by research grants, National Institutes of Health; consultant, Boehringer Ingelheim. The other authors report no conflicts.
The opinions expressed in this article are not necessarily those of the editors or of the American Heart Association.
Guest Editor for this article was Jeffrey L. Saver, MD.
The online-only Data Supplement is available with this article at http://stroke.ahajournals.org/lookup/suppl/doi:10.1161/STROKEAHA.114.006851/-/DC1.
- Received July 22, 2014.
- Revision received July 22, 2014.
- Accepted September 4, 2014.
- © 2014 American Heart Association, Inc.
- Majersik JJ,
- Cole JW,
- Golledge J,
- Rost NS,
- Chan Y-FY,
- Gurol ME,
- et al
- 3.↵Vanderbilt University Medical Center. BioVU: Vanderbilt DNA Databank. http://www.vanderbilt.edu/oor/cores/biovu-vanderbilt-dna-databank/. Accessed January 16, 2014.
- 4.↵Vanderbilt University Medical Center. The eMERGE Network: electronic medical records & genomics. http://emerge.mc.vanderbilt.edu/. Accessed January 16, 2014.
- Bellenguez C,
- Bevan S,
- Gschwendtner A,
- Spencer CC,
- et al
- Meschia JF,
- Arnett DK,
- Ay H,
- Brown RD Jr,
- Benavente OR,
- Cole JW,
- et al
- 8.↵UK Biobank. About UK Biobank. http://www.ukbiobank.ac.uk/about-biobank-uk/. Accessed January 16, 2014.
- 14.↵National Institutes of Health. Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS). http://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-088.html. Accessed January 16, 2014.
- 15.↵National Institutes of Health. Genomic Data sharing (GDS): NIH Genomic Data Sharing Policy. http://gds.nih.gov/03policy2.html. Accessed September 16, 2014.
- 16.↵National Institutes of Health. dbGaP Overview. http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/about.cgi. Accessed January 16, 2014.
- Henderson GE,
- Edwards TP,
- Cadigan RJ,
- Davis AM,
- Zimmer C,
- Conlon I,
- et al
- 18.↵International Air Transport Association. IATA Dangerous Goods Regulations (DGR). 55th ed. Geneva, Switzerland: International Air Transport Association; 2014.