EVS Variants Track Settings

Home
Genomes
Genome Browser
Tools
Mirrors
- Third Party Mirrors
- Mirroring Instructions
Downloads
My Data
Projects
Help
About Us
- News
- Publications
- Blog
- Cite Us
- Credits
- Release Log
- Staff
- Contact Us
- Conditions of Use
- Jobs
- Licenses

Description

The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders. The current data release (ESP6500SI-V2-SSA137) through the EVS website is taken from 6,503 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data.

Display Conventions

In "dense" mode, a vertical line is drawn at the position of each variant. In "pack" and "full" modes, in addition to the vertical line, a label to the left shows the reference allele first and variant alleles below (A = red, C = blue, G = green, T = magenta, Indels = black). Hovering the pointer over any variant will prompt the display of the occurrences numbers for each allele in the Exome Sequencing Project's database. Clicking on any variant will result in full details of that variant being displayed as well as possible links to the ESP and dbSNP databases.

Methods

Sequences were aligned to NCBI build 37 human genome reference using BWA. PCR duplicates were removed using Picard. Alignments were recalibrated using GATK. Lane-level indel realignments and base alignment quality (BAQ) adjustments were applied.

All data were simultaneously analyzed for exome SNP variants at the University of Michigan (by the Abecasis Laboratory). SNPs were called using a two-step approach. First, genotype likelihood files (GLFs) were generated using samtools pileup on individual BAM files. Next, we used glfMultiples, a multi-sample variant caller, to generate initial SNP calls. Details of the likelihood model implemented in glfMultiples are given in Li, et al., 2011 (in the section entitled "Identifying Potential Polymorphic Sites"). The Michigan SNP calling pipeline is available at: http://genome.sph.umich.edu/wiki/UMAKE. This pipeline makes diploid calls for pseudo-autosomal regions of male samples and haploid calls for the rest of the chromosome. Female samples have diploid calls for all regions on the X chromosome. SNPs were filtered by a machine-learning technique called support vector machine (SVM) classification (for a detailed description, see Filter Status).

Small INDEL variants were analyzed at the Broad Institute (by the Genome Sequencing and Analysis group) using the GATK variation discovery pipeline following the guidelines in the GATK best practices v4. More specifically, each BAM was reduced to create a Reduced BAM, and then INDELs were discovered by analyzing all samples simultaneously with the GATK UnifiedGenotyper, and subsequently filtered by the GATK Variant Quality Score Recalibration (VQSR) filtering model, again following the V4 best practices. The INDEL genotypes for X and Y chromosomes were adjusted to be consistent with the samples' genders. Female samples have diploid calls for all regions on the X chromosome. Male samples have diploid calls for pseudo-autosomal regions on the X chromosome and haploid calls for the rest of the X chromosome and on the Y chromosome as well. However, the INDEL calls for the ESP data are preliminary and not as robust as the SNP calls at this point. Users are advised to keep this difference in mind when applying the ESP data to research studies.

All SNPs and INDELs were further annotated by SeattleSeqAnnotation137, and the variant annotations at the coding-DNA and protein levels mostly follow HGVS notations.

The SNP calls are included in the release of dbSNP build 138. The full dataset is described in Fu, et al., 2013, and a subset of the data (i.e., 2,500 exomes) was published by the ESP Population Genetics and Statistical Analysis Working Group in Tennessen, et al., 2012.

Description

Display Conventions

Methods

Credits

References