Description
This track shows 84.7 million single nucleotide polymorphisms (SNPs),
3.6 million short insertions/deletions (indels), and 60,000 structural variants
discovered by the
1000 Genomes Project
through its
Phase 3 sequencing of 2,504 genomes from 16 populations worldwide.
The variant genotypes have been phased by the 1000 Genomes Project
(i.e., the two alleles of each diploid genotype have been assigned to two
haplotypes,
one inherited from each parent).
This extra information enables a clustering of independent haplotypes
by local similarity for display.
Display Conventions
In "dense" mode, a vertical line is drawn at the position of each
variant.
In "pack" mode, since these variants have been phased, the
display shows a clustering of haplotypes in the viewed range, sorted
by similarity of alleles weighted by proximity to a central variant.
The clustering view can highlight local patterns of linkage.
In the clustering display, each sample's phased diploid genotype is split
into two independent haplotypes.
Each haplotype is placed in a horizontal row of pixels; when the number of
haplotypes exceeds the number of vertical pixels for the track, multiple
haplotypes fall in the same pixel row and pixels are averaged across haplotypes.
Each variant is a vertical bar with black representing the reference allele
and white (invisible) representing the non-reference allele(s).
Tick marks are drawn at the top and bottom of each variant's vertical bar
to make the bar more visible when most alleles are reference alleles.
The vertical bar for the central variant used in clustering is outlined in purple.
In order to avoid long compute times, the range of alleles used in clustering
may be limited; alleles used in clustering have purple tick marks at the
top and bottom.
The clustering tree is displayed to the left of the main image.
It does not represent relatedness of individuals; it simply shows the arrangement
of local haplotypes by similarity. When a rightmost branch is purple, it means
that all haplotypes in that branch are identical, at least within the range of
variants used in clustering.
Methods
The genomes of 2,504 individuals were sequenced using both whole-genome sequencing
(mean depth = 7.4x) and targeted exome sequencing (mean depth = 65.7x).
Quoting the Phase 3 publication (1000 Genomes Project Consortium, 2015):
In contrast to earlier phases of the project, we expanded analysis
beyond bi-allelic events to include multi-allelic SNPs, indels, and a
diverse set of structural variants (SVs). An overview of the sample
collection, data generation, data processing, and analysis is given in
Extended Data Fig. 1. Variant discovery used an ensemble of 24
sequence analysis tools (Supplementary Table 2), and machine-learning
classifiers to separate high-quality variants from potential false
positives, balancing sensitivity and specificity. Construction of
haplotypes started with estimation of long-range phased haplotypes
using array genotypes for project participants and, where available,
their first degree relatives; continued with the addition of high
confidence bi-allelic variants that were analysed jointly to improve
these haplotypes; and concluded with the placement of multi-allelic
and structural variants onto the haplotype scaffold one at a time.
See also:
Credits
Thanks to the
1000 Genomes Project
for making these data freely available.
References
1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO,
Marchini JL, McCarthy S, McVean GA et al.
A global reference for human genetic variation.
Nature. 2015 Oct 1;526(7571):68-74.
PMID: 26432245
|
|