Note: lifted from hg18
Description
This track displays variant base calls from the publicly released genome
sequences of several individuals:
- 5 Sub-Saharan African genomes sequenced by Penn State University:
- !Gubi (KB1)
- G/aq'o (NB1)
- !Ai (MD8)
- D#kgao (TK1)
- Archbishop Desmond Tutu (ABTutu)
- 6 individuals from the 1000 Genomes Project:
- a CEU daughter and parents (NA12878, NA12891, NA12892)
- a YRI daughter and parents (NA19240, NA19238, NA19239)
- 69 non-diseased individuals sequenced by Complete Genomics:
- a YRI daughter and parents (NA19240, NA19238, NA19239)
- a PUR trio (HG00731, HG00732, HG00733)
- a 17-member, 3-generation CEPH pedigree
(Pedigree 1463: NA12877, NA12878, NA12879, NA12880, NA12881, NA12882, NA12883, NA12884,
NA12885, NA12886, NA12887, NA12888, NA12889, NA12890, NA12891, NA12892, NA12893)
- a diversity panel representing unrelated individuals from ten different populations:
- ASW (NA19700, NA19701, NA19703, NA19704, NA19834)
- CEU (NA06985, NA06994, NA07357, NA10851, NA12004)
- CHB (NA18526, NA18537, NA18555, NA18558)
- GIH (NA20845, NA20846, NA20847, NA20850)
- JPT (NA18940, NA18942, NA18947, NA18956)
- LWK (NA19017, NA19020, NA19025, NA19026)
- MKK (NA21732, NA21733, NA21737, NA21767)
- MXL (NA19735, NA19648, NA19649, NA19669, NA19670)
- TSI (NA20502, NA20509, NA20510, NA20511)
- YRI (NA18501, NA18502, NA18504, NA18505, NA18508, NA18517, NA19129)
- 5 individuals from the Personal Genome Project:
- George Church (NA20431)
- Misha Angrist (NA21677)
- Rosalynn Gill (NA21833)
- Henry Louis Gates Sr.
- Henry Louis Gates Jr.
- and independently published genomes:
- Craig Venter
- James Watson
- Anonymous Yoruba individual NA18507
- Anonymous Han Chinese individual (YH, YanHuang Project)
- Seong-Jim Kim (SJK)
- Anonymous Korean individual (AK1)
- Stephen Quake
- Anonymous Irish male
- Marjolein Kriek
- Gregory Lucier
- Extinct Palaeo-Eskimo Saqqaq individual
Note: The Khoisan languages are characterized by clicks, denoting
additional consonants. The ! is a palatal click, / is a dental click,
and # is an alveolar click (Le Roux and White, 2004).
Display Conventions and Configuration
In the genome browser, when viewing the forward strand of the
reference genome (the normal case), the displayed alleles are relative
to the forward strand. When viewing the reverse strand of the
reference genome ("reverse" button), the displayed alleles
are reverse-complemented to match the reverse strand.
When read frequency data are available, they are displayed in the mouseover
text (e.g., "T:8 G:3" means that 8 reads contained a T and 3 reads contained
a G at that base position) and box colors are used to show the proportion of
alleles.
On the details page for each variant, the alleles are given for the
forward strand of the reference genome. Frequency data are shown when
available.
Methods
Variants from Complete Genomics and Marjolein Kriek were mapped to the
Feb. 2009 (GRCh37/hg19) human genome assembly, so they required no remapping.
Variants for all other individuals were originally mapped to the Mar. 2006
(NCBI36/hg18) human genome assembly. Their locations were translated into GRCh37/hg19
coordinates using the liftOver program and the mapping file
hg18ToHg19.over.chain.gz. Homozygous matches to the
GRCh37/hg19 reference were removed.
Sources
KB1, NB1, MD8, TK1, ABTutu
(Penn State)
(Schuster et al.)
SNPs are from the allSNPs.txt file which can be downloaded
from Galaxy. The indels are also
available for download from Galaxy.
CEU trio NA12878, NA12891, NA12892; YRI trio NA19240, NA19238, NA19239
(1000 Genomes Project, March 2010 release)
(1000 Genomes)
The variants shown are from the 1000 Genomes Project's March 2010
release.
The CEU variant calls were based on sequence data from the
Wellcome Trust Sanger Insititute and the
Broad Institute, using the Illumina/Solexa platform.
The YRI variant calls were based on sequence data from the
Baylor College of Medicine
Human Genome Sequencing Center and
Applied Biosystems, using the SOLiD platform.
For more information on the mapping, variant calling, filtering and
validation, see the
pilot 2 README file.
The variant calls are available from the March 2010 release subdirectory at
EBI and at
NCBI.
Complete Genomics 69 genomes
(Complete Genomics, Nov 2011 release)
(CG)
There are four sets of data: a Yoruba trio; a Puerto Rican trio; a
17-member, 3-generation pedigree; and a diversity panel representing ten
different populations. The CEPH samples within the pedigree and
diversity sets are from the NIGMS Repository and the remainder from
the NHGRI Repository, both housed at the Coriell Institute for Medical
Research. The downloaded dataset was generated by the Complete Genomics
Analysis Pipeline version 2.0.0.
George Church
(Personal Genome Project, Complete Genomics)
(CG)
The variants are from Complete Genomics (Complete Genomics Analysis Pipeline
version 1.2.0.14).
Misha Angrist, Rosalynn Gill, Henry Louis Gates Sr., Henry Louis Gates Jr.
(Personal Genome Project)
(PGP)
The variants were downloaded from a
Trait-o-matic installation that may be out of order.
The numbers for Angrist are read counts; the number supporting each allele
was not given.
The Personal Genome Project offers whole genome sequences for the original
individuals and many more
for download.
Craig Venter (JCVI)
(Levy et al.)
An overview is given
here.
This subtrack contains Venter's single-base variants from the file
HuRef.InternalHuRef-NCBI.gff,
filtered to include only Method 1 variants (where each variant was
kept in its original form and not post-processed), and to exclude any
variants that had N as an allele.
JCVI hosts a
genome browser.
James Watson
(CSHL)
(Wheeler et al.)
These single-base variants came from the file
watson_snp.gff.gz.
CSHL hosts a
genome browser.
Yoruba NA18507
(Illumina Cambridge/Solexa)
(Bentley et al.)
Illumina released the read sequences to the
NCBI Short Read Archive.
Aakrosh Ratan in the Miller Lab at Penn State University (PSU)
mapped the sequence reads to the reference genome and called
single-base variants using
MAQ.
YH (YanHuang Project)
(Wang et al.)
The YanHuang Project released these single-base variants from the
genome of a Han Chinese individual.
The data are available from the
YH database in the file
yhsnp_add.gff.
The YanHuang Project hosts a
genome browser.
SJK (GUMS/KOBIC)
(Ahn et al.)
Researchers at Gachon University of Medicine and Science (GUMS)
and the Korean Bioinformation Center (KOBIC)
released these single-base variants from the genome of Seong-Jin Kim.
The data are available from
KOBIC
in the file
KOREF-solexa-snp-X30_Q40d4D100.gff.
AK1
(Genomic Medicine Institute)
(Kim et al.)
The variants shown are from the
AK1_SNP.tar.gz download.
Stephen Quake (Stanford)
(Pushkarev et al.)
The variants were downloaded from a
Trait-o-matic installation that may be out of order.
Anonymous Irish male
(Tong et al.)
The SNPs shown are from the Galaxy library,
Irish whole genome.
Marjolein Kriek
(Leiden)
The SNPs shown are called by Belinda Giardine from PSU, from the BAM file
provided by
Leiden University Medical Center. The reads were aligned to the
GRCh37/hg19 build. SNP calls were made using samtools, with a minimum of 4 reads
supporting the variant call and a maximum of 45. Those with a quality
score of less than 30 were filtered out.
Gregory Lucier
(Life Technologies)
The SNPs shown are from
Nimbus Informatics.
Sequencing was done using the Life Technologies SOLiD platform.
Palaeo-Eskimo Saqqaq individual (Saqqaq Genome Project)
(Rasmussen et al.)
The variants shown are
all non-reference SNPs found by the SNPest program, and in
a second track the
high confidence SNPs from the first set.
The allele counts are not available for these tracks but read depth is
available. The read depth was put in place of the allele counts to
give a measure of the reliability of the call.
Credits
Variants shown in this track were determined by the many individuals
and institutions listed above.
Thanks to Belinda Giardine at PSU for collecting the data and
loading them into the UCSC database.
References
Le Roux W, White A.
The voices of the San living in Southern Africa today. Cape Town: Kwela Books; 2004.
KB1, NB1, MD8, TK1, ABTutu (Penn State)
Schuster SC, Miller W, Ratan A, Tomsho LP, Giardine B, Kasson LR, Harris RS, Petersen DC, Zhao F, Qi
J et al.
Complete Khoisan and Bantu genomes from southern Africa.
Nature. 2010 Feb 18;463(7283):943-7.
PMID: 20164927; PMC: PMC3890430
CEU trio NA12878, NA12891, NA12892;
YRI trio NA19240, NA19238, NA19239 (1000 Genomes)
1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA,
Hurles ME, McVean GA.
A map of human genome variation from population-scale sequencing.
Nature. 2010 Oct 28;467(7319):1061-73.
PMID: 20981092; PMC: PMC3042601
Complete Genomics 69 genomes
Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P, Nazarenko I, Nilsen
GB, Yeung G et al.
Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays.
Science. 2010 Jan 1;327(5961):78-81.
PMID: 19892942
Public Genome Data Repository Service Note,
Complete Genomics 2011.
George Church
Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG, Carnevali P, Nazarenko I, Nilsen
GB, Yeung G et al.
Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays.
Science. 2010 Jan 1;327(5961):78-81.
PMID: 19892942
Misha Angrist, Rosalynn Gill, Henry Louis Gates Sr., Henry Louis Gates Jr.
Church GM.
The personal genome project.
Mol Syst Biol. 2005;1:2005.0030.
PMID: 16729065; PMC: PMC1681452
Craig Venter (JCVI)
Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G
et al.
The diploid genome sequence of an individual human.
PLoS Biol. 2007 Sep 4;5(10):e254.
PMID: 17803354; PMC: PMC1964779
James Watson (CSHL)
Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT
et al.
The complete genome of an individual by massively parallel DNA sequencing.
Nature. 2008 Apr 17;452(7189):872-6.
PMID: 18421352
Yoruba NA18507 (Illumina Cambridge/Solexa)
Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes
CL, Bignell HR et al.
Accurate whole human genome sequencing using reversible terminator chemistry.
Nature. 2008 Nov 6;456(7218):53-9.
PMID: 18987734; PMC: PMC2581791
YH (YanHuang Project)
Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J et al.
The diploid genome sequence of an Asian individual.
Nature. 2008 Nov 6;456(7218):60-5.
PMID: 18987735; PMC: PMC2716080
SJK (GUMS/KOBIC)
Ahn SM, Kim TH, Lee S, Kim D, Ghang H, Kim DS, Kim BC, Kim SY, Kim WY, Kim C et al.
The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group.
Genome Res. 2009 Sep;19(9):1622-9.
PMID: 19470904; PMC: PMC2752128
AK1 (Genomic Medicine Institute)
Kim JI, Ju YS, Park H, Kim S, Lee S, Yi JH, Mudge J, Miller NA, Hong D, Bell CJ et al.
A highly annotated whole-genome sequence of a Korean individual.
Nature. 2009 Aug 20;460(7258):1011-5.
PMID: 19587683; PMC: PMC2860965
Stephen Quake
Pushkarev D, Neff NF, Quake SR.
Single-molecule sequencing of an individual human genome.
Nat Biotechnol. 2009 Sep;27(9):847-50.
PMID: 19668243; PMC: PMC4117198
Anonymous Irish male
Tong P, Prendergast JG, Lohan AJ, Farrington SM, Cronin S, Friel N, Bradley DG, Hardiman O, Evans A,
Wilson JF et al.
Sequencing and analysis of an Irish human genome.
Genome Biol. 2010;11(9):R91.
PMID: 20822512; PMC: PMC2965383
Marjolein Kriek
Not published yet, data provided by
Leiden University Medical Center.
Gregory Lucier
Not published, data provided by Life Technologies and Nimbus Informatics.
Palaeo-Eskimo Saqqaq individual
Rasmussen M, Li Y, Lindgreen S, Pedersen JS, Albrechtsen A, Moltke I, Metspalu M, Metspalu E,
Kivisild T, Gupta R et al.
Ancient human genome sequence of an extinct Palaeo-Eskimo.
Nature. 2010 Feb 11;463(7282):757-62.
PMID: 20148029; PMC: PMC3951495
|
Top⇑ |