Description
This track shows deletion/insertion polymorphisms (DIPs). In packed and full
modes, the sequence variation is shown to the left of the DIP.
The naming convention "-/sequence" is used for deletions;
"sequence/-" is used for insertions. The details
page shows the name of the trace used to define the polymorphism, the
quality score, and the strand on which the trace aligns to the reference
sequence.
The quality score reflects the minimum PHRED quality value over
the entire range of the DIP within the trace, plus 5 flanking bases.
PHRED quality scores are
expressed as log probabilities using the formula:
Q = -10 * log10(Pe)
where Pe is the estimated probability of an
error at that base. PHRED quality scores typically vary from 0 to 40, where 0
indicates complete uncertainty about the base and 40 implies odds of 10,000
to 1 that the base is correct. Sometimes a PHRED value of 50 or higher is
used to denote finished sequence. A color gradient is used to distinguish
quality scores in the browser display: brighter shading indicates higher
scores.
The "Trace Pos" value on the details page indicates the 3' position
of the DIP within the trace. The alleles are
reported relative to the "+" strand of the reference sequence;
however, the trace may actually align to the "-" strand.
When viewing the chromatogram using the URL provided,
if the trace aligned to the "-" strand, the DIP bases in the trace
will be the reverse compliment of the variant allele given.
Methods
All human trace data from NCBI's trace archive were aligned to hg17 with
ssahaSNP, followed by ssahaDIP post-processing to detect deletion/insertion
polymorphisms. DIPs within ENCODE regions were extracted.
Verification
For verification, 500k traces from the mouse whole genome shotgun (WGS)
sequencing effort were compared to mm6 using ssahaSNP and
ssahaDIP. Because mm6 and these traces are from the same mouse strain,
C57BL/6J, the DIP rate should be very low. Applying a quality threshold of
Q23, the detected DIP rate was one DIP per 140k Neighborhood Quality Standard
(NQS) bases. This level was ten-fold lower than the SNP rate for the same
data set using ssahaSNP, which has been validated as having a 5% false positive rate.
The detected DIP rate for human traces against hg17 is one DIP per 12k NQS
bases, indicating a false positive rate of 12k/140k, or about 8%.
Further validation experiments are in progress.
Credits
All analyses were performed by Jim Mullikin using ssahaSNP and ssahaDIP.
The trace data were contributed to the trace archive by many sequencing
centers.
References
Ning Z, Cox AJ, Mullikin JC.
SSAHA: A fast search method for large DNA databases.
Genome Res. 2001 Oct;11(10):1725-9.
The International SNP Map Working Group.
A map of human genome sequence variation containing
1.4 million single nucleotide polymorphisms.
Nature. 2001 Feb 15;409(6822):928-33.
|