Cons Indels MmCf Track Settings

Home
Genomes
Genome Browser
Tools
Mirrors
- Third Party Mirrors
- Mirroring Instructions
Downloads
My Data
Projects
Help
About Us
- News
- Publications
- Blog
- Cite Us
- Credits
- Release Log
- Staff
- Contact Us
- Conditions of Use
- Jobs
- Licenses

Description

This track displays regions showing evidence for conservation with respect to mutations involving sequence insertions and deletions (indels). These “indel-purified sequences” (IPSs) were obtained by comparing the predictions of a neutral model of indel evolution with data obtained from human (hg19), mouse (mm8) and dog (camFam2) alignments (Lunter et al., 2006) The evidence for conservation is statistical, and each region is annotated with a posterior probability. It may be interpreted as the probability that the segment shows the paucity of indels by selection, rather than by random chance. Apart from the underlying alignment, these data are independent of the conservation of the nucleotide sequence itself. Any inferred conservation of the sequence, e.g. as shown by phastCons, is therefore independent evidence for selection. It may happen that sequence is conserved with respect to indel mutations without concomitant evidence of conservation of the nucleotide sequence. The opposite may also happen.

Methods

In the absence of selection, indels have a certain predicted distribution over the genome. The actual distribution shows an over-abundance of ungapped regions, due to selection purifying functional sequence from the deleterious effects of indels. Neutrally evolving sequence, such as (by and large) ancestral repeats, show an exceedingly good fit to the neutral predictions. This accurate fit allows the identification of a good proportion of conserved sequence at a relatively low false discovery rate (FDR). For example, at an FDR of 10%, the predicted sensitivity is 75%. Each identified indel-purified sequence (IPS) is annotated by two numbers: a false discovery rate (FDR), and a posterior probability (p). The FDR refers to a set of segments, not a given segment by itself. In this case, it refers to the minimum FDR of all sets that include the segment of interest. For example, a segment annotated with a 10% FDR also belongs to a set with a 15% FDR, but not a set with a 5% FDR. The posterior probability does refer to the single segment by itself. It has a frequentist interpretation, namely, as the proportion of regions, annotated with the same posterior probability, that have been under purifying selection, rather than showing the observed lack of indels by random chance. The data include segments for a false-discovery rate of up to 50%. The score directly reflects the FDR, through the following formula:

score = 90 / (FDR + 0.08)

et al

Verification

The neutral indel model was calibrated using ancestral repeats, against which it showed an excellent fit. Among the identified IPSs at 10% FDR and predicted sensitivity of 75%, we found 75% of annotated protein-coding exons (weighted by length), and 75% of the 222 microRNAs that were annotated at the time. Ancestral repeats were heavily depleted among the identified segments.

Description

Display Conventions

Methods

Verification

Credits

References