Description
This track shows RNA secondary structure predictions made with the
EvoFold (v.2) program, a comparative method that exploits the evolutionary signal
of genomic multiple-sequence alignments for identifying conserved
functional RNA structures.
Display Conventions and Configuration
Track elements are labeled using the convention ID_strand_score.
When zoomed out beyond the base level, secondary structure prediction regions
are indicated by blocks, with the stem-pairing regions shown in a darker shade
than unpaired regions. Arrows indicate the predicted strand.
When zoomed in to the base level, the specific secondary structure predictions
are shown in parenthesis format. The confidence score for each position is
indicated in grayscale, with darker shades corresponding to higher scores.
The details page for each track element shows the predicted secondary structure
(labeled SS anno), together with details of the multiple species
alignments at that location. Substitutions relative to the human sequence are
color-coded according to their compatibility with the predicted secondary
structure (see the color legend on the details page). Each prediction is
assigned an overall score and a sequence of position-specific scores. The
overall score measures evidence for any functional RNA structures in the given
region, while the position-specific scores (0 - 9) measure the confidence of
the base-specific annotations. Base-pairing positions are annotated
with the same pair symbol. The offsets are provided to ease
visual navigation of the alignment in terms of the human sequence. The offset
is calculated (in units of ten) from the start position of the element on
the positive strand or from the end position when on the negative strand.
The graphical display may be filtered to show only those track elements
with scores that meet or exceed a certain threshhold. To set a
threshhold, type the minimum score into the text box at the top of the
description page.
Methods
Evofold makes use of phylogenetic
stochastic context-free grammars (phylo-SCFGs), which are combined
probabilistic models of RNA secondary structure and primary sequence
evolution. The predictions consist of both a specific RNA secondary
structure and an overall score. The overall score is essentially a
log-odd score between a phylo-SCFG modeling the constrained evolution of
stem-pairing regions and one which only models unpaired regions.
The predictions for this track were based on the conserved segments of
a human-referenced (hg18) 31-way vertebrate alignment comprising 28
mammalian assemblies and three other vertebrate assemblies (see Parker
et al for details). The 31-way alignment is a subset of the 44-way
alignment displayed on hg18.
Additional resources
Auxiliary data sets and a family classification of the predictions
can be browsed on a mirror site
from here.
Credits
The EvoFold program and browser track were developed by
Jakob Skou Pedersen
initially at UCSC Genome Bioinformatics Group and later at University
of Copenhagen and at Aarhus University, Denmark (current
position). Parker et al. describes the current set of predictions and
their family classification. The multiple alignments used for the
analysis were generated at UCSC as part of the 29 Mammals Sequencing
and Analysis Consortium (Lindblad-Toh et al.).
The RNA secondary structure is rendered using the VARNA Java applet.
References
EvoFold
Parker BJ, Moltke I, Roth A, Washietl S, Wen J, Kellis M, Breaker R,
and Pedersen JS. New families of human regulatory RNA structures
identified by comparative analysis of vertebrate genomes. Genome
Res. in press.
Pedersen JS, Bejerano G, Siepel A, Rosenbloom K,
Lindblad-Toh K, Lander ES, Kent J, Miller W,
Haussler D. Identification and classification of conserved RNA
secondary structures in the human genome. PLoS Comput
Biol. 2006 Apr;2(4):e33.
Phylo-SCFGs
Knudsen B, Hein J.
RNA secondary structure prediction using stochastic context-free
grammars and evolutionary history.
Bioinformatics. 1999 Jun;15(6):446-54.
Pedersen JS, Meyer IM, Forsberg R, Simmonds P, Hein J.
A comparative method for finding and folding RNA
secondary structures within protein-coding regions.
Nucleic Acids Res. 2004 Sep 24;32(16):4925-36.
Alignments and conserved elements
Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, et al. A
high-resolution map of evolutionary constraint in the human genome
based on 29 eutherian mammals. In review.
Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom
K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM,
Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D.
Evolutionarily conserved elements in vertebrate, insect, worm,
and yeast genomes.
Genome Res. 2005 Aug;15(8):1034-50.
|