Description
This track displays regions containing putative functional RNA secondary
structures as predicted by RNAz on the basis of thermodynamic stability
and evolutionary conservation.
Methods
RNAz evaluates multiple sequence alignments for unusually stable and
conserved RNA secondary structures, two typical characteristics for
functional RNA structures that can be found in noncoding RNAs or cis-acting
regulatory elements of mRNAs.
The RNAz algorithm works as follows: First a consensus secondary structure
is predicted using the RNAalifold approach (Hofacker et al., 2002),
which is
an extension of classical minimum free energy folding algorithms for aligned
sequences. The significance of a predicted consensus structure is evaluated by
calculating a structure conservation index, which is the ratio of
unconstrained folding energies relative to the folding energies under the
constraint that all aligned sequences are forced to fold into a common
structure. Thermodynamical stability is evaluated by calculating a
normalized z-score of the sequences in the alignment. The z-score
indicates whether the given sequences are more stable than random sequences
of the same length and base composition. Based on these two features,
structure conservation index and z-score, an alignment is classified as
structural RNA or "other" using a support vector machine
classification algorithm (Washietl et al., 2005; Washietl et al.
, 2007).
This track shows the result of a RNAz screen of 28-way TBA/MULTIZ
alignments. Alignments were sliced in overlapping windows of 120 nt in size
and with a step size of 40 nt. Sequences with more than 25% gaps with respect
to the human sequence were discarded. Only alignments with more than four
sequences, a minimum size of 50 columns and at most 1% repeat masked letters
were considered. RNAz can only handle alignments with up to six sequences. From
alignments with more than six sequences we chose a subset of six. For subset
selection, we used a greedy algorithm and iteratively selected sequences
optimizing the set for a mean pairwise identity of around 80%. In cases of
alignments with more than 10 sequences we sampled three different of such
subsets. The windows were finally scored with RNAz version 0.1.1 in the
forward and reverse complement direction. Overlapping hits with at least one
sampled alignment with RNAz score > 0.5 were combined to a single genomic
region. The track shows regions with at least one window in the cluster with
an average RNAz score of all samples > 0.5 and at least one hit with RNAz
score > 0.9. More details may be found in Washietl et al., 2007.
Credits
The RNAz program and browser track were developed by Stefan Washietl,
Ivo Hofacker (Institute for Theoretical Chemistry, Univ. of Vienna) and
Peter F. Stadler (Bioinformatics group, Department of Computer Science,
Univ. of Leipzig).
References
Hofacker IL, Fekete M, Stadler PF.
Secondary structure prediction for aligned RNA sequences.
J. Mol. Biol. 2002 Jun 21;319(5):1059-66.
Washietl S, Hofacker IL, Stadler PF.
Fast and reliable prediction of noncoding RNAs.
Proc. Natl. Acad. Sci. USA. 2005 Feb 15;102(7):2454-59.
Washietl S, Pedersen JS, Korbel JO, Fried C, Gruber AR, Hackermuller J,
Hertel J, Lindemeyer M, Missal K, Tanzer A, et al.
Structured RNAs in the ENCODE Selected Regions of the Human
Genome.
Genome Res. 2007 Jun;17(6):852-64.
|