Description
This track shows the locations of known and predicted non-protein-coding RNA
genes and pseudogenes that fall within the ENCODE regions. It contains all
information in Sean Eddy's RNA Genes track for these regions, combined with
computational predictions generated by Jakob Skou Pedersen's EvoFold algorithm.
In addition to the fields contained in the RNA Genes track, this track also
includes ENCODE-related fields describing overlap with transcribed regions and
repeats.
Feature types in this annotation include:
- tRNA: transfer RNA (or pseudogene)
- rRNA: ribosomal RNA (or pseudogene)
- scRNA: small cytoplasmic RNA (or pseudogene)
- snRNA: small nuclear RNA (or pseudogene)
- snoRNA: small nucleolar RNA (or pseudogene)
- miRNA: microRNA (or pseudogene)
- misc_RNA: miscellaneous other RNA, such as Xist (or pseudogene)
- "-": unknown RNA
Display Conventions and Configuration
The locations of the RNA genes and pseudogenes are represented by blocks in the
graphical display, color-coded as follows:
- Black: region is Repeatmasked.
- Green: region is transcribed.
- Red: region is from the RNA Genes track
and is not transcribed.
- Blue: region is an EvoFold prediction
and is not transcribed.
The display may be filtered to show only those items
with unnormalized scores that meet or exceed a certain threshhold. To set a
threshhold, type the minimum score into the text box at the top of the
description page.
Methods
The RNA Genes track was supplemented with EvoFold predictions and filtered to
include only those items that lie within the ENCODE regions.
Regions that are at least 10 percent Repeatmasked are flagged because no
transcriptional data is available for them. A region is considered transcribed
if at least 10 percent overlaps with any Affymetrix transcribed fragment
(transfrag), derived from six microarray experiments, or Yale
transcriptionally-active region (TAR), derived from 15 microarray experiments.
In these cases, each array from which the overlapped transfrags and TARs were
derived is listed.
EvoFold is a comparative method that exploits the evolutionary signal
of genomic multiple-sequence alignments for identifying conserved
functional RNA structures. The method makes use of phylogenetic
stochastic context-free grammars (phylo-SCFGs), which are combined
probabilistic models of RNA secondary structure and primary sequence
evolution. The predictions consist both of a specific RNA secondary
structure and an overall score. The overall score is essentially a
log-odd score phylo-SCFG modeling the constrained evolution of
stem-pairing regions and one which only models unpaired regions.
Two sets of EvoFold predictions are included in this track. The first,
labeled EvoFold, contains predictions based on the conserved elements of an
8-way vertebrate alignment of the human, chimpanzee, mouse, rat, dog, chicken,
zebrafish, and Fugu assemblies. The second set of predictions, TBA23_EvoFold,
was based on the conserved elements of the 23-way TBA alignments present in the
ENCODE regions. When a pair of these predictions overlap, only the EvoFold
prediction is shown.
Credits
These data were kindly provided by Sean Eddy at Washington University,
Jakob Skou Pedersen at UC Santa Cruz, and The Encode Consortium.
This annotation track was generated by Matt Weirauch.
References
Knudsen, B. and J.J. Hein.
RNA secondary structure prediction using stochastic context-free
grammars and evolutionary history.
Bioinformatics 15(6), 446-54 (1999).
Pedersen, J.S., Bejerano, G. and Haussler, D. Identification and
classification of conserved RNA secondary structures in the human
genome. (In preparation).
|