Description
The transcriptome track shows gene predictions based on data from
RefSeq and EMBL/GenBank. This is a moderately conservative set of
predictions, requiring the support of either one GenBank full length
RNA sequence, one RefSeq RNA, or one spliced EST. The track includes
both protein-coding and non-coding transcripts. The CDS are predicted
using ESTScan.
Display Conventions and Configuration
This track in general follows the display conventions for
gene
prediction tracks. The exons for putative noncoding genes and
untranslated regions are represented by relatively thin blocks, while
those for coding open reading frames are thicker.
This track contains an optional codon coloring feature that allows
users to quickly validate and compare gene predictions. To display
codon colors, select the genomic codons option from the
Color track by codons pull-down menu. Click
here
for more information about this feature.
Further information on the predicted transcripts can be found on
the Transcriptome Web
interface.
Methods
The transcriptome is built using a multi-step pipeline:
RefSeq and GenBank RNAs and ESTs
are aligned to the genome with SIBsim4,
keeping only the best alignments for each RNA.
Alignments are broken up at
non-intronic gaps, with small isolated fragments thrown out.
A splicing graph is created for
each set of overlapping alignments. This graph has an edge for each
exon or intron, and a vertex for each splice site, start, and end.
Each RNA that contributes to an edge is kept as evidence for that
edge.
The graph is traversed to generate
all unique transcripts. The traversal is guided by the initial RNAs
to avoid a combinatorial explosion in alternative splicing.
Protein predictions are generated.
Credits
The transcriptome track was produced on the Vital-IT high-performance computing platform using a computational pipeline
developed by Christian Iseli with help from colleagues at the
Ludwig institute for Cancer
Research and the Swiss
Institute of Bioinformatics. It is based on data from NCBI
RefSeq
and GenBank/
EMBL.
Our thanks to the people running these databases and to the
scientists worldwide who have made contributions to them.
References
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.
GenBank: update.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.
PMID: 14681350; PMC: PMC308779
|