Description
This track shows gene predictions submitted for the ENCODE Gene Annotation
Assessment Project
(EGASP) Gene Prediction Workshop 2005 that cover only
a partial set of the 44 ENCODE regions. The partial set excludes
the 13 ENCODE regions for which high-quality annotations were released in late
2004.
The following gene predictions are included:
The EGASP Full companion track shows original gene prediction submissions for
the full set of 44 ENCODE regions using Gene Prediction algorithms other than
those used here; the EGASP Update track shows updated versions
of some of the submitted predictions.
Display Conventions and Configuration
Data for each gene prediction method within this composite annotation track
is displayed in a separate subtrack. See the top of the track description page
for a complete list of the subtracks available for this annotation. To display
only selected subtracks, uncheck the boxes next to the tracks you wish to
hide.
The individual subtracks within this annotation follow the display conventions
for gene prediction
tracks. The track description page offers the option
to color and label codons in a zoomed-in display of the subtracks to facilitate
validation and comparison of gene predictions. To enable this feature, select
the genomic codons option from the "Color track by codons"
menu. Click the
Help on codon coloring
link for more information about this feature.
Color differences among the subtracks are arbitrary. They provide a
visual cue for distinguishing the different gene prediction methods.
Methods
ACEScan
ACEScan (Alternative Conserved Exons Scan)
indicates alternative splicing that is evolutionarily conserved in human and
mouse/rat. The Conserved Alternative Exon Predictions subtrack shows
predicted alternative conserved exons. The Unconserved Alternative and
Constitutive Exon Predictions subtrack shows exons that
are predicted to be constitutive or may have species-specific alternative
splicing.
Augustus
Augustus uses a generalized hidden Markov model (GHMM) that
models coding and non-coding sequence, splice sites, the branch point region,
translation start and end, and lengths of exons and introns. The track
contains four different sets of predictions. Ab initio
single genome predictions are based solely on the input sequence. EST and
protein evidence predictions were generated using AGRIPPA hints based on
alignments of human sequence from the dbEST and nr databases. Mouse homology
gene predictions were produced using mouse genomic sequence only; BLAST, CHAOS,
DIALIGN were used to generate the hints for Augustus. The combined
EST/protein evidence and mouse homology gene predictions were created using
human sequence from the dbEST and nr databases and mouse genomic sequence to
generate hints for Augustus.
Additional predictions and methods for this subtrack are available in the
EGASP Updates track.
GeneZilla
GeneZilla is a program for the computational prediction of protein-coding genes
in eukaryotic DNA, based on the generalized hidden Markov model (GHMM)
framework. These predictions were generated using GeneZilla and
IsoScan, which uses a four-state hidden Markov model to
predict isochores (regions of homogeneous G+C content) in genomic DNA.
SAGA
SAGA is an ab initio multiple-species gene-finding program based on the
Gibbs sampling-based method described in Chatterji et al. (2004). In
addition to sampling parameters, SAGA also uses a phyloHMM based model to
boost the scores, similar to the method described in Siepel et al.
(2004).
Credits
The gene prediction data sets were submitted by the following individuals and
institutions:
-
ACEScan: Gene Yeo, Crick-Jacobs Center for Computational Biology,
Salk Institute.
-
Augustus: Mario Stanke, Department of Bioinformatics, University of Göttingen,
Germany.
-
GeneZilla: William Majoros, Dept. of Bioinformatics, The Institute for
Genomic Research (TIGR).
-
SAGA: Sourav Chatterji, Lior Pachter lab,
Department of Mathematics, U.C. Berkeley.
References
Chatterji, S. and Pachter, L.
Multiple organism gene finding by collapsed Gibbs sampling.
Proc. 8th Int'l Conf. on Research in Computational Molecular Biology,
187-193 (2004).
Siepel, A. and Haussler, D.
Computational identification of evolutionarily conserved
exons.
Proc. 8th Int'l Conf. on Research in Computational Molecular Biology,
177-186 (2004).
|