N-SCAN Track Settings

Home
Genomes
Genome Browser
Tools
Mirrors
- Third Party Mirrors
- Mirroring Instructions
Downloads
My Data
Projects
Help
About Us
- News
- Publications
- Blog
- Cite Us
- Credits
- Release Log
- Staff
- Contact Us
- Conditions of Use
- Jobs
- Licenses

List subtracks: only selected/visible all

hide

N-SCAN PASA-EST

N-SCAN PASA-EST Gene Predictions

schema

hide

N-SCAN

N-SCAN Gene Predictions

schema

Methods

N-SCAN

N-SCAN combines biological-signal modeling in the target genome sequence along with information from a multiple-genome alignment to generate de novo gene predictions. It extends the TWINSCAN target-informant genome pair to allow for an arbitrary number of informant sequences as well as richer models of sequence evolution. N-SCAN models the phylogenetic relationships between the aligned genome sequences, context-dependent substitution rates, insertions, and deletions.

Human N-SCAN uses mouse (mm7) as the informant and iterative pseudogene masking.

N-SCAN PASA-EST

N-SCAN PASA-EST combines EST alignments into N-SCAN. Similar to the conservation sequence models in TWINSCAN, separate probability models are developed for EST alignments to genomic sequence in exons, introns, splice sites and UTRs, reflecting the EST alignment patterns in these regions. N-SCAN PASA-EST is more accurate than N-SCAN while retaining the ability to discover novel genes to which no ESTs align.

In N-SCAN PASA-EST, cDNA sequences were clustered using the PASA program beforehand. PASA, the Program to Assemble Spliced Alignments, was created by Brian Haas at TIGR. The algorithm assembles clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations.

The PASA clusters were used as 'EST' sequences in N-SCAN PASA-EST. The resulting gene models were updated with the input PASA clusters using the assembly tool of the PASA pipeline. These updates consist of automatically generated alternative splices, UTR features and sometimes merging of two gene models. In addition, PASA assigned open reading frames to clusters that did not overlap a gene prediction, but that did contain a full length cDNA, and output them as 'novel genes'. Note that PASA does not use any cDNA annotation from input but assigns the ORF itself.

No manual annotation was performed to generate any of the gene models. The high accuracy of the set is in part due to the large number of available ESTs and full length cDNAs.

References

Gross SS, Brent MR. Using multiple alignments to improve gene prediction. In Proc. 9th Int'l Conf. on Research in Computational Molecular Biology (RECOMB '05):374-388 and J Comput Biol. 2006 Mar;13(2):379-93.

Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 2003 Oct 1;31(19):5654-66.

Description

Methods

N-SCAN

N-SCAN PASA-EST

Credits

References