Description
The Old Known Genes track shows genes from the 2006 Known Genes build.
RGD Genes has now replaced Known Genes as the main gene track for rat.
The Old Known Genes track shows known protein-coding genes based on
protein data from UniProt (SWISS-PROT and TrEMBL) and mRNA data from
the NCBI reference sequences collection (RefSeq) and GenBank.
Each Known Gene is represented by an mRNA and a protein.
Display Conventions and Configuration
This track follows the display conventions for
gene prediction
tracks.
This track contains an optional codon coloring
feature that allows users to quickly validate and compare gene predictions.
To display codon colors, select the genomic codons option from the
Color track by codons pull-down menu. Click
here for more
information about this feature.
Methods
This release of UCSC Known Genes was built by a new process, KG II,
as described below.
UniProt protein sequences (including alternative splicing isoforms)
and mRNA sequences from RefSeq and GenBank
were aligned against the base genome using BLAT.
RefSeq alignments having a base identity level within 0.1% of the best
and at least 96% base identity with the genomic sequence were kept.
GenBank mRNA alignments having a base identity level within 0.2% of
the best and at least 97% base identity with the genomic sequence were kept.
Protein alignments having a base identity level within 0.2% of the best and
at least 80% base identity with the genomic sequence were kept.
Then the genomic mRNA and protein alignments were compared,
and protein-mRNA pairings were determined from their overlaps.
mRNA CDS data were obtained from RefSeq and GenBank data
and supplemented by CDS structures derived from UCSC protein-mRNA BLAT alignments.
The initial set of UCSC Known Genes candidates consists of
all protein-mRNA pairs with valid mRNA CDS structures.
A gene-check program (similar to the one used for the Consensus CDS (CCDS) project)
is used to remove questionable candidates, such as those with in-frame stop codons,
missing start or stop codons, etc.
From each group of gene candidates that share the same CDS structure,
the protein-mRNA pair having the best ranking and protein-mRNA alignment score
is selected as a UCSC Known Gene.
The ranking of a gene candidate depends on its gene-check quality measures.
When all else is equal,
a preference is given to RefSeq mRNAs and next to MGC mRNAs.
Similarly a preference is given to gene candidates represented by Swiss-Prot proteins.
The protein-mRNA alignment score is calculated based on protein to
mRNA alignment using TBLASTN, plus weighted sub-scores according
to the date and length of the mRNA.
Credits
The UCSC Known Genes track was produced using protein data from
UniProt and mRNA
data from NCBI
RefSeq
and GenBank.
Data Use Restrictions
The UniProt data have the following terms of use, UniProt copyright(c) 2002 -
2004 UniProt consortium:
For non-commercial use, all databases and documents in the UniProt FTP
directory may be copied and redistributed freely, without advance
permission, provided that this copyright statement is reproduced with
each copy.
For commercial use, all databases and documents in the UniProt FTP
directory except the files
- ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot_sprot.dat.gz
- ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot_sprot.xml.gz
may be copied and redistributed freely, without advance permission,
provided that this copyright statement is reproduced with each copy.
More information for commercial users can be found
here.
From January 1, 2005, all databases and documents in the UniProt FTP
directory may be copied and redistributed freely by all entities,
without advance permission, provided that this copyright statement is
reproduced with each copy.
References
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL.
GenBank: update.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6.
PMID: 14681350; PMC: PMC308779
Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D.
The UCSC Known Genes.
Bioinformatics. 2006 May 1;22(9):1036-46.
PMID: 16500937
Kent WJ.
BLAT - the BLAST-like alignment tool.
Genome Res. 2002 Apr;12(4):656-64.
PMID: 11932250; PMC: PMC187518
|