Release Notes
This release of the Gencode Genes track (Version 3c, October 2009)
shows high-quality
manual annotations in the ENCODE regions generated by the
GENCODE project.
Version 3 of the Gencode gene set presents a full merge between HAVANA and
ENSEMBL, giving priority to the manually curated Havana objects and using
ENSEMBL objects where they are different or fall into un-annotated regions.
The annotation was carried out on genome assembly GRCh37 (hg19), features are
projected back to NCBI36 (hg18) where possible.
Gencode 3c is a small update of version 3b (July 09 freeze) mainly for
chromosomes 3 & 4 for which the latest annotation was held back and QC'ed
again to be used in the RNASeq Genome Annotation Assessment Project. Statistics
about this release can be found
here.
Display Conventions and Configuration
The annotations are divided into separate tracks based on
source/confidence. The Gencode project recommends that the
annotations from level 1 & 2 be used as the reference gene annotation,
level 3 was added to fill gaps for methods that analyze the entire
genome and require a full set.
- At this time only pseudogene loci, that were predicted by the
analysis-pipelines from YALE, UCSC as well as by HAVANA manual annotation from
WTSI.
- Level 2: manual annotation
- HAVANA manual annotation from WTSI.
The following regions are considered "fully annotated" and contain
level 2 annotation from HAVANA only, although they will still be updated:
chromosomes 1, 2, 6, 9, 10, 13, 20, 21, 22, X, Y, ENCODE pilot
regions, chr11:2353995-3878750.
- Level 3: automated annotation
- ENSEMBL loci in regions where no HAVANA annotation can be found.
NOTE: The release cycles for Gencode, Havana and Ensembl differ. Users
are cautioned to compare release dates to determine which annotation is most
current.
The gene annotations are colored based on the HAVANA annotation type and
the confidence level. See the table below for the color key, as well as more
detail about the transcript and feature types.
Class |
Color |
Description |
Transcript Types (see Vega Transcript Types) |
Validated_coding | Dark Orange | Level 1 Validated: coding regions | protein_coding |
Validated_processed | Light Orange | Level 1 Validated: processed | processed_transcript |
Validated_processed_pseudogene | Dark Pink | Level 1 Validated: processed pseudogenes | processed_pseudogene, processed_transcript, transcribed_processed_pseudogene |
Validated_unprocessed_pseudogene | Medium Pink | Level 1 Validated: unprocessed pseudogenes | transcribed_unprocessed_pseudogene, unprocessed_pseudogene |
Validated_pseudogene | Light Pink | Level 1 Validated: pseudogenes | IG_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene |
Havana_coding | Dark Orange | Level 2 Manual annotation: coding | IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,protein_coding |
Havana_nonsense | Medium Orange | Level 2 Manual annotation: nonsense | nonsense_mediated_decay |
Havana_non_coding | Light Orange | Level 2 Manual annotation: non-coding | ambiguous_orf, antisense, non_coding, processed_transcript, retained_intron |
Havana_polyA | Black | Level 2 Manual annotation: polyA | polyA_signal, polyA_site, pseudo_polyA |
Havana_processed_pseudogene | Dark Pink | Level 2 Manual annotation: processed pseudogene | processed_pseudogene, transcribed_processed_pseudogene |
Havana_unprocessed_pseudogene | Medium Pink | Level 2 Manual annotation: unprocessed pseudogene | transcribed_unprocessed_pseudogene, unprocessed_pseudogene |
Havana_pseudogene | Light Pink | Level 2 Manual annotation: pseudogene | IG_pseudogene, TR_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene |
Havana_TEC | Grey | Level 2 Manual annotation: TEC | TEC, artifact |
Ensembl_coding | Dark Red | Level 3 Automated annotation: coding | IG_C_gene, IG_D_gene, IG_J_gene, IG_V_gene, protein_coding |
Ensembl_non_coding | Light Orange | Level 3 Automated annotation: non-coding | antisense, non_coding, processed_transcript, retained_intron |
Ensembl_pseudogene | Dark Pink | Level 3 Automated annotation: pseudogene | IG_pseudogene, miRNA_pseudogene, misc_RNA_pseudogene, pseudogene, retrotransposed, unitary_pseudogene |
Ensembl_processed_pseudogene | Medium Pink | Level 3 Automated annotation: processed pseudogene | processed_pseudogene |
Ensembl_unprocessed_pseudogene | Light Pink | Level 3 Automated annotation: unprocessed pseudogene | unprocessed_pseudogene |
Ensembl_RNA | Light Red | Level 3 Automated annotation: RNA transcripts | Mt_rRNA, Mt_tRNA, Mt_tRNA_pseudogene, miRNA, misc_RNA, rRNA, rRNA_pseudogene, scRNA_pseudogene, snRNA, snRNA_pseudogene, snoRNA, snoRNA_pseudogene, tRNA_pseudogene, tRNAscan |
2way_consensus_pseudogene | Dark Purple | Level 3 Automated annotation: pseudogenes | pseudogenes |
This track uses filtering by category to select subsets of transcripts and has
additional advanced features. Help with these features can be found
here.
Methods
We aim to annotate all evidence-based gene features at high accuracy on
the human reference sequence. This includes identifying all
protein-coding loci with associated alternative variants, non-coding
loci which have transcript evidence, and pseudogenes. We integrate
computational approaches (including comparative methods), manual
annotation and targeted experimental verification.
For a detailed description of the methods and references used, see
Harrow et al (2006).
Verification
See Harrow et al. (2006) for information on verification
techniques.
Credits
This GENCODE release is the result of a collaborative effort among
the following laboratories: (contact:
Felix Kokocinski)
Lab/Institution
|
Contributors
|
HAVANA annotation group,
Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK |
Adam Frankish, James Gilbert, Jennifer Harrow,
Felix Kokocinski, Stephen Trevanion, Tim Hubbard (GENCODE Principal Investigator)
|
Genome Bioinformatics Lab (CRG),
Barcelona, Spain |
Thomas Derrien, Tyler Alioto, Roderic Guigó |
Genome Bioinformatics, University of California Santa Cruz (UCSC), USA |
Rachel Harte, Mark Diekhans, Robert Baertsch, David Haussler |
Comp. Genomics Lab, Washington University St. Louis (WUSTL), USA |
Jeltje van Baren, Charlie Comstock, David Lu, Michael Brent |
Computer Science and Artificial Intelligence Lab,
Broad Institute of MIT and Harvard, USA |
Mike Lin, Manolis Kellis |
Bioinformatics, Yale University (Yale), USA |
Philip Cayting, Mark Gerstein |
Center for Integrative Genomics,
University of Lausanne, Switzerland |
Cedric Howald, Alexandre Reymond |
ENSEMBL genebuild group,
Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK |
Bronwen Aken, Julio Fernandez Banet, Stephen Searle
|
Structural Computational Biology Group, Centro Natcional de Investigaciones Oncologicas (CNIO), Madrid, Spain |
Manuel Rodríguez José, Jan-Jaap Wesselink, Michael Tress, Alfonso Valencia |
References
Coffey AJ, Kokocinski F, Calafato MS, Scott CE, Palta P, Drury E, Joyce CJ, Leproust EM, Harrow J, Hunt S, et al.
The GENCODE exome: sequencing the complete human exome.
European Journal of Human Genetics. March 2011;19 827-831. [Epub ahead of print]
Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al.
GENCODE: producing a reference annotation
for ENCODE. Genome Biol. 2006;7 Suppl
1:S4.1-9.
Data Release Policy
GENCODE data are available for use without restrictions.
The full data release policy for ENCODE is available
here.
|
|