Description
Bidirectional promoters are the regulatory regions that fall between
pairs of genes, where the 5' ends of the genes within a pair are positioned
in close proximity to one another. This spacing facilitates the initiation
of transcription of both genes, creating two transcription forks that advance
in opposite directions. The formal definition of a bidirectional promoter
requires that the transcription initiation sites are separated by no more than
1,000 bp from one another. Using these criteria we have comprehensively
annotated the human and mouse genomes for the presence of bidirectional
promoters, using in silico approaches. The identification of these promoters
is contingent upon the presence of adjacent, oppositely oriented pairs of
genes, because few distinguishing features are available to uniquely identify
bidirectional promoters de novo. Genomic annotations used for our
identification phase include:
The annotations for protein coding genes (A) are strongly supported
and therefore provide a high quality dataset for mapping bidirectional
promoters. In contrast, bidirectional promoters supported by spliced ESTs
(C) alone have varying levels of evidence, ranging from one
characterized transcript to hundreds of them. For this reason, the mRNA
annotation (B) from GenBank provides a stringent level of validation
for the start sites of the EST transcripts. As a large class of regulatory
sequences, bidirectional promoters exemplify a rich source of unexplored
biological information in the human genome. When compared to the mouse genome,
these promoters are identifiable as truly orthologous locations, being
maintained in regions of conserved synteny (including both genes and the
intervening promoter region) that have undergone no rearrangements since the
last common ancestor of mammals, and in some cases fish. We use this approach
to annotate orthologous bidirectional promoters in nonhuman species until
genomic annotations become available.
Methods
Assigning Orthologous Regions
A multi-stage approach to mapping orthology at bidirectional promoters was
developed. Orthology assignments are strongest in coding regions. Therefore we
began by mapping single human genes regulated by bidirectional promoters from
the Known Genes annotations onto the mouse genome. Orthology assignments were
determined using the "chains and nets" data from the UCSC Human Genome Browser
mysql tables. Chains in the Genome Browser represent sequences of gapless
aligned blocks. Nets provide a hierarchical ordering of those chains. Level 1
chains contain the longest, best-scoring sequence chains that span any
selected region. Subsequent levels in the net represent the results of
rearrangements, duplications, insertions and deletions that may have disrupted
the presence of conserved synteny derived from an ancestral sequence.
Confirming Orthologous Genes
After determining the orthology assignments using the UCSC chains and nets
data, we used the Known Gene annotations or spliced ESTs to search the identity
of genes within the corresponding region. Known Genes represent protein-coding
genes and therefore can be verified by chains and nets alignments, followed by
confirmation of protein identity in both species. Spliced ESTs carry less
descriptive information than protein coding genes and therefore were validated
in the second species by their presence in an orthologous region, showing
conserved synteny of the two genes within a pair, and meeting the criteria of
less than 1,000 bp of intergenic distance between those transcripts. Our method
for mapping bidirectional promoters in spliced EST datasets is described in
more detail in a previous publication. If the program verified evidence for
orthology and conserved-syntenic gene arrangement, then the orthologous
bidirectional promoter was confirmed. After orthologous assignments were
confirmed for pairs of human genes, the reciprocal assignments were analyzed
from mouse to human.
Currently orthologous bidirectional promoter regions (that have been identified
using UCSC known genes) have been mapped in human, chimp, macaque, mouse, rat,
dog and cow genomes).
Credits
These data were produced by Mary Q. Yang in the
Elnitski lab at NHGRI, NIH. (contact:
elnitski@mail.nih.gov)
References
Piontkivska H, Yang MQ, Larkin DM, Lewin HA, Reecy J, Elnitski L.
Cross-species mapping of bidirectional promoters enables prediction of unannotated 5' UTRs and
identification of species-specific transcripts.
BMC Genomics. 2009 Apr 24;10:189.
PMID: 19393065; PMC: PMC2688522
Yang MQ, Elnitski LL.
A computational study of bidirectional promoters in the human genome
.
Springer Lecture Series: Notes in Bioinformatics 2007.
Yang MQ, Elnitski L.
Orthology of Bidirectional Promoters Enables Use of a Multiple Class Predictor for Discriminating
Functional Elements in the Human Genome.
Proceedings of the 2007 International Conference on Bioinformatics & Computational Biology.
pp. 218-228. 2007. ISBN: 1-60132-042-6.
Yang MQ, Koehly LM, Elnitski LL.
Comprehensive annotation of bidirectional promoters identifies co-regulation among breast and
ovarian cancer genes.
PLoS Comput Biol. 2007 Apr 20;3(4):e72.
PMID: 17447839; PMC: PMC1853124
Yang MQ, Taylor J, Elnitski L.
Comparative analyses of bidirectional promoters in vertebrates.
BMC Bioinformatics. 2008 May 28;9 Suppl 6:S9.
PMID: 18541062; PMC: PMC2423431
Data Release Policy
Data users may freely use ENCODE data, but may not, without prior
consent, submit publications that use an unpublished ENCODE dataset until
nine months following the release of the dataset. This date is listed in
the tablemetadata as dateUnrestricted and on the
download page. The full data release policy for ENCODE is available
here.
|
|