Note: lifted from hg17
Description
This track describes the location of transcription start sites (TSS) throughout
the human genome along with a confidence measure for each TSS based on
experimental evidence. The TSSs of a gene are important landmarks that help
define the promoter regions of a gene. These TSSs were determined by
SwitchGear Genomics
by integrating experimental data using an empirically derived scoring
function. Each TSS has a unique identifier that associates it with a gene model
(see details below), and each TSS is color-coded to reflect its confidence
score.
These TSSs are also available in a searchable format at
SwitchDB,
an open-access online database of human TSSs. Expermental tools are available
through
SwitchGear
to study the function of the promoter regions associated with
these TSSs.
Methods
The predicted TSSs are associated with a genome-wide set of gene models.
SwitchGear gene models are defined as clusters of cDNA alignments that have
overlapping exons on the same strand. These gene models were created from over
250,000 human cDNA alignments to construct a genome-wide set of ~37,000 gene
models. Each gene model is identified by its chromosome number, strand, and
unique identifier. For example, ID CHR7_P0362
indicates a cDNA cluster (0362) aligning to the plus strand (P) of
chromosome 7 (CHR7). Existing gene annotation is mapped to the gene models
through the NCBI annotation associated with Refseq accession numbers.
The SwitchGear TSS prediction algorithm identifies the most likely sites of
transcription initiation for each gene model. The algorithm employs a scoring
metric to assign a confidence level to each TSS prediction based on existing
experimental evidence. In addition to the ~250,000 human cDNAs listed in
Genbank, more than 5 million additional 5' human cDNA sequence tags have been
generated using a combination of approaches. While these short sequence reads do
not reveal gene structure, they provide a significant amount of experimental
evidence for identifying transcript start sites. For each gene model, the
algorithm counts the number of TSSs (defined as the 5' end of a cDNA) within
200 bp of one another. The TSS score is based on the total number of TSSs
identified within this window, with each TSS weighted according to several
discriminating features: cDNA library source, relative location within the gene
model, and exon structure of the transcript. Furthermore, the TSSs for each
gene model are ranked to identify the TSS representing the most likely
transcription initiation site for a gene model. Rankings are indicated in the
TSS unique identifier by the addition of a suffix (i.e. CHR7_P0362_R1 or
CHR7_P0362_R2).
Using the Filter
This track has a filter that can be used to change the TSS elements displayed
by the browser. This filter is based on the score of the TSS element. The
filter is located at the top of the track description page, which is accessed
via the small button to the left of the track's graphical display or through
the link on the track's control menu. By default the track displays only those
TSSs with a score of 10 or above.
By default, the TSSs for predicted pseudogenes are not displayed. If you would
like to display them, check the box next to the Include TSSs for predicted
pseudogenes label.
When you have finished configuring the filter, click the Submit button.
Credits
This track was created by Nathan Trinklein and Shelley Force Aldred of
SwitchGear Genomics.
|