Description
This track displays expression levels of computationally identified
first exons and a constitutive exon of genes in ENCODE regions,
based on the real competitive Polymerase Chain
Reaction (rcPCR) technique described in Ding
et al. (2003).
Expression levels
are indicated by color, ranging from black (no expression) to red (high
expression).
Experiments were performed on total RNA samples of ten
normal human tissues purchased from Clontech (Palo Alto, CA):
cerebral cortex, colon, heart, kidney, liver, lung,
skeletal muscle, spleen, stomach, and testis.
The name for each alternative transcript starts with the gene name,
followed by an identifier for the alternative first exon or the
constitutive exon. For example, for gene CAV1, there are three
alternative first exons (CAV1-E1A, CAV1-E1B, and CAV1-E1C) and the
third exon is chosen as the constitutively expressed exon (CAV1-E3).
Methods
Alternative transcription start sites (TSS) for 20 ENCODE genes were predicted
using PromoSer, an in-house computational tool.
PromoSer computationally identifies the TSS by considering alignments
of a large number of partial and full-length mRNA sequences and ESTs to
genomic DNA, with provision for alternative promoters. In PromoSer, the
treatment of alternative first exons (or the resulting TSSs) is as
follows:
- all transcripts (mRNA, full-length mRNA and EST) from
the same gene cluster are examined
- individual ESTs are not considered for alternative TSSs; only the 5'-most
positions from all ESTs in the cluster are considered a potential TSS
- if multiple 5'-end positions are more than 20 bp apart, they are reported
as alternative TSSs
For each gene, all alternative first exons were identified based on manual
selection of PromoSer predictions. An exon that is
shared by all transcripts (called the constitutive exon) was also selected.
The selection process involved visually
examining the structure of the cluster, preferably using the latest
data available on UCSC, to identify distinct first exons that were well
formed (having multiple supporting sequences) and had no evidence
(especially from newer sequences) of additional sequence that made
them internal exons. After the first exon was identified, a subsequence
(between 100-300 bases) was selected for use in the experiment. The
selection process avoided repeat sequences as much as possible and if
the two first exons partially overlapped, the non-overlapping region was
selected. If those conditions caused the remaining sequence to be too
short (or the first exon itself was too short), a junction with the
second exon was used. A constitutive exon was also selected that was
included in all (or most) of the alternative transcripts and
suitable sequences were then extracted as above (no exon junctions are used).
The absolute expression levels of all exons were individually quantified
by rcPCR by designing four assays with PCR amplicons corresponding to
each exon.
Amplicons were designed according to transcript sequences
and can span a large distance on the genomic sequence. In addition,
some amplicons were designed across the junctions between first exons
and the constitutive second exons, and thus these amplicons may overlap
with the amplicons that correspond to the constitutive second exons.
The rcPCR technique combined competitive PCR and matrix-assisted laser
desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS)
for gene expression analysis. To measure the expression level of a
gene, an oligonucleotide standard (60-80 bases) of known concentration,
complementary to the target sequence with a single base
mismatch in the middle, was added as the competitor for PCR. The gene of
interest and the oligonucleotide standard resembled two alleles of a
heterozygous locus in an allele frequency analysis experiment, and thus
could be quantified by the high-throughput MALDI-TOF MS
based MassARRAY system (Sequenom Inc.).
After PCR, a base extension
reaction was carried out with an extension primer, a ThermoSequenase and
a mixture of ddNTPs/dNTP (for example, a mixture of
ddA, ddC, ddT, and dG). The extension primer annealed the immediate
5’-upstream sequence of the mismatch position. Depending on the nature
of the mismatch and the mixture composition of ddNTPs/dNTP, one or two
bases were added to the extension primer, producing two extension
products with one base-length difference. These two extension products
were then detected and quantified by MALDI-TOF MS.
Expression ratios (e.g. CAV1-E1A/CAV1-E3, CAV1-E1B/CAV1-E3,
CAV1-E1C/CAV1-E3) indicate the relative abundance of
alternative first exons.
18S rRNA was used for exon absolute expression
normalization among different tissues.
Values shown on this track represent the relative abundance of the
alternative first exons with respect to the 18S rRNA. The raw values have
been log10 transformed and scaled to show graded colors on the browser.
Verification
One biological replicate was performed for each gene. Two to four
competitor concentrations were used to detect the expression level
of each exon. Two to six technical replicates were performed for
each competitor concentration. One more biological replicate will be
performed in the future.
Credits
Data generation and analysis for this track were performed by
ZLAB
at Boston University. The following people contributed: Shengnan Jin,
Anason Halees, Heather Burden, Yutao Fu, Ulas Karaoz, Yong Yu, Chunming
Ding, Charles R. Cantor, and Zhiping Weng.
References
Ding, C. and Cantor, C.R.
A
high-throughput gene expression analysis technique using competitive PCR and
matrix-assisted laser desorption ionization time-of-flight MS.
Proc Natl Acad Sci U S A 100(6), 3059-64 (2003).
Ding, C. and Cantor, C.R.
Direct molecular haplotyping of long-range genomic DNA with
M1-PCR.
Proc Natl Acad Sci U S A 100(13), 7449-53 (2003).
Halees, A.S., Leyfer, D. and Weng, Z.
PromoSer: A large-scale mammalian promoter and transcription
start site identification service.
Nucleic Acids Res. 31(13), 3554-9 (2003).
Halees, A.S. and Weng, Z.
PromoSer: improvements to the algorithm, visualization and
accessibility.
Nucleic Acids Res., 32, W191-W194 (2004).
|
|