Description
This track shows the starts and ends of mRNA transcripts
determined by paired-end ditag (PET) sequencing. PETs are composed of 18
bases from either end of a cDNA; 36 bp PETs from many clones were
concatenated together and cloned into pZero-1 for efficient sequencing. See
the Methods and References sections below for more details on PET sequencing.
The PET sequences in this track are full-length transcripts
derived from two cell lines with differing treatments:
- the log phase of MCF7 cells
- MCF7 cells treated with estrogen (10nM beta-estradiol) for 12 hours
- HCT116 cells treated with 5FU (5-fluorouracil) for 6 hours
- Log phase of embryonic stem cell hES3 in feeder free culture condition
In total, 584,624 PETs were generated for the log phase MCF7 cells,
153,179 PETs were generated for the estrogen-treated MCF7 cells,
280,340 PETs were generated for the HCT116 cells, and 1,799,970 PETs
were generated from the hES3 cells.
More than 80% of the PETs in the
HCT116 and log phase MCF7 cells were mapped to the genome.
The 474,278 log phase MCF7 PETs and 223,261 HCT116 PETs that
mapped with single and multiple (up to ten) matches in the genome
are shown in the two subtracks. For the estrogen-treated MCF7 cells,
only those PETs mapped to the ENCODE regions with the
above match criteria (4881 total) are displayed.
Human embryonic stem cell line hES3 (46XX, Chinese) was obtained from
ES Cell International. These cells were serially cultured according
to protocols established previously (Choo, 2006).
In brief, feeder-free cultures of hES3 were maintained at 37C/5% CO2
on Matrigel-coated organ culture dishes supplemented with conditioned
media from mouse feeders, DE-MEF.
In the graphical display,
the ends are represented by blocks connected by a horizontal line. In full
and packed display modes, the arrowheads on the horizontal line represent the
direction of transcription, and an ID of the format XXXXX-N-M is
shown to the left of each PET, where X is the unique ID for each
PET, N indicates the number of mapping locations in the genome
(1 for a single mapping location, 2 for two mapping locations, and so forth),
and M is the number of PET sequences at this location. The total
count of PET sequences mapped to the same locus but with slight nucleotide
differences may reflect the expression level of the transcripts. PETs that
mapped to multiple locations may represent low complexity or repetitive
sequences.
The graphical display also uses color coding to reflect the uniqueness
and expression level of each PET:
Color | Mapping | PETS observed at location |
dark blue | unique | 2 or more |
light blue | unique | 1 |
medium brown | multiple | 2 or more |
light brown | multiple | 1 |
Methods
PolyA+ RNA was isolated from the cells. A full-length cDNA library was
constructed and converted into a PET library for Gene
Identification Signature analysis (Ng et al., 2005). Generation of
PET sequences involved cloning of cDNA sequences into the plasmid vector,
pGIS3. pGIS3 contains two MmeI recognition sites that
flank the cloning site, which were used to produce a 36 bp PET. Each 36 bp PET
sequence contains 18 bp from each of the 5' and 3' ends of the original
full-length cDNA clone. The 18 bp 3' signature contains 16 bp 3'-specific
nucleotides and an AA residual of the polyA tail to indicate the sequence
orientation. PET sequences were mapped to the genome using the following
specific criteria:
- a minimal continuous 16 bp match must exist for the 5' signature; the
3' signature must have a minimal continuous 14 bp match
- both 5' and 3' signatures must be present on the same chromosome
- their 5' to 3' orientation must be correct
- the maximal genomic span of a PET genomic alignment must be less than
one million bp
Most of the PET sequences (more than 90%) were mapped to specific locations
(single mapping loci). PETs mapping to 2 - 10 locations are
also included and may represent duplicated genes or pseudogenes in
the genome.
Verification
To assess overall PET quality and mapping specificity, the top ten most
abundant PET clusters that mapped to well-characterized known genes were
examined. Over 99% of the PETs represented full-length transcripts, and the
majority fell within ten bp of the known 5' and 3' boundaries of these
transcripts. The PET mapping was further verified by confirming the existence
of physical cDNA clones represented by the ditags. PCR primers were designed
based on the PET sequences and amplified the corresponding cDNA inserts from
the parental GIS flcDNA library for sequencing analysis. In a set of 86
arbitrarily-selected PETs representing a wide range of annotation
categories — including known genes (38 PETs), predicted genes (2 PETs),
and novel transcripts (46 PETs) — 84 (97.7%) confirmed
the existence of bona fide transcripts.
Credits
The GIS-PET libraries and sequence data for transcriptome analysis were
produced at the
Genome Institute of Singapore. The data were
mapped and analyzed by scientists from the Genome Institute of
Singapore and the
Bioinformatics Institute of
Singapore.
References
Choo A, Padmanabhan J, Chin A, Fong WJ, Oh SKW.
Immortalized feeders for the scale-up of human embryonic stem
cells in feeder and feeder-free conditions.
J Biotechnol. 2006 Mar 9;122(1):130-41.
Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A,
Ridwan A, Wong CH, et al.
Gene identification signature (GIS) analysis for
transcriptome characterization and genome annotation.
Nat Methods. 2005 Feb;2(2):105-11.
|
|