Description
This track depicts high throughput sequencing of long RNAs (>200 nt)
from whole cell
RNA
samples from
tissues
or sub cellular compartments from
cell lines
included in the ENCODE Transcriptome subproject. The overall
goal of the ENCODE project is to identify and characterize all functional
elements in the sequence of the human genome.
RNA-Seq was performed by reverse-transcribing an RNA sample into cDNA,
followed by high throughput DNA sequencing of the cDNA, which was done here
on Helicos Genetic Analysis System (Harris et al;
http://www.helicosbio.com/).
Display Conventions and Configuration
This is a multi-view track that provides the following views of the data:
- Alignments
- RNA-seq tag alignments.
- Raw Signal
- Density graph (wiggle) of the number of reads overlapping a nucleotide
in the genome.
To show only selected subtracks, uncheck the boxes next to the tracks that
you wish to hide.
Color differences among the views are arbitrary. They provide a
visual cue for
distinguishing between the different cell types and compartments.
Note that the strand of the RNA is not displayed in the track in the genome
browser. The strand can be found in the
download file.
Methods
Cells were grown according to the approved
ENCODE cell culture
protocols.
RNA molecules longer than 200 nt and present in RNA population isolated from
different subcellular compartments (such as cytosol, nucleus, polysomes and
others) were fractionated into polyA+ and polyA- fractions as described in
these
protocols.
RNA was converted into first strand cDNA using a high excess of random hexamers
without prior fragmentation. Spurious second-strand cDNA synthesis could occur
under these conditions. The first strand cDNA molecules were tailed at the
3′ ends with polyA residues using terminal transferase and used
directly for sequencing.
Filtered reads were aligned to the human genome using in-house and freely
available Helicos Alignment software indexDPgenomic
(http://open.helicosbio.com/mwiki/index.php/Docs/Software/Bioinformatics#Executables,
requires registration (free))
with a minimum normalized alignment score of 4.5.
The normalized score was defined as following:
Score=(#matches*5-#mismatches*4)/length_read
For example, in the following alignment:
Tag Sequence CCTCCGTGTTGTTCCAGCC-CAGTGCTCGCAGG
Ref Sequence C-TCCGTGTTGTTCCAGCCACAGTGCTCGCAGG
Length of alignment block: 33
Length of tag sequence: 32
Number of matches: 31
Number of errors: 2
Score: (31*5) - (2*4) = 155 - 8 = 147
Normalized score = 147/32 = 4.59375
Raw data can be found at Helicos (requires registration (free)).
Verification
Known exon maps as displayed on the genome browser are confirmed by the
alignment of sequence reads.
Credits
Helicos BioSciences: Philipp Kapranov, Eldar Giladi, Steve Roels, Chris Hart,
Stan Letovsky, Patrice Milos.
Cold Spring Harbor Laboratory: Carrie Davis, Kim Bell, Huaien Wang,
Tom Gingeras.
Contacts:
Philipp Kapranov
;
Patrice Milos
References
Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I, Causey M,
Colonell J, Dimeo J, Efcavitch JW, Giladi E, Gill J, Healy J, Jarosz M,
Lapen D, Moulton K, Quake SR, Steinmann K, Thayer E, Tyurina A, Ward R,
Weiss H, Xie Z.
Single-molecule
DNA sequencing of a viral genome
Science. 2008 Apr 4;320(5872):106-9
Data Release Policy
Data users may freely use ENCODE data, but may not, without prior
consent, submit publications that use an unpublished ENCODE dataset until
nine months following the release of the dataset. This date is listed in
the Restricted Until column, above. The full data release policy
for ENCODE is available
here.
|