Data version: ENCODE Nov 2008, Feb 2009, and July 2009 Freezes
Description
This track is produced as part of the ENCODE Transcriptome Project.
Transcription of different
RNA extracts from different
sub-cellular localizations in different
cell lines
is compared in companion experiments using three different technologies:
tiling arrays, RNA-seq using Solexa, and RNA-seq using SOLiD. The
tiling array data are shown in this track.
The Raw Signal view shows an estimate of abundance of RNA molecules
and the Transfrags view
shows the locations of sites corresponding to these molecules.
Display Conventions and Configuration
To show only selected subtracks, uncheck the boxes next to the tracks that
you wish to hide.
Color differences among the views are arbitrary. They provide a
visual cue for
distinguishing between the different cell types and compartments.
Transfrags
The Transfrags view includes all transfrags before filtering.
Filtered Transfrags
The Filtered Transfrags view excludes repeats and other known annotations
including:
tRNAs and rRNAs, mi/snoRNAs, things mapping to the mitochondrial or Y
chromosomes,
and many predicted snoRNAs and miRNAs.
Raw Signal
The Raw Signal view shows the probe intensity based on all probes.
Filtered Signal
The Filtered Signal view shows the probe intensity based on the
filtered set of probes.
Regions with negative signal value are areas
where the mismatch probe hybridizes better than the match probe.
This could indicate presence of a SNP, a site of RNA editing, or
sequencing error.
Methods
Cells were grown according to the approved
ENCODE cell culture protocols.
RNA molecules longer than 200 nt
and present in RNA population isolated from different subcellular compartments
(such as cytosol, nucleus, polysomes and others) were fractionated into polyA+
and polyA- fractions as described in
these
protocols.
Each RNA fraction was converted into double-stranded cDNA using
random hexamers,
labeled and hybridized to a tiling 91-array set containing probes against the
non-repetitive
portion of the human genome tiled on average every 5 bp (center-to-center of
each consecutive 25-mers).
All arrays were scaled to a median array intensity of 330. Within a sliding
61 bp window
centered on each probe, an estimate of RNA abundance (signal; see the
Raw Signal view)
was found by calculating the median of all pairwise average PM-MM values,
where PM is a
perfect match and MM is a mismatch. Kapranov et al. (2002), Cheng
et al. (2005) , Kapranov et al. (2007), and Cawley
et al. (2004)
are good references for the experimental methods. Cawley et al.
also describes the analytical methods.
Verification
The reproducibility of the labeling method was assessed separately. Three
independent
technical replicates were generated from the same RNA pool for each RNA
preparation and
hybridized to duplicate arrays (two technical replicates) that contain the
ENCODE regions.
Labeled RNA samples were then pooled and hybridized to the tiling 91-array
set spanning
the whole genome. Transcribed regions (transfrags; see the Transfrags
view) were
generated from the Raw Signal by merging genomic positions to which
probes
are mapped. This merging was based on a 5% false positive rate cutoff in
negative
bacterial controls, a maximum gap (MaxGap) of 40 base-pairs and minimum run
(MinRun)
of 40 base-pairs.
Credits
These data were generated and analyzed by the transcriptome group at
Affymetrix
and Cold Spring Harbor Laboratories:
P. Kapranov, I. Bell, E. Dumais,
J. Drenkow, J. Dumais, N. Garg, M. Lubinsky,
Carrie A. Davis, Huaien Wang, Kimberly Bell, Jorg Drenkow, Chris Zaleski,
and Thomas R. Gingeras.
Data users may freely use ENCODE data, but may not, without prior
consent, submit publications that use an unpublished ENCODE dataset until
nine months following the release of the dataset. This date is listed in
the Restricted Until column, above. The full data release policy
for ENCODE is available
here.