Description
This track is produced as part of the ENCODE Transcriptome Project.
It shows high throughput sequencing of
RNA samples from
tissues or sub cellular compartments from
cell lines
included in the ENCODE Transcriptome subproject. The overall
goal of the ENCODE project is to identify and characterize all functional
elements in the sequence of the human genome.
Display Conventions and Configuration
This track is a multi-view composite track that contains multiple data types
(views). For each view, there are multiple subtracks that
display individually on the browser. Instructions for configuring multi-view
tracks are here.
To show only selected subtracks, uncheck the boxes next to the tracks that
you wish to hide.
Color differences among the views are arbitrary. They provide a
visual cue for
distinguishing between the different cell types.
- Plus Raw Signal
- The Plus Raw Signal view graphs the base-by-base density of alignments
on the + strand.
- Minus Raw Signal
- The Minus Raw Signal view graphs the base-by-base density of alignments
on the - strand.
- All Raw Signal
- The All Raw Signal view graphs the base-by-base density of alignments
on both strands.
- Alignments
- The Alignments view shows reads mapped to the genome. Sequences determined
to be transcribed on the positive strand are shown in blue.
Sequences determined to be transcribed on the negative strand are shown in
orange. Sequences for which the direction of
transcription was not able to be determined are shown in black.
- Split Alignments
- The Split Alignments view shows alignments of individual RNA sequences that cross exon splice sites. They are colored by strand as described above.
Methods
The RNA-Seq data were generated from high quality polyA RNA, and the RNA-Seq
libraries were constructed using SOLiD Whole Transcriptome (WT) protocol and
reagent kit. Total RNA in good quality was used as starting materials and
purified twice through MACs polyT column aimed to enrich polyA and remove any
contaminants (e.g., rRNA, tRNA, DNA, protein etc.). A one microgram enriched
polyA RNA sample was then fragmented to small pieces, and a gel-based selection
method was performed to collect fragmented random polyA at a size-range of
50-150 nt in length. The collected fragmental RNA was then hybridized and
ligated to a mix of adaptors provided from ABI, followed by reverse
transcription to generate corresponding cDNAs. The resulting cDNA library was
further amplified by PCR and sequenced by SOLiD platform for single reads at 35
bp length (new version in 50 bp length).
Cells were grown according to the approved
ENCODE cell culture
protocols.
Data: The SOLiD-generated RNA-Seq reads were 35 bp in length. An
initial filtering process was performed to remove any non-desirable contamination
sequences, such as rRNA, tRNA, and repeats etc. A read-split mapping approach
was developed to map the 35 bp reads onto the reference genome
(NCBI Build 36/hg18). Specifically, the 35 bp reads were divided into two parts
(1st-25 bp and 2nd-25 bp with 10 bp overlapping) and mapped separately. An
extension mapping analysis was further performed to generate score counts from
each read and use the score numbers as a gauge of filtering reference (e.g.,
scoring >26 was used). The reads with mapping locations N≤1 or N>10
were excluded from further analysis. As a unique strand-specific feature from
the SOLiD RNA-Seq, the data sets generated by SOLiD RNA-Seq were
strand-specific and mapped on exons with strand-specificity.
Mapping parameters: Mapping was done using Applied Biosystems' SOLiD
alignment
for whole transcriptome analysis pipeline. Two mismatches were allowed in the
25 bp color space seed sequence with progressive alignment performed to find the
full mapping location. A score is computed for each mapping location and any
location that scored ≤26 was filtered.
Credits
The GIS RNA-seq libraries and sequence data for transcriptome analysis were
produced at the
Genome Institute of
Singapore.
The data were mapped and analyzed by scientists from the Genome Institute of
Singapore.
Contact:
RUAN Xiaoan
Data Release Policy
Data users may freely use ENCODE data, but may not, without prior
consent, submit publications that use an unpublished ENCODE dataset until
nine months following the release of the dataset. This date is listed in
the Restricted Until column, above. The full data release policy
for ENCODE is available
here.
|