Description
This track depicts NextGen sequencing information for RNAs between the sizes
of 20-200 nt isolated from
RNA samples from
tissues or sub cellular compartments from
ENCODE cell
lines.
The overall
goal of the ENCODE project is to identify and characterize all functional
elements in the sequence of the human genome.
This cloning protocol generates directional libraries that are read from the
5′ ends of the inserts, which should largely correspond to the 5′
ends of the mature RNAs. The libraries were sequenced on a Solexa platform for
a total of 36, 50 or 76 cycles however the reads undergo post-processing
resulting in trimming of their 3′ ends. Consequently, the mapped read
lengths are variable.
Display Conventions and Configuration
To show only selected subtracks, uncheck the boxes next to the tracks that
you wish to hide.
Color differences among the views are arbitrary. They provide a
visual cue for
distinguishing between the different cell types and compartments.
- Transfrags
- Identical reads were collapsed while maintaining their multiplicity
information and reported as "transfrags". "Y" means that the transfrag
underwent clipping prior to mapping. "N" indicates that the transfrag did not
undergo clipping.
The Transfrags view includes all transfrags before filtering.
- Raw Signals
- The Raw Signal views show the density of aligned tags on the plus and minus strands.
- Alignments
- The Alignments view shows reads mapped to the genome and indicates where
bases may mismatch. Every mapped read is displayed, i.e. uncollapsed.
Sequences determined to be transcribed on the positive strand are shown in blue. Sequences determined to be transcribed on the negative strand are shown in orange. Sequences for which the direction of transcription was not able to be determined are shown in black. The score of each alignment is the number of times it was aligned to the entire genome, that is, a score of two means that this particular read was aligned to the genome twice in two different locations.
Methods
Small RNAs between 20-200 nt were ribominus treated according to the
manufacturer's protocol (Invitrogen) using custom LNA probes targeting ribosomal
RNAs (some datasets are also depleted of U snRNAs and high abundant microRNAs).
The RNA was treated with Tobacco Alkaline Pyrophosphatase to eliminate any
5′ cap structures.
Poly-A Polymerase was used to catalyze the addition of C's to the 3′
end. The 5′ ends were phosphorylated using T4 PNK and an RNA linker was
ligated onto the 5′ end. Reverse transcription was carried out using a
poly-G oligo with a defined 5′ extension. The inserts were then amplified
using oligos targeting the 5′ linker and poly-G extension and containing
sequencing adapters. The library was sequenced on an Illumina GA machine for a
total of 36, 50 or 76 cycles. Initially 1 lane is run. If an appreciable number
of mappable reads are obtained, additional lanes are run. Sequence reads
underwent quality filtration using Illumina standard pipeline (Gerlad).
The read lengths may exceed the insert sizes and consequently introduce
3′ adaptor sequence into the 3′ end of the reads. The 3′
sequencing adaptor was removed from the reads using a custom clipper program,
which aligned the adaptor sequence to the short-reads, allowing up to 2
mismatches and no indels. Regions that aligned were "clipped" off from the
read. The trimmed portions were collapsed into identical reads, their count
noted and aligned to the human genome (NCBI build 36, hg18 unmasked) using
Nexalign (Lassmann et al., not published). The alignment parameters are tuned to
tolerate up to 2 mismatches with no indels and will allow for trimmed portions
as small as 5 nucleotides to be mapped. We report reads that mapped 10 or fewer
times.
Note: Data obtained from each lane is processed and mapped
independently. The processed/mapped data from each lane is then complied as a
single track without additional processing and submitted to UCSC.
Consequently, identical reads within a lane were collapsed and their value is
reported as the "transfrag" signal value. However, the redundancy between lanes
has not been eliminated so the same transfrag may appear multiple times within a
track.
Verification
Comparison of referential data generated from 8 individual sequencing lanes
(Illumina technology).
Credits
Hannon lab members: Katalin Fejes-Toth, Vihra Sotirova, Gordon Assaf, Jon Preall
And members of the Gingeras and Guigo labs.
Data Release Policy
Data users may freely use ENCODE data, but may not, without prior
consent, submit publications that use an unpublished ENCODE dataset until
nine months following the release of the dataset. This date is listed in
the Restricted Until column, above. The full data release policy
for ENCODE is available
here.
|