CSHL Sm RNA-seq Track Settings
 
ENCODE Cold Spring Harbor Labs Small RNA-seq   (All Expression tracks)

Maximum display mode:       Reset to defaults   
Select views (help):
Transfrags ▾       Plus Raw Signal ▾       Minus Raw Signal ▾       Alignments ▾      
Select subtracks by cell line and localization:
 All Cell Line GM12878 (Tier 1)  K562 (Tier 1)  Prostate 
Localization
Cell 
Polysome 
Cytosol 
Nucleus 
Nucleoplasm 
Chromatin 
Nucleolus 
List subtracks: only selected/visible    all    ()
  Cell Line↓1 Localization↓2 views↓3   Track Name↓4    Restricted Until↓5
 
hide
 GM12878  Cytosol  Transfrags  ENCODE CSHL RNA-seq Transfrags (small RNA in GM12878 cytosol)    schema   2010-06-23 
 
hide
 GM12878  Cytosol  Plus Raw Signal  ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in GM12878 cytosol)    schema   2010-06-23 
 
hide
 GM12878  Cytosol  Minus Raw Signal  ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in GM12878 cytosol)    schema   2010-06-23 
 
hide
 GM12878  Cytosol  Alignments  ENCODE CSHL RNA-seq Tags (small RNA in GM12878 cytosol)    schema   2010-06-23 
 
hide
 K562  Cytosol  Transfrags  ENCODE CSHL RNA-seq Transfrags (small RNA in K562 cytosol)    schema   2010-06-23 
 
hide
 K562  Cytosol  Plus Raw Signal  ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in K562 cytosol)    schema   2010-06-23 
 
hide
 K562  Cytosol  Minus Raw Signal  ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in K562 cytosol)    schema   2010-06-23 
 
hide
 K562  Cytosol  Alignments  ENCODE CSHL RNA-seq Tags (small RNA in K562 cytosol)    schema   2010-06-23 
 
hide
 Prostate  Cell  Transfrags  ENCODE CSHL RNA-seq Transfrags (small RNA in Prostate cell)    schema   2010-06-23 
 
hide
 Prostate  Cell  Plus Raw Signal  ENCODE CSHL RNA-seq Plus Strand Raw Signal (small RNA in Prostate cell)    schema   2010-06-23 
 
hide
 Prostate  Cell  Minus Raw Signal  ENCODE CSHL RNA-seq Minus Strand Raw Signal (small RNA in Prostate cell)    schema   2010-06-23 
 
hide
 Prostate  Cell  Alignments  ENCODE CSHL RNA-seq Tags (small RNA in Prostate cell)    schema   2010-06-23 
     Restriction Policy
Downloads

Description

This track depicts NextGen sequencing information for RNAs between the sizes of 20-200 nt isolated from RNA samples from tissues or sub cellular compartments from ENCODE cell lines. The overall goal of the ENCODE project is to identify and characterize all functional elements in the sequence of the human genome.

This cloning protocol generates directional libraries that are read from the 5′ ends of the inserts, which should largely correspond to the 5′ ends of the mature RNAs. The libraries were sequenced on a Solexa platform for a total of 36, 50 or 76 cycles however the reads undergo post-processing resulting in trimming of their 3′ ends. Consequently, the mapped read lengths are variable.

Display Conventions and Configuration

To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide.

Color differences among the views are arbitrary. They provide a visual cue for distinguishing between the different cell types and compartments.

Transfrags
Identical reads were collapsed while maintaining their multiplicity information and reported as "transfrags". "Y" means that the transfrag underwent clipping prior to mapping. "N" indicates that the transfrag did not undergo clipping. The Transfrags view includes all transfrags before filtering.
Raw Signals
The Raw Signal views show the density of aligned tags on the plus and minus strands.
Alignments
The Alignments view shows reads mapped to the genome and indicates where bases may mismatch. Every mapped read is displayed, i.e. uncollapsed. Sequences determined to be transcribed on the positive strand are shown in blue. Sequences determined to be transcribed on the negative strand are shown in orange. Sequences for which the direction of transcription was not able to be determined are shown in black. The score of each alignment is the number of times it was aligned to the entire genome, that is, a score of two means that this particular read was aligned to the genome twice in two different locations.

Methods

Small RNAs between 20-200 nt were ribominus treated according to the manufacturer's protocol (Invitrogen) using custom LNA probes targeting ribosomal RNAs (some datasets are also depleted of U snRNAs and high abundant microRNAs). The RNA was treated with Tobacco Alkaline Pyrophosphatase to eliminate any 5′ cap structures.

Poly-A Polymerase was used to catalyze the addition of C's to the 3′ end. The 5′ ends were phosphorylated using T4 PNK and an RNA linker was ligated onto the 5′ end. Reverse transcription was carried out using a poly-G oligo with a defined 5′ extension. The inserts were then amplified using oligos targeting the 5′ linker and poly-G extension and containing sequencing adapters. The library was sequenced on an Illumina GA machine for a total of 36, 50 or 76 cycles. Initially 1 lane is run. If an appreciable number of mappable reads are obtained, additional lanes are run. Sequence reads underwent quality filtration using Illumina standard pipeline (Gerlad).

The read lengths may exceed the insert sizes and consequently introduce 3′ adaptor sequence into the 3′ end of the reads. The 3′ sequencing adaptor was removed from the reads using a custom clipper program, which aligned the adaptor sequence to the short-reads, allowing up to 2 mismatches and no indels. Regions that aligned were "clipped" off from the read. The trimmed portions were collapsed into identical reads, their count noted and aligned to the human genome (NCBI build 36, hg18 unmasked) using Nexalign (Lassmann et al., not published). The alignment parameters are tuned to tolerate up to 2 mismatches with no indels and will allow for trimmed portions as small as 5 nucleotides to be mapped. We report reads that mapped 10 or fewer times.

Note: Data obtained from each lane is processed and mapped independently. The processed/mapped data from each lane is then complied as a single track without additional processing and submitted to UCSC. Consequently, identical reads within a lane were collapsed and their value is reported as the "transfrag" signal value. However, the redundancy between lanes has not been eliminated so the same transfrag may appear multiple times within a track.

Verification

Comparison of referential data generated from 8 individual sequencing lanes (Illumina technology).

Credits

Hannon lab members: Katalin Fejes-Toth, Vihra Sotirova, Gordon Assaf, Jon Preall

And members of the Gingeras and Guigo labs.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.