Mappability Track Settings
 
Mappability or Uniqueness of Reference Genome   (All Mapping and Sequencing tracks)

Maximum display mode:       Reset to defaults   
Select mappability (help):
Broad Alignability ▾       CRG Alignability ▾       Duke Uniqueness ▾       Rosetta Uniqueness ▾       UMass Uniqueness ▾       Duke Excluded Regions ▾      
 Select all subtracks
List subtracks: only selected/visible    all    ()
  mappability↓1 Sequence Size↓2   Track Name↓3  
 
hide
 Broad Alignability  36bp  Mappability - ENCODE Broad Alignability of 36mers with no more than 2 mismatches    schema 
 
hide
 CRG Alignability  36bp  Mappability - CRG GEM Alignability of 36mers with no more than 2 mismatches    schema 
 
hide
 CRG Alignability  40bp  Mappability - CRG GEM Alignability of 40mers with no more than 2 mismatches    schema 
 
hide
 CRG Alignability  50bp  Mappability - CRG GEM Alignability of 50mers with no more than 2 mismatches    schema 
 
hide
 CRG Alignability  75bp  Mappability - CRG GEM Alignability of 75mers with no more than 2 mismatches    schema 
 
hide
 CRG Alignability  100bp  Mappability - CRG GEM Alignability of 100mers with no more than 2 mismatches    schema 
 
hide
 Duke Uniqueness  20bp  Mappability - ENCODE Duke Uniqueness of 20bp sequences    schema 
 
hide
 Duke Uniqueness  24bp  Mappability - ENCODE Duke Uniqueness of 24bp sequences    schema 
 
hide
 Duke Uniqueness  35bp  Mappability - ENCODE Duke Uniqueness of 35bp sequences    schema 
 
hide
 Rosetta Uniqueness  35bp  Mappability - Rosetta Uniqueness 35-mer Alignment (BWA/MAQ, unique alignment=37)   schema 
 
hide
 UMass Uniqueness  15bp  Mappability - ENCODE UMass Uniqueness at 15bp    schema 
 
hide
 Duke Excluded Regions  Varied  Mappability - ENCODE Duke Excluded Regions    schema 
    
Downloads

Description

These tracks display the level of sequence uniqueness of the reference NCBI36/hg18 genome assembly. They were generated using different window sizes, and high signal will be found in areas where the sequence is unique.

Methods

The Broad alignability track displays whether a region is made up of mostly unique or mostly non-unique sequence. To generate the track, every 36-mer in the genome was marked as "unique" if the most similar 36-mer elsewhere in the genome have at most 2 mismatches, and as "non-unique" otherwise. Position X in the alignable track is marked by 1 if >50% of the bases in [X-200,X+200] are "unique" and by 0 otherwise. Every point in the alignable track has a corresponding position in each of the ChIP signal tracks. The Broad alignability track was generated for the ENCODE project as a tool for development of the Broad Histone tracks.

The Duke uniqueness tracks display how unique is each sequence on the positive strand starting at a particular base and of a particular length. Thus, the 20 bp track reflects the uniqueness of all 20 base sequences with the score being assigned to the first base of the sequence. Scores are normalized to between 0 and 1 with 1 representing a completely unique sequence and 0 representing the sequence occurs >4 times in the genome (excluding chrN_random and alternative haplotypes). A score of 0.5 indicates the sequence occurs exactly twice, likewise 0.33 for three times and 0.25 for four times. The Duke uniqueness tracks were generated for the ENCODE project as tools in the development of the Open Chromatin tracks.

The Duke excluded regions track displays genomic regions for which mapped sequence tags were filtered out before signal generation and peak calling for Duke/UNC/UTA's Open Chromatin tracks. This track contains problematic regions for short sequence tag signal detection (such as satellites and rRNA genes). The Duke excluded regions track was generated for the ENCODE project.

The Rosetta uniqueness track uses sequence 'tiles' of 35 bp. Each tile was aligned to the genome using the BWA aligner. Tiles that align uniquely and perfectly in hg18 receive a p-value of 1e-37, while those that align perfectly in multiple locations receive a p-value of 0. For each tile, the oligo midpoint coordinate was recorded along with the -log_10 p-value: 37 (unambiguous) to 0 (ambiguous). The Rosetta uniqueness track was generated independently of the ENCODE project.

The UMass uniqueness track displays a uniqueness signal for each base which represents the sum of both plus and minus strand 15-mer occurrences of that particular 5'->3' (plus strand) sequence throughout the genome. Scores are normalized between 0 and 1 by calculating ( 1 / N ) where N is the number of genome wide occurrences of the 15-mer starting at position X. A score of 1 represents a single genome wide occurrence of that 15-mer. A 0.5 would represent either 2 plus strand occurrences or 1 plus and 1 minus strand occurrence, and so on. Ratios are rounded to 3 significant digits. Therefore a 0.000 would represent > 2000 occurrences. A 0 is reserved for a given 15-mer that is either not assembled or contains at least one N at position X. The UMass uniqueness track was generated for the ENCODE project.

The CRG Alignability tracks display how uniquely k-mer sequences align to a region of the genome. To generate the data, the GEM-mappability program has been employed. The method is equivalent to mapping sliding windows of k-mers (where k has been set to 36, 40, 50, 75 or 100 nts to produce these tracks) back to the genome using the GEM mapper aligner (up to 2 mismatches were allowed in this case). For each window, a mapability score was computed (S = 1/(number of matches found in the genome): S=1 means one match in the genome, S=0.5 is two matches in the genome, and so on). The CRG Alignability tracks were generated independently of the ENCODE project, in the framework of the GEM (GEnome Multitool) project.

Credits

The Broad alignability track was created by the Broad Institute. Data generation and analysis was supported by funds from the NHGRI (the ENCODE project), the Burroughs Wellcome Fund, Massachusetts General Hospital and the Broad Institute.

The Duke uniqueness and Duke excluded regions tracks were created by Terry Furey and Debbie Winter at Duke Univerisity's Institute for Genome Sciences & Policy (IGSP); and Stefan Graf at the European Bioinformatics Insitute (EBI). We thank NHGRI for ENCODE funding support.

The Rosetta uniqueness track was created by John Castle, at Rosetta Inpharmatics (Merck), with assistance from Melissa Cline at UCSC.

The UMass uniqueness track was created by Bryan Lajoie in Job Dekker's Lab at the University of Massachusetts Medical School. Funding Support: NIH grant HG003143 to JD. Keck Distinguished Young Scholar Award to JD. This track was generated as part of the ENCODE project funded by the NHGRI.

The CRG Alignability track was created by Thomas Derrien and Paolo Ribeca in Roderic Guigo's lab at the Centre for Genomic Regulation (CRG), Barcelona, Spain. Thomas Derrien was supported by funds from NHGRI for the ENCODE project, while Paolo Ribeca was funded by a Consolider grant CDS2007-00050 from the Spanish Ministerio de Educación y Ciencia.

References

Derrien T, Estelle J, Marco Sola S, Knowles DG, Raineri E, Guigo R, Ribeca P. Fast computation and applications of genome mappability. PLoS One. 2012;7(1):e30377.

Data Release Policy

Data users may freely use all data in this track. ENCODE labs that contributed annotations have exempted the data displayed here from the ENCODE data release policy restrictions.