Open Chromatin Track Settings
 
ENCODE Open Chromatin, Duke/UNC/UT   (All Regulation tracks)

Maximum display mode:       Reset to defaults   
Select views (help):
Peaks ▾       Peaks (Zinba) ▾       Signal (F-Seq Density) ▾       Signal (Base Overlap) ▾      
Select subtracks by experiment and cell line:
 All Experiment DNase-seq  FAIRE-seq  ChIP-seq CTCF  ChIP-seq c-Myc  ChIP-seq Pol2  Input Control 
Cell Line
AoSMC Serum Free 
Chorion 
Fibrobl 
FibroP 
GM12878 Tier1 
GM12891 
GM12892 
GM18507 
GM19238 
GM19239 
GM19240 
H1-hESC Tier1 
H9-hESC 
HeLa-S3 Tier2 
HeLa-S3 IFNα 
HeLa-S3 IFNγ 
HepG2 Tier2 
HSMM 
HSMMtube 
HUVEC Tier2 
K562 Tier1 
LHSR 
LHSR androgen 
MCF-7 
Medullo 
Melano 
Myometr 
NHBE 
NHEK 
PanIslets 
ProgFib 
Cell Line
 All Experiment DNase-seq  FAIRE-seq  ChIP-seq CTCF  ChIP-seq c-Myc  ChIP-seq Pol2  Input Control 
List subtracks: only selected/visible    all    ()
  Cell Line↓1 Experiment↓2 views↓3   Track Name↓4    Restricted Until↓5
 
hide
 GM12878 Tier1  DNase-seq  Peaks  ENCODE Open Chromatin, Duke DNase-seq Peaks (in GM12878 cells)    schema   2009-12-20 
 
hide
 GM12878 Tier1  DNase-seq  Signal (F-Seq Density)  ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in GM12878 cells)    schema   2009-11-27 
 
hide
 GM12878 Tier1  DNase-seq  Signal (Base Overlap)  ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in GM12878 cells)    schema   2009-11-27 
 
hide
 GM12878 Tier1  FAIRE-seq  Peaks  ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in GM12878 cells)    schema   2010-01-20 
 
hide
 GM12878 Tier1  FAIRE-seq  Signal (F-Seq Density)  ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in GM12878 cells)    schema   2009-11-25 
 
hide
 GM12878 Tier1  FAIRE-seq  Signal (Base Overlap)  ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in GM12878 cells)    schema   2009-11-25 
 
hide
 GM12878 Tier1  ChIP-seq CTCF  Peaks  ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in GM12878 cells)    schema   2009-12-20 
 
hide
 GM12878 Tier1  ChIP-seq CTCF  Signal (F-Seq Density)  ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in GM12878 cells)    schema   2009-11-24 
 
hide
 GM12878 Tier1  ChIP-seq CTCF  Signal (Base Overlap)  ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in GM12878 cells)    schema   2009-11-24 
 
hide
 GM12878 Tier1  ChIP-seq c-Myc  Peaks  ENCODE Open Chromatin, UT ChIP-seq Peaks (c-Myc in GM12878 cells)    schema   2010-06-08 
 
hide
 GM12878 Tier1  ChIP-seq c-Myc  Signal (F-Seq Density)  ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (c-Myc in GM12878 cells)    schema   2010-06-08 
 
hide
 GM12878 Tier1  ChIP-seq c-Myc  Signal (Base Overlap)  ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (c-Myc in GM12878 cells)    schema   2010-06-08 
 
hide
 GM12878 Tier1  ChIP-seq Pol2  Peaks  ENCODE Open Chromatin, UT ChIP-seq Peaks (Pol2 in GM12878 cells)    schema   2010-09-22 
 
hide
 GM12878 Tier1  ChIP-seq Pol2  Signal (F-Seq Density)  ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (Pol2 in GM12878 cells)    schema   2010-09-22 
 
hide
 GM12878 Tier1  ChIP-seq Pol2  Signal (Base Overlap)  ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (Pol2 in GM12878 cells)    schema   2010-09-22 
 
hide
 GM12878 Tier1  Input Control  Signal (F-Seq Density)  ENCODE Open Chromatin, UT Input F-Seq Density Signal (Input in GM12878 cells)    schema   2009-07-07 
 
hide
 K562 Tier1  DNase-seq  Peaks  ENCODE Open Chromatin, Duke DNase-seq Peaks (in K562 cells)    schema   2009-12-20 
 
hide
 K562 Tier1  DNase-seq  Signal (F-Seq Density)  ENCODE Open Chromatin, Duke DNase-seq F-Seq Density Signal (in K562 cells)    schema   2009-11-26 
 
hide
 K562 Tier1  DNase-seq  Signal (Base Overlap)  ENCODE Open Chromatin, Duke DNase-seq Base Overlap Signal (in K562 cells)    schema   2009-08-09 
 
hide
 K562 Tier1  FAIRE-seq  Peaks  ENCODE Open Chromatin, UNC FAIRE-seq Peaks (in K562 cells)    schema   2010-01-20 
 
hide
 K562 Tier1  FAIRE-seq  Signal (F-Seq Density)  ENCODE Open Chromatin, UNC FAIRE-seq F-Seq Density Signal (in K562 cells)    schema   2009-11-26 
 
hide
 K562 Tier1  FAIRE-seq  Signal (Base Overlap)  ENCODE Open Chromatin, UNC FAIRE-seq Base Overlap Signal (in K562 cells)    schema   2009-08-09 
 
hide
 K562 Tier1  ChIP-seq CTCF  Peaks  ENCODE Open Chromatin, UT ChIP-seq Peaks (CTCF in K562 cells)    schema   2009-12-20 
 
hide
 K562 Tier1  ChIP-seq CTCF  Signal (F-Seq Density)  ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (CTCF in K562 cells)    schema   2009-11-27 
 
hide
 K562 Tier1  ChIP-seq CTCF  Signal (Base Overlap)  ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (CTCF in K562 cells)    schema   2009-11-27 
 
hide
 K562 Tier1  ChIP-seq c-Myc  Peaks  ENCODE Open Chromatin, UT ChIP-seq Peaks (c-Myc in K562 cells)    schema   2009-12-20 
 
hide
 K562 Tier1  ChIP-seq c-Myc  Signal (F-Seq Density)  ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (c-Myc in K562 cells)    schema   2009-11-27 
 
hide
 K562 Tier1  ChIP-seq c-Myc  Signal (Base Overlap)  ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (c-Myc in K562 cells)    schema   2009-11-27 
 
hide
 K562 Tier1  ChIP-seq Pol2  Peaks  ENCODE Open Chromatin, UT ChIP-seq Peaks (Pol2 in K562 cells)    schema   2010-06-29 
 
hide
 K562 Tier1  ChIP-seq Pol2  Signal (F-Seq Density)  ENCODE Open Chromatin, UT ChIP-seq F-Seq Density Signal (Pol2 in K562 cells)    schema   2010-06-29 
 
hide
 K562 Tier1  ChIP-seq Pol2  Signal (Base Overlap)  ENCODE Open Chromatin, UT ChIP-seq Base Overlap Signal (Pol2 in K562 cells)    schema   2010-06-29 
 
hide
 K562 Tier1  Input Control  Signal (F-Seq Density)  ENCODE Open Chromatin, UT Input F-Seq Density Signal (Input in K562 cells)    schema   2009-08-05 
     Restriction Policy
Downloads

Description

These tracks display evidence of open chromatin in multiple cell types from the Duke/UNC/UT-Austin/EBI ENCODE group. Open chromatin was identified using two independent and complementary methods: DNaseI hypersensitivity (HS) and Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE), combined with chromatin immunoprecipitation (ChIP) for select regulatory factors. Each method was verified by two detection platforms: Illumina (formerly Solexa) sequencing by synthesis, and high-resolution 1% ENCODE tiled microarrays supplied by NimbleGen.

DNaseI HS data: DNaseI is an enzyme that has long been used to map general chromatin accessibility, and DNaseI "hyperaccessibility" or "hypersensitivity" is a feature of active cis-regulatory sequences. The use of this method has led to the discovery of functional regulatory elements that include enhancers, silencers, insulators, promotors, locus control regions and novel elements. DNaseI hypersensitivity signifies chromatin accessibility following binding of trans-acting factors in place of a canonical nucleosome.

FAIRE data: FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements) is a method to isolate and identify nucleosome-depleted regions of the genome. FAIRE was initially discovered in yeast and subsequently shown to identify active regulatory elements in human cells (Giresi et al., 2007). Although less well-characterized than DNase, FAIRE also appears to identify functional regulatory elements that include enhancers, silencers, insulators, promotors, locus control regions and novel elements. DNA fragments isolated by FAIRE are 100-200 bp in length, with the average length being 140 bp.

ChIP data: ChIP (Chromatin Immunoprecipitation) is a method to identify the specific location of proteins that are directly or indirectly bound to genomic DNA. By identifying the binding location of sequence-specific transcription factors, general transcription machinery components, and chromatin factors, ChIP can help in the functional annotation of the open chromatin regions identified by DNaseI HS mapping and FAIRE.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. Chromatin data displayed here represents a continuum of signal intensities. The Crawford lab recommends setting the "Data view scaling: auto-scale" option when viewing signal data in full mode. In general, for each experiment in each of the cell types, the Open Chromatin tracks contain the following views:

Peaks
Regions of enriched signal in either DNaseI HS, FAIRE, or ChIP experiments. Peaks were called based on signals created using F-Seq, a software program developed at Duke (Boyle et al., 2008b). Significant regions were determined by performing ROC analysis of sequence data using data from the 1% ENCODE arrays, and determining a cut-off value at approximately the 95% sensitivity level. The solid vertical line in the peak represents the point with highest signal. ENCODE Peaks tables contain a p-value for statistical significance. For these data, this was determined by fitting the data to a gamma distribution.
Peaks (Zinba)
Enriched regions for FAIRE data were called using ZINBA (Zero Inflated Negative Binomial Algorithm). ZINBA is a flexible statistical method that uses a generalized linear model to select genomic windows with enriched sequence counts after adjusting for relevant confounding factors such as mappability, GC content, and copy number alterations. Significant regions are selected using the set of standardized residuals below a false discovery rate (qvalue) threshold. Peaks were further refined using a shape detection algorithm to identify local maxima and boundaries of the Signal (Base Overlap) data within each significant region.
Signal (F-Seq Density)
Density graph (wiggle) of signal enrichment calculated using F-Seq for the combined set of sequences from all replicates. F-Seq employs Parzen kernel density estimation to create base pair scores (Boyle et al., 2008b). This method does not look at fixed-length windows but rather weights contributions of nearby sequences in proportion to their distance from that base. It only considers sequences aligned 4 or less times in the genome, and uses an alignability background model to try to correct for regions where sequences cannot be aligned. For the K562, HepG2 and HelaS3 cell types, where there is an abnormal karyotype, a model to try to correct for amplifications and deletions was also used. No control data were used in the creation of these annotations.
Signal (Base Overlap)
An alternative version of the Signal (F-Seq Density) track annotation that provides a higher resolution view of the raw sequence data. This track also includes the combined set of sequences from all replicates. For each sequence, the aligned read is extended in the following way: for DNase, the read is extended 5 bp in both directions from its 5' aligned end where DNase cut the DNA; for FAIRE and ChIP, the sequence is extend to a fragment length of 134 bp from the 5' aligned end representing the approximate average fragment length. The score at each base pair represents the number of extended fragments that overlap the base pair.
Alignments
Mappings of short reads to the genome (currently only available for download).
Additional data that were used to generate these tracks are located in the ENCODE Mappability track:
Uniqueness
The Duke uniqueness tracks were used in identify regions of unique sequence for different tag lengths. The tracks also identify regions where high-throughput sequence tags cannot be mapped.
Excluded Regions
The Duke excluded regions track was used to identify problematic regions for short sequence tag signal detection (such as satellites and rRNA genes). These regions of the genome were excluded from the Open Chromatin tracks.

Methods

Cells were grown according to the approved ENCODE cell culture protocols.

DNaseI hypersensitive sites were isolated using methods called DNase-seq or DNase-chip (Boyle et al., 2008a, Crawford et al., 2006). Briefly, cells were lysed with NP40, and intact nuclei were digested with optimal levels of DNaseI enzyme. DNaseI digested ends were captured from three different DNase concentrations, and material was sequenced using Illumina (Solexa) sequencing. DNase-seq data were verified using material that was hybridized to NimbleGen Human ENCODE tiling arrays (1% of the genome). Multiple independent growths (replicates) were compared to verify the reproducibility of the data. A more detailed protocol is available here.

FAIRE was performed (Giresi et al., 2007) by cross-linking proteins to DNA using 1% formaldehyde solution, and the complex was sheared using sonication. Phenol/chloroform extractions were performed to remove DNA fragments cross-linked to protein. The DNA recovered in the aqueous phase was hybridized to NimbleGen Human ENCODE tiling arrays (1% of the genome) and sequenced using a Solexa sequencing system. The ENCODE array data were used to verify the accuracy of the sequencing data, and multiple independent growths (replicates) were compared to assess the reproducibility of the data. A more detailed protocol is available here. Also see Giresi et al., 2009.

To perform ChIP, proteins were cross-linked to DNA in vivo using 1% formaldehyde solution (Bhinge et al., 2007, ENCODE Project Consortium., 2007). Cross-linked chromatin was sheared by sonication and immunoprecipitated using a specific antibody against the protein of interest. After reversal of the cross-links, the immunoprecipitated DNA was used to identify the genomic location of transcription factor binding. This was accomplished by Solexa sequencing of the ends of the immunoprecipitated DNA (ChIP-seq), as well as labeling and hybridization of the immunoprecipitated DNA to NimbleGen Human ENCODE tiling arrays (1% of the genome) along with the input DNA as reference (ChIP-chip). The ENCODE array data were used to verify the accuracy of the sequencing data, and multiple independent growths (replicates) were compared to assess the reproducibility of the data. A more detailed protocol is available here.

ENCODE Array data were normalized using the Tukey biweight normalization, and peaks were called using ChIPOTle (Buck, et al., 2005) at multiple levels of significance. Regions matched on size to these peaks that were devoid of any significant signal were also created to allow for ROC analysis.

Sequences from each experiment were aligned to the genome using Maq (Li et al., 2008) and those that aligned to 4 or fewer locations were retained. Other sequences were also filtered based on their alignment to problematic regions (such as satellites and rRNA genes). The resulting digital signal was converted to a continuous wiggle track using F-Seq that employs Parzen kernel density estimation to create base pair scores (Boyle et al., 2008b). Discrete DNase HS, FAIRE, and ChIP sites (peaks) were identified from DNase/FAIRE/ChIP-seq using F-Seq by setting a Parzen cutoff based on ROC curve analysis using peaks and non-peaks identified from DNase/FAIRE/ChIP-chip using NimbleGen Human ENCODE tiling arrays (1% of the genome).

Input data was generated for GM12878, K562, HeLa-S3, HepG2, and HUVEC. These were used directly to create a control/background model used for F-Seq when generating signal annotations and subsequenntly peaks for these cell lines. These models are meant to correct for sequencing biases, alignment artifacts, and copy number changes in these cell lines. Input data is not being generated directly for other cell lines. Instead, a general background model was derived from the five Input data sets. This should provide corrections for sequencing biases and alignment artifacts, but obviously not for cell type specific copy number changes.

Release Notes

This is Release 3 (Mar 2010) of this track, which includes 18 new cell line or cell/treatment experiments. In addition, a number of new experiments were added to existing cell lines. Almost all Peaks have been called anew using improved cut-offs and p-Values. Finally, a second type of peak called using a ZINBA algorithm has been provided for several of the FAIRE-seq experiments. For all new versions of previously-released data, the affected database tables and files include 'V2' or 'V3' in the name, and metadata is marked with "submittedDataVersion=V", followed by a number and reason for replacement. Previous versions of these files are available for download from the FTP site.

Credits

These data and annotations were created by a collaboration of multiple institutions (contact: Terry Furey):

We thank NHGRI for ENCODE funding support.

References

Bhinge AA, Kim J, Euskirchen GM, Snyder M, Iyer, VR. Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). Genome Res. 2007 Jun;17(6):910-6.

Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, Crawford GE. High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008 Jan 25;132(2):311-22.

Boyle AP, Guinney J, Crawford GE, and Furey TS. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics. 2008 Nov 1;24(21):2537-8.

Buck MJ, Nobel AB, Lieb JD. ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data. Genome Biol. 2005;6(11):R97.

Crawford GE, Davis S, Scacheri PC, Renaud G, Halawi MJ, Erdos MR, Green R, Meltzer PS, Wolfsberg TG, Collins FS. DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nat Methods. 2006 Jul;3(7):503-9.

Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, Bernat JA, Ginsburg D et al. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 2006 Jan;16(1):123-31.

The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14;447(7146):799-816.

Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolated active regulatory elements in human chromatin. Genome Res. 2007 Jun;17(6):877-85.

Giresi PG, Lieb JD. Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). Methods. 2009 Jul;48(3):233-9.

Li H, Ruan J, and Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008 Nov;18(11):1851-8.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.