Data version: ENCODE Jan 2011 Freeze (Sept 2012 Analysis Pubs)
Description
The ENCODE Analysis Working Group (AWG) has performed uniform processing on datasets produced by
multiple data production groups in the ENCODE Consortium. This track represents a uniform set of
open chromatin elements (DNaseI hypersensitive sites) in 125
ENCODE cell types,
based on DNase-seq data produced by the "Open Chromatin" (Duke/UNC/UT-A) and University of
Washington (UW) ENCODE groups from the project inception in 2007 through the ENCODE January
2011 data freeze. The AWG uniform datasets are used in downstream analysis pipelines by members of
the ENCODE Consortium and are one of the primary sources of data referenced in the 2012 ENCODE
integrative analysis paper (ENCODE Project Consortium 2012). More information about the ENCODE
integrative analysis is here.
The primary and lab-processed data (along with methods descriptions, credits and references) on
which this track is based are available in the following ENCODE tracks:
The display for this track shows site location and signal value as grayscale-colored items where
higher signal values correspond to darker-colored blocks. The display can be filtered to higher
valued items, using the 'Minimum signal' configuration item.
This track is a composite annotation track containing multiple subtracks, one for each cell type.
The display mode and filtering of each subtrack can be individually controlled.
For more information about track configuration, see
Configuring Multi-View Tracks.
Metadata for a particular subtrack can be found by clicking the down arrow in
the list of subtracks. The UCSC Accession listed in the metadata can be used with the File Search tool to
retrieve primary data files underlying datasets of interest.
In the subtrack selection list, the ENCODE tier (priority) is listed for each cell type.
Tier 1 and Tier 2 represent categories with cell types designated for intensive study by
the ENCODE investigators.
After the January 2011 data freeze, an additional set of cell types were promoted from
Tier 3 to Tier 2 to broaden the list of intensively studied cell types.
These cell types are listed as Tier 2* in the subtrack list here (and are
described as 'newly promoted to tier 2: not in 2011 analysis' on the
ENCODE Common Cell Types page).
Methods
The DNase-seq aligned sequence reads (BAM files) from the primary data tracks listed above were
processed using the UW HotSpot pipeline (as described in the UW DnaseI HS track description above).
First, "hotspots" (i.e. broad, variable-sized regions of generalized chromatin accessibility) were
identified using a relaxed threshold. Then more stringent "narrowPeaks" (False Discovery Rate 1% peaks)
were generated by first thresholding hotspots (using random simulation) at FDR 1%, and then (essentially)
locating local maxima of the tag density (150 bp window, sliding every 20 bp) within the hotspots.
FDR 1% peaks were set to a fixed width of 150 bp.
The Duke DNase primary data were pre-processed to reduce variability by combining all replicates for
a given cell-type and subsampling at a level of 30 million tags. For the UW data, the replicate 1
calls from the primary UW DNaseI HS data track were used. For the 14 cell types where both groups
have data, a collapsed set of FDR 1% peaks were generated by taking a non-overlapping selection of
the calls from both centers and giving preference to the peak with the higher z-score when calls
overlapped. A collapsed set of hotspots on these cell types was generated by merging the calls from
both centers (taking the union interval of overlapping intervals).
Credits
The processed data for this track were generated by the University of Washington ENCODE group on
behalf of the ENCODE Analysis Working Group. Credits for the primary data underlying this track are
included in track description pages listed in the Description section above.
While primary ENCODE data is subject to a restriction period as described in the
ENCODE data release policy,
this restriction does not apply to the integrative analysis results.
The data in this track are freely available.