Data version: ENCODE Oct 2005 Freeze Data coordinates converted via liftOver from: May 2004 (NCBI35/hg17)
Description
This track displays hit regions and peak centers for Sanger
ChIP-chip data, as identified by hidden Markov model (HMM) analysis.
Display Conventions and Configuration
This annotation follows the display conventions for composite
tracks. The subtracks within this annotation
may be configured in a variety of ways to highlight different aspects of the
displayed data. The graphical configuration options are shown at the top of
the track description page, followed by a list of subtracks. To display only
selected subtracks, uncheck the boxes next to the tracks you wish to hide.
For more information about the graphical configuration options, click the
Graph
configuration help link.
Methods
Data for each replicate was normalized with the
Tukey-Biweight Method using R (as recommended by NimbleGen).
The log base 2 ratio of the normalized intensities was used for
downstream data processing.
A two-state HMM was used to
analyze the data. The states of the HMM represent regions of the
tile path corresponding to antibody binding locations.
State emission probabilities were determined by comparing the
cumulative distribution of the experimental data for each replicate
on each ENCODE region to a fitted cumulative normal distribution.
The fitted distribution was calculated using the Levenberg-Marquart
curve-fitting technique and six fitting points ranging from 0.05 to 0.45
of the cumulative distribution. Initial fitting parameters were
set from the experimental data. This model is robust through a range of
sensible transition probabilities.
Bound regions were identified by finding the optimal state sequence
from the HMM using the Viterbi algorithm, and the resulting
region data was post-processed to develop the hit list.
Hits were defined as contiguous portions of the tile path
identified as bound by the HMM.
The score of a hit was determined by taking the
summation of the median enrichment values of the tiles in the contiguous
portions (i.e. the area under the peak). For the purpose of this
analysis, hits that were within 1000 base pairs of adjacent hits were
combined into hit regions.
The start position of the oligo with
the highest enrichment value in the hit region was deemed the center
of the peak. The ranking of hits was based on the total score of all
hits in a hit region. It is recommended that analysis based on
this data use the peak centers expanded to a convenient size
for the analysis.
Credits
The ChIP-chip data were generated by Ian Dunham's lab at the Sanger
Institute. Contacts:
Ian Dunham and
Christoph Koch.
The HMM analysis was performed at the EBI by
Paul Flicek.