Caltech TFBS Track Settings
 
Transcription Factor Binding Sites by ChIP-seq from ENCODE/Caltech   (All Expression and Regulation tracks)

Maximum display mode:       Reset to defaults   
Select views (help):
Peaks ▾       Signal ▾      
Select subtracks by treatment and factor:
 All Treatment None  EqS 2.0pct 24h  EqS 2.0pct 60h  EqS 2.0pct 5d  EqS 2.0pct 7d 
Factor
CEBPB 
CTCF 
E2F4 
FOSL1 (sc-605) 
MAX 
MyoD (sc-32758) 
Myogenin (sc-12732) 
NRSF 
POL2 
Pol2(phosphoS2) 
SRF 
TCF3 (SC-349) 
TCF12 
USF-1 
Input 
Select subtracks further by: (select multiple categories and items - help)
Control:
Rep:

List subtracks: only selected/visible    all    ()
  Cell Line↓1 Factor↓2 Control↓3 Treatment↓4 Protocol↓5 Rep↓6 views↓7   Track Name↓8    Restricted Until↓9
 
hide
 C2C12  Myogenin (sc-12732)  Control 32bp  EqS 2.0pct 60h  PCR2x  1  Peaks  C2C12 Myogenin Myocyte 60h TFBS ChIP-seq Peaks Rep 1 from ENCODE/Caltech    schema   2012-05-11 
 
hide
 C2C12  Myogenin (sc-12732)  Control 32bp  EqS 2.0pct 60h  PCR2x  1  Signal  C2C12 Myogenin Myocyte 60h TFBS ChIP-seq Signal Rep 1 from ENCODE/Caltech    schema   2012-05-11 
 
hide
 C2C12  MyoD (sc-32758)  Control 32bp      PCR2x  1  Peaks  C2C12 MyoD Myoblast TFBS ChIP-seq Peaks Rep 1 from ENCODE/Caltech    schema   2012-05-11 
 
hide
 C2C12  MyoD (sc-32758)  Control 32bp      PCR2x  1  Signal  C2C12 MyoD Myoblast TFBS ChIP-seq Signal Rep 1 from ENCODE/Caltech    schema   2012-05-11 
     Restriction Policy
Downloads

Description

Rationale for the Mouse ENCODE project

Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function.

The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus, we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation.

Maps of Occupancy by Transcription Factors

Genome-wide occupancy maps of transcription factors (TFs) are generated by ChIP-seq. A ChIP-Seq experiment combines a chromatin immunoprecipitation (ChIP) experiment that enriches genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibody) with high-throughput short read sequencing of the enriched DNA fragments (Wold & Myers, 2008). Proteins are crosslinked to DNA (usually with formaldehyde), chromatin is sheared and immunoprecipitated with the antibody of interest. The immunoprecipitated material is turned into a sequencing library and sequenced. The sequencing reads are then aligned to the genome. A control sample consisting of sonicated chromatin that has not been immunoprecipitated or immunoprecipitated with a non-specific immunoglobulin is also sequenced. The ChIP and the control datasets are analyzed with a variety of software packages to identify regions occupied by the target protein. The sequencing data, alignments and analysis files for these experiments are available for download.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. This track contains the following views:

Peaks
Regions of signal enrichment based on processed data. Intensity is represented in grayscale, the darker shading shows higher intensity (a solid vertical line in the peak region represents the point with the highest signal).
Signal
Density graph (wiggle) of signal enrichment based on processed data of all mapped read intensity of the signal is represented as RPM (Read Per Million).

Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks.

Methods

Cells were grown according to the approved ENCODE cell culture protocols.

Chromatin immunoprecipitation followed published methods (Johnson & Mortazavi et al., 2007) with the exception of certain experiments for which glutaraldehyde was added to the crosslink reaction. Information on the antibodies used is available via the metadata for each subtrack. Libraries were constructed using the Illumina ChIP-seq Sample Preparation Kit or using a modified protocol that includes the addition of multiplexing tags to the fragments. DNA fragments were repaired to generate blunt ends and a single A nucleotide was added to each end. Double-stranded Illumina adaptors or Double-stranded Illumina adaptors with multiplexing tags were ligated to the fragments. Ligation products were amplified by 18 cycles of PCR, and the DNA between 150-250 bp was gel purified. Completed libraries were quantified with Quant-iT dsDNA HS Assay Kit. The DNA library was sequenced on the Illumina GAII and GAIIx sequencing systems, and more recently, for multiplexed libraries, several of them were pooled and sequenced on the HiSeq platform. Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. Older libraries were generated using 2 rounds of PCR. Matched input samples were sequenced for each variation of fixation conditions and the number of PCR rounds. Reads of 32 bp, 36 bp or 50 bp length were generated.

Sequencing reads (fastq files) were assigned to the corresponding libraries based on the multiplexing tag for pooled libraries (all tags have been removed from reads in the fastq files available for download) or directly processed. Bowtie (Langmead et al., 2009) was used to map reads to the male or female version of the mouse genome (excluding the _random chromosomes in the assembly) depending on the cell line sex. The following parameters were used: "-v 2 -k 11 -m 10 -t --best --strata". Aligned reads were converted into rds files using the ERANGE package (Johnson & Mortazavi et al., 2007) and the findall.py program in ERANGE was used to identify enriched regions against the matching input sample. The following settings were used for point-source transcription factors: "--shift learn --ratio 3 --minimum 2 --listPeak --revbackground". For histone modifications, the settings were changed to "--notrim --nodirectionality --spacing 100 --ratio 3 --minimum 2 --listPeak --revbackground".

Credits

Cell growth, ChIP, and Illumina library construction were done in the laboratory of Barbara Wold, (California Institute of Technology). Sequencing was done at the Millard and Muriel Jacobs Genetics and Genomics Laboratory at the California Institute of Technology, initial HiSeq data was generated at Illumina Inc., Hawyard, CA.

Cell growth and ChIP: Georgi Marinov, Katherine Fisher, Gordon Kwan, Antony Kirilusha, Ali Mortazavi, Gilberto DeSalvo, Brian Williams
Library Construction, Sequencing and Primary Data Handling: Lorianne Schaeffer, Diane Trout , Igor Antoschechkin (California Institute of Technology), Lu Zhang, Gary Schroth (Illumina Inc.)
Data Processing and Submission: Georgi Marinov, Diane Trout

Contact: Barbara Wold, Georgi K. Marinov, Diane Trout

References

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.

Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007 Jun 8;316(5830):1497-502.

Wold B, Myers RM. Sequence census methods for functional genomics. Nat Methods. 2008 Jan;5(1):19-21.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.