This track displays human tissue microarray data using Affymetrix Human Exon 1.0
ST expression arrays. This RNA expression track was produced as part of the
ENCODE Project. The RNA was extracted from cells that were also analyzed by DNaseI hypersensitivity (Duke DNaseI HS), FAIRE (UNC FAIRE), and ChIP (UTA TFBS).
Display Conventions and Configuration
In contrast to the hg18 annotation, this track now displays exon array data
that has been aggregated to the gene level for those probes that have been
linked to genes. Probes not linked to genes are not included.
The display for this track shows gene probe location and signal value as
grayscale-colored items where higher signal values correspond to darker-colored
blocks.
Items with scores between 900-1000 have signal values greater than 9 that have been linearly scaled for that particular cell type.
Items scoring 400-900 have signal values between 4 and 9, and the signal is simply multiplied by 100 to get the score.
Items with scores between 200-400 have signal values below 4 that have been linearly scaled to fit that score range.
The subtracks within this composite annotation track correspond to data from different
cell types and tissues. The configuration options are shown at the top of the track
description page, followed by a list of subtracks. To display only selected subtracks,
uncheck the boxes next to the tracks you wish to hide.
For information regarding specific microarray probes, turn on the Affy Exon Probes track, which
can be found in the Expression track group. See Methods for a description
as to how probe level data was processed to produce gene level annotations.
Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks.
Data from these tracks are stored as bed files whose first six fields follow the bed file standard. The three additional fields are as follows:
signalValue: The normalized expression value for a gene, calculated as described below.
exonCount: The number of exons used in the calculation of the expression value.
constitutiveExons: The number of constitutive exons used in the calculation of the expression value.
Methods
Cells were grown according to the approved
ENCODE cell culture protocols.
Total RNA was isolated from these cells using trizol extraction followed by
cleanup on RNEasy column (Qiagen) that included a DNaseI step.
The RNA was checked for quality using a nanodrop and an Agilent Bioanalyzer.
RNA (1 µg) deemed to be of good quality was then processed
either by 1) the standard Affymetrix Whole transcript Sense Target labeling protocol that included a riboreduction step,
or 2) the NuGEN labeling system.
The fragmented biotin-labeled cDNA was hybridized over 16 h to Affymetrix Exon 1.0 ST arrays and scanned on an Affymetrix Scanner 3000 7G using AGCC software.
Data from all replicates were then normalized together.
Probesets flagged as cross-hybridizing were removed from the analysis (Salomonis et al. 2010).
Though these arrays provide exon-level resolution, gene-level expression was estimated by
grouping probesets by gene for normalization (Bemmo et al. 2008). Probesets were assigned
to genes based on the GENCODE v10 annotation (July 2011). An exon was classified as
constitutive or non-constitutive based on whether it was present in all protein-coding transcripts.
For genes with at least 4 constitutive probes, only constitutive probesets were used to
estimate gene expression. For all other genes, including all non-protein-coding genes,
all (non-cross-hybridizing) probesets that mapped to an expressed exon in any transcript of the gene were used.
Gene-level expression estimates were normalized using Affymetrix Power Tools (APT) (Lockstone 2011)
with the chipstream command "rma-bg, med-norm, pm-gcbg, med-polish". This chipstream calls
for an RMA normalization with gc-background correction using antigenomic background probes.
While the data was generated using the same microarray platform, two different experimental
backgrounds were present due to a change in labeling reagents (Affymetrix vs. NuGEN; see Methods above).
It was found that batch effects related to this change were causing array data to group by experimental protocol
rather than cell type relatedness. We used an R script (ComBat) to correct for this batch effect (Johnson et al. 2007).
Verification
When biological replicates were available, data were verified by analyzing replicates displaying a Pearson correlation coefficient > 0.9.
Release Notes
This is release 3 of this track (April 2012). Several new cell types have been added. The name of cell line Astrocy was changed to NH-A.
Credits
RNA was extracted from each cell type by Greg Crawford's group at
Duke University.
RNA was purified and hybridized to Affymetrix Exon arrays by
Sridar Chittur and
Scott Tenenbaum at the University of Albany-SUNY.
Data analyses were primarily performed by
Nathan Sheffield (Duke University) with assistance from Melissa Cline (UCSC), Zhancheng Zhang (UNC Chapel Hill), and Darin London (Duke University).
Data users may freely use ENCODE data, but
may not, without prior consent, submit publications that use an
unpublished ENCODE dataset until nine months following the release of
the dataset. This date is listed in the Restricted Until column,
above. The full data release policy for ENCODE is available
here.