Bertone Yale TAR Track Settings

Home
Genomes
Genome Browser
Tools
Mirrors
- Third Party Mirrors
- Mirroring Instructions
Downloads
My Data
Projects
Help
About Us
- News
- Publications
- Blog
- Cite Us
- Credits
- Release Log
- Staff
- Contact Us
- Conditions of Use
- Jobs
- Licenses

Methods

Microarrays were designed using sequence from the human hg13 assembly. The genome sequence was screened for repetitive elements and low-complexity DNA using RepeatMasker in the sensitive mode. Additional low-complexity filtering was performed using the NSEG (segment sequence(s) by local complexity) program using a minimum segment length of 21 nucleotides to determine low complexity segments of lowest probability. After filtering, 1.5 Gb of nonrepetitive DNA remained and microarray probes were chosen using the NASA Oligonucleotide Probe Selection Algorithm (NOPSA).

NOPSA is designed to find the optimal probes for hybridization. A database of the frequency of every 18-mer in the genome is created using a hash algorithm. Chaining was used to resolve collisions. Average frequencies of 36-mers in the genome were determined from the frequencies of each 18-mer subsequence in the 36-mer and its reverse complement. 36-mer oligonucleotides with a frequency equal to one are selected as potential probes for the microarray (from supporting online material for Stolc et al., 2004)

This resulted in probe selection based on several criteria:

Every 36-mer in the genome is unique.

Sequences that could form a loop with a stem of > 7 bp were excluded.

Factors such as sequence length, extent of complementarity and base composition were also considered.

A total of 51,874,388 36-mer oligonucleotide probes were selected from both the sense and antisense strands at an average resolution of 46 bp to cover the non-repetitive sequence from the whole genome. Probes were spaced every 10 nucleotides on average. The probes were synthesized via maskless photolithography at a feature density of approximately 390,000 probes per slide.

Biological samples that were hybridized to the arrays consisted of triple-selected human liver poly(A)+ RNA pooled from several individuals (supplied by Ambion). One biological replicate was carried out.

See this NCBI GEO accession for details of experimental protocols.

The TARs identified for hg13 (NCBI Build 31) were mapped to this assembly using Blat. The program pslCDnaFilter was used to filter alignments using the parameters -minId=0.96, -minCover=0.25, -localNearBest=0.001,-minQSize=20, -minNonRepSize=16, -ignoreNs, -bestOverlap.

Data Analysis

Two groups of TARs were identified: Normal and Poly(A)-associated.

Normal TARs:

Clusters of transcription units were identified that consisted of at least five consectutive probes with fluorescence intensities in the top 90th intensity percentile and with genomic coordinates within a 250-nt window. After collecting these regions genome-wide, their locations were compared to those of annotated components of genes. As a result, a total of 13,889 transcription units, ranging in size from 209 to 3,438 nucleotides, were identified. Under the null hypothesis of zero transcription, only 400 were expected to be found. Of those regions identified, one-third (4,931) correspond to previously annotated exons while the other 8,958 are new transcribed sequences that are referred to as TARs.

Poly(A)-associated TARs:

Another set of criteria was used to find TARs in which the probe hybridization intensities were correlated with the presence of a polyadenylation signal 3' to the TAR. Transcription units are five consecutive probes with fluoroscence intensities in the top 80th intensity percentile and in a window of 250 nucleotides. The 3' region also must contain or be close to a polyadenylation signal. Transcription units with an associated polyadenylation signal of "AATAAA" were assigned to a type I group, while those with "ATTAAA" were type II. Only 100 of these should occur at random in the genome under the null hypothesis of zero transcription. The majority (1,991) were found to be within annotated exons, and 952 were located more than 10 kb from an annotated gene. A total of 1,371 type I and 674 type II poly(A) sequences were identified within exons of known genes. 1,289 (94%) of type I and 607 (90%) of type II instances were found to be in the 3' exon of the gene.

References

Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S et al. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004 Dec 24;306(5705):2242-6.

Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, Herreman T, Tongprasit W, Barbano PE et al. A gene expression map for the euchromatic genome of Drosophilamelanogaster. Science. 2004 Oct 22;306(5696):655-60.

Description

Methods

Display Conventions

Data Analysis

Normal TARs:

Poly(A)-associated TARs:

Verification

Credits

References