Pos Sel Genes Track Settings
 
Positively Selected Genes (6 species)   (All Genes and Gene Predictions tracks)

Display mode:   

Show only items with score at or above:   (range: 0 to 1000)

View table schema
Data last updated: 2008-07-08

Description

This track shows the results of a genome-wide scan for positively selected genes (PSGs) based on multiple alignments of the human (hg18), chimp (panTro2), macaque (rheMac2), mouse (mm8), rat (rn4), and dog (canFam2) genome assemblies (Kosiol et al., 2008). The track displays the 16,529 high-confidence orthologs that were tested, and highlights those genes showing evidence of positive selection. It summarizes the results of nine different likelihood ratio tests (LRTs) for positive selection, as described below. Four classes of genes are distinguished by score and color:

  1. Score = 1000; shown in red. Genes with strong evidence of positive selection across species. These are the 400 genes whose P-values under test A (see below) meet the threshold required for a false discovery rate (FDR) of 0.05.
  2. Score = 700; shown in purple. Genes with strong evidence of positive selection on one or more branches. These are the 144 additional genes that meet the threshold for FDR < 0.05 under any of the branch- and clade-specific tests B-I.
  3. Score = 400; shown in blue. Genes with weak evidence of positive selection on one or more branches. These are the 3705 additional genes whose nominal (unadjusted) P-values are < 0.05 under any of tests A-I.
  4. Score = 0; shown in black. Genes with no significant evidence of positive selection. These are the remaining genes, having nominal P ≥ 0.05 under all of tests A-I.

In some cases, genes were truncated before testing to eliminate regions with frame-shift indels or nonconserved exon boundaries. The track shows just the portions of the genes that were tested, rather than the full gene structures.

The 544 genes in groups 1 and 2, above, were also subjected to a novel Bayesian analysis to determine their most likely "selection histories", or patterns of positive selection and non-selection on the branches of the six-species phylogeny. These selection histories are described graphically on the details page, with red indicating branches under positive selection, and black indicating branches free from positive selection.

Schema and Identifiers

The P-values for all tests are stored in the browser database and can be incorporated into filters using the table browser. A P-value of 2 indicates that a test was not performed, due to insufficient data. (For example, test G could not be performed if the chimp sequence was missing or did not pass the quality filters.) The *isFdr columns indicate which genes are significant with FDR < 0.05 for each test. Thus, filtering for lrtPrimateClPValue < 0.05 will retrieve all genes showing weak evidence of positive selection in the primate clade, and filtering for lrtRodentBrIsFdr = 1 will retrieve all genes showing strong evidence of positive selection on the branch to the rodents.

The original source of each gene structure (RefSeq, Vega, or UCSC) is given in its identifier. A suffix of ".inc" indicates that a gene was truncated before testing.

Methods

Human genes from the RefSeq, Vega, and UCSC Genes sets were mapped onto multiz-based multiple alignments, then subjected to a strict set of filters to identify high-confidence one-to-one orthologs. Each gene was required to be present in at least three species, and recently duplicated genes were removed (see Kosiol et al., 2008 for details). These orthologous genes were then examined for evidence of positive selection using a series of nine likelihood ratio tests (LRTs) based on Yang and Nielsen's (2002) branch-site framework. These LRTs essentially measure evidence for positive selection in terms of how much better the data is fit by a model that allows for positive selection (on particular branches of the tree) than by a model that allows only for purifying selection and neutral evolution. LRTs were performed for positive selection on:

  • A: all branches
  • B: branch to primates
  • C: all branches in primate clade
  • D: branch to rodents
  • E: all branches in rodent clade
  • F: branch to human
  • G: branch to chimp
  • H: branch to hominids
  • I: branch to macaque

The Bayesian analysis estimates a posterior distribution over all possible selection histories under a simple model that allows positive selection to switch on and off along the branches of the phylogeny. The inference was performed by Gibbs sampling for just the 544 genes showing strong evidence of positive selection according to the LRTs.

See the supplementary website for additional raw data.

References

Kosiol C, Vinar T, da Fonseca R, Hubisz M, Bustamante C, Nielsen R, and Siepel A. Patterns of Positive Selection in Six Mammalian Genomes. PLoS Genetics. 2008 June;4(8): e1000144.

Yang Z, Nielsen R. Codon substitution models for detecting molecular adaptation at individual sites along specific lineages. Molecular Biology and Evolution. 2002 June;19(6):908-17.