Description
This track shows the results of a genome-wide scan for positively selected
genes (PSGs) based on multiple alignments of the human (hg18), chimp
(panTro2), macaque (rheMac2), mouse (mm8), rat (rn4), and dog (canFam2)
genome assemblies (Kosiol et al., 2008). The track displays
the 16,529 high-confidence orthologs that were tested,
and highlights those genes showing evidence of positive selection.
It summarizes the results of nine different likelihood ratio tests (LRTs)
for positive selection, as described below. Four classes of genes are
distinguished by score and color:
- Score = 1000; shown in red. Genes with strong evidence of positive
selection across species. These are the 400 genes whose P-values
under test A (see below) meet the threshold required for a false discovery
rate (FDR) of 0.05.
- Score = 700; shown in purple. Genes with strong evidence of positive
selection on one or more branches. These are the 144 additional
genes that meet the threshold for FDR < 0.05 under any of the branch-
and clade-specific tests B-I.
- Score = 400; shown in blue. Genes with weak evidence of positive
selection on one or more branches. These are the 3705 additional
genes whose nominal (unadjusted) P-values are < 0.05 under any
of tests A-I.
- Score = 0; shown in black. Genes with no significant evidence of
positive selection. These are the remaining genes, having
nominal P ≥ 0.05 under all of tests A-I.
In some cases, genes were truncated before testing to eliminate regions
with frame-shift indels or nonconserved exon boundaries. The track shows
just the portions of the genes that were tested, rather than the full gene
structures.
The 544 genes in groups 1 and 2, above, were also subjected to a novel
Bayesian analysis to determine their most likely "selection
histories", or
patterns of positive selection and non-selection on the branches of the
six-species phylogeny. These selection histories are described graphically
on the details page, with red indicating branches under positive selection,
and black indicating branches free from positive selection.
Schema and Identifiers
The P-values for all tests are stored in the browser database
and can be incorporated into filters using the table browser.
A P-value of 2 indicates that a test was not performed, due to
insufficient data. (For example, test G could not be performed if the
chimp sequence was missing or did not pass the quality filters.) The
*isFdr columns indicate which genes are significant with FDR < 0.05 for
each test. Thus, filtering for lrtPrimateClPValue < 0.05 will retrieve all
genes showing weak evidence of positive selection in the primate clade, and
filtering for lrtRodentBrIsFdr = 1 will retrieve all genes showing strong
evidence of positive selection on the branch to the rodents.
The original source of each gene structure (RefSeq, Vega, or UCSC) is
given in its identifier. A suffix of ".inc" indicates that a
gene was truncated before testing.
Methods
Human genes from the RefSeq, Vega, and UCSC Genes sets were mapped onto
multiz-based multiple alignments, then subjected to a strict set of filters
to identify high-confidence one-to-one orthologs. Each gene was required
to be present in at least three species, and recently duplicated genes were
removed (see Kosiol et al., 2008 for details). These orthologous
genes were then examined for evidence of positive selection using a series
of nine likelihood ratio tests (LRTs) based on Yang and Nielsen's (2002)
branch-site framework. These LRTs essentially measure evidence for
positive selection in terms of how much better the data is fit by a model
that allows for positive selection (on particular branches of the tree)
than by a model that allows only for purifying selection and neutral
evolution. LRTs were performed for positive selection on:
- A: all branches
- B: branch to primates
- C: all branches in primate clade
- D: branch to rodents
- E: all branches in rodent clade
- F: branch to human
- G: branch to chimp
- H: branch to hominids
- I: branch to macaque
The Bayesian analysis estimates a posterior distribution over all possible
selection histories under a simple model that allows positive selection to
switch on and off along the branches of the phylogeny. The inference was
performed by Gibbs sampling for just the 544 genes showing strong evidence of
positive selection according to the LRTs.
See the supplementary website for additional raw data.
References
Kosiol C, Vinar T, da Fonseca R, Hubisz M, Bustamante C, Nielsen R, and Siepel A.
Patterns of Positive Selection in Six Mammalian Genomes. PLoS Genetics. 2008 June;4(8): e1000144.
Yang Z, Nielsen R. Codon substitution models for detecting molecular adaptation at individual sites along specific lineages.
Molecular Biology and Evolution. 2002 June;19(6):908-17.
|