Description
This track contains
dbSNP
build 126, available from
ftp.ncbi.nih.gov/snp.
Interpreting and Configuring the Graphical Display
Variants are shown as single tick marks at most zoom levels.
When viewing the track at or near base-level resolution, the displayed
width of the SNP corresponds to the width of the variant in the reference
sequence. Insertions are indicated by a single tick mark displayed between
two nucleotides, single nucleotide polymorphisms are displayed as the width
of a single base, and multiple nucleotide variants are represented by a
block that spans two or more bases.
The configuration categories reflect the following definitions (not all categories apply
to this assembly):
-
Location Type: Describes the alignment of the flanking sequence
- Range - the flank alignments leave a gap of 2 or more bases in the reference assembly
- Exact - the flank alignments leave exactly one base between them
- Between - the flank alignments are contiguous; the variation is an insertion
- RangeInsertion - the flank alignments surround a distinct polymorphism between
the submitted sequence and reference assembly;
the submitted sequence is shorter
- RangeSubstitution - the flank alignments surround a distinct polymorphism between
the submitted sequence and reference assembly;
the submitted sequence and the reference assembly sequence are of equal length
- RangeDeletion - the flank alignments surround a distinct polymorphism between
the submitted sequence and reference assembly;
the submitted sequence is longer
-
Class: Describes the observed alleles
- Single - single nucleotide variation: all observed alleles are single nucleotides
(can have 2, 3 or 4 alleles)
- In-del - insertion/deletion (applies to RangeInsertion, RangeSubstitution, RangeDeletion)
- Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'
- Microsatellite - the observed allele from dbSNP is variation in counts of short tandem repeats
- Named - the observed allele from dbSNP is given as a text name
- No Variation - no variation asserted for sequence
- Mixed - the cluster contains submissions from multiple classes
- Multiple Nucleotide Polymorphism - alleles of the same length, length > 1, and from set of {A,T,C,G}
- Insertion - the polymorphism is an insertion relative to the reference assembly
- Deletion - the polymorphism is a deletion relative to the reference assembly
- Unknown - no classification provided by data contributor
-
Validation: Method used to validate
the variant (each variant may be validated by more than one method)
- By Frequency - at least one submitted SNP in cluster has frequency data submitted
- By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method
- By Submitter - at least one submitter SNP in cluster was validated by independent assay
- By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes
- By HapMap - validated by HapMap project
- Unknown - no validation has been reported for this variant
-
Function: dbSNP's predicted functional effect of variant on RefSeq transcripts,
both curated (NM_* and NR_*) as in the RefSeq Genes track and predicted (XM_* and XR_*),
not shown in UCSC Genome Browser.
A variant may have more than one functional role if it overlaps
multiple transcripts.
- Locus Region - variation is 3' to and within 500 bases of a
transcript, or is 5' to and within 2000 bases of a transcript
(dbSNP term: locus;
Sequence Ontology term:
feature_variant)
- Coding - Synonymous - no change in peptide for allele with
respect to the reference assembly
(dbSNP term: coding-synon;
Sequence Ontology term:
synonymous_variant)
- Coding - Non-Synonymous - change in peptide for allele with
respect to the reference assembly
(dbSNP term: coding-nonsynon;
Sequence Ontology term:
protein_altering_variant)
- Untranslated - variation is in a transcript, but not in a coding
region interval
(dbSNP term: untranslated;
Sequence Ontology term:
UTR_variant)
- Intron - variation is in an intron, but not in the first two or
last two bases of the intron
(dbSNP term: intron;
Sequence Ontology term:
intron_variant)
- Splice Site - variation is in the first two or last two bases
of an intron
(dbSNP term: splice-site;
Sequence Ontology term:
splice_site_variant)
- Reference (coding) - one of the observed alleles of a SNP
in a coding region matches the reference assembly (cds-reference)
Sequence Ontology term:
coding_sequence_variant)
- Unknown - no known functional classification
-
Molecule Type: Sample used to find this variant
- Genomic - variant discovered using a genomic template
- cDNA - variant discovered using a cDNA template
- Unknown - sample type not known
-
Average heterozygosity: Calculated by dbSNP as described
here
- Average heterozygosity should not exceed 0.5 for bi-allelic
single-base substitutions.
-
Weight: Alignment quality assigned by dbSNP
- Weight can be 0, 1, 2, 3 or 10.
- Weight = 1 are the highest quality alignments.
- Weight = 0 and weight = 10 are excluded from the data set.
- A filter on maximum weight value is supported, which defaults to 3.
You can configure this track such that the details page displays
the function and coding differences relative to
particular gene sets. Choose the gene sets from the list on the SNP
configuration page displayed beneath this heading: On details page,
show function and coding differences relative to.
When one or more gene tracks are selected, the SNP details page
lists all genes that the SNP hits (or is close to), with the same keywords
used in the function category. The function usually
agrees with NCBI's function, but can sometimes give a bit more detail
(e.g. more detail about how close a near-gene SNP is to a nearby gene).
Insertions/Deletions
dbSNP uses a class called 'in-del'. This has been split into the 'insertion' and
'deletion' categories, based on location type. The location types 'range' and 'exact' are deletions
relative to the reference assembly. The location type 'between' indicates
insertions relative
to the reference assembly. For the new location types, the class 'in-del' is preserved.
UCSC Annotations
In addition to presenting the dbSNP data, the following annotations are provided:
- The dbSNP reference allele is compared to the UCSC reference allele, and a note is made if the
dbSNP reference allele is the reverse complement of the UCSC reference allele.
- Single-base substitutions where the alignments of the flanking sequences are adjacent
or have a gap of more than one base are noted.
- Observed alleles with an unexpected format are noted.
- The length of observed alleles is checked for consistency with location types;
exceptions are noted.
- Single-base substitutions are checked to see that one of the observed alleles matches
the reference allele; exceptions are noted.
- Simple deletions are checked to see that the observed allele matches the reference allele;
exceptions are noted.
- Tri-allelic and quad-allelic single-base substitutions are noted.
- Variants that have multiple mappings are noted.
Data Sources
- Coordinates, orientation, location type and dbSNP reference allele data
were obtained from b126_SNPContigLoc_36_1.bcp.gz.
- b126_SNPMapInfo_36_1.bcp.gz provided the alignment weights; alignments with
weight = 0 or weight = 10 were filtered out.
- Class and observed polymorphism were obtained from the shared UniVariation.bcp.gz,
using the univar_id from SNP.bcp.gz as an index.
- Functional classification was obtained from b126_SNPContigLocusId_36_1.bcp.gz.
The internal database representation
uses dbSNP's function terms, but for display in SNP details pages,
these are translated into
Sequence Ontology terms.
- Validation status and heterozygosity were obtained from SNP.bcp.gz.
- The header lines in the rs_fasta files were used for molecule type.
Orthologous Alleles (human only)
Beginning with the March 2006 human assembly, we provide a related table that
contains orthologous alleles in the chimpanzee and rhesus macaque assemblies.
We use our liftOver utility to identify the orthologous alleles. The candidate human SNPs are
a filtered list that meet the criteria:
- class = 'single'
- locType = 'exact'
- chromEnd = chromStart + 1
- align to just one location
- are not aligned to a chrN_random chrom
- are biallelic (not tri or quad allelic)
In some cases the orthologous allele is unknown; these are set to 'N'.
If a lift was not possible, we set the orthologous allele to '?' and the orthologous start and end
position to 0 (zero).
References
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K.
dbSNP: the NCBI database of genetic variation. .
Nucleic Acids Res. 2001 Jan 1;29(1):308-11.
|
|