Description
This track shows the genomic positions of somatic variants found through whole genome sequencing of tumors
as part of The Cancer Genome Atlas (TCGA) by the National Cancer Institute, made available through
the Genomic Data Commons Portal. The
data shown here is sometimes called the "Pan-Cancer dataset", a collection of thirty-three
TCGA projects processed in a uniform way.
Display Conventions and Configuration
Variants can be filtered by project and gender from the track details page. Pressing the
"Advanced" button allows the user to specify whether the checked values all have to be
true of a particular variant, or if only one of them need be present to satisfy the filter.
The vertical viewing range in full mode can also be used to filter what variants are shown. Variants
that have a sampleCount more or less than the min and max values specificed in the viewing range are
not displayed.
Data access
The raw data can be explored interactively with the Table Browser or the Data
Integrator.
For automated download and analysis, the genome annotation for all the thirty-three projects is
stored in a bigBed file that can be downloaded from
our
download server. There are also bigBed files for each of the thirty-three projects in that
directory. Individual regions or the whole genome annotation can be obtained using our tool
bigBedToBed which can be compiled from the source code or downloaded as a precompiled
binary for your system. Instructions for downloading source code and binaries can be found
here. The tool can also be used to obtain only features within a given range,
e.g.,
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/gdcCancer/gdcCancer.bb -chrom=chr21 -start=0 -end=100000000 stdout
Methods
All MuTect Variant calls were downloaded from the GDC portal in January 2019 and reformatted at UCSC
to the bigBed format with a short
script, cancerMafToBigBed.
Credits
Thanks to GDC for making the TCGA data available on their web site.
|