|
Variant Call Format
(VCF) is a flexible and extendable line-oriented text
format developed by the 1000 Genomes Project for releases of single nucleotide variants,
indels, copy number variants and structural variants discovered by the project.
When a VCF file is compressed and indexed using
tabix, and made web-accesible,
the Genome Browser can fetch only the portions of the file necessary
to display items in the viewed region.
This makes it possible to display variants from files that
are so large that the connection to UCSC would time out when
attempting to upload the whole file to UCSC.
Both the VCF file and its tabix index file remain on your
web-accessible server (http, https, or ftp), not on the UCSC server.
UCSC temporarily caches the accessed portions of the files to speed up
interactive display. Please note that UCSC only supports VCF
versions 3.3, 3.4, 4.0 and 4.1.
The typical workflow for generating a VCF custom track is this:
- If you haven't done so already,
download and build the
tabix and bgzip
programs. Test your installation by running tabix with no
command line arguments; it should print a brief usage message.
For help with tabix, please contact the
samtools-help mailing list (tabix is part of the samtools project).
- Create VCF or convert another format to VCF. Items must be sorted by genomic position.
- Compress your .vcf file using the bgzip program:
bgzip my.vcf
For more information about the bgzip command, run
bgzip with no other arguments.
- Create a tabix index file for the bgzip-compressed VCF (.vcf.gz):
tabix -p vcf my.vcf.gz
The tabix command appends .tbi to my.vcf.gz, creating a
binary index file my.vcf.gz.tbi with which
genomic coordinates can quickly be translated into file offsets in
my.vcf.gz.
- Move both the compressed VCF file and tabix index file (my.vcf.gz and
my.vcf.gz.tbi) to an http, https, or ftp location.
- Construct a custom track
using a single
track line.
The most basic version of the track line will look something
like this:
track type=vcfTabix name="My VCF" bigDataUrl=http://myorg.edu/mylab/my.vcf.gz
Again, in addition to http://myorg.edu/mylab/my.vcf.gz, the
associated index file http://myorg.edu/mylab/my.vcf.gz.tbi
must also be available at the same location.
- Paste the custom track line into the text box in the
custom track
management page, click submit and view in the Genome Browser.
Parameters for VCF custom track definition lines
All options are placed in a single line separated by spaces (lines are broken
only for readability here):
track type=vcfTabix bigDataUrl=http://...
hapClusterEnabled=true|false hapClusterColorBy=altOnly|refAlt|base
hapClusterTreeAngle=triangle|rectangle hapClusterHeight=N
applyMinQual=true|false minQual=Q minFreq=F
name=track_label description=center_label
visibility=display_mode priority=priority
db=db maxWindowToDraw=N
chromosomes=chr1,chr2,...
Note if you copy/paste the above example, you must remove the line breaks.
Click here for a text version that you can paste
without editing.
The track type and bigDataUrl are REQUIRED:
type=vcfTabix bigDataUrl=http://myorg.edu/mylab/my.vcf.gz
The remaining settings are OPTIONAL. Some are specific to VCF:
hapClusterEnabled true|false # if file has phased genotypes, sort by local similarity
hapClusterColorBy altOnly|refAlt|base # coloring scheme, default altOnly, conditional on hapClusterEnabled
hapClusterTreeAngle triangle|rectangle # draw leaves as < or [, default <, conditional on hapClusterEnabled
hapClusterHeight N # height of track in pixels, default 128, conditional on hapClusterEnabled
applyMinQual true|false # if true, don't display items with QUAL < minQual; default false
minQual Q # minimum value of Q column to display item, conditional on applyMinQual
minFreq F # minimum minor allele frequency to display item; default 0.0
Other optional settings are not specific to VCF, but relevant:
name track label # default is "User Track"
description center label # default is "User Supplied Track"
visibility squish|pack|full|dense|hide # default is hide (will also take numeric values 4|3|2|1|0)
priority N # default is 100
db genome database # e.g. hg19 for Human Feb. 2009 (GRCh37)
maxWindowToDraw N # don't display track when viewing more than N bases
chromosomes chr1,chr2,... # track contains data only on listed reference assembly sequences
The VCF track configuration help page
describes the VCF track configuration page options.
Example One
In this example, you will create a custom track for an indexed VCF file that
is already on a public server — variant calls generated by the
1000 Genomes Project.
The line breaks inserted here for readability must be removed before submitting
the track line:
browser position chr21:33,034,804-33,037,719
track type=vcfTabix name="VCF Example One" description="VCF Ex. 1: 1000 Genomes phase 1 interim SNVs"
chromosomes=chr21 maxWindowToDraw=200000
db=hg19 visibility=pack
bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/vcfExample.vcf.gz
The "browser" line above is used to view a small region of
chromosome 21 with variants from the .vcf.gz file.
Note if you copy/paste the above example, you must remove the line breaks
(or, click here for a text version that you
can paste without editing).
Paste the "browser" line and "track" line into the
custom track management page
for the human assembly hg19 (Feb. 2009), then press the submit button.
On the following page, press the chr21 link in the custom track
listing to view the VCF track in the Genome Browser.
Example Two
In this example, you will create compressed, indexed VCF from an existing VCF text file.
First, save this VCF file vcfExampleTwo.vcf
to your machine.
Perform steps 1 and 3-7 in the workflow described above, but substituting
vcfExampleTwo.vcf for my.vcf. On the
custom track management page,
click the "add custom tracks" button if necessary and
make sure that the genome is set to Human and the assembly is set to Feb.
2009 (hg19) before pasting the track line and submitting.
This track line is a little nicer than the one shown in step 6, but remember
to remove the line breaks that have been added to the track line for
readability (or, click here for a text version
that you can paste without editing):
track type=vcfTabix name="VCF Example Two" bigDataUrl=http://myorg.edu/mylab/vcfExampleTwo.vcf.gz
description="VCF Ex. 2: More variants from 1000 Genomes" visibility=pack
db=hg19 chromosomes=chr21
browser position chr21:33,034,804-33,037,719
browser pack snp132Common
Example Three
In this example, you will load a hub that has VCF data described in a hub's trackDb.txt file.
First, navigate to the Basic Hub Quick Start Guide
and review an introduction to hubs.
Visualizing VCF files in hubs involves creating three text files called the hub.txt, genomes.txt, and
lastly trackDb.txt. The browser is given a URL to the top level hub.txt file that points to the
related genomes.txt and trackDb.txt files. In the trackDb.txt, there are stanzas for each track
that outlines the details for and type of each track to display, such as these lines for a VCF file
located at the bigDataUrl location:
track vcf1
bigDataUrl http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/ALL.chr21.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf.gz
#Note: there is a corresponding fileName.vcf.gz.tbi in the same above directory
shortLabel chr21 VCF example
longLabel This chr21 VCF file is an example from the 1000 Genomes Phase 1 Integrated Variant Calls Track
type vcfTabix
visibility dense
Here is a direct link to the trackDb.txt
to see more information about this example hub, and below is a direct link to visualise the hub in the browser
where this example VCF file displays in dense along side the other tracks in this hub.
You can find more Track Hub VCF display options on the Track Database (trackDb) Definition Document page.
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/hub.txt
Sharing Your Data with Others
If you would like to share your VCF data track with a colleague, learn
how to create a URL by looking at Example 11 on
this page.
| |