This section provides a step-by-step description of the process used to set up a track hub on
your own server.
To create your own hub you will need:
-
one or more data sets formatted in one of the compressed binary index formats
supported by the Genome Browser: bigBed, bigGenePred, bigPsl, bigChain, bigMaf, bigWig, BAM,
CRAM, HAL or VCF
-
a set of text files that specify properties for the track hub and for each of
the data tracks within it
-
a twoBit file with your sequence if you are setting up an assembly hub.
-
an Internet-enabled web server or ftp server
The files are placed on the server in a file hierarchy like the one
shown in Example 1.
Users experienced in setting up Genome Browser mirrors that
contain their own data will find that setting up a track
hub is similar, but is usually much easier. Depending on the number
and complexity of the data sets, a track hub can typically be set up in a day
or two.
It is generally easiest to run the command-line data formatting programs in a
Linux programming environment, although it's possible to manipulate smaller
data sets using Mac OS-X as well.
Example 1: Directory hierarchy for a hub containing DNase and
RNAseq data for the hg18 and hg19 human genome assemblies. The hg18/ and
hg19/ subdirectories contain the assembly-specific data files.
myHub/ - directory containing track hub files
hub.txt - a short description of hub properties
genomes.txt - list of genome assemblies included in the hub data
hg19/ - directory of data for the hg19 (GRCh37) human assembly
trackDb.txt - display properties for tracks in this directory
dnase.html - description text for a DNase track
dnaseLiver.bigWig - wiggle plot of DNase in liver
dnaseLiver.bigBed - regions of active DNase
liverGenes.bigGenePred - gene annotations of genes over-expressed in liver tissue
dnaseLung.bigWig - wiggle plot of DNase in lung
dnaseLung.bigWig - regions of active DNase
...
rnaSeq.html - description text for an RNAseq track
rnaSeqLiver.bigWig - wiggle plot of RNAseq data in liver
rnaSeqLiver.bigBed - intron/exon lists for liver
rnaSeqLung.bigWig - wiggle plot of RNAseq data in lung
rnaSeqLung.bigBed - intron/exon lists for lung
hg18/ - directory of data for the hg18 (Build 36) human assembly
trackDb.txt - display properties for tracks in this directory
dnase.html - description text for a DNase track
dnaseLiver.bigWig - wiggle plot of DNase data in liver
dnaseLiver.bigBed - regions of active DNase
dnaseLung.bigWig - wiggle plot of DNase data in lung
dnaseLung.bigWig - regions of active DNase
...
rnaSeq.html - description text for an RNAseq track
rnaSeqLiver.bigWig - wiggle plot of RNAseq data in liver
rnaSeqLiver.bigBed - intron/exon lists for liver
rnaSeqLung.bigWig - wiggle plot of RNAseq data in lung
rnaSeqLung.bigBed - intron/exon lists for lung
Step 1. Format the data
The data tracks provided by a hub must be formatted in one of the
compressed binary index formats supported by the Genome Browser:
bigWig,
bigBed,
bigGenePred,
bigChain,
bigPsl,
bigMaf,
bigWig,
BAM,
CRAM, HAL or
VCF.
bigWig -
The bigWig format is best for displaying continuous value plot data,
such as read depths from short read sequencing projects or levels of
conservation observed in a multiple-species alignment. A bigWig file contains a
list of chromosome segments, each of which is associated with a floating point
value. When graphed, the segments may appear as a big "wiggle".
Although each bigWig file can contain only a single value for any given base,
bigWig tracks are often combined into "container multiWig" or
"compositeTrack on" tagged tracks.
For information on creating and configuring bigWig tracks, see the
bigWig Track Format help page.
bigBed -
BigBed files are binary indexed versions of Browser Extensible Data
(BED) files. BED format is
useful for associating a name and (optionally) a color and a score with one
or more related regions on the same chromosome, such as all the exons of a gene.
See the bigBed Track Format help page for
information on creating and configuring bigBed tracks.
bigGenePred -
BigGenePred files are binary indexed versions of Browser Extensible Data
(BED) files with an extra eight fields that
are useful for describing gene predicitons that are modeled after the fields in
genePred files. BigGenePred format is
useful for associating a name and (optionally) a color and a score with one
or more related regions on the same chromosome, such as all the exons of a gene.
See the bigBed Track Format help page for
information on creating and configuring bigBed tracks.
bigPsl -
BigPsl files are binary indexed versions of PSL
files. BigPsl format is useful for large data sets created by BLAT or other tools.
See the bigPsl Track Format help page for more information
on creating and configuring bigPsl tracks.
bigChain -
BigChain files are binary indexed versions of chain files.
BigChain format is useful for large pairwise alignment data sets.
See the bigChain Track Format help page for more information
on creating and configuring bigChain tracks.
bigMaf -
BigMaf files are binary indexed versions of MAF
files. BigMaf format is useful for large multiple alignment data sets.
See the bigMaf Track Format help page for more information
on creating and configuring bigMaf tracks.
BAM -
BAM files contain alignments of (generally short) DNA reads to a
reference sequence, usually a complete genome. BAM files are binary
versions of Sequence Alignment/Map
(SAM) format files.
Unlike bigWig and bigBed formats, the
index for a BAM file is in a separate file, which the track hub
expects to be in the same directory with the same root name as the BAM file
with the addition of a .bai suffix.
See the BAM Track Format help page for
more information.
CRAM -
The CRAM file format is a more dense form of BAM
files with the benefit of saving much disk space. While BAM files contain
all sequence data within a file, CRAM files are smaller by taking
advantage of an additional external "reference sequence" file.
This file is needed to both compress and decompress the read information.
See the CRAM Track Format help page for
more information.
HAL -
HAL (Hierarchical Alignment Format) is a graph-based structure to efficiently
store and index multiple genome alignments and ancestral reconstructions.
HAL
files are represented in HDF5
format, an open standard for storing and indexing large, compressed scientific
data sets. HAL is the native output format of the Progressive Cactus alignment pipeline,
and is included in the
Progressive Cactus
installation package.
VCF -
VCF (Variant Call Format) files can contain annotations of single nucleotide
variants, insertions/deletions, copy number variants, structural variants
and other types of genomic variation. When a VCF file is compressed and
indexed using
tabix
(available here),
it can be used as a data track file.
Unlike bigWig and bigBed formats, the
tabix index is in a separate file, which the track hub
expects to be in the same directory with the same root name as the VCF file
with the addition of a .tbi suffix.
See the VCF Track Format help page for
more information.
Step 2. Create the track hub directory
Create a track hub directory in an Internet-accessible location on your web or
ftp server. This directory will contain the hub.txt and genomes.txt
files that define properties of the track hub and a subdirectory
for each of the genome assemblies covered by the hub track data.
Step 3. Place the track data files in an Internet-accessible location
The data files underlying a track in a hub do not have to reside
in the track hub directory or even on the same server, but they must be
accessible via the Internet.
The track hub utility supports Internet protocols such as http://,
https://, and ftp://, as well as file paths relative to the hub
directory hierarchy. The location of a track file is defined by its
bigDataUrl tag in the associated trackDb.txt file (Step 7).
Step 4. Create the hub.txt file
Within the hub directory, create a hub.txt file containing a single stanza
with up to six fields that define properties of the track hub:
hub hub_name
shortLabel hub_short_label
longLabel hub_long_label
genomesFile genomes_filelist
email email_address
descriptionUrl descriptionUrl
hub - a single-word name of the directory containing the track hub
files. Not displayed to hub users. This must be the first line in the hub.txt
file.
shortLabel - the short name for the track hub.
Suggested maximum length is 17 characters.
Displayed as the hub name on the Track Hubs page and the track group
name on the browser tracks page.
longLabel - a longer descriptive label for the track hub.
Suggested maximum length is 80 characters.
Displayed in the description field on the Track
Hubs page.
genomesFile - the relative path of the genomes.txt file, which
contains the list of genome assemblies covered by the track data and the names
of their associated configuration files. By convention the genomes.txt
file is located in the same directory as the hub.txt file.
email - the contact to whom questions regarding the track hub
should be directed.
descriptionUrl - URL to HTML page with a description of the hub's contents. This can be
relative to the directory which holds hub.txt. This file is assumed to be HTML, and if the hub is a UCSC public
hub, this HTML will be crawled nightly by UCSC to build an index with which public hubs can be searched.
If present, clicks on the shortLabel will open this HTML in a new tab. This field is optional.
Example 2: Sample hub.txt file defining attributes for the track
hub shown in Example 1.
hub UCSCHub
shortLabel UCSC Hub
longLabel UCSC Genome Informatics Hub for human DNase and RNAseq data
genomesFile genomes.txt
email genome@soe.ucsc.edu
descriptionUrl ucscHub.html
Step 5. Create the genomes.txt file
Create a genomes.txt file within the track hub directory that contains a two-line
stanza that must be separated by a line for each genome assembly that is supported by the hub data.
Each stanza shows the location of the trackDb file that defines
display properties for each track in that assembly.
genome assembly_database_1
trackDb assembly_1_path/trackDb.txt
genome assembly_database_2
trackDb assembly_2_path/trackDb.txt
genome - a valid UCSC database name. Each stanza must begin with
this tag and each stanza must be separated by an empty line.
trackDb - the relative path of the trackDb file for the assembly
designated by the genome tag. By convention, the trackDb file is
located in a subdirectory of the hub directory. However, the trackDb tag
may also specify a complete URL.
If this genomes.txt file is for an assembly that does not have native support in the browser, the following fields must also be present:
twoBitPath - refers to the .2bit file containing the sequence for this assembly. Typically this file is constructed from the original fasta files for the sequence using the kent program faToTwoBit. See HERE for instructions on how to build a 2bit file.
groups - a file which defines the track groups on this genome browser. Track groups are the sections of related tracks grouped together under the primary genome browser graphics display image.
The groups.txt file defines the grouping of track controls under the primary genome browser image display. The example referenced here has the usual definitions as found in the UCSC Genome Browser.
Each group is defined, for example the Mapping group:
name map
label Mapping
priority 2
defaultIsClosed 0
The name is used in the trackDb.txt track definition group, to assign a particular track to this group.
The label is displayed on the genome browser as the title of this group of track controls
The priority orders this track group with the other track groups
The defaultIsClosed determines if this track group is expanded or closed by default. Values to use are 0 or 1
description - will be displayed for user information on the gateway page and most title pages of this genome assembly browser. It is the name displayed in the assembly pull-down menu on the browser gateway page.
organism - the string which is displayed along with the description on most title pages in the genome browser. Adjust your names in organism and description until they are appropriate. This organism name is the name that appears in the genome pull-down menu on the browser gateway page.
defaultPos - specifies the default position the genome browser will open when a user first views this assembly. This is usually selected to highlight a popular gene or region of interest in the genome assembly.
orderKey - used with other genome definitions at this hub to order the pull-down menu ordering the genome pull-down menu.
htmlPath - refers to an html file that is used on the gateway page to display information about the assembly.
Example 3: Sample genomes.txt file defining attributes for the hub
shown in Example 1.
genome hg18
trackDb hg18/trackDb.txt
genome hg19
trackDb hg19/trackDb.txt
genome newOrg1
trackDb newOrg1/trackDb.txt
twoBitPath newOrg1/newOrg1.2bit
groups newOrg1/groups.txt
description Big Foot V4
organism BigFoot
defaultPos chr21:33031596-33033258
orderKey 4800
scientificName Biggus Footus
htmlPath newOrg1/description.html
Step 6. Create the genome assembly subdirectories
Within the track hub directory, create a subdirectory for each of the genome
assemblies that have track data in the hub. The subdirectory names must
have a 1:1 correspondence with the database names defined by the
genome tags in the genomes.txt file.
Step 7. Create the trackDb.txt files
The trackDb.txt file, which is based on the Genome Browser
.ra format, is the most complicated of the text files in the
hub directory. It contains a stanza for each of the data files for the
given assembly that defines display and configuration properties for the
track. If the tracks are grouped into larger entities, such as
composite or super-tracks, the larger entities will have a stanza in the
file as well.
The Track Database Definition Document
will help you understand how to create a trackDb.txt file. This document
describes how to declare dataset display settings and values, and indicates
the support level for each setting. While there are over 100 track settings supported at UCSC,
other sites that display hubs have more limited settings support.
To further portability of hubs, we have used input from other sites to identify a
'base' subset of the 'full' settings list, and the document has been assigned a version number.
See the document introduction for a fuller explanation.
At a minimum, each track in the trackDb.txt file must contain the 'required' settings:
track track_name
bigDataUrl track_data_URL
shortLabel short_label
longLabel long_label
type track_type
track - the symbolic name of the track. The first character
must be a letter, and the remaining characters must be letters, numbers,
or under-bar ("_"). Each track must have a unique name. This tag pair must be
the first entry in the trackDb.txt file.
bigDataUrl - the file name, path, or Web location of the
track's data file. The bigDataUrl can be a full URL. If it is not
prefaced by a protocol, such as http://, https:// or
ftp://, then it is considered to be a path
relative to the trackDb.txt file.
shortLabel - the short name for the track displayed in the
track list, in the configuration and track settings, and on the details
pages. Suggested maximum length is 17 characters.
longLabel - the longer description label for the track that is
displayed in the configuration and track settings, and on the
details pages. Suggested maximum length is 80 characters.
type - the format of the file specified by bigDataUrl. Must be either
bigWig, bigBed, bigGenePred, bigChain,
bigPsl, bigMaf, bam, halSnake or vcfTabix
(Note: use type bam for CRAM files). If the type
is bigBed, it may be followed by an optional number denoting the number
of fields in the bigBed file (e.g., "type bigBed 12" for a file with 12 fields).
If no number is given, a default value of 3 is assumed (a very limited display
that omits names, strand information, and exon boundaries).
Example 4: Sample trackDb.txt file containing two simple tracks.
track dnaseSignal
bigDataUrl dnaseSignal.bigWig
shortLabel DNAse Signal
longLabel Depth of alignments of DNAse reads
type bigWig
track dnaseReads
bigDataUrl dnaseReads.bam
shortLabel DNAse Reads
longLabel DNAse reads mapped with MAQ
type bam
Suggestions:
Default subtracks for composite: For each composite, it is recommended that a
subset of subtracks are 'selected' (on) by default. This way, when a user turns the composite
from hide to another visibility, they will see tracks displayed in the browser.
Default composites within a super-track: For super-tracks that you don't want
displaying by default when your track hub is turned on, it is recommended that some (or all)
composites within the super-track be set to dense (or some visibility other than hide) by
default and that the super-track be set to hide by default. This way, if a user changes a
super-track from hide to show from the controls under the browser image, tracks are displayed.
To implement, change the visibility line in trackDb of the super-tracks to hide and the
visibility lines of all or some of the composite tracks within to dense (or some visibility
other than hide).
hgTrackUi controls: In addition to the controls for each view (click
on the title of the view drop-down), there is often another set of controls above the view
drop-downs (just under the 'Overall display mode'). This set of controls is not associated with a
particular view and clashes with the view controls. It is recommended to remove the controls
that are not associated with a particular view.
Step 8. Create track description files
Each track in the hub may have an associated description file
that describes the track to viewers. The file provides detailed information
about the data displayed in the track, including methods used to produce and
validate the data, background information, display conventions,
acknowledgments, and reference publications. The description file, which
must be in HTML format, is inserted into the track configuration
page that displays when the user clicks on the track's short label. It
also displays on the track details page that is shown when the user
clicks on a feature in the track image.
The track description file must have the same name as the
symbolic name for the track (defined by the track tag in the
trackDb.txt file) with a suffix of .html. For instance, a
description file associated with the track named "dnaseSignal" in
Example 4 would be named "dnaseSignal.html". The
description file must reside in the same directory as the trackDb.txt file.
Both parent and child tracks within a super-track can have their own
description files. If the description file is not present, the
corresponding sections of the track settings and details pages are left
blank. Only one description page can be associated with composite and multiWig
tracks; the file name should correspond to the symbolic name of the top-level
track in the composite.
|