|
The bigMaf format stores multiple alignments in a format compatible with
MAF files, which are then compressed and
indexed as bigBeds. bigMaf files are created using the program bedToBigBed with a
special AutoSQL file that defines the fields of the bigMaf. The
resulting bigMaf files are in an indexed binary format. The main advantage of
the bigMaf files is that only portions of the files needed to display a
particular region are transferred to UCSC. So for large data sets, bigMaf is
considerably faster than regular MAF files. The bigMaf file remains on
your web accessible server (http, https, or ftp), not on the UCSC server.
Only the portion that is needed
for the chromosomal position you are currently viewing is locally cached as a
"sparse file".
Big MAF
The following AutoSQL definition is used for bigMaf multiple alignment files.
This is the bigMaf.as
file defined by the -as option when using bedToBigBed.
table bedMaf
"Bed3 with MAF block"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position in chromosome"
uint chromEnd; "End position in chromosome"
lstring mafBlock; "MAF block"
)
Note that the bedToBigBed utility uses a substantial amount of
memory; somewhere on the order of 1.25 times more RAM than the
uncompressed BED input file.
To create a bigMaf track, follow these steps:
- If you already have a MAF file you would like to convert to a bigMaf, skip to Step 3,
otherwise download the example
MAF file for the Human
GRCh38(hg38) assembly.
- If you would like to include optional reading frame and block summary information with our
example MAF file, please download the
chr22_KI270731v1_random.gp
GenePred file.
- Download the AutoSQL files needed by bedToBigBed:
- bigMaf.as
- If you would like to include optional frame summary and information with your bigMaf file,
you will also want to download the
mafSummary.as and
mafFrames.as files.
- Download the bedToBigBed program from the
directory
of binary utilities.
- If you would like to generate the optional frame and summary files for your
multiple alignment, also download the hgLoadMafSummary,
genePredSingleCover, and genePredToMafFrames programs
from the same
directory.
- Use the fetchChromSizes script from the same
directory
to create a chrom.sizes file for the UCSC database you are working with
(e.g. hg38). Alternatively, you can download the chrom.sizes file for
any assembly hosted at UCSC from our
downloads page (click on "Full data set" for any assembly). For example, for the hg38
database, the hg38.chrom.sizes are located at
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.
- Download the mafToBigMaf.awk script.
- Create the bigMaf file from your sorted bigMaf input file using a combination of
awk, sed and the bedToBigBed utility like so:
awk -f mafToBigMaf.awk chr22_KI270731v1_random.maf | sed 's/^$/d' | sed 's/hg38.//' > bigMaf.txt
bedToBigBed -type=bed3+1 -as=bigMaf.as -tab bigMaf.txt chrom.sizes bigMaf.bb
- Follow the below steps to create the binary indexed mafFrames and mafSummary files
to accompany your bigMaf file:
genePredSingleCover chr22_KI270731v1_random.gp single.gp
genePredToMafFrames hg38 chr22_KI270731v1_random.maf bigMafFrames.txt hg38 single.gp
awk -f mafToBigMaf.awk chr22_KI270731v1_random.maf | sed '/^$/d' | sed 's/hg38.//' > bigMafFrames.txt
bedToBigBed -type=bed4+7 -as=mafFrames.as -tab bigMafFrames.txt chrom.sizes bigMafFrames.bb
hgLoadMafSummary -minSeqSize=1 -test hg38 bigMafSummary chr22_KI270731v1_random.maf
cut -f 2 bigMafSummary.tab | sort -k1,1 -k2,2n > bigMafSummary.bed
bedToBigBed -type=bed3+4 -as=mafSummary.as -tab bigMafSummary.bed chrom.sizes bigMafSummary.bb
- Move the newly created bigMaf file (bigMaf.bb) to an http,
https, or ftp location.
- If you generated the bigMafSummary.bb and/or bigMafFrames.bb
files, they will also need to be in a web accessible location, likely in
the same location as bigMaf.bb.
- Construct a custom track
using a single
track line.
Note that any of the track attributes listed
here are applicable
to tracks of type bigBed.
The most basic version of the "track" line will look something
like this:
track type=bigMaf name="My Big MAF" description="A Multiple Alignment" bigDataUrl=http://myorg.edu/mylab/myBigGenePred.bb
- Paste this custom track line into the text box on the
custom track management page.
The bedToBigBed program can also be run with several additional options.
Run bedToBigBed with no arguments to view a full list of available options.
Example One
In this example, you will use an existing bigMaf file to create a bigMaf
custom track. A bigMaf file that contains data on the hg38
assembly has been placed on our http server.
You can create a custom track using this bigMaf file by constructing a
"track" line that references this file like so:
track type=bigMaf name="bigMaf Example One"
description="A bigMaf file"
bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb
Paste the above "track" line into the
custom track management page for the
human assembly hg38 (Dec. 2013), then press the submit button.
Please note that additional track line options exist that are specific for
the MAF format. For instance, adding
speciesOrder="panTro4 rheMac3 mm10 rn5 canFam3 monDom5"
to the above example will allow specifying the order of sequences.
Custom tracks can also be loaded via one URL line. The below link loads the same
bigMaf track, but includes parameters on the URL line:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random&hgct_customText=track%20type=bigMaf%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb%20visibility=pack
With this example bigMaf loaded, click into an alignment from the track. Note
that the details page has information about the individual alignments, similar
to the details page of a standard MAF track.
Example Two
In this example, you will create your own bigMaf file from an existing
bigMaf input file.
- Save this bed3+1 bigMaf.txt
example input file to your machine (satisfies the first part of the above step 6).
- Save this bigMaf.as text file to your machine
(Step 2).
- Download the bedToBigBed utility
(step 3).
- Save this hg38.chrom.sizes text file to your machine.
It contains the chrom.sizes for the human (hg38) assembly
(step 4).
- Use the bedToBigBed utility to create the binary indexed MAF file
(completes step 6):
bedToBigBed -type=bed3+1 -tab -as=bigMaf.as bigMaf.txt hg38.chrom.sizes bigMaf.bb
- Place the bigMaf file you just created (bigMaf.bb) on a
web-accessible server (step 8).
- Construct a "track" line that points to your bigMaf file
(see step 9).
- Create the custom track on the human assembly hg38 (Dec. 2013), and
view it in the Genome Browser (see step 10).
Sharing Your Data with Others
If you would like to share your bigMaf data track with a colleague, learn
how to create a URL by looking at Example 11 on
this page.
Extracting Data from the bigMaf Format
Since the bigMaf files are an extension of bigBed files, which are indexed binary files,
they can be difficult to
extract data from. We have developed the following
programs, all of which are available from the
directory of binary
utilities.
- bigBedToBed — this program converts a bigBed file
to ASCII BED format.
- bigBedSummary — this program extracts summary information
from a bigBed file.
- bigBedInfo — this program prints out information about a
bigBed file.
As with all UCSC Genome Browser programs, simply type the program name
at the command line with no parameters to see the usage statement.
Troubleshooting
If you encounter an error when you run the bedToBigBed program,
it may be because your input bigMaf file has data off the end of a chromosome.
In this case, use the bedClip program
here before the
bedToBigBed program. It will remove the row(s) in your input BED
file that are off the end of a chromosome.
| |