|
A twoBit file is a highly efficient way to store genomic sequence.
The format is defined here. Please note that
lower-case nucleotides will be considered masked in the twoBit,
which could cause such sequence to be ignored when using the -mask option with gfServer,
so one may wish to upper-case sequence when preparing the FASTA format.
To complete the steps below you
will need to download the faToTwoBit, twoBitInfo, and twoBitToFa utilities. For more information
on downloading our command line utilities, please see these
instructions.
To create a twoBit file, follow these steps:
- Prepare the sequence for your twoBit file in a FASTA formatted file (i.e. genome.fa).
- Run the faToTwoBit program on your FASTA file.
faToTwoBit genome.fa genome.2bit
- Use twoBitInfo to verify the sequences in this assembly and create a chrom.sizes file which is useful in later processing to construct the big* files:
twoBitInfo genome.2bit stdout | sort -k2rn > genome.chrom.sizes
The twoBit commands can function with the .2bit file at a URL:
twoBitInfo -udcDir=. http://your-website.edu/~user/genome.2bit | sort -k2nr > genome.chrom.sizes
Sequence can be extracted from the .2bit file with the twoBitToFa command, for example:
twoBitToFa -seq=chr1 -udcDir=. http://your-website.edu/~user/genome.2bit stdout > genome.chr1.fa
| |