reczko@max:/data/results/reference/mmu/mm9/bfast$ bfast fasta2brg -A 1 -f mm9-stranded-mRNA.fa bfast fasta2brg -A 0 -f mm9-stranded-mRNA.fa p53 bfast index -f phi.x.174.fa -m 111111111111111111 -w 12 -i 1 bfast index -f phi.x.174.fa -m 1111111110111111111 -w 12 -i 2 bfast index -f phi.x.174.fa -m 111111011111101011111 -w 12 -i 3 bfast index -f phi.x.174.fa -m 111111011001100111011111 -w 12 -i 4 bfast index -f phi.x.174.fa -m 1111011101011111101111 -w 12 -i 5 bfast match -f phi.x.174.fa -r reads.phi.x.174.fastq > bfast.matches.file.phi.x.174.bmf bfast localalign -f phi.x.174.fa -m bfast.matches.file.phi.x.174.bmf > bfast.aligned.file.phi.x.174.baf bfast postprocess -f phi.x.174.fa -i bfast.aligned.file.phi.x.174.baf > bfast.reported.file.phi.x.174.sam cat reads.phi.x.174.fastq | bfast match -f phi.x.174.fa | bfast localalign -f phi.x.174.fa | bfast postprocess -f phi.x.174.fa > bfast.reported.file.phi.x.174.sam p59 ABI Solid: 1111111111111111111111 111110100111110011111111111 10111111011001100011111000111111 1111111100101111000001100011111011 111111110001111110011111111 11111011010011000011000110011111111 1111111111110011101111111 111011000011111111001111011111 1110110001011010011100101111101111 111111001000110001011100110001100011111 Convert the reads: $bfast-0.6.5b/scripts/solid2fastq -n 10000000 -o reads *.csfasta *.qual Convert the reference (nucleotide space): $bfast-0.6.5b/bfast fasta2brg -f hg18.fa Convert the reference (color space): $bfast-0.6.5b/bfast fasta2brg -f hg18.fa -A 1 Create the indexes: $bfast-0.6.5b/bfast index -f hg18.fa -m -w 14 -i -A 1 bfast index -f mm9-stranded-mRNA.fa -m 1111111111111111111111 -w 14 -i 1 -A 1 bfast index -f mm9-stranded-mRNA.fa -m 111110100111110011111111111 -w 14 -i 2 -A 1 bfast index -f mm9-stranded-mRNA.fa -m 10111111011001100011111000111111 -w 14 -i 3 -A 1 bfast index -f mm9-stranded-mRNA.fa -m 1111111100101111000001100011111011 -w 14 -i 4 -A 1 bfast index -f mm9-stranded-mRNA.fa -m 111111110001111110011111111 -w 14 -i 5 -A 1 bfast index -f mm9-stranded-mRNA.fa -m 11111011010011000011000110011111111 -w 14 -i 6 -A 1 bfast index -f mm9-stranded-mRNA.fa -m 1111111111110011101111111 -w 14 -i 7 -A 1 bfast index -f mm9-stranded-mRNA.fa -m 111011000011111111001111011111 -w 14 -i 8 -A 1 bfast index -f mm9-stranded-mRNA.fa -m 1110110001011010011100101111101111 -w 14 -i 9 -A 1 bfast index -f mm9-stranded-mRNA.fa -m 111111001000110001011100110001100011111 -w 14 -i 10 -A 1 /data/results/reference/mmu/mm9/bfast/solid2fastq -n 10000000 -o 2hrep3.fq /data/images/proton/DKlab/mr/parclip/raw/2hrep3/*.csfasta /data/images/proton/DKlab/mr/parclip/raw/2hrep3/*.qual #For both strands, use -w 0. For the forward strand only, use -w 1. For the reverse strand only, use -w 2. bfast match -f /data/results/reference/mmu/mm9/bfast/mm9-stranded-mRNA.fa -A 1 -r 2hrep3.fq > 2hrep3.bmf #In total, found matches for 12564 out of 9526713 reads. bfast localalign -f /data/results/reference/mmu/mm9/bfast/mm9-stranded-mRNA.fa -m 2hrep3.bmf -A 1 > 2hrep3.baf bfast postprocess -f /data/results/reference/mmu/mm9/bfast/mm9-stranded-mRNA.fa -i 2hrep3.baf -A 1 > 2hrep3.sam #Found 6937 reads with at least one end mapping. bfast localalign -f /data/results/reference/mmu/mm9/bfast/mm9-stranded-mRNA.fa -m 2hrep3.bmf -U -A 1 > 2hrep3.baf Found 6883 reads with at least one end mapping. bfast localalign -f /data/results/reference/mmu/mm9/bfast/mm9-stranded-mRNA.fa -m 2hrep3.bmf -q 25 -A 1 > 2hrep3.baf Outputted alignments for 12564 reads. Found 9519776 reads with no ends mapped. Found 6937 reads with 1 end mapped. cd t bfast match -f /data/results/reference/mmu/mm9/bfast/mm9-stranded-mRNA.fa -A 1 -w 1 -l -r ../2hrep3.fq.1.fastq > 2hrep3st.bmf bfast localalign -f /data/results/reference/mmu/mm9/bfast/mm9-stranded-mRNA.fa -m 2hrep3st.bmf -A 1 > 2hrep3st.baf bfast postprocess -f /data/results/reference/mmu/mm9/bfast/mm9-stranded-mRNA.fa -i 2hrep3st.baf -A 1 > 2hrep3st.sam #-q 10 Found 9522249 reads with no ends mapped. Found 4464 reads with at least one end mapping. 1_1231_586 0 ENSMUST00000057054 609 1 1M2I38M5D9M * 0 0 CCGGAACGGCATCCTACCTTCCCATAGACCCCTGACTGGCACTAGCTCAG VEO]`^_````&"V``""""_2$[````/"""%WK@L*(OMQQWUOUIQ> PG:Z:bfast AS:i:-200 NM:i:12 NH:i:1 IH:i:1 HI:i:1 MD:Z:3G1A7G8C16^CTTCC3T5 CS:Z:T20302013031310231223202133221230112121231123232212 CQ:Z:?..8 CM:i:8 XA:i:3 XE:Z:------------1----2-3--2------23-1-----2---------------- grep ENSMUST00000098950 *sam @SQ SN:ENSMUST00000098950 LN:6030 bfast match -f /data/results/reference/mmu/mm9/bfast/mm9-stranded-mRNA.fa -A 1 -w 1 -l -r ../2hrep3.fq.1.fastq > 2hrep3st.bmf #-q 20 bfast localalign -q 20 -f /data/results/reference/mmu/mm9/bfast/mm9-stranded-mRNA.fa -m 2hrep3st.bmf -A 1 > 2hrep3st.baf Outputted alignments for 8023 reads. bfast localalign -o 30 -q 20 -f /data/results/reference/mmu/mm9/bfast/mm9-stranded-mRNA.fa -m 2hrep3st.bmf -A 1 > 2hrep3st.baf Found 9522256 reads with no ends mapped. Found 4457 reads with 1 end mapped. Outputted alignments for 8023 reads. /data/results/reference/mmu/mm9/bfast mm9-stranded-mRNA.fa mm9-stranded-mRNA.fa.cs.2.1.bif mm9-stranded-mRNA.fa.cs.5.1.bif mm9-stranded-mRNA.fa.cs.8.1.bif mm9-stranded-mRNA.fa.nt.brg mm9-stranded-mRNA.fa.cs.10.1.bif mm9-stranded-mRNA.fa.cs.3.1.bif mm9-stranded-mRNA.fa.cs.6.1.bif mm9-stranded-mRNA.fa.cs.9.1.bif solid2fastq mm9-stranded-mRNA.fa.cs.1.1.bif mm9-stranded-mRNA.fa.cs.4.1.bif mm9-stranded-mRNA.fa.cs.7.1.bif mm9-stranded-mRNA.fa.cs.brg Search the indexes: $bfast-0.6.5b/bfast match -f hg18.fa -A 1 -r reads..fastq > bfast.matches.file.hg18..bmf Perform local alignment: $bfast-0.6.5b/bfast localalign -f hg18.fa -m bfast.matches.file.hg18..bmf -A 1 > bfast.aligned.file.hg18..baf Filter alignments: $bfast-0.6.5b/bfast postprocess -f hg18.fa -i bfast.aligned.file.hg18..baf -A 1 > bfast.reported.file.hg18..sam Note that for parallel computation, execute bfast match, bfast localalign, and bfast postprocess for each converted input file created (replace < N > with the input file number). Also, since color space local alignment may be slower than the match step, we can use the -s and -e options in bfast localalign to further parallelize the local alignment. 6.6 Color Space Alignment The work flow for color space has five steps as seen in Figure 6.1 similar in fashion to the one described in section 3.3. 1. In the first step we build two reference genomes: a nucleotide space genome using the option -A 0 in fasta2brg, and a color space genome using the option -A 1 (see section 4.2). 2. In the second step we create the indexes in color space by using the color space reference genome built in the first step and the option -A 1 (see section 4.3). 3. In the third step we search for CALs using the color space indexes created in the second step, using the color space reference genome built in first step, and by using the option -A 1 (see section 4.4). 4. In the fourth step we perform local alignment using the nucleotide space reference genome built in the fist step and by using the option -A 1 (see section 4.5). 5. In the fifth step we prioritize the local alignments as was previously described in sec- tion 3.3. 6.7 Running BWA within BFAST BFAST admittedly is difficult to tune for very short reads (less than 30bp). Therefore, BFAST supports running BWA (see http://bio-bwa.sourceforge.net). The work- flow is similar to running BFAST, with a few exceptions. First, bfast bwtindex must be run on the reference genome, with the appropriate setting for the type of read data (-A 0 for NT space, -A 1 for color space). Next, the input reads must in FASTQ format. For paired end or mate-pair reads, each end/mate must be in its own separate file as pairs/mates are not aggregated until a later step. Next, one bfast bwaaln command is run for each input FASTQ file (one for fragment reads, two for paired-end/mate-pair reads). Finally, the bfast localalign command is run inputting both the outputs of the two bfast bwaaln command using the -1 and -2 options. The workflow proceeds similarly from this point.