192_725_82_F3	16	ENSMUST00000070533	1	255	5H31M14H	*	0	0	GCGGCGGCGGGCGAGCGGGCGCTAGAGTAGG	*	AS:i:176	NM:i:1	CM:i:5	XX:Z:CcTACTCTAGcGcCCGcTcGCCCGCCGCCGC	MD:Z:23g7

#http://bowtie-bio.sourceforge.net/manual.shtml#default-bowtie-output

Default bowtie output

bowtie outputs one alignment per line. Each line is a collection of 8 fields separated by tabs; from left to right, the fields are:

Name of read that aligned.

Note that the [SAM specification] disallows whitespace in the read name. If the read name contains any whitespace characters, Bowtie 2 will truncate the name at the first whitespace character. This is similar to the behavior of other tools.

Reference strand aligned to, + for forward strand, - for reverse

Name of reference sequence where alignment occurs, or numeric ID if no name was provided

0-based offset into the forward reference strand where leftmost character of the alignment occurs

Read sequence (reverse-complemented if orientation is -).

If the read was in colorspace, then the sequence shown in this column is the sequence of decoded nucleotides, not the original colors. See the Colorspace alignment section for details about decoding. To display colors instead, use the --col-cseq option.

ASCII-encoded read qualities (reversed if orientation is -). The encoded quality values are on the Phred scale and the encoding is ASCII-offset by 33 (ASCII char !).

If the read was in colorspace, then the qualities shown in this column are the decoded qualities, not the original qualities. See the Colorspace alignment section for details about decoding. To display colors instead, use the --col-cqual option.

If -M was specified and the prescribed ceiling was exceeded for this read, this column contains the value of the ceiling, indicating that at least that many valid alignments were found in addition to the one reported.

Otherwise, this column contains the number of other instances where the same sequence aligned against the same reference characters as were aligned against in the reported alignment. This is not the number of other places the read aligns with the same number of mismatches. The number in this column is generally not a good proxy for that number (e.g., the number in this column may be '0' while the number of other alignments with the same number of mismatches might be large).

Comma-separated list of mismatch descriptors. If there are no mismatches in the alignment, this field is empty. A single descriptor has the format offset:reference-base>read-base. The offset is expressed as a 0-based offset from the high-quality (5') end of the read.

SAM bowtie output

Following is a brief description of the SAM format as output by bowtie when the -S/--sam option is specified. For more details, see the SAM format specification.

When -S/--sam is specified, bowtie prints a SAM header with @HD, @SQ and @PG lines. When one or more --sam-RG arguments are specified, bowtie will also print an @RG line that includes all user-specified --sam-RG tokens separated by tabs.

Each subsequnt line corresponds to a read or an alignment. Each line is a collection of at least 12 fields separated by tabs; from left to right, the fields are:

Name of read that aligned

Sum of all applicable flags. Flags relevant to Bowtie are:

1
The read is one of a pair

2
The alignment is one end of a proper paired-end alignment

4
The read has no reported alignments

8
The read is one of a pair and has no reported alignments

16
The alignment is to the reverse reference strand

32
The other mate in the paired-end alignment is aligned to the reverse reference strand

64
The read is the first (#1) mate in a pair

128
The read is the second (#2) mate in a pair

Thus, an unpaired read that aligns to the reverse reference strand will have flag 16. A paired-end read that aligns and is the first mate in the pair will have flag 83 (= 64 + 16 + 2 + 1).

Name of reference sequence where alignment occurs, or ordinal ID if no name was provided

1-based offset into the forward reference strand where leftmost character of the alignment occurs

Mapping quality

CIGAR string representation of alignment

Name of reference sequence where mate's alignment occurs. Set to = if the mate's reference sequence is the same as this alignment's, or * if there is no mate.

1-based offset into the forward reference strand where leftmost character of the mate's alignment occurs. Offset is 0 if there is no mate.

Inferred insert size. Size is negative if the mate's alignment occurs upstream of this alignment. Size is 0 if there is no mate.

Read sequence (reverse-complemented if aligned to the reverse strand)

ASCII-encoded read qualities (reverse-complemented if the read aligned to the reverse strand). The encoded quality values are on the Phred quality scale and the encoding is ASCII-offset by 33 (ASCII char !), similarly to a FASTQ file.

Optional fields. Fields are tab-separated. For descriptions of all possible optional fields, see the SAM format specification. bowtie outputs some of these optional fields for each alignment, depending on the type of the alignment:

    NM:i:<N>
Aligned read has an edit distance of <N>.

    CM:i:<N>
Aligned read has an edit distance of <N> in colorspace. This field is present in addition to the NM field in -C/--color mode, but is omitted otherwise.

    MD:Z:<S>
For aligned reads, <S> is a string representation of the mismatched reference bases in the alignment. See SAM format specification for details. For colorspace alignments, <S> describes the decoded nucleotide alignment, not the colorspace alignment.
MD:Z:[0-9]+(([A-Z]|\^[A-Z]+)[0-9]+)* String for mismatching positions.
The MD field aims to achieve SNP/indel calling without looking at the reference. For example, a string
‘10A5^AC6’ means from the leftmost reference base in the alignment, there are 10 matches followed
by an A on the reference which is different from the aligned read base; the next 5 reference bases are
matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are
matches. The MD field ought to match the CIGAR string.


    XA:i:<N>
Aligned read belongs to stratum <N>. See Strata for definition.

    XM:i:<N>
For a read with no reported alignments, <N> is 0 if the read had no alignments. If -m was specified and the read's alignments were supressed because the -m ceiling was exceeded, <N> equals the -m ceiling + 1, to indicate that there were at least that many valid alignments (but all were suppressed). In -M mode, if the alignment was randomly selected because the -M ceiling was exceeded, <N> equals the -M ceiling + 1, to indicate that there were at least that many valid alignments (of which one was reported at random).

#old samtools  has wrong SAM format for minus!
#new samtools:
/data/results/tools/samtools/samtools-1.3/samtools calmd -b ../raw/6hrep3F-4p3.sam.bam  /data/results/reference/mmu/Mus_musculus/UCSC/mm9/Sequence/WholeGenomeFasta/genome.fa > foo.bam 2> foo
[bam_fillmd1] different MD for read '5806664-1': '8A10' -> '10T8'
samtools view foo.bam |grep 295988-1
295988-1	16	chr10	12175052	0	33M	*	0	0	AACATCAACAACAACAACAACAACAACAAGGCG	qqq!!qqqqqqqqqqqqqqqqqqqqqqqqqqqq	NM:i:1	X1:i:3	MD:Z:4a28


sam
295988-1	16	chr10	12175052	0	33M	*	0	0	AACATCAACAACAACAACAACAACAACAAGGCG	qqq!!qqqqqqqqqqqqqqqqqqqqqqqqqqqq	NM:i:1	X1:i:3	MD:Z:28T4
AACATCAACAACAACAACAACAACAACAAGGCG
    A
295988-1	-	chr10	12175051	AACATCAACAACAACAACAACAACAACAAGGCG	qqq!!qqqqqqqqqqqqqqqqqqqqqqqqqqqq	2	28:A>T
>mm9_dna range=chr10:12175052-12175084 5'pad=0 3'pad=0 strand=+ repeatMasking=none
AACAACAACAACAACAACAACAACAACAAGGCG

grep NM:i:1 /data/images/proton/DKlab/mr/parclip/shrimp/IFN-15mMm.bam.sam | head -33 | tail
630_3509_3409_F3	0	ENSMUST00000070533	235	255	15H25M10H	*	0	0	GGCTCTGGGCAAGGACTGGCTCCAG	*	AS:i:156	NM:i:1	CS:Z:T023302010303111123222300310302121032201223..3021..	CM:i:3	XX:Z:GgCTCTgGGCAAgGACTGGCTCCAG	MD:Z:7T17

bt
5096016-1	-	chr6	34385835	TTGTGTTGTTGTTGTTGTTGTTGTTGTTGTGAATA	qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq	0	32:T>G,33:G>T

samtools view foo.bam | grep 5096016-1
5096016-1	16	chr6	34385836	25	35M	*	0	0	TTGTGTTGTTGTTGTTGTTGTTGTTGTTGTGAATA	qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq	X1:i:1	NM:i:2	MD:Z:1g0t32
5096016-1	16	chr6	34385836	25	35M	*	0	0	TTGTGTTGTTGTTGTTGTTGTTGTTGTTGTGAATA	qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq	NM:i:1	X1:i:1	MD:Z:32G0T1
TTGTGTTGTTGTTGTTGTTGTTGTTGTTGTGAATA
TGT
>mm9_dna range=chr6:34385836-34385869 5'pad=0 3'pad=0 strand=+ repeatMasking=none
TGTTGTTGTTGTTGTTGTTGTTGTTGTTGTGAAT

bt
1030429-1	+	chr13	62908934	TACACCACCACCACCACCACCAACACCACCACCAA	qqqqqqqqqqqqqqqqqqqqqq!!qqqqqqqqqqq	0	22:C>A
sam
1030429-1	0	chr13	62908935	25	35M	*	0	0	TACACCACCACCACCACCACCAACACCACCACCAA	qqqqqqqqqqqqqqqqqqqqqq!!qqqqqqqqqqq	NM:i:1	X1:i:1	MD:Z:22A12


bt
1489890-1	16	chr12	53162620	25	35M	*	0	0	AACCACAGTTGTCGTTGTTGTTGTTGTTGTTGTTG	qqqqqqqqqqq!!qqqqqqqqqqqqqqqqqqqqqq	NM:i:1	X1:i:1	MD:Z:22C12
sam
1489890-1	-	chr12	53162619	AACCACAGTTGTCGTTGTTGTTGTTGTTGTTGTTG	qqqqqqqqqqq!!qqqqqqqqqqqqqqqqqqqqqq	0	22:T>C


bt
668_188_2536_F3	-	ENSMUST00000160944-chr1-+-3044314-3044814	91	TACAAGGCCTAATGGTGATTCCTACAG	IIIIIIIIIIIIIIIIIIIIIIIIIII	0	8:C>T
sam
668_188_2536_F3	16	ENSMUST00000160944-chr1-+-3044314-3044814	92	255	18H27M5H	*	0	0	TACAAGGCCTAATGGTGATTCCTACAG	*	AS:i:196	NM:i:1	CS:Z:T10110321132120321001303203020113102330.010..3131..	CM:i:2	XX:Z:CTGTAGgAATCAcCATTAGGCCTTGTA	MD:Z:18C8