The fasta header of each read contains the following information: > readID referenceID CIGARstring MappingQuality An example is: >5VQM1:01345:11675 B10_MAC.ab1 2S5M9I203M 33 readID: the ID given by the sequencer referenceID: the ID of the reference sequence the alignment is performed CIGARstring: A CIGAR string is comprised of a series of operation lengths plus the operations. The conventional CIGAR format allows for three types of operations: M for match or mismatch, I for insertion and D for deletion. The extended CIGAR format further allows four more operations, as is shown in the following table, to describe clipping, padding and splicing: op Description M Alignment match (can be a sequence match or mismatch) I Insertion to the reference D Deletion from the reference N Skipped region from the reference S Soft clip on the read (clipped sequence present in ) P Padding (silent deletion from the padded reference sequence) Examples: Coor 12345678901234 5678901234567890123456789012345 ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT +r001/1 TTAGATAAAGGATA*CTG 8M 2I4M 1D3M +r002 aaaAGATAA*GGATA 3S 6M 1P1I4M MappingQuality: Given 1000 read alignments with mapping quality being 30, one of them will be wrong in average. Given 100 read alignments with mapping quality being 20, one of them will be wrong in average. Given 10 read alignments with mapping quality being 10, one of them will be wrong in average. cat /tmp/foo | awk -f sam2aligned-fasta1.awk > r375-all.fa cat /tmp/foo | awk -v th=10 -f sam2aligned-fasta1.awk > r375-MQ10.fa #Passed 210402 mappings of 262547 80.1388 % cat /tmp/foo | awk -v th=20 -f sam2aligned-fasta1.awk > r375-MQ20.fa #Passed 182334 mappings of 262547 69.4481 % cat /tmp/foo | awk -v th=30 -f sam2aligned-fasta1.awk > r375-MQ30.fa #Passed 108958 mappings of 262547 41.5004 % cat /tmp/foo | awk -v th=30 -f /data/results/tools/align/get-sam-refid-stats1.awk |sort -rn -k3,3 MQ cutoff30 Passed 108958 mappings of 262547 41.5004 % longi 26041 23.9 B10_MAC.ab1 25187 23.1162 annu 25051 22.9914 F10_PSEUDO.ab1 8821 8.09578 richi 7512 6.8944 pulcri 5182 4.75596 marti 4332 3.97584 caspius 2920 2.67993 albo 2644 2.42662 vexans 791 0.725968 pipiblack 238 0.218433 detri 200 0.183557 pipiwhite 39 0.0357936 cat /tmp/foo | awk -v th=20 -f /data/results/tools/align/get-sam-refid-stats1.awk |sort -rn -k3,3 MQ cutoff20 Passed 182334 mappings of 262547 69.4481 % B10_MAC.ab1 59639 32.7087 longi 30368 16.6551 annu 28120 15.4222 F10_PSEUDO.ab1 16572 9.08882 richi 15766 8.64677 pulcri 9450 5.1828 marti 8082 4.43252 caspius 5327 2.92156 albo 5094 2.79377 vexans 2913 1.59762 pipiblack 520 0.285191 detri 383 0.210054 pipiwhite 100 0.0548444 cat /tmp/foo | awk -v th=10 -f /data/results/tools/align/get-sam-refid-stats1.awk |sort -rn -k3,3 MQ cutoff10 Passed 210402 mappings of 262547 80.1388 % B10_MAC.ab1 68556 32.5833 annu 31713 15.0726 longi 31366 14.9077 F10_PSEUDO.ab1 21087 10.0222 richi 17530 8.33167 pulcri 11053 5.25328 marti 10168 4.83265 caspius 6156 2.92583 albo 6081 2.89018 vexans 4273 2.03087 detri 1227 0.583169 pipiblack 1005 0.477657 pipiwhite 187 0.0888775 # convert to bam samtools view -Sb /tmp/foo -t ~/bak/doc/karakasiliotis/metagenomics/IK28species.fa.fai > r375.bam