#@ Dear Ioannis, to provide all alignment info, I have prepared a bam file and the reference sequences used for alignment at http://genomics-lab.fleming.gr/fleming/external/Karakasiliotis/run375 r375.bam IK28species.fa IK28species.fa.fai If IGV handles the IUPAC codes in the fasta, you could load the alignment into IGV, if needed. BW, Martin The fasta header of each read contains the following information: > readID referenceID CIGARstring MappingQuality(MQ) An example is: >5VQM1:01345:11675 B10_MAC.ab1 2S5M9I203M 33 readID: the ID given by the sequencer referenceID: the ID of the reference sequence the alignment is performed CIGARstring: A CIGAR string is comprised of a series of operation lengths plus the operations. The conventional CIGAR format allows for three types of operations: M for match or mismatch, I for insertion and D for deletion. The extended CIGAR format further allows four more operations, as is shown in the following table, to describe clipping, padding and splicing: op Description M Alignment match (can be a sequence match or mismatch) I Insertion to the reference D Deletion from the reference N Skipped region from the reference S Soft clip on the read (clipped sequence present in ) P Padding (silent deletion from the padded reference sequence) Examples: Coor 12345678901234 5678901234567890123456789012345 ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT +r001/1 TTAGATAAAGGATA*CTG 8M 2I4M 1D3M +r002 aaaAGATAA*GGATA 3S 6M 1P1I4M MappingQuality: Given 1000 read alignments with mapping quality being 30, one of them will be wrong in average. Given 100 read alignments with mapping quality being 20, one of them will be wrong in average. Given 10 read alignments with mapping quality being 10, one of them will be wrong in average. At http://genomics-lab.fleming.gr/fleming/external/Karakasiliotis/run375 are the Fasta files with different Mapping qualities: r375-all.fa (262547 unfiltered mappings) r375-MQ10.fa #Passed 210402 mappings of 262547 80.1388 % r375-MQ20.fa #Passed 182334 mappings of 262547 69.4481 % r375-MQ30.fa #Passed 108958 mappings of 262547 41.5004 % Using these files, the species distributions are as follows: MQ cutoff30 longi 26041 23.9 % B10_MAC.ab1 25187 23.1162 % annu 25051 22.9914 % F10_PSEUDO.ab1 8821 8.09578 % richi 7512 6.8944 % pulcri 5182 4.75596 % marti 4332 3.97584 % caspius 2920 2.67993 % albo 2644 2.42662 % vexans 791 0.725968 % pipiblack 238 0.218433 % detri 200 0.183557 % pipiwhite 39 0.0357936 % MQ cutoff20 B10_MAC.ab1 59639 32.7087 % longi 30368 16.6551 % annu 28120 15.4222 % F10_PSEUDO.ab1 16572 9.08882 % richi 15766 8.64677 % pulcri 9450 5.1828 % marti 8082 4.43252 % caspius 5327 2.92156 % albo 5094 2.79377 % vexans 2913 1.59762 % pipiblack 520 0.285191 % detri 383 0.210054 % pipiwhite 100 0.0548444 % MQ cutoff10 B10_MAC.ab1 68556 32.5833 % annu 31713 15.0726 % longi 31366 14.9077 % F10_PSEUDO.ab1 21087 10.0222 % richi 17530 8.33167 % pulcri 11053 5.25328 % marti 10168 4.83265 % caspius 6156 2.92583 % albo 6081 2.89018 % vexans 4273 2.03087 % detri 1227 0.583169 % pipiblack 1005 0.477657 % pipiwhite 187 0.0888775 % BW, Martin