Given 1000 read alignments with mapping quality being 30, one of them will be wrong in average. Given 100 read alignments with mapping quality being 20, one of them will be wrong in average. Given 10 read alignments with mapping quality being 10, one of them will be wrong in average. A CIGAR string is comprised of a series of operation lengths plus the operations. The conventional CIGAR format allows for three types of operations: M for match or mismatch, I for insertion and D for deletion. The extended CIGAR format further allows four more operations, as is shown in the following table, to describe clipping, padding and splicing: op Description M Alignment match (can be a sequence match or mismatch) I Insertion to the reference D Deletion from the reference N Skipped region from the reference S Soft clip on the read (clipped sequence present in ) P Padding (silent deletion from the padded reference sequence) Coor 12345678901234 5678901234567890123456789012345 ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT +r001/1 TTAGATAAAGGATA*CTG 8M 2I4M 1D3M +r002 aaaAGATAA*GGATA 3S 6M 1P1I4M missing Yannis Karakasiliotis sample Martin Reczko Thu, Feb 15, 2018 at 5:05 PM To: Pantelis Hatzis , Harokopos Vaggelis Dear all, analysis of the nobarcodes in run374,375 and 338 as neg. control suggests that ~250k reads of 374,375 map to YK's reference (vs 180 reads in the neg control). The mapping quality distribution indicates an enrichment of high-qual alingments in 374,375 (vs no enrichm. in control). The detected species distribution is very similar in 374 and 375 (and both are dissimilar to the control. Thus about 250k sequences of YK have no barcode in 374,375. BW Martin Details: #mapping ../bin/MosaikBuild -q /data/images/proton2/run374/nomatch_rawlib.basecaller.bam.fastq -out t.mkb -st 454 ../bin/MosaikAligner -p 5 -mmp 0.5 -act 15 -hs 10 -in t.mkb -ia /data/results/tools/align/mosaik/MOSAIK/test/ik28/pipiwhite.dat -out r374c -annse /data/results/tools/align/mosaik/MOSAIK/src/networkFile/2.1.26.se.100.005.ann -annpe /data/results/tools/align/mosaik/MOSAIK/src/networkFile/2.1.26.pe.100.0065.ann &> log2c # unaligned mates( [1mX [0m): 6084617 ( 97.7 %) # filtered out( [1mF [0m): 27159 ( 0.4 %) # uniquely aligned mates( [1mU [0m): 8518 ( 0.1 %) # multiply aligned mates( [1mM [0m): 107316 ( 1.7 %) ================================================ total aligned: [1m 115834 [0m ( [1m 1.9 % [0m) total: 6227610 ../bin/MosaikBuild -q /data/images/proton2/run375/nomatch_rawlib.basecaller.bam.fastq -out r375 -st 454 ../bin/MosaikAligner -p 5 -mmp 0.5 -act 15 -hs 10 -in r375 -ia /data/results/tools/align/mosaik/MOSAIK/test/ik28/pipiwhite.dat -out r375c -annse /data/results/tools/align/mosaik/MOSAIK/src/networkFile/2.1.26.se.100.005.ann -annpe /data/results/tools/align/mosaik/MOSAIK/src/networkFile/2.1.26.pe.100.0065.ann &> log1c # unaligned mates( [1mX [0m): 10083565 ( 98.2 %) # filtered out( [1mF [0m): 32940 ( 0.3 %) # uniquely aligned mates( [1mU [0m): 11228 ( 0.1 %) # multiply aligned mates( [1mM [0m): 135485 ( 1.3 %) ================================================ total aligned: [1m 146713 [0m ( [1m 1.4 % [0m) total: 10263218 ##neg control: /data/images/proton2/run338/nomatch_rawlib.basecaller.bam bamToFastq -i /data/images/proton2/run338/nomatch_rawlib.basecaller.bam -fq /tmp/r338nobarcode.fq ../bin/MosaikBuild -q /tmp/r338nobarcode.fq -out r338 -st 454 ../bin/MosaikAligner -p 5 -mmp 0.5 -act 15 -hs 10 -in r338 -ia /data/results/tools/align/mosaik/MOSAIK/test/ik28/pipiwhite.dat -out r338 -annse /data/results/tools/align/mosaik/MOSAIK/src/networkFile/2.1.26.se.100.005.ann -annpe /data/results/tools/align/mosaik/MOSAIK/src/networkFile/2.1.26.pe.100.0065.ann &> log338 # unaligned mates( [1mX [0m): 5153227 ( 99.9 %) # filtered out( [1mF [0m): 3859 ( 0.1 %) # uniquely aligned mates( [1mU [0m): 73 ( 0.0 %) # multiply aligned mates( [1mM [0m): 109 ( 0.0 %) ================================================ total aligned: [1m 182 [0m ( [1m 0.0 % [0m) total: 5157268 #map qual samtools view r374c.bam -F 4 | awk -F "\t" '{o[$5]++;n++;}END{for (i=0;i<256;i++){x=o[i];if(x>0){print i,x,100.0*x/n}}}' 0 1456 1.25697 1 1295 1.11798 2 9244 7.98039 3 1980 1.70934 4 1853 1.5997 5 1441 1.24402 6 797 0.688054 7 2368 2.0443 8 1945 1.67913 9 328 0.283164 10 419 0.361725 11 500 0.431652 12 418 0.360861 13 700 0.604313 14 792 0.683737 15 1301 1.12316 16 1474 1.27251 17 1848 1.59539 18 1702 1.46934 19 2812 2.42761 20 2263 1.95366 21 2738 2.36373 22 1692 1.46071 23 1100 0.949635 24 1011 0.872801 25 1403 1.21122 26 1735 1.49783 27 5425 4.68343 28 10006 8.63822 29 4833 4.17235 30 5727 4.94414 31 9085 7.84312 32 10199 8.80484 33 17811 15.3763 34 5959 5.14443 35 173 0.149352 36 1 0.000863304 samtools view r375c.bam -F 4 | awk -F "\t" '{o[$5]++;n++;}END{for (i=0;i<256;i++){x=o[i];if(x>0){print i,x,100.0*x/n}}}' 0 1886 1.2855 1 2069 1.41024 2 12347 8.41575 3 2459 1.67606 4 2377 1.62017 5 1744 1.18872 6 1103 0.751808 7 2851 1.94325 8 2123 1.44704 9 479 0.326488 10 563 0.383742 11 623 0.424639 12 676 0.460764 13 1020 0.695235 14 1234 0.841098 15 1942 1.32367 16 2335 1.59154 17 2634 1.79534 18 2194 1.49544 19 2881 1.9637 20 2294 1.5636 21 2930 1.9971 22 1929 1.31481 23 1361 0.927661 24 1421 0.968558 25 1832 1.2487 26 2283 1.5561 27 7282 4.96343 28 13084 8.91809 29 6754 4.60355 30 7706 5.25243 31 10705 7.29656 32 12937 8.8179 33 20925 14.2625 34 7487 5.10316 35 243 0.165629 ##neg control samtools view r338.bam -F 4 | awk -F "\t" '{o[$5]++;n++;}END{for (i=0;i<256;i++){x=o[i];if(x>0){print i,x,100.0*x/n}}}' 0 15 8.24176 1 16 8.79121 2 23 12.6374 3 9 4.94505 4 8 4.3956 5 2 1.0989 6 4 2.1978 7 5 2.74725 8 1 0.549451 9 9 4.94505 10 4 2.1978 13 2 1.0989 14 3 1.64835 15 1 0.549451 16 2 1.0989 17 4 2.1978 18 6 3.2967 19 9 4.94505 20 8 4.3956 21 3 1.64835 22 6 3.2967 25 1 0.549451 26 1 0.549451 27 3 1.64835 28 3 1.64835 29 1 0.549451 30 3 1.64835 31 8 4.3956 32 1 0.549451 33 13 7.14286 34 8 4.3956 #species dist reczko@max:/data/results/tools/align/mosaik/MOSAIK/test$ samtools view r374c.bam -F 4 | awk -f /data/results/tools/align/get-sam-refid-stats1.awk detri 3655 3.15538 richi 8404 7.25521 caspius 3241 2.79797 marti 4979 4.29839 pipiwhite 3268 2.82128 longi 13789 11.9041 F10_PSEUDO.ab1 12574 10.8552 pipiblack 572 0.49381 vexans 1860 1.60575 albo 4084 3.52574 B10_MAC.ab1 39587 34.1756 pulcri 5023 4.33638 annu 14798 12.7752 samtools view r375c.bam -F 4 | awk -f /data/results/tools/align/get-sam-refid-stats1.awk richi 10827 7.37971 detri 5165 3.52048 caspius 3946 2.6896 marti 5919 4.03441 pipiwhite 4590 3.12856 longi 18170 12.3847 F10_PSEUDO.ab1 14933 10.1784 pipiblack 780 0.53165 vexans 2664 1.81579 albo 5227 3.56274 pulcri 6171 4.20617 B10_MAC.ab1 50382 34.3405 annu 17939 12.2273 ##neg control samtools view r338.bam -F 4 | awk -f /data/results/tools/align/get-sam-refid-stats1.awk richi 3 1.64835 detri 4 2.1978 caspius 1 0.549451 marti 1 0.549451 pipiwhite 10 5.49451 longi 5 2.74725 F10_PSEUDO.ab1 49 26.9231 vexans 1 0.549451 pipiblack 2 1.0989 pulcri 2 1.0989 albo 5 2.74725 B10_MAC.ab1 92 50.5495 annu 7 3.84615 #@ Dear Martin Could you please try to map the results of the new analysis for the YK sample. IONAS-375 New re-analysis for RNA sequencing gave 3million reads on the correct bc (16) Report Name YKsample-ION374_171207 (my mistake, should be ION375) RNAseq analysis @ 15/2/18 if it works I should re-analize IONAS-374 as well Thank you, Vaggelis PS. Dr Skretas sample is at ION378, sample GSkP6-4567Lib2211