Dear all, a+b below are the results for typing the Salmonella sample unsing Kmerid (https://github.com/phe-bioinformatics/kmerid). c is a multi-locus sequence typing using SRST2 with the mlst_db Salmonella_enterica. (https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-014-0090-6 SRST2: Rapid genomic surveillance for public health and hospital microbiology labs) d is a resistance gene test using SRST2 with the ARG-ANNOT db (Gupta et al, Antimicrob Agents Chemother 2014; 58(1):212. DOI: [10.1128/AAC.01310-13](http://aac.asm.org/content/58/1/212.abstract) e is a gene test using the Plasmidfinder-DB An inital assembly has been generated, with the following stats: Longest sequence length : 21788 bp N50: 2222 (522 sequences) (1861647 bp combined) The contigs are at http://genomics-lab.fleming.gr/fleming/external/klpn/salmonella1/a.lines.fasta The use of other error-tolerant assemblers is under investigation. Detailed results are at: http://genomics-lab.fleming.gr/fleming/external/klpn/salmonella1 BW Martin a) using a subset of 4 million reads: python kmerid.py -f klpn-run378-4000k.fastq.gz -c config/config.cnf -n #Kmer based similarities #similarity groups file 97.913666 salmonella Salmonella_enterica_serovar_Agona_SL483_uid59431.fa 76.752792 salmonella Salmonella_enterica_serovar_Gallinarum_pullorum_RKS5078_uid87035.fa 74.651047 salmonella Salmonella_enterica_serovar_Typhimurium_UK_1_uid87049.fa 74.200378 salmonella Salmonella_enterica_serovar_Paratyphi_A_ATCC_9150_uid58201.fa 73.937592 salmonella Salmonella_enterica_Serovar_Cubana_CFSAN002050_uid212973.fa 73.152649 salmonella Salmonella_enterica_serovar_Javiana_CFSAN001992_uid190101.fa 72.185585 salmonella Salmonella_enterica_serovar_Choleraesuis_SC_B67_uid58017.fa 70.572006 salmonella Salmonella_enterica_serovar_Typhi_Ty2_uid57973.fa 31.943195 salmonella Salmonella_enterica_arizonae_serovar_62_z4_z23__uid58191.fa 19.616383 salmonella Salmonella_bongori_NCTC_12419_uid70155.fa 5.333167 escherichia Escherichia_coli_K_12_substr__W3110_uid161931.fa 4.920207 escherichia Escherichia_coli_W_uid162011.fa 4.726932 escherichia Escherichia_coli_IAI39_uid59381.fa 4.672065 escherichia Escherichia_coli_O127_H6_E2348_69_uid59343.fa 4.615572 escherichia Escherichia_coli_SMS_3_5_uid58919.fa 4.547652 escherichia Escherichia_coli_NA114_uid162139.fa 4.504881 escherichia Escherichia_coli_UM146_uid162043.fa 4.481282 escherichia Escherichia_fergusonii_ATCC_35469_uid59375.fa 4.457407 escherichia Escherichia_coli_042_uid161985.fa 4.415426 escherichia Escherichia_coli_O157_H7_TW14359_uid59235.fa b) using a subset of 8 million reads: #Kmer based similarities #similarity groups file 99.133087 salmonella Salmonella_enterica_serovar_Agona_SL483_uid59431.fa 77.780556 salmonella Salmonella_enterica_serovar_Gallinarum_pullorum_RKS5078_uid87035.fa 75.645142 salmonella Salmonella_enterica_serovar_Typhimurium_UK_1_uid87049.fa 75.215080 salmonella Salmonella_enterica_serovar_Paratyphi_A_ATCC_9150_uid58201.fa 74.917244 salmonella Salmonella_enterica_Serovar_Cubana_CFSAN002050_uid212973.fa 74.155731 salmonella Salmonella_enterica_serovar_Javiana_CFSAN001992_uid190101.fa 73.168488 salmonella Salmonella_enterica_serovar_Choleraesuis_SC_B67_uid58017.fa 71.526733 salmonella Salmonella_enterica_serovar_Typhi_Ty2_uid57973.fa 32.477509 salmonella Salmonella_enterica_arizonae_serovar_62_z4_z23__uid58191.fa 19.932760 salmonella Salmonella_bongori_NCTC_12419_uid70155.fa 6.205814 escherichia Escherichia_coli_K_12_substr__W3110_uid161931.fa 5.659159 escherichia Escherichia_coli_W_uid162011.fa 5.344789 escherichia Escherichia_coli_IAI39_uid59381.fa 5.213348 escherichia Escherichia_coli_O127_H6_E2348_69_uid59343.fa 5.203148 escherichia Escherichia_coli_SMS_3_5_uid58919.fa 5.113528 escherichia Escherichia_coli_NA114_uid162139.fa 5.049896 escherichia Escherichia_coli_UM146_uid162043.fa 5.034655 escherichia Escherichia_coli_O157_H7_TW14359_uid59235.fa 5.033866 escherichia Escherichia_coli_042_uid161985.fa 4.784770 escherichia Escherichia_fergusonii_ATCC_35469_uid59375.fa c) test2__mlst__Salmonella_enterica__results.txt Sample ST aroC dnaN hemD hisD purE sucA thrA mismatches uncertainty depth maxMAF R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032 13 3 3 7 4 3 3 7 0 - 407.102714286 0.411609498681 d) #resistance gene test klpn378_test__fullgenes__ARGannot_r2__results.txt Sample DB gene allele coverage depth diffs uncertainty divergence length maxMAF clusterid seqid annotation R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032 ARGannot_r2 Aac6-Iaa_AGly Aac6-Iy_761 98.858 452.51 2snp5indel 0.457 438 0.114 454 761 no;no;Aac6-Iy;AGly;AF144880;3542-3979;438 R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032 ARGannot_r2 TEM-1D_Bla TEM-171_1017 100.0 117.599 0.0 861 0.25 205 1017 no;no;TEM-171;Bla;GQ149347;5270-6130;861 ## ARG-ANNOT (ARGannot_r2.fasta) - recommended Citation: Gupta et al, Antimicrob Agents Chemother 2014; 58(1):212. DOI: [10.1128/AAC.01310-13](http://aac.asm.org/content/58/1/212.abstract) * Quality: ARG-ANNOT has been manually curated and appears to be the highest quality sequence set in terms of its informativeness (accession & position of original sequence; antibiotic class) and internal consistency. It contains almost all sequences found in CARD and ResFinder, but has been manually curated and had many redundancies removed. * Annotations (following the ARG-ANNOT documentation): The nucleotide sequences included in this database from different antibiotics classes are abbreviated as **AGly**: aminoglycosides, **Bla**: beta-lactamases, **Fos**: fosfomycin, **Flq**: fluoroquinolones, **Gly**: glycopeptides, **MLS**: macrolide-lincosamide- streptogramin, **Phe**: phenicols, **Rif**: rifampicin, **Sul**: sulfonamides, **Tet**: tetracyclines and **Tmt**: trimethoprim. A unified nomenclature system was followed in which the name contains all of the information regarding gene class, gene name, accession number, gene location in the sequence and gene size. For example, *(AGly) AadA1:M95287:3320-4111:792* tells the researcher that the class of antibiotics is AGly: aminoglycosides, the gene name is *AadA1*, the accession number is M95287, the gene location is 3320-4111, and the gene size is 792 bp. e) klpn378_plasmids__fullgenes__PlasmidFinder__results.txt Sample DB gene allele coverage depth diffs uncertainty divergence length maxMAF clusterid seqid annotation R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032 PlasmidFinder Col _Col-plasmid_1_J01566__22 92.164 20.333 21snp1indel20holes 8.468 268 0.455 138 _22 R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032 PlasmidFinder ColRNAI_1 ColRNAI_1_DQ298019 90.769 51.773 13snp12holes edge0.0 11.017 130 0.089 135 DQ298019 # Plasmidfinder-DB /data/results/tools/denovo/srst/srst2/data/PlasmidFinder.fasta Citation: Carattoli et al, Antimicrob Agents Chemother 2014; 58(7):3895-3903 DOI: [10.1128/AAC.02412-14](http://aac.asm.org/content/58/7/3895) * Sequences were downloaded in June 2014 from [CGE](http://cge.cbs.dtu.dk/services/data.php). * Clustered at 80% nucleotide identity using CD-HIT-EST. * Minor manual correction of gene names and removal of duplicate sequences.