Dear all,

a+b below are the results for typing the Salmonella sample unsing Kmerid
(https://github.com/phe-bioinformatics/kmerid).

c is a multi-locus sequence typing using SRST2 with the mlst_db Salmonella_enterica.
(https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-014-0090-6
SRST2: Rapid genomic surveillance for public health and hospital microbiology labs)

d is a resistance gene test using SRST2 with the ARG-ANNOT db
(Gupta et al, Antimicrob Agents Chemother 2014; 58(1):212. DOI: [10.1128/AAC.01310-13](http://aac.asm.org/content/58/1/212.abstract)

e is a gene test using the Plasmidfinder-DB

An inital assembly has been generated, with the following stats:
Longest sequence length  :          21788 bp
N50:	2222	(522 sequences)	(1861647 bp combined)
The contigs are at
http://genomics-lab.fleming.gr/fleming/external/klpn/salmonella1/a.lines.fasta
The use of other error-tolerant assemblers is under investigation.

Detailed results are at:
http://genomics-lab.fleming.gr/fleming/external/klpn/salmonella1

BW
Martin


a) using a subset of 4 million reads:
python kmerid.py -f  klpn-run378-4000k.fastq.gz -c config/config.cnf -n
#Kmer based similarities
#similarity	groups	file
97.913666	salmonella	Salmonella_enterica_serovar_Agona_SL483_uid59431.fa
76.752792	salmonella	Salmonella_enterica_serovar_Gallinarum_pullorum_RKS5078_uid87035.fa
74.651047	salmonella	Salmonella_enterica_serovar_Typhimurium_UK_1_uid87049.fa
74.200378	salmonella	Salmonella_enterica_serovar_Paratyphi_A_ATCC_9150_uid58201.fa
73.937592	salmonella	Salmonella_enterica_Serovar_Cubana_CFSAN002050_uid212973.fa
73.152649	salmonella	Salmonella_enterica_serovar_Javiana_CFSAN001992_uid190101.fa
72.185585	salmonella	Salmonella_enterica_serovar_Choleraesuis_SC_B67_uid58017.fa
70.572006	salmonella	Salmonella_enterica_serovar_Typhi_Ty2_uid57973.fa
31.943195	salmonella	Salmonella_enterica_arizonae_serovar_62_z4_z23__uid58191.fa
19.616383	salmonella	Salmonella_bongori_NCTC_12419_uid70155.fa
5.333167	escherichia	Escherichia_coli_K_12_substr__W3110_uid161931.fa
4.920207	escherichia	Escherichia_coli_W_uid162011.fa
4.726932	escherichia	Escherichia_coli_IAI39_uid59381.fa
4.672065	escherichia	Escherichia_coli_O127_H6_E2348_69_uid59343.fa
4.615572	escherichia	Escherichia_coli_SMS_3_5_uid58919.fa
4.547652	escherichia	Escherichia_coli_NA114_uid162139.fa
4.504881	escherichia	Escherichia_coli_UM146_uid162043.fa
4.481282	escherichia	Escherichia_fergusonii_ATCC_35469_uid59375.fa
4.457407	escherichia	Escherichia_coli_042_uid161985.fa
4.415426	escherichia	Escherichia_coli_O157_H7_TW14359_uid59235.fa

b) using a subset of 8 million reads:
#Kmer based similarities
#similarity	groups	file
99.133087	salmonella	Salmonella_enterica_serovar_Agona_SL483_uid59431.fa
77.780556	salmonella	Salmonella_enterica_serovar_Gallinarum_pullorum_RKS5078_uid87035.fa
75.645142	salmonella	Salmonella_enterica_serovar_Typhimurium_UK_1_uid87049.fa
75.215080	salmonella	Salmonella_enterica_serovar_Paratyphi_A_ATCC_9150_uid58201.fa
74.917244	salmonella	Salmonella_enterica_Serovar_Cubana_CFSAN002050_uid212973.fa
74.155731	salmonella	Salmonella_enterica_serovar_Javiana_CFSAN001992_uid190101.fa
73.168488	salmonella	Salmonella_enterica_serovar_Choleraesuis_SC_B67_uid58017.fa
71.526733	salmonella	Salmonella_enterica_serovar_Typhi_Ty2_uid57973.fa
32.477509	salmonella	Salmonella_enterica_arizonae_serovar_62_z4_z23__uid58191.fa
19.932760	salmonella	Salmonella_bongori_NCTC_12419_uid70155.fa
6.205814	escherichia	Escherichia_coli_K_12_substr__W3110_uid161931.fa
5.659159	escherichia	Escherichia_coli_W_uid162011.fa
5.344789	escherichia	Escherichia_coli_IAI39_uid59381.fa
5.213348	escherichia	Escherichia_coli_O127_H6_E2348_69_uid59343.fa
5.203148	escherichia	Escherichia_coli_SMS_3_5_uid58919.fa
5.113528	escherichia	Escherichia_coli_NA114_uid162139.fa
5.049896	escherichia	Escherichia_coli_UM146_uid162043.fa
5.034655	escherichia	Escherichia_coli_O157_H7_TW14359_uid59235.fa
5.033866	escherichia	Escherichia_coli_042_uid161985.fa
4.784770	escherichia	Escherichia_fergusonii_ATCC_35469_uid59375.fa

c) test2__mlst__Salmonella_enterica__results.txt
Sample	ST	aroC	dnaN	hemD	hisD	purE	sucA	thrA	mismatches	uncertainty	depth	maxMAF
R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032	13	3	3	7	4	3	3	7	0	-	407.102714286	0.411609498681
d)
#resistance gene test
klpn378_test__fullgenes__ARGannot_r2__results.txt
Sample	DB	gene	allele	coverage	depth	diffs	uncertainty	divergence	length	maxMAF	clusterid	seqid	annotation
R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032	ARGannot_r2	Aac6-Iaa_AGly	Aac6-Iy_761	98.858	452.51	2snp5indel		0.457	438	0.114	454	761	no;no;Aac6-Iy;AGly;AF144880;3542-3979;438
R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032	ARGannot_r2	TEM-1D_Bla	TEM-171_1017	100.0	117.599			0.0	861	0.25	205	1017	no;no;TEM-171;Bla;GQ149347;5270-6130;861

## ARG-ANNOT (ARGannot_r2.fasta) - recommended
Citation: Gupta et al, Antimicrob Agents Chemother 2014; 58(1):212. DOI: [10.1128/AAC.01310-13](http://aac.asm.org/content/58/1/212.abstract)
* Quality: ARG-ANNOT has been manually curated and appears to be the highest quality sequence set in terms of its informativeness (accession & position of original sequence; antibiotic class) and internal consistency. It contains almost all sequences found in CARD and ResFinder, but has been manually curated and had many redundancies removed.
* Annotations (following the ARG-ANNOT documentation): The nucleotide sequences included in this database from different antibiotics classes are abbreviated as **AGly**: aminoglycosides, **Bla**: beta-lactamases, **Fos**: fosfomycin, **Flq**: fluoroquinolones, **Gly**: glycopeptides, **MLS**: macrolide-lincosamide- streptogramin, **Phe**: phenicols, **Rif**: rifampicin, **Sul**: sulfonamides, **Tet**: tetracyclines and **Tmt**: trimethoprim. A unified nomenclature system was followed in which the name contains all of the information regarding gene class, gene name, accession number, gene location in the sequence and gene size. For example, *(AGly) AadA1:M95287:3320-4111:792* tells the researcher that the class of antibiotics is AGly: aminoglycosides, the gene name is *AadA1*, the accession number is M95287, the gene location is 3320-4111, and the gene size is 792 bp.

e) klpn378_plasmids__fullgenes__PlasmidFinder__results.txt
Sample	DB	gene	allele	coverage	depth	diffs	uncertainty	divergence	length	maxMAF	clusterid	seqid	annotation
R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032	PlasmidFinder	Col	_Col-plasmid_1_J01566__22	92.164	20.333	21snp1indel20holes		8.468	268	0.455	138	_22	
R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032	PlasmidFinder	ColRNAI_1	ColRNAI_1_DQ298019	90.769	51.773	13snp12holes	edge0.0	11.017	130	0.089	135	DQ298019	

# Plasmidfinder-DB
/data/results/tools/denovo/srst/srst2/data/PlasmidFinder.fasta
Citation: Carattoli et al, Antimicrob Agents Chemother 2014; 58(7):3895-3903 DOI: [10.1128/AAC.02412-14](http://aac.asm.org/content/58/7/3895)
* Sequences were downloaded in June 2014 from [CGE](http://cge.cbs.dtu.dk/services/data.php).
* Clustered at 80% nucleotide identity using CD-HIT-EST.
* Minor manual correction of gene names and removal of duplicate sequences.