]0;/data/results/tools/denovo/srst/srst2reczko@max:/data/results/tools/denovo/srst/srst2$ getmlst.py --species "Escherichia coli#1" For SRST2, remember to check what separator is being used in this allele database Looks like --mlst_delimiter '_' >adk_1 --> --> ('adk', '_', '1') Suggested srst2 command for use with this MLST database: srst2 --output test --input_pe *.fastq.gz --mlst_db Escherichia_coli#1.fasta --mlst_definitions ecoli.txt --mlst_delimiter '_' ]0;/data/results/tools/denovo/srst/srst2reczko@max:/data/results/tools/denovo/srst/srst2$ getmlst.py --species "Salmonella enterica" For SRST2, remember to check what separator is being used in this allele database Looks like --mlst_delimiter '_' >aroC_1 --> --> ('aroC', '_', '1') Suggested srst2 command for use with this MLST database: srst2 --output test --input_pe *.fastq.gz --mlst_db Salmonella_enterica.fasta --mlst_definitions senterica.txt --mlst_delimiter '_' # srst2 --output test --input_se /data/results/tools/denovo/kmerid/kmerid-master/klpn-run378-400k.fastq.gz --mlst_db Salmonella_enterica.fasta --mlst_definitions senterica.txt --mlst_delimiter '_' ]0;/data/results/tools/denovo/srst/srst2reczko@max:/data/results/tools/denovo/srst/srst2$ srst2 --output test --input_se /data/results/tools/denovo/kmerid/kmerid-master/klpn-run378-400k.fastq.gz --mlst_db Salmonella_enterica.fasta --mlst_definitions senterica.txt --mlst_delimiter '_' 01/31/2018 19:29:03 program started 01/31/2018 19:29:03 command line: /usr/local/bin/srst2 --output test --input_se /data/results/tools/denovo/kmerid/kmerid-master/klpn-run378-400k.fastq.gz --mlst_db Salmonella_enterica.fasta --mlst_definitions senterica.txt --mlst_delimiter _ 01/31/2018 19:29:03 Total single reads found:1 01/31/2018 19:29:03 Building bowtie2 index for Salmonella_enterica.fasta... 01/31/2018 19:29:03 Running: /data/results/tools/align/bowtie2-2.1.0/bowtie2-build Salmonella_enterica.fasta Salmonella_enterica.fasta Settings: Output files: "Salmonella_enterica.fasta.*.bt2" 01/31/2018 19:29:10 Processing database Salmonella_enterica.fasta 01/31/2018 19:29:10 Running: samtools faidx Salmonella_enterica.fasta Attempting to read 7 loci from ST database senterica.txt Read ST database senterica.txt successfully 01/31/2018 19:29:10 Processing sample klpn-run378-400k 01/31/2018 19:29:10 Starting mapping with bowtie2 01/31/2018 19:29:10 Output prefix set to: test__klpn-run378-400k.Salmonella_enterica 01/31/2018 19:29:10 Aligning reads to index Salmonella_enterica.fasta using bowtie2... 01/31/2018 19:29:10 Running: /data/results/tools/align/bowtie2-2.1.0/bowtie2 -U /data/results/tools/denovo/kmerid/kmerid-master/klpn-run378-400k.fastq.gz -S test__klpn-run378-400k.Salmonella_enterica.sam -q --very-sensitive-local --no-unal -a -x Salmonella_enterica.fasta 400000 reads; of these: 400000 (100.00%) were unpaired; of these: 399702 (99.93%) aligned 0 times 9 (0.00%) aligned exactly 1 time 289 (0.07%) aligned >1 times 0.07% overall alignment rate 01/31/2018 19:30:15 Processing Bowtie2 output with SAMtools... 01/31/2018 19:30:15 Generate and sort BAM file... 01/31/2018 19:30:15 Running: samtools view -b -o test__klpn-run378-400k.Salmonella_enterica.unsorted.bam -q 1 -S test__klpn-run378-400k.Salmonella_enterica.sam.mod [samopen] SAM header is present: 5147 sequences. 01/31/2018 19:30:16 Running: samtools sort test__klpn-run378-400k.Salmonella_enterica.unsorted.bam test__klpn-run378-400k.Salmonella_enterica.sorted 01/31/2018 19:30:17 Deleting sam and bam files that are not longer needed... 01/31/2018 19:30:17 Deleting test__klpn-run378-400k.Salmonella_enterica.sam 01/31/2018 19:30:17 Deleting test__klpn-run378-400k.Salmonella_enterica.sam.mod 01/31/2018 19:30:17 Deleting test__klpn-run378-400k.Salmonella_enterica.unsorted.bam 01/31/2018 19:30:17 Generate pileup... 01/31/2018 19:30:17 Running: samtools mpileup -L 1000 -f Salmonella_enterica.fasta -Q 20 -q 1 -B test__klpn-run378-400k.Salmonella_enterica.sorted.bam [mpileup] 1 samples in 1 input files Set max per-file depth to 8000 01/31/2018 19:30:19 Processing SAMtools pileup... 01/31/2018 19:30:26 Scoring alleles... /usr/lib/python2.7/dist-packages/scipy/stats/_discrete_distns.py:54: RuntimeWarning: floating point number truncated to an integer vals = special.bdtr(k, n, p) This combination of alleles was not found in the sequence type database: klpn-run378-400k hemD_7 01/31/2018 19:31:03 klpn-run378-400k NF*? - - 7*? - - - - hemD_7/25holes hemD_7/edge0.0 15.241 0.292682926829 01/31/2018 19:31:03 Printing all MLST scores to test__klpn-run378-400k.Salmonella_enterica.scores 01/31/2018 19:31:03 Finished processing for read set klpn-run378-400k ... 01/31/2018 19:31:03 Finished processing for database Salmonella_enterica.fasta ... 01/31/2018 19:31:03 MLST output printed to test__mlst__Salmonella_enterica__results.txt 01/31/2018 19:31:03 SRST2 has finished. #all srst2 --threads 16 --output test2 --input_se /data/images/proton2/run378/R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032.fastq --mlst_db Salmonella_enterica.fasta --mlst_definitions senterica.txt --mlst_delimiter '_' #resistance gene test srst2 --threads 16 --input_se /data/images/proton2/run378/R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032.fastq --output klpn378_test --log --gene_db /data/results/tools/denovo/srst/srst2/data/ARGannot_r2.fasta # plasmidfinder srst2 --threads 16 --input_se /data/images/proton2/run378/R_2018_01_10_11_21_01_user_IONAS-378-DrSkretas_KEELPNO_PHlab_180110_GSkP6_KLPNC5_PH4C3-5.KLPNC5-642Salm.IonXpress_032.fastq --output klpn378_plasmids --log --gene_db /data/results/tools/denovo/srst/srst2/data/PlasmidFinder.fasta #@ https://github.com/katholt/srst2#using-the-vfdb-virulence-factor-database-with-srst2 http://www.mgc.ac.cn/VFs/Down/CP_VFs.ffn.gz #@ Plotting output in R Some R functions are provided in scripts/plotSRST2data.R for plotting SRST2 output to produce images like those in the paper (e.g. Figure 8: http://www.genomemedicine.com/content/6/11/90/figure/F8) These functions require the ape package to be installed. Example usage: # load the functions in R source("srst2/scripts/plotSRST2data.R") 1. EXAMPLE FROM FIGURE 8A (http://www.genomemedicine.com/content/6/11/90/figure/F8): viewing resistance genes in individual strains that have been analysed for MLST + resistance genes # read in a compiled MLST + genes table output by SRST2 Ef_JAMA<-read.delim("srst2/data/EfaeciumJAMA__compiledResults.txt", stringsAsFactors = F) # Check column names. Sample names are in column 1 (strain_names=1 in the function call below), MLST data is in columns 2 to 9 (mlst_columns = 2:9), while gene presence/absence information is recorded in columns 13 to 31 (gene_columns = 13:31). colnames(Ef_JAMA) [1] "Sample" "ST" "AtpA" "Ddl" "Gdh" "PurK" [7] "Gyd" "PstS" "Adk" "mismatches" "uncertainty" "depth" [13] "Aac6.Aph2_AGly" "Aac6.Ii_AGly" "Ant6.Ia_AGly" "Aph3.III_AGly" "Dfr_Tmt" "ErmB_MLS" [19] "ErmC_MLS" "MsrC_MLS" "Sat4A_AGly" "TetL_Tet" "TetM_Tet" "TetU_Tet" [25] "VanA_Gly" "VanH.Pt_Gly" "VanR.A_Gly" "VanS.A_Gly" "VanX.M_Gly" "VanY.A_Gly" [31] "VanZ.A_Gly" # Make a tree based on the MLST loci, and plot gene content as a matrix. # cluster=T turns on hierarchical clustering of the columns (=genes); labelHeight, infoWidth and treeWidth control the relative dimensions of the plotting areas available for the tree, printed MLST information, and gene labels. geneContentPlot(m=Ef_JAMA, mlst_columns = 2:9, gene_columns = 13:31, strain_names=1, cluster=T, labelHeight=40, infoWidth=15, treeWidth=5) 2. EXAMPLE FROM FIGURE 7C (http://www.genomemedicine.com/content/6/11/90/figure/F7): viewing summaries of resistance genes by ST, in a set of isolates that have been analysed for MLST + resistance genes # read in a compiled MLST + genes table output by SRST2 Ef_Howden<-read.delim("srst2/data/EfaeciumHowden__compiledResults.txt", stringsAsFactors = F) # Make a tree of STs based on the MLST alleles and plot this; Group isolates by ST and calculate the number of strains in each ST that contain each resistance gene; plot these counts as a heatmap. Note that barplots of the number of strains in each ST are also plotted to the right. geneSTplot(d,mlst_columns=8:15,gene_columns=17:59,plot_type="count",cluster=T) # Same as above but plot the rate of detection of each gene within each ST, not the raw counts geneSTplot(d,mlst_columns=8:15,gene_columns=17:59,plot_type="rate",cluster=T) # To suppress SNPs, i.e. collapse ST1 and ST1* into a single group for summarisation at the clonal complex level, set suppressSNPs=T. # To suppress uncertainty due to low depth, i.e. collapse ST1 and ST1? into a single group for summarisation at the clonal complex level, set suppressUncertainty=T. Note, heatmap colours can be set via the matrix.colours parameter in both of these functions. The default value is matrix.colours=colorRampPalette(c("white","yellow","blue"),space="rgb")(100), i.e. white=0% gene frequency, yellow = 50% and blue = 100%.