~/Elixir/metagenomics/species/get-biotypes-from-bam.sh ~/Elixir/metagenomics/species/Apis\ mellifera/ncbi-genomes-2020-03-03/GCF_003254395.2_Amel_HAv3.1_genomic.gtf.gz ~/Elixir/metagenomics/species/Apis\ mellifera/ncbi-genomes-2020-03-03/GCF_003254395.2_Amel_HAv3.1_genomic.fna.gz ~/Elixir/metagenomics/SPlab/run440/genomic/SPD3_apis_mellifera.bam 63 441 3077 icov misc_RNA 2 0.0648298 lncRNA 423 13.7115 rRNA 11 0.356564 pseudogene 7 0.226904 miRNA 3 0.0972447 tRNA 1 0.0324149 protein_coding 2637 85.4781 snRNA 1 0.0324149 ~/Elixir/metagenomics/species/get-biotypes-from-bam.sh ~/Elixir/metagenomics/species/Papaver\ somniferum/ncbi-genomes-2020-03-03/GCF_003573695.1_ASM357369v1_genomic.gtf.gz ~/Elixir/metagenomics/species/Papaver\ somniferum/ncbi-genomes-2020-03-03/GCF_003573695.1_ASM357369v1_genomic.fna.gz ~/Elixir/metagenomics/SPlab/run440/genomic/SPD3_Papaver_somniferum.bam plant genomes Anastasios Galanis Wed, Feb 26, 2020 at 4:04 PM To: Martin Reczko , "patalano@fleming.gr" Dear Martin, below you'll find the genomes. At the top of each page there is a link to the fasta and gff files. Papaver somniferum https://www.ncbi.nlm.nih.gov/genome/?term=Papaver+somniferum Asparagus officinalis https://www.ncbi.nlm.nih.gov/genome/10978?genome_assembly_id=305876 Artemisia annua https://www.ncbi.nlm.nih.gov/genome/?term=artemisia+annua Olea europaea https://www.ncbi.nlm.nih.gov/genome/?term=olea+europaea Helianthus annuus https://www.ncbi.nlm.nih.gov/genome/?term=Helianthus+annuus Cajanus cajan https://www.ncbi.nlm.nih.gov/genome/?term=Cajanus+cajan Vitis vinifera https://www.ncbi.nlm.nih.gov/genome/?term=vitis+vinifera I will try until the end of the week to make a plant ''mock'' community with sequences from a few plants, mixed in known proportions. I don't know how to do that to be honest, so I'd like to try out myself first. Most likely I will fail so I will need your help on Monday/Tuesday. Kind regards, Anastasios /home/reczko/Elixir/metagenomics/species/varroa destructor zcat BEIS01.1.fsa_nt.gz | grep contig | wc 1425 14250 125715 #@ honeybee genome -> ~/Elixir/metagenomics/species/Apis mellifera/ https://www.ncbi.nlm.nih.gov/Traces/wgs/?val=QIUM02 reczko@max:~/Elixir/metagenomics/species/hisat$ hisat2-build -p 9 -f ../Apis\ mellifera/QIUM02.1.fsa_nt apis_mellifera #@ wine reczko@max:~/Elixir/metagenomics/species/Vitis vinifera$ hisat2-build -p 6 -f GCF_000003745.3_12X_genomic.fna ../hisat/vitis_vinifera zcat ~/Elixir/metagenomics/species/Apis\ mellifera/ncbi-genomes-2020-03-03/GCF_003254395.2_Amel_HAv3.1_genomic.fna.gz > foo; hisat2-build -p 15 -f foo Apis_mellifera zcat ../Artemisia\ annua/GCA_003112345.1_ASM311234v1_genomic.fna.gz > foo; hisat2-build -p 9 -f foo Artemisia_annua zcat ../Asparagus\ officinalis/GCF_001876935.1_Aspof.V1_genomic.fna.gz > foo; hisat2-build -p 9 -f foo Asparagus_officinalis zcat ../Cajanus\ cajan/GCF_000340665.1_C.cajan_V1.0_genomic.fna.gz > foo; hisat2-build -p 9 -f foo Cajanus_cajan zcat ../Helianthus\ annuus/GCF_002127325.1_HanXRQr1.0_genomic.fna.gz > foo; hisat2-build -p 9 -f foo Helianthus_annuus zcat ../Nosema\ ceranae/GCF_000988165.1_ASM98816v1_genomic.fna.gz > foo; hisat2-build -p 9 -f foo Nosema_ceranae zcat ../Olea\ europaea/GCF_002742605.1_O_europaea_v1_genomic.fna.gz > foo; hisat2-build -p 9 -f foo Olea_europaea zcat ../Papaver\ somniferum/GCF_003573695.1_ASM357369v1_genomic.fna.gz > foo; hisat2-build -p 9 -f foo Papaver_somniferum #hisat2-build -p 9 -f ../varroa\ destructor/BEIS01.1.fsa_nt varroa_destructor hisat2 -k 1 -p 6 --dta -x ~/Elixir/metagenomics/species/hisat/Papaver_somniferum -U /data/images/proton2/run431/R_2019_08_07_16_07_46_user_IONAS-431-GKlab-MS_SPlab_190807_GK3R288-302_SP3R1-4.SPD3_fragDNA1.IonXpress_031.fastq | /data/results/tools/samtools/samtools-1.3/samtools sort -@ 8 -o /home/reczko/Elixir/metagenomics/SPlab/run440/genomic/SPD3_Papaver_somniferum.bam 8160732 reads; of these: 8160732 (100.00%) were unpaired; of these: 8114444 (99.43%) aligned 0 times 46288 (0.57%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.57% overall alignment rate hisat2 -k 1 -p 6 --dta -x ~/Elixir/metagenomics/species/hisat/Olea_europaea -U /data/images/proton2/run431/R_2019_08_07_16_07_46_user_IONAS-431-GKlab-MS_SPlab_190807_GK3R288-302_SP3R1-4.SPD3_fragDNA1.IonXpress_031.fastq | /data/results/tools/samtools/samtools-1.3/samtools sort -@ 8 -o /home/reczko/Elixir/metagenomics/SPlab/run440/genomic/SPD3_Olea_europaea.bam 8160732 reads; of these: 8160732 (100.00%) were unpaired; of these: 8135376 (99.69%) aligned 0 times 25356 (0.31%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.31% overall alignment rate hisat2 -k 1 -p 6 --dta -x ~/Elixir/metagenomics/species/hisat/Nosema_ceranae -U /data/images/proton2/run431/R_2019_08_07_16_07_46_user_IONAS-431-GKlab-MS_SPlab_190807_GK3R288-302_SP3R1-4.SPD3_fragDNA1.IonXpress_031.fastq | /data/results/tools/samtools/samtools-1.3/samtools sort -@ 8 -o /home/reczko/Elixir/metagenomics/SPlab/run440/genomic/SPD3_Nosema_ceranae.bam 8160732 reads; of these: 8160732 (100.00%) were unpaired; of these: 8160715 (100.00%) aligned 0 times 17 (0.00%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.00% overall alignment rate ]0;~/Elixir/metagenomics/species/hisatreczko@max:~/Elixir/metagenomics/species/hisat$ hisat2 -k 1 -p 6 --dta -x ~/Elixir/metagenomics/species/hisat/Helianthus_annuus -U /data/images/proton2/run431/R_2019_08_07_16_07_46_user_IONAS-431-GKlab-MS_SPlab_190807_GK3R288-302_SP3R1-4.SPD3_fragDNA1.IonXpress_031.fastq | /data/results/tools/samtools/samtools-1.3/samtools sort -@ 8 -o /home/reczko/Elixir/metagenomics/SPlab/run440/genomic/SPD3_Helianthus_annuus.bam 8160732 reads; of these: 8160732 (100.00%) were unpaired; of these: 8144992 (99.81%) aligned 0 times 15740 (0.19%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.19% overall alignment rate ]0;~/Elixir/metagenomics/species/hisatreczko@max:~/Elixir/metagenomics/species/hisat$ hisat2 -k 1 -p 6 --dta -x ~/Elixir/metagenomics/species/hisat/Cajanus_cajan -U /data/images/proton2/run431/R_2019_08_07_16_07_46_user_IONAS-431-GKlab-MS_SPlab_190807_GK3R288-302_SP3R1-4.SPD3_fragDNA1.IonXpress_031.fastq | /data/results/tools/samtools/samtools-1.3/samtools sort -@ 8 -o /home/reczko/Elixir/metagenomics/SPlab/run440/genomic/SPD3_Cajanus_cajan.bam 8160732 reads; of these: 8160732 (100.00%) were unpaired; of these: 8146637 (99.83%) aligned 0 times 14095 (0.17%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.17% overall alignment rate ]0;~/Elixir/metagenomics/species/hisatreczko@max:~/Elixir/metagenomics/species/hisat$ hisat2 -k 1 -p 6 --dta -x ~/Elixir/metagenomics/species/hisat/Asparagus_officinalis -U /data/images/proton2/run431/R_2019_08_07_16_07_46_user_IONAS-431-GKlab-MS_SPlab_190807_GK3R288-302_SP3R1-4.SPD3_fragDNA1.IonXpress_031.fastq | /data/results/tools/samtools/samtools-1.3/samtools sort -@ 8 -o /home/reczko/Elixir/metagenomics/SPlab/run440/genomic/SPD3_Asparagus_officinalis.bam 8160732 reads; of these: 8160732 (100.00%) were unpaired; of these: 8148543 (99.85%) aligned 0 times 12189 (0.15%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.15% overall alignment rate ]0;~/Elixir/metagenomics/species/hisatreczko@max:~/Elixir/metagenomics/species/hisat$ hisat2 -k 1 -p 6 --dta -x ~/Elixir/metagenomics/species/hisat/Artemisia_annua -U /data/images/proton2/run431/R_2019_08_07_16_07_46_user_IONAS-431-GKlab-MS_SPlab_190807_GK3R288-302_SP3R1-4.SPD3_fragDNA1.IonXpress_031.fastq | /data/results/tools/samtools/samtools-1.3/samtools sort -@ 8 -o /home/reczko/Elixir/metagenomics/SPlab/run440/genomic/SPD3_Artemisia_annua.bam 8160732 reads; of these: 8160732 (100.00%) were unpaired; of these: 8145042 (99.81%) aligned 0 times 15690 (0.19%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.19% overall alignment rate ]0;~/Elixir/metagenomics/species/hisatreczko@max:~/Elixir/metagenomics/species/hisat$ hisat2 -k 1 -p 6 --dta -x ~/Elixir/metagenomics/species/hisat/varroa_destructor -U /data/images/proton2/run431/R_2019_08_07_16_07_46_user_IONAS-431-GKlab-MS_SPlab_190807_GK3R288-302_SP3R1-4.SPD3_fragDNA1.IonXpress_031.fastq | /data/results/tools/samtools/samtools-1.3/samtools sort -@ 8 -o /home/reczko/Elixir/metagenomics/SPlab/run440/genomic/SPD3_varroa_destructor.bam 8160732 reads; of these: 8160732 (100.00%) were unpaired; of these: 8159364 (99.98%) aligned 0 times 1368 (0.02%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.02% overall alignment rate ]0;~/Elixir/metagenomics/species/hisatreczko@max:~/Elixir/metagenomics/species/hisat$ hisat2 -k 1 -p 6 --dta -x ~/Elixir/metagenomics/species/hisat/vitis_vinifera -U /data/images/proton2/run431/R_2019_08_07_16_07_46_user_IONAS-431-GKlab-MS_SPlab_190807_GK3R288-302_SP3R1-4.SPD3_fragDNA1.IonXpress_031.fastq | /data/results/tools/samtools/samtools-1.3/samtools sort -@ 8 -o /home/reczko/Elixir/metagenomics/SPlab/run440/genomic/SPD3_vitis_vinifera.bam 8160732 reads; of these: 8160732 (100.00%) were unpaired; of these: 8151055 (99.88%) aligned 0 times 9677 (0.12%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.12% overall alignment rate DIAMOND had Vitis vinifera 5954 reads hisat2 -k 1 -p 6 --dta -x ~/Elixir/metagenomics/species/hisat/Apis_mellifera -U /data/images/proton2/run431/R_2019_08_07_16_07_46_user_IONAS-431-GKlab-MS_SPlab_190807_GK3R288-302_SP3R1-4.SPD3_fragDNA1.IonXpress_031.fastq | /data/results/tools/samtools/samtools-1.3/samtools sort -@ 8 -o /home/reczko/Elixir/metagenomics/SPlab/run440/genomic/SPD3_apis_mellifera.bam 8160732 reads; of these: 8160732 (100.00%) were unpaired; of these: 8147679 (99.84%) aligned 0 times 13053 (0.16%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.16% overall alignment rate cd /home/reczko/Elixir/metagenomics/SPlab/run440/genomic zcat ~/Elixir/metagenomics/species/Apis\ mellifera/ncbi-genomes-2020-03-03/GCF_003254395.2_Amel_HAv3.1_genomic.gtf.gz > gtf /data/results/tools/rnaseq/subread/subread-1.5.2-source/bin/featureCounts -T 16 -Q 10 -a gtf -o foo SPD3_apis_mellifera.bam || Successfully assigned reads : 2537 (0.0%) || https://bioinformatics.stackexchange.com/questions/3482/counting-reads-for-each-biotype https://github.com/ewels/ngi_visualizations/tree/master/ngi_visualizations/biotypes zcat ~/Elixir/metagenomics/species/Apis\ mellifera/ncbi-genomes-2020-03-03/GCF_003254395.2_Amel_HAv3.1_genomic.fna.gz > fa samtools faidx fa sort fa.fai > genome.txt bedtools bamtobed -i SPD3_apis_mellifera.bam > bed; awk -f /data/results/tools/awk/gff2bed.awk gtf | sort -k 1,1 -k2,2n > gbed bedtools complement -i gbed -g genome.txt > intergenic.bed # 100% intergenic reads bedtools coverage -f 1.0 -a intergenic.bed -b SPD3_apis_mellifera.bam | awk '{if (0+$NF>0) {print $0}}' >icov wc icov 63 441 3077 icov bedtools coverage -a gtf -b SPD3_apis_mellifera.bam | awk '{if (0+$NF>0) {print $0}}' >cov awk -f ~/Elixir/metagenomics/species/get-biotypes.awk cov misc_RNA 2 0.0648298 lncRNA 423 13.7115 rRNA 11 0.356564 pseudogene 7 0.226904 miRNA 3 0.0972447 tRNA 1 0.0324149 protein_coding 2637 85.4781 snRNA 1 0.0324149 DIAMOND had Apidae 696 reads zcat ~/Elixir/metagenomics/species/Papaver\ somniferum/ncbi-genomes-2020-03-03/GCF_003573695.1_ASM357369v1_genomic.fna.gz >foo;hisat2-build -p 9 -f foo Papaver_somniferum samtools view SPD3_vitis_vinifera.bam > SPD3_vitis_vinifera.sam Diamod/Megan matching read: QHOT2:00041:07763 [length=192, matches=134] >RVW79797.1 Length = 559 Score = 83 bits (206), Expect = 3e-13 Identities = 38/65 (58%), Positives = 48/65 (74%), Gaps = 1/65 (2%) Frame = +2 Query: 2 PLAYAIVSSEKYDDWSWFLTKLKKVIRFANVM/VISDRHKTYCLVLSEVFGKKNHAYCYKHIKEN 192 PLA+ ++SSE YDDWSWFL LKKV+ V+ +ISDRH + EVFG +NHAYCY+H+KEN Sbjct: 332 PLAFGVMSSENYDDWSWFLQNLKKVVGDKEVV-IISDRHPALLRSVPEVFGLENHAYCYRHLKEN 395 ~/Elixir/metagenomics/SPlab/run440/genomic/SPD3_vitis_vinifera.sam QHOT2:00041:07763 4 * 0 0 * * 0 0 TCCTTTAGCATATGCTATTGTTAGTTCAGAAAAGTATGATGATTGGTCTTGGTTCTTAACTAAGCTAAAAAAGGTTATCAGGTTTGCGAATGTAATGTTATTTCTGATAGACATAAAACATATTGTCTAGTGTTGTCAGAGGTATTTGGTAAGAAGAATCATGCATATTGCTACAAACACATTAAAGAGAAT 7;099*8::;=<<<<==>7<<5;8818<<<;;<;;<8<88777<7<4::7;8:::6;;//////%/49)/.666369,....4::990632,768/8335;;:<::5777)5888965777::97:6...//677;;;?5:69<6<<7;;5:9<;;;<<<<8<;:<<=B8===>A9@A6;;<<7< YT:Z:UU DIAMOND match: >RVW79797.1 Length = 559 Score = 83 bits (206), Expect = 3e-13 Identities = 38/65 (58%), Positives = 48/65 (74%), Gaps = 1/65 (2%) Frame = +2 Query: 2 PLAYAIVSSEKYDDWSWFLTKLKKVIRFANVM/VISDRHKTYCLVLSEVFGKKNHAYCYKHIKEN 192 PLA+ ++SSE YDDWSWFL LKKV+ V+ +ISDRH + EVFG +NHAYCY+H+KEN Sbjct: 332 PLAFGVMSSENYDDWSWFLQNLKKVVGDKEVV-IISDRHPALLRSVPEVFGLENHAYCYRHLKEN 395 BLASTX match: hypothetical protein CK203_047614 [Vitis vinifera] Sequence ID: RVW79797.1Length: 559Number of Matches: 2 Range 1: 332 to 357GenPeptGraphics Next Match Previous Match Alignment statistics for match #1 Score Expect Method Identities Positives Gaps Frame 48.9 bits(115) 2e-11 Composition-based stats. 18/26(69%) 22/26(84%) 0/26(0%) +2 Query 2 PLAYAIVSSEKYDDWSWFLTKLKKVI 79 PLA+ ++SSE YDDWSWFL LKKV+ Sbjct 332 PLAFGVMSSENYDDWSWFLQNLKKVV 357 Range 2: 361 to 395GenPeptGraphics Next Match Previous Match First Match Alignment statistics for match #2 Score Expect Method Identities Positives Gaps Frame 48.1 bits(113) 2e-11 Composition-based stats. 20/35(57%) 25/35(71%) 0/35(0%) +1 Query 88 ECNVISDRHKTYCLVLSEVFGKKNHAYCYKHIKEN 192 E +ISDRH + EVFG +NHAYCY+H+KEN Sbjct 361 EVVIISDRHPALLRSVPEVFGLENHAYCYRHLKEN 395 # get biotypes ~/Elixir/metagenomics/species/Apis mellifera/GCF_003254395.2_Amel_HAv3.1_genomic.gff.gz zcat /home/reczko/Elixir/metagenomics/species/Apis\ mellifera/GCF_003254395.2_Amel_HAv3.1_genomic.gff.gz > gff /data/results/tools/align/cufflinks-2.2.1.Linux_x86_64/gffread gff -T -o my.gtf bedtools coverage -a gtf -b SPD3_apis_mellifera.bam /data/results/tools/rnaseq/subread/subread-1.5.2-source/bin/featureCounts -T 16 -Q 10 -a gtf -o foo SPD3_apis_mellifera.bam awk '{if (0+$NF>0) {print $0}}' foo > foo2 GTF=/path/to/Mus_musculus.GRCm38.78.gtf EXPTNAME=mouse_rna CPUS=8 MAPQ=10 GENEMX=${EXPTNAME}_genes.mx /data/results/tools/rnaseq/subread/subread-1.5.2-source/bin/featureCounts #Make the gene-wise matrix featureCounts -Q $MAPQ -T $CPUS -a $GTF -o /dev/stdout *bam \ | cut -f1,7- | sed 1d > $GENEMX Consider attaching the gene name to give the data more relevance. To do that, first make a table of Ensembl IDs and gene names from the GTF file. grep -w gene Mus_musculus.GRCm38.78.gtf | head -1000 \ | cut -d '"' -f2,6 | tr '"' '\t' | sort -k 1b,1 > ENS2genename.txt Next, use unix join (or other method) to attach the gene name. head -1 RNA-seq.mx > header.txt sed 1d RNA-seq.mx | sort -k 1b,1 \ | join -1 1 -2 1 ENS2genename.txt - \ | tr ' ' '\t' | sed 's/\t/_/' \ | cat header.txt - > RNA-seq_genenames.mx /data/results/tools/rnaseq/subread/subread-1.5.2-source/bin/featureCounts -O --fraction -M -s 1 -J -f -g gene_name -t exon -T 4 -a /data/results/reference/mmu/Mus_musculus.NCBIM37.64-toMM9.gtf -o wt01.cnt wt01.bam wt02.bam WT_0h_replicate_I.bam WT_0h_replicate_II_.bam ./all_varroa.sh &> all_varroa.log ./all_Apis.sh &> all_Apis.log ./all_Artemisia.sh &> all_Artemisia.log ./all_Asparagus.sh &> all_Asparagus.log ./all_Cajanus.sh &> all_Cajanus.log ./all_Helianthus.sh &> all_Helianthus.log ./all_Nosema.sh &> all_Nosema.log ./all_Olea.sh &> all_Olea.log ./all_Papaver.sh &> all_Papaver.log ./all_vitis.sh &> all_vitis.log all_Apis.log all_Artemisia.log all_Asparagus.log all_Cajanus.log all_Helianthus.log all_Nosema.log all_Olea.log all_Papaver.log all_varroa.log all_vitis.log rm all_genomic.txt for i in *log do awk -f log2csv.awk $i >> all_genomic.txt done awk -f all2matrix.awk all_genomic.txt > all_genomic.csv #stricter settings ~/Elixir/metagenomics/SPlab/run440/genomicreczko@max:~/Elixir/metagenomics/SPlab/run440/genomic$ /data/results/tools/rnaseq/hisat/hisat2-2.1.0/hisat2 -k 1 -p 12 --rdg 10,6 --rfg 10,6 --dta -x ~/Elixir/metagenomics/species/hisat/varroa_destructor -U /data/images/proton2/run431/R_2019_08_07_16_07_46_user_IONAS-431-GKlab-MS_SPlab_190807_GK3R288-302_SP3R1-4.SPD3_fragDNA1.IonXpress_031.fastq | /data/results/tools/samtools/samtools-1.3/samtools sort -@ 8 -o /home/reczko/Elixir/metagenomics/SPlab/run440/genomic/SPD3_varroa_strict.bam Warning: the current version of HISAT2 (2.1.0) is older than the version (2.2.0) used to build the index. Users are strongly recommended to update HISAT2 to the latest version. Warning: the current version of HISAT2 (2.1.0) is older than the version (2.2.0) used to build the index. Users are strongly recommended to update HISAT2 to the latest version. 8160732 reads; of these: 8160732 (100.00%) were unpaired; of these: 8159533 (99.99%) aligned 0 times 1199 (0.01%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.01% overall alignment rate awk -f log2csv.awk all_varroa_strict.log > all_genomic_strict.txt awk -f all2matrix.awk all_genomic_strict.txt