Dear George and Dafni, thank you for your message, please see the additional method section below. My affiliation is: Institute for Basic Biomedical Science, Biomedical Sciences Research Center "Alexander Fleming". It would be appropriate to acknowledge also Vangelis Harokopos of the Genomics Facility of BSRC “Alexander Fleming for performing the seqeuncing. For completeness, the backbone without the insert I call 'reference sequence' below (3923 nt), could also be added as supp. material. Let me know if you need more information. Best regards, Martin Bioinformatics Methods Ion proton reads were aligned to the reference sequence using bowtie2 (v2.2.8). The alignment information stored in the CIGAR string of the resulting SAM file was parsed and mapped to matching and mismatching sequences using the tool biostar59647 of the JVarkit utilities [Lindenbaum 2015]. From the resulting XML file, a custom awk script extracted the mismatching insert sequence starting at position 10 of the reference sequence. The resulting insert sequences were clustered using the CD-HIT tool (v4.6.1) [Fu et al. 2012] and read counts per insert sequence and condition were extracted using custom awk scripts. References Lindenbaum, Pierre (2015); JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030 Limin Fu, Beifang Niu, Zhengwei Zhu, Sitao Wu and Weizhong Li (2012); CD-HIT: accelerated for clustering the next generation sequencing data. Bioinformatics, 28 (23): 3150-3152. doi: 10.1093/bioinformatics/bts565 https://www.biostars.org/p/59647/