gskretas@eie.gr iliasmatis@gmail.com dafnidelivoria@gmail.com cc: PH Dear Dr. Skretas, with great disappointment I noticed today your publication: Matis et al. in Nature Biomedical Engineering 2017, 1(10), 838–852 where the only reference to my work is the sentence in the methods section: "From the obtained data, all the sequences with mismatches outside the variable peptide-encoding region were removed and only the 12-, 15- or 18-bp-long peptide-encoding sequences were subjected to further analysis." I recollected my effort for this analysis that sums to a total of 18 days dealing exclusively with your analysis. As the mismatching part in a gapped alignment to a custom reference sequence had to be extracted and collected, this type of analysis was a custom design only for your case. With great enthusiasm for this project I spent a lot of effort into this, assuming that this would - obviously - lead to the participation in a potential publication. As I assume all authors for this publication have contributed to this work, I would kindly ask you contact the journal for the addition of my name to the list of authors. With kind regards, Martin Reczko -- Dr. Martin Reczko Technical Coordinator - ELIXIR Greece Staff research scientist professor level Institute for Basic Biomedical Science Biomedical Sciences Research Center "Alexander Fleming" 34 Fleming Street 16672 Vari, Greece Tel. +30-210-9656310 (ext. 144) FAX: +30-210-9653934 email: reczko@fleming.gr From: Dafni Delivoria Date: Wed, Feb 17, 2016 at 3:18 PM Subject: Deep sequencing results follow-up To: hatzis@fleming.gr Cc: Georgios Skretas , Ilias Matis Dear Dr. Hatzis, As a follow up to your discussion with Dr. Skretas, I would like to ask you a few questions regarding the deep sequencing results from the first (March 2015) and the second time (January 2016) First of all, the samples that we have sent for deep sequencing contain members of a library whose DNA sequence is: ….Ccatggttaaagttatcggtcgtcgttccctcggagtgcaaagaatatttgatattggtcttccccaagaccataattttctgctagccaatggggcgatcggccacaat TGC(NNS)3-6 Tgcttaagttttggcaccgaaattttaaccgttgagtacggcccattgcccattggcaaaattgtgagtgaagaaattaattgttc….. ….Ccatggttaaagttatcggtcgtcgttccctcggagtgcaaagaatatttgatattggtcttccccaagaccataattttctgctagccaatggggcgatcggccacaat AGC(NNS)3-6 Tgcttaagttttggcaccgaaattttaaccgttgagtacggcccattgcccattggcaaaattgtgagtgaagaaattaattgttc….. ….Ccatggttaaagttatcggtcgtcgttccctcggagtgcaaagaatatttgatattggtcttccccaagaccataattttctgctagccaatggggcgatcggccacaat ACC(NNS)3-6 Tgcttaagttttggcaccgaaattttaaccgttgagtacggcccattgcccattggcaaaattgtgagtgaagaaattaattgttc….. In red you can see the random sequences where N: any nucleotide and S: G or C. Therefore, according to the above: 1. the sequences that we are looking for should start with TGC, AGC or ACC 2. these sequences should be 12, 15, 18 or 21 bp long 3. The 6th, 9th, 12th, 15th, 18th and 21st base of each random sequence should be either G or C. From the 609.965 DNA sequences reported in the first deep sequencing which appear to be individual, the 564.982 follow the above rules. In contrast, in the second deep sequencing, of the 102.962 sequences reported, only the 26.910 follow the above rules. We also notice that in some cases this can be rectified if we consider these to be misaligned (eg. Either missing the initial T from the TGC(NNS)3-6 sequence which probably happens in the case of GCGGCGGCACCGGGCGC, or having an added T in the start of the random sequence, as it might be the case for the sequence TACCTCGTCGTTCTGG). Furthermore, I would like to ask whether there is a possibility of contamination between the samples in the first deep sequencing, as we have observed that the most common clones in IMP2 and IMP3 (which appear with more than 100.000 reads each), are also predominant in IMP1 (with 100-4000 reads each in contrast to under 10 reads for the rest of the sequences). For example, in the case of the 15bp long sequences, 1172 sequences were reported in total (with 26.270 total reads for the IMP1 sample) and from these, only 570 sequences appear predominantly in the IMP1 library compared to the other two (with only 3.347 total reads in IMP1). Therefore, the IMP1 sample seems to be enriched with the clones from the IMP2 and IMP3 library, which we don’t believe to be true. Also, according to your email, the norm_IMP* column contains read counts with a given insert sequence divided by the total number of reads in the library and therefore, the sum of this column should be equal to 1. In the case of IMP1 this is equal to 0.18. Could you tell me why this is the case? Finally, in the case of the second deep sequencing there are only 836.722 reads for the Ab42 library and only 147.658 reads for the SOD library. Could you please tell me why there are so few reads reported compared to the first deep sequencing and why this appears to be even worse for the SOD library? Looking forward to your reply. Best regards, Dafni Delivoria #@ Dr Matis run149 Martin Reczko Wed, May 4, 2016 at 7:28 PM To: Pantelis Hatzis Dear Pantelis, happy Easter! The 2 libraries of Dr. Matis have been processed as before. The alignments performed as follows: GSKP4-Ab42.IonXpress_015.fastq.bam 3813591 reads; of these: 3813591 (100.00%) were unpaired; of these: 1708736 (44.81%) aligned 0 times 2104855 (55.19%) aligned exactly 1 time 0 (0.00%) aligned >1 times 55.19% overall alignment rate GSKP5-SOD.IonXpress_016.fastq.bam 7852313 reads; of these: 7852313 (100.00%) were unpaired; of these: 894022 (11.39%) aligned 0 times 6958291 (88.61%) aligned exactly 1 time 0 (0.00%) aligned >1 times 88.61% overall alignment rate The number of reads in each library for each insert sequence is in the attached insert_samples_run288.xlsx The GSKP* columns contain the number of reads with the given insert sequence. The norm_GSKP* columns contain read counts with the given insert sequence divided by the total number of reads with inserts in the library. BW, Martin [Quoted text hidden] insert_samples_run288.xlsx 4206K #@ Dr Matis run149 Martin Reczko Tue, Mar 10, 2015 at 12:38 AM To: Pantelis Hatzis Dear Pantelis, pls note the results for Dr Matis http://genomics-lab.fleming.gr/fleming/Matis/run149 The alignments performed as follows: IonXpress_009_IMP1_4-5-6-7PeptideLinrary 16102381 reads; of these: 16102381 (100.00%) were unpaired; of these: 7311491 (45.41%) aligned 0 times 8790890 (54.59%) aligned exactly 1 time 0 (0.00%) aligned >1 times 54.59% overall alignment rate IonXpress_011_IMP2_A4VRound4 15233125 reads; of these: 15233125 (100.00%) were unpaired; of these: 7881760 (51.74%) aligned 0 times 7351365 (48.26%) aligned exactly 1 time 0 (0.00%) aligned >1 times 48.26% overall alignment rate IonXpress_015_IMP3_Ab42Round2 13825029 reads; of these: 13825029 (100.00%) were unpaired; of these: 3318079 (24.00%) aligned 0 times 10506950 (76.00%) aligned exactly 1 time 0 (0.00%) aligned >1 times 76.00% overall alignment rate A random sample of 1000 reads was drawn from the unmapped reads and aligned against NCBI's non-redundant database. The blast results and the distribution of the matching species are in: IMP1-unmapped-blast1000.txt IMP1-unmapped-blast1000-species.txt IMP2-unmapped-blast1000.txt IMP2-unmapped-blast1000-species.txt IMP3-unmapped-blast1000.txt IMP3-unmapped-blast1000-species.txt The number of reads in each library for each insert sequence is in: insert_samples_total_reads_ge10.xlsx (at least 10 reads in all libraries) and insert_samples_total_reads_gt1.csv (at least 2 reads in all libraries) The IMP* columns contain the number of reads with the given insert sequence. The norm_IMP* columns contain read counts with the given insert sequence divided by the total number of reads with inserts in the library. BW, Martin