gskretas@eie.gr
iliasmatis@gmail.com
dafnidelivoria@gmail.com

cc: PH


Dear Dr. Skretas,

with great disappointment I noticed today your publication:
Matis et al. in Nature Biomedical Engineering 2017, 1(10), 838–852
where the only reference to my work is the sentence in the methods section:
"From the obtained data, all the sequences with mismatches outside the
variable peptide-encoding region were removed and only the 12-, 15- or
18-bp-long peptide-encoding sequences were subjected to further
analysis."

I recollected my effort for this analysis that sums to a total of
18 days dealing exclusively with your analysis. As the mismatching part
in a gapped alignment to a custom reference sequence had to be extracted and collected, this type
of analysis was a custom design only for your case. With great enthusiasm for
this project I spent a lot of effort into this, assuming that
this would - obviously - lead to the participation in a potential publication.

As I assume all authors for this publication have contributed to this work,
I would kindly ask you contact the journal for the addition of my name to the list of authors.

With kind regards,
Martin Reczko

--
Dr. Martin Reczko
Technical Coordinator - ELIXIR Greece
Staff research scientist professor level
Institute for Basic Biomedical Science
Biomedical Sciences Research Center "Alexander Fleming"
34 Fleming Street
16672 Vari, Greece
Tel. +30-210-9656310 (ext. 144)
FAX: +30-210-9653934
email: reczko@fleming.gr


From: Dafni Delivoria <dafnidelivoria@gmail.com>
Date: Wed, Feb 17, 2016 at 3:18 PM
Subject: Deep sequencing results follow-up
To: hatzis@fleming.gr
Cc: Georgios Skretas <gskretas@eie.gr>, Ilias Matis <iliasmatis@gmail.com>


Dear Dr. Hatzis,

As a follow up to your discussion with Dr. Skretas, I would like to ask you a few questions regarding the deep sequencing results from the first (March 2015) and the second time (January 2016)

First of all, the samples that we have sent for deep sequencing contain members of a library whose DNA sequence is:

….Ccatggttaaagttatcggtcgtcgttccctcggagtgcaaagaatatttgatattggtcttccccaagaccataattttctgctagccaatggggcgatcggccacaat TGC(NNS)3-6 Tgcttaagttttggcaccgaaattttaaccgttgagtacggcccattgcccattggcaaaattgtgagtgaagaaattaattgttc…..

….Ccatggttaaagttatcggtcgtcgttccctcggagtgcaaagaatatttgatattggtcttccccaagaccataattttctgctagccaatggggcgatcggccacaat AGC(NNS)3-6 Tgcttaagttttggcaccgaaattttaaccgttgagtacggcccattgcccattggcaaaattgtgagtgaagaaattaattgttc…..

….Ccatggttaaagttatcggtcgtcgttccctcggagtgcaaagaatatttgatattggtcttccccaagaccataattttctgctagccaatggggcgatcggccacaat ACC(NNS)3-6 Tgcttaagttttggcaccgaaattttaaccgttgagtacggcccattgcccattggcaaaattgtgagtgaagaaattaattgttc…..

 
In red you can see the random sequences where N: any nucleotide and S: G or C.

Therefore, according to the above:

1.     the sequences that we are looking for should start with TGC, AGC or ACC

2.     these sequences should be 12, 15, 18 or 21 bp long

3.     The 6th, 9th, 12th, 15th, 18th and 21st base of each random sequence should be either G or C.


From the 609.965 DNA sequences reported in the first deep sequencing which appear to be individual, the 564.982 follow the above rules. In contrast, in the second deep sequencing, of the 102.962 sequences reported, only the 26.910 follow the above rules. We also notice that in some cases this can be rectified if we consider these to be misaligned (eg. Either missing the initial T from the TGC(NNS)3-6  sequence which probably happens in the case of GCGGCGGCACCGGGCGC, or having an added T in the start of the random sequence, as it might be the case for the sequence TACCTCGTCGTTCTGG).

Furthermore, I would like to ask whether there is a possibility of contamination between the samples in the first deep sequencing, as we have observed that the most common clones in IMP2 and IMP3 (which appear with more than 100.000 reads each), are also predominant in IMP1 (with 100-4000 reads each in contrast to under 10 reads for the rest of the sequences). For example, in the case of the 15bp long sequences, 1172 sequences were reported in total (with 26.270 total reads for the IMP1 sample) and from these, only 570 sequences appear predominantly in the IMP1 library compared to the other two (with only 3.347 total reads in IMP1). Therefore, the IMP1 sample seems to be enriched with the clones from the IMP2 and IMP3 library, which we don’t believe to be true.

Also, according to your email, the norm_IMP* column contains read counts with a given insert sequence divided by the total number of reads in the library and therefore, the sum of this column should be equal to 1. In the case of IMP1 this is equal to 0.18. Could you tell me why this is the case? 

Finally, in the case of the second deep sequencing there are only 836.722 reads for the Ab42 library and only 147.658 reads for the SOD library. Could you please tell me why there are so few reads reported compared to the first deep sequencing and why this appears to be even worse for the SOD library?

Looking forward to your reply.

Best regards,

Dafni Delivoria

#@
Dr Matis run149
Martin Reczko <reczko@fleming.gr>	Wed, May 4, 2016 at 7:28 PM
To: Pantelis Hatzis <hatzis@fleming.gr>

Dear Pantelis,

happy Easter!
The 2 libraries of Dr. Matis have been processed as before.

The alignments performed as follows:
GSKP4-Ab42.IonXpress_015.fastq.bam
3813591 reads; of these:
  3813591 (100.00%) were unpaired; of these:
    1708736 (44.81%) aligned 0 times
    2104855 (55.19%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
55.19% overall alignment rate

GSKP5-SOD.IonXpress_016.fastq.bam
7852313 reads; of these:
  7852313 (100.00%) were unpaired; of these:
    894022 (11.39%) aligned 0 times
    6958291 (88.61%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
88.61% overall alignment rate


The number of reads in each library for each insert sequence is in the attached
insert_samples_run288.xlsx

The GSKP* columns contain the number of reads with the given insert
sequence.  The norm_GSKP* columns contain read counts with the given
insert sequence divided by the total number of reads with inserts in the library.

BW,
Martin
[Quoted text hidden]

		insert_samples_run288.xlsx
4206K

#@
Dr Matis run149
Martin Reczko <reczko@fleming.gr>	Tue, Mar 10, 2015 at 12:38 AM
To: Pantelis Hatzis <hatzis@fleming.gr>

Dear Pantelis,

pls note the results for Dr Matis
http://genomics-lab.fleming.gr/fleming/Matis/run149

The alignments performed as follows:

IonXpress_009_IMP1_4-5-6-7PeptideLinrary
16102381 reads; of these:
  16102381 (100.00%) were unpaired; of these:
    7311491 (45.41%) aligned 0 times
    8790890 (54.59%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
54.59% overall alignment rate

IonXpress_011_IMP2_A4VRound4
15233125 reads; of these:
  15233125 (100.00%) were unpaired; of these:
    7881760 (51.74%) aligned 0 times
    7351365 (48.26%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
48.26% overall alignment rate

IonXpress_015_IMP3_Ab42Round2
13825029 reads; of these:
  13825029 (100.00%) were unpaired; of these:
    3318079 (24.00%) aligned 0 times
    10506950 (76.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
76.00% overall alignment rate

A random sample of 1000 reads was drawn from the unmapped reads and aligned against NCBI's non-redundant database.
The blast results and the distribution of the matching species are in:
IMP1-unmapped-blast1000.txt
IMP1-unmapped-blast1000-species.txt
IMP2-unmapped-blast1000.txt
IMP2-unmapped-blast1000-species.txt
IMP3-unmapped-blast1000.txt
IMP3-unmapped-blast1000-species.txt


The number of reads in each library for each insert sequence is in:
insert_samples_total_reads_ge10.xlsx (at least 10 reads in all libraries)
and
insert_samples_total_reads_gt1.csv (at least 2 reads in all libraries)

The IMP* columns contain the number of reads with the given insert
sequence.  The norm_IMP* columns contain read counts with the given
insert sequence divided by the total number of reads with inserts in the library.

BW,
Martin