#@ 07112017 Dear Martin I finally succeed with the Geo Submission and soon will be online and I will send to you, Pantelis and Vaggelis. As I mentioned in my previous mail, at present we are finally writing the manuscript describing the RCAD (Renal Cysts And Diabetes) mouse model and before to perform the definitive GO analysis on the mRNA seq data I would like to see these data without filtering In the enclosed documents there the normalized counts of each stage examined at Fleming E14.5, E15.6, E17.5 (Fasteris re analzed by you) and P1. Is these enough or you need the correct links that also I have . All these files are correct and verified Also are included the comparison E17.5 Fasteris versus your analysis . I still do not understand the differences and why the listings diverge. Please let me know if you think these docs are enough or you need something else. Once this is done I will recontact in order to load iregulom. Finally we have done GO using Toppgene software, which is more or less comparable to David. However, someone indicated us that we need to confirm the GO analysis with more than a software. Do you have a suggestion ?. Many thanks Silvia #@ Dear Silvia, I've prepared all fastq files in the folder: http://genomics-lab.fleming.gr/fleming/Cereghini/fastq The HoxB7cre samples are called: HoxB7cre_HET_SCR16.fastq.gz HoxB7cre_HET_SCR17.fastq.gz HoxB7cre_WT_SCR18.fastq.gz HoxB7cre_WT_SCR19.fastq.gz The fastq files are the raw data format. Concerning your question "which are the libraries that did you pool. I need to correct the exact amount of RNA you have used for the Hoxb7 and see the Ring values, because the text should correspond to our samples" Vangelis Harokopos or Pantelis should be able to answer. Best wishes, Martin #@ Dear Silvia, please find the metaseqR analysis for the Fasteris E17.5 data at http://genomics-lab.fleming.gr/fleming/Cereghini/e17.5/metaseqr_E17.5_edger/index.html WT are GZS_17 -35, Mutant Heterozygous : GZS_34-36 The the gene expr. table is also attached as an xls. I've checked consistenct for most of the top diff. exp. genes against the previous analysis (Tuxedo NCBIM37 gene expr diff.xlsx). Best wishes, Martin PS: My steps to start Cytoscape and install iRegulon (on a Linux system) are: export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 ./cytoscape.sh Apps->App Manager->Search iRegulon -> Install #@ Dear Martin I am at present out of Paris but I will come back on Thursday dConcerning your questions: 1- The sequences of Fasteris are true stranded. Is this information enough?` You have already the bioinformatic Tuxedo analysis done by Fasteris in addition to the splicing isoform (in fact we were specifically interested on the alternative splicing of exon 2 of Hnf1B because the mutation we introduced in the mouse is a human splicing mutation . in the donor intron 2 that eliminated exon 2 as well as the last 32bp bases of exon 2. However the analysis resulted quite complicated because the mutants are heterozygous and there are already 2 wt isoforms plus the other spliced isoforms. In fact I did not realized what happened from this analysis. 2- DATA E14.5 Enclosed is the mail of Pantelis with E14.5 samples and correct numbers of the WT and Het samples. Please note that there are many pseudogenes diff expressed and the results were somehow strange. 3- Geo submission: Concerning the bam files you send me, can them be converted in a way I can open as excell or other . I know people in our unit have softwares to do it but I could not yet to have the possibility to come to their labs. Finally , If you can help us into installation of iRegulom and cytoscape it would great because as you say it is important that we use also. many thanks Silvia #@ Dear Silvia, thank you for the information. I've transferred and started processing. My inspection revealed that this data used is Illumina and has used the TrueSeq adapter (that is currently removed). After the alignments, I will also infer if the sequencing was stranded or unstranded (Illumia's default), but if you have this information, this would help. Concerning filtering the fold-change cutoffs, we can define and threhold we like using the complete result table, there is no need to re-run. > Finally I was checking all the mails sent by Pantelis that I found in > an old computer because my computer was over las year and I found the > metadata analysis of E14.5 samples sent separately that are clearly > indicated in the input. The mistake arrived when all the links were > sent together. > > I imagine you had this. If you want I can send back. Great, can you please send again that email to me, just to verify which of my versions is the correct one. > I am still trying to find a computer to convert the bam files to > another that I could convert. For instance I could not open tha fastaq > sent by Fasteris for that reason I did not have. Let me know if I can help. > PS: as I mentioned in a previous mail we have applied Toppfun to > define GO terms , expression patterns disease etc. We would like > rather with all these data to do gene regulatory networks (I saw > papers that use i-Regulom based on Cytospace .Do you know these > sofwares? I have installed i-Regulom to our Cytoscape, however we have not yet used this. In principle I could try to run this, but as this is very interactive, it would be better if you installed this on your side too. I can assist in the installation, if needed. Tomorrow evening, the E17.5 results should be ready. Best wishes Martin #@ Silvia Cereghini Jul 14 (3 days ago) to me Dear Martin Yesterday I sent the fastaq raw data obtained from Fasteris: WT are GZS_17 -35, Mutant Heterozygous : GZS_34-36 in each case each sample correspond to pools of 2 embryos. So it could be performed the metadata analysis as you have already done for the other samples But before we need to clarify the aspect of Filtering : I would like to know how this is applied and which are the cut off values you use, in other words as I asked previously if the values of for instance are very low only in mutants do you exclude them? On the other side, we must take in account that the mutant are heterozygous mutants so we do not expect drastic decreases of expression for that reason the cut off value instead -1 it would preferably -0.75 , because this include many targets of Hnf1b that are in fact downregulated. Finally I was checking all the mails sent by Pantelis that I found in an old computer because my computer was over las year and I found the metadata analysis of E14.5 samples sent separately that are clearly indicated in the input. The mistake arrived when all the links were sent together. I imagine you had this. If you want I can send back. Once clarified the problem of filtering or not , the metadata analysis from all samples is done. I am still trying to find a computer to convert the bam files to another that I could convert. For instance I could not open tha fastaq sent by Fasteris for that reason I did not have. All the best `Silvia . PS: as I mentioned in a previous mail we have applied Toppfun to define GO terms , expression patterns disease etc. We would like rather with all these data to do gene regulatory networks (I saw papers that use i-Regulom based on Cytospace .Do you know these sofwares? #@ Dear Silvia, thank you for your message. Concerning the raw data for GEO submission, the most common form are 'bam' files that contain both the original reads and the alignment information. Below is a list with URLs for all samples we have sequenced for you (except the recent 3UTRseq data). Please confirm that you can upload from these links. http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/E15.5_HET_SCR1.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/E15.5_WT_SCR2.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/SCR10-E145-4wt.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/SCR12-E145-6Het.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/SCR13-E145-3wt.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/SCR14-E145-5Het.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/SCR15-E145-HET7.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/HoxB7cre_HET_SCR16.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/HoxB7cre_HET_SCR17.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/HoxB7cre_WT_SCR18.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/HoxB7cre_WT_SCR19.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/RCAD_P1_HET_SCR22.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/RCAD_P1_HET_SCR23.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/RCAD_P1_WT_SCR20.bam http://genomics-lab.fleming.gr/fleming/Cereghini/bam-files/RCAD_P1_WT_SCR21.bam > We are at present writing our manuscript on the RCAD mouse > model. Therefore it is important to know whether there was a problem > of the results of E14.5 which the number of samples were different > from the ones you have on the runs done. The names of your samples indicate that SCR20,21,22,23 are for the RCAD mouse model. For E14.5, there are 2 WT samples (SCR10,13) and 3 HET samples (SCR12,14,15). > In addition there are other points: > > 1- I am requesting to Fasteris the fasta data at E17.5 because in any > case we will need to submit to GEO in the future after submission. > > 2- I do not think necessary to reanalyze the other data done in > Fleming using Tuxedo as Fasteris, because the metadata done in Fleming > is very good and I prefer in this way Once you have the E17.5 data in fastq or bam format (fasta is not sufficient!), we can process with metaseqR in the same way as all other samples, which is necessary if this data should be used with the other data in the same publication. > 3- However as I mentioned previously I would like to know whether in > the analysis of E14.5, E15.5 and P1 you have applied a cutoff > excluding low values, because I prefer not to ap ply because this > exclude those genes that are strongly down regulated or absent either > in mutants or WT. In all of these cases, genes with low expressions were excluded to avoid artifacts. I can rerun this without this filter. For the HoxB7cre samples (SCR16,17,18,19) I did this, see my message from Jan 12/2017 and the report at: http://genomics-lab.fleming.gr/fleming/Cereghini/run142/metaseqr_wt_vs_mut_run128_with_flags_unfiltered > 4- Moreover probably the data on E15.5 although the results are fine > was done pooling 3 xWT and 3X het mutants, so it will be not > acceptable because dos not include at least 2 samples of each > condition The information I have on E15.5 is that these were 2 samples sequenced with high depth (SC1_E15.5_1-3 with 29 million reads and SC2_E15.5_2-4 with 24 million reads). > Could you please tell me when you will have time to discuss further on > the analysis of the data and what we would like to discuss/ analyze > and the possibilities on the basis of our results. I can if you prefer > to phone you . > > I should point out that the analysis done at E15.5, and P1 were very > coherent with our PCR analysis and you can see there are many common > downregulated -genes . However E14.5 was strange in particular the > most down regulated genes have little to do with kidney genes. I was > thinking that perhaps there were a contamination with blood, but when > you mentioned the problem of number of samples I am thinking that > perhaps something was mixed up. For now, a) please confirm you can obtain the raw data from the links above b) let me know if, when and how I can obtain the E17.5 data c) let me know if you would like to have an unfilterd analysis of E14.5, E15.5 and P1 Best wishes, Martin #@ 26062017 Dear Martin, We are at present writing our manuscript on the RCAD mouse model. Therefore it is important to know whether there was a problem of the results of E14.5 which the number of samples were different from the ones you have on the runs done. In addition there are other points: 1- I am requesting to Fasteris the fasta data at E17.5 because in any case we will need to submit to GEO in the future after submission. 2- I do not think necessary to reanalyze the other data done in Fleming using Tuxedo as Fasteris, because the metadata done in Fleming is very good and I prefer in this way 3- However as I mentioned previously I would like to know whether in the analysis of E14.5, E15.5 and P1 you have applied a cutoff excluding low values, because I prefer not to apply because this exclude those genes that are strongly down regulated or absent either in mutants or WT. 4- Moreover probably the data on E15.5 although the results are fine was done pooling 3 xWT and 3X het mutants, so it will be not acceptable because dos not include at least 2 samples of each condition Could you please tell me when you will have time to discuss further on the analysis of the data and what we would like to discuss/ analyze and the possibilities on the basis of our results. I can if you prefer to phone you . I should point out that the analysis done at E15.5, and P1 were very coherent with our PCR analysis and you can see there are many common downregulated -genes . However E14.5 was strange in particular the most down regulated genes have little to do with kidney genes. I was thinking that perhaps there were a contamination with blood, but when you mentioned the problem of number of samples I am thinking that perhaps something was mixed up. Thanking you in advance Silvia #@ sorry I realized I did a mistake t-( correct version in red) Début du message réexpédié : De : Silvia Cereghini Objet : Rép : Confirmation aP1 Date : 12 juin 2017 17:53:51 UTC+02:00 À : Martin Reczko Dear Martin Enclosed are the list of RNAs sent the 2-09-2015. Concerning P1 we selected : 5HET, 6WT, 7WT ,8 HET Which correspond to the runs run141 SCR21-7wt SCR23-8Het run143 SCR20-6wt SCR22-5Het SO 20, 21 are WT and 22, 23 HET as indicated Concerning E14.5 run106 SCR10-E145-4wt SCR12-E145-6Het run107 SCR13-E145-3wt SCR14-E145-5Het run118 SCR15-E145HET7 They coincide with the numbers of E14.5 litter WHY the numbers of E14.5 have been changed in the link of metadata I ignore . I believe also there was a problem with one of the samples because we had only 2 WT and 2 HET , but this was done in the platform , so I cannot provide any answer on these. But perhaps you will be able to find out. Best Silvia #@ Dear Silvia, the assignment you confirm for WT=(SCR20,22) HET=(SCR21,23) has not been used in any of the previous analysis. a) The first assignment was WT=(SCR20,21) HET=(SCR22,23) This is in: http://genomics-lab.fleming.gr/fleming/Cereghini/metaseqr_RCAD_P1/index.html and gave: WT_vs_HET: 91 statistically significant genes of which 9 up regulated, 82 down regulated b) Then we removed SCR22 and reassigned SCR21 to HET: WT=(SCR20) HET=(SCR21, SCR23) This is in http://genomics-lab.fleming.gr/fleming/Cereghini/run142/metaseqr_wt_vs_het_run142_woSCR22/index.html and gave WT_vs_HET: 0 statistically significant genes c) I rerun with your new assignment WT=(SCR20,22) HET=(SCR21,23) results are in http://genomics-lab.fleming.gr/fleming/Cereghini/run142/metaseqr_wt_vs_het_run142_rerun09062017/ and give WT_vs_HET: 10 statistically significant genes of which 9 up regulated, 1 down regulated This affects all subsequent comparisons that include RCAD_P1. Before I generate affected figures again, I would need to verify the manuscript you mention as we have agreed that all these analysis are done within our collaboration. BW, Martin ilvia Cereghini 3:18 PM (15 minutes ago) to Pantelis, me Dear Pantelis and Martin Enclosed are all the results you have sent me clearly specified and I confirm that SCR20: 32993714 is WT E14.5 SCR22: 61736816 WT SCR21: 53526749 HET SCR23: 36634546 HET What is for submitting to Geo correspond to the file Raw Data . Could you please confirm this? Best silvia For S. Cereghini, renaming and the WT analysis on all genes has finished: At http://genomics-lab.fleming.gr/fleming/Cereghini/metaseqr_e15.5/index.html http://genomics-lab.fleming.gr/fleming/Cereghini/metaseqr_e14.5/index.html http://genomics-lab.fleming.gr/fleming/Cereghini/metaseqr_HoxB7cre/index.html http://genomics-lab.fleming.gr/fleming/Cereghini/metaseqr_RCAD_P1/index.html are the meteseqr reports for all Cereghini runs. At http://genomics-lab.fleming.gr/fleming/Cereghini/run142/ you'll find the following scatterplots with the following correlations between the significantly differentially expressed genes: DE1_vs_DE2 Pearson Correlation diff_exp_e15.5_VS_diff_exp_e14.5.png 0.5380191 diff_exp_e14.5_VS_diff_exp_HoxB7cre.png 0.395454 diff_exp_e14.5_VS_diff_exp_RCAD_P1.png -0.01915 diff_exp_e15.5_VS_diff_exp_HoxB7cre.png 0.9214371 diff_exp_e15.5_VS_diff_exp_RCAD_P1.png -0.01915 diff_exp_HoxB7cre_VS_diff_exp_RCAD_P1.png 0.5980661 Also, scatterplots and correlations between expression values of all genes in the WT samples are: WT1_vs_WT2 Pearson Correlation WT_e15.5_VS_WT_e14.5.png 0.9438614 WT_e14.5_VS_WT_HoxB7cre.png 0.950614 WT_e14.5_VS_WT_RCAD_P1.png 0.9339366 WT_e15.5_VS_WT_HoxB7cre.png 0.9573134 WT_e15.5_VS_WT_RCAD_P1.png 0.9409653 WT_HoxB7cre_VS_WT_RCAD_P1.png 0.9498121 #@ Dear Silvia, it took some time to check my old files. The Tuxedo analysis I have sent was meant to be compared to the last mail I sent on 4/26/15. However I noticed that in the old link http://genomics-lab.fleming.gr/fleming/Cereghini/run142/metaseqr_wt_vs_het_run142_woSCR22/ the setting was SCR20 (WT) vs SCR21 (Het), SCR23 (Het). The correct link back then was http://genomics-lab.fleming.gr/fleming/Cereghini/run142/metaseqr_wt_vs_het_run142_woSCR22-scr21_wt_instead_of_het/ In this the setting was SCR20 (WT), SCR21 (WT) vs SCR23 (Het). Thus, the negative log2(fold_change) in http://genomics-lab.fleming.gr/fleming/Cereghini/Tuxedo_run143/Tuxedo_run143_gene_exp_diff.xlsx are comparable with the log2_normalized_fold_change_WT_vs_HET in http://genomics-lab.fleming.gr/fleming/Cereghini/run142/metaseqr_wt_vs_het_run142_woSCR22-scr21_wt_instead_of_het//lists/metaseqr_sig_out_WT_vs_HET.txt.gz Sorry for the confusion. Pantelis forwarded me some GEO deposition request related to data in a submitted manuscript. Do you have some more information on this? Best wishes, Martin Martin Reczko Mar 8 to Silvia Dear Silvia, please excuse my delayed answer, I did receive your initial mail and reprocessed the data with the Tuxedo suite in the meantime, to facilitate combination with the 'Fasteris data'. For the contrast: RCAD_P1_WT_SCR20,RCAD_P1_WT_SCR21 vs. RCAD_P1_HET_SCR23 (as before), you can find the results at: http://genomics-lab.fleming.gr/fleming/Cereghini/Tuxedo_run143/Tuxedo_run143_gene_exp_diff.xlsx http://genomics-lab.fleming.gr/fleming/Cereghini/Tuxedo_run143/Tuxedo_run143_cds_exp_diff.xlsx http://genomics-lab.fleming.gr/fleming/Cereghini/Tuxedo_run143/Tuxedo_run143_isoform_exp_diff.xlsx (Let me know if you prefer to attach these files). One question concerning your summary, point: " 4- Concerning Fasteris I only found several data files but not the row data . I will send you in another mail. They include analysis of splicing isoforms because we were interested in the RCAD model the detection of transcripts that lack the exon 2 due to the splicing mutation introduced. This is complex because the animals are heterozygous. " The exon 2 of which gene/transcript are you referring to (so I can have a closer look). Best wishes, Martin #@ Martin Reczko 4/26/15 to Silvia Dear Silvia, as you suggested, I've run the diff. exp. analysis again without SCR22, which seems much more reasonable. Please see the results at: http://genomics-lab.fleming.gr/fleming/Cereghini/run142/metaseqr_wt_vs_het_run142_woSCR22/ This is SCR20 and SCR21 (WT) vs. SCR23 (Het). Best wishes, Martin @@ Dear Silvia, please ignore the previous results, I gave the wrong (WT) lable to SCR21. I've re-run it with these sample lables: SCR20 WT SCR21 HET SCR23 HET Now we have 10216 differentially expressed features (while with the wrong lable we had 6293 differentially expressed features). The results are at the same location, please reload http://genomics-lab.fleming.gr/fleming/Cereghini/run142/metaseqr_wt_vs_het_run142_woSCR22 Please excuse this confusion, Best wishes, Martin The groups are: run20: SC1_E15.5_1-3 SC2_E15.5_2-4 run107: SCR10-E145-4wt SCR12-E145-6Het SCR13-E145-3wt SCR14-E145-5Het run128 SCR17r-65n5mut-bc4 SCR19r-65n11wt-bc15 run127 SCR16r-65n2mut-bc2 SCR18r-65n8wt-bc5 run141 SCR20-6wtt-bc5 SCR22-5Het-bc14 run143 SCR21-7wt-bc9 SCR23-8Het-bc15 Silvia Cereghini 3:25 AM (8 hours ago) to me Dear Martin Thanks for your mail. But I am really confused because I was thinking that SCR20 and 22 were WT And SCR21 and 23 het . Is that correct or is correct that t you indicated in your mail. Best regards @@ Martin Reczko 3:11 PM (21 hours ago) to Silvia Dear Silvia, as you suggested, I've run the diff. exp. analysis again without SCR22, which seems much more reasonable. Please see the results at: http://genomics-lab.fleming.gr/fleming/Cereghini/run142/metaseqr_wt_vs_het_run142_woSCR22/ This is SCR20 and SCR21 (WT) vs. SCR23 (Het). Best wishes, Martin @@ SCR20 /data/images/proton/run107/www/SCR10-E145-4wt.bam WT single yes SCR21 /data/images/proton/run107/www/SCR12-E145-6Het.bam HET single yes SCR22 /data/images/proton/run107/www/SCR13-E145-3wt.bam WT single yes SCR23 /data/images/proton/run107/www/SCR15-E145Het7.bam HET single yes samplename filename condition paired stranded SCR20 /data/images/proton/run141/tophat_005/sort_uniq.bam WT single yes SCR21 /data/images/proton/run143/tophat_009/sort_uniq.bam WT single yes SCR22 /data/images/proton/run141/tophat_014/sort_uniq.bam HET single yes SCR23 /data/images/proton/run143/tophat_015/sort_uniq.bam HET single yes metaseqr_e14.5 -> run142/metaseqr_wt_vs_het_run107 @@ Silvia Cereghini 3:49 AM (11 hours ago) to me Dear Martin Thanks very much. In fact yesterday I found the data and was analysing with Toppcluster. All analysis (E15.5, Hoxb7cre and P1 RCAD gave as expected genes expressed differentially in kidney and different renal compartments. And they are fine. I have still the same problem I explained in a previous mail with samples from E14.5 (analyzed last year associated with problems in getting libraries. Indeed in this case there are many genes that have nothing to do with kidney and also some of them with values very different between WT. The GO-biological processes for instance are really unrelated with kidney. Can be this do to contaminations or to a different library preparation. I already ask this to Pantelis who answer you did not find any problem. Since you are used to this kind of global analysis perhaps you will have an explanation? Thanks again Silvia In fact I cannot do very much with such a data @@ Le 16 mars 2015 à 04:43, Martin Reczko a écrit : Dear Silvia, as I have processed all of your data, let me add some details. You can find the expression values in the "Results" section, part "DEG table for the contrast "... of each metaseqr report. For example, with e_14.5 you should find the link. http://genomics-lab.fleming.gr/fleming/Cereghini/metaseqr_e14.5/lists/metaseqr_sig_out_WT_vs_HET.txt.gz In all cases except e_15.5, the results are averaged over the 2 replicates. e_15.5 was the first result obtained and had no replicates. The results you were looking at are the read count tables that contain separate results for each replicate, the relevant results are however in the DEG table. With kind regards, Martin -- Dr. Martin Reczko Head of Genomic Bioinformatics Functional Genomics Laboratory and Genomics Facility Institute of Molecular Biology and Genetics Biomedical Sciences Research Center "Alexander Fleming" P.O. Box 74145 16602, Varkiza, Greece Tel. +30-210-9656310 (ext. 144) FAX: +30-210-9653934 email: reczko@fleming.gr @@Pantelis Hatzis 6:40 PM (36 minutes ago) to me Could you reply to this? P Sent from my iPad Begin forwarded message: From: Silvia Cereghini Date: 14 Μαρτίου 2015 - 2:41:27 π.μ. EET To: Pantelis Hatzis Subject: silvia data Dear Pantelis I have just arrives from Buenos Aires and began to examine the data you sent. I have just a question for all the data sent there are not the the colonnes with mean values (es shown below) but only the separate values of the 2 WT and 2 mutants. Is is possible to obtain this , because is the onlye way to order the values according of different values to compare the 2 WT vs the 2mut Thanks again Silvia meta_p-value_WT_vs_Het meta_FDR_WT_vs_Het natural_normalized_fold_change_WT_vs_Het log2_normalized_fold_change_WT_vs_Het natural_normalized_mean_counts_WT log2_normalized_mean_counts_WT natural_normalized_mean_counts_Het log2_normalized_mean_counts_Het