Dear Dionyssis, if you select significantly deregulated genes, please use the column J = FDR_edger and select genes with FDR_edger <= 0.05 . FDR_edger is the p-vaule adjusted for multiple testing. The first 3 genes in your list would be removed this way. For functional annotation, we recommend the online tool DAVID at https://david.ncifcrf.gov/home.jsp I've attached the Nature Protocols paper explaining this in detail. In our case:a) we copy the (up- or downregulated) GENBANK_ACCESSIONs in the colum 'gene_id' b) select 'Functional Annotaion' (leading to https://david.ncifcrf.gov/summary.jsp) c) paste our genes into the box 'Paste a list' d) choose under 'Select Identifier' the format GENBANK_ACCESSION e) choose under 'List Type' : 'Gene List' f) and 'Submit List' In the 'Annotation Summary Results' we expand 'Gene_Ontology' and click 'Chart' at 'GOTERM_BP_FAT' For the genes downregulated in AGS (negative log2_normalized_fold_change_ABCCC_vs_AGS) (= upregulated in ABCCC), the top terms are: induction of apoptosis induction of programmed cell death positive regulation of apoptosis positive regulation of programmed cell death positive regulation of cell death intracellular signaling cascade regulation of phosphate metabolic process regulation of phosphorus metabolic process cell death regulation of smooth muscle cell proliferation death regulation of apoptosis positive regulation of transcription from RNA polymerase II promoter regulation of programmed cell death regulation of cell death For the genes upregulated in AGS (positive log2_normalized_fold_change_ABCCC_vs_AGS) (= downregulated in ABCCC), the top terms are: sterol metabolic process cholesterol biosynthetic process cholesterol metabolic process steroid biosynthetic process steroid metabolic process M phase lipid biosynthetic process cell cycle phase mitosis nuclear division cell cycle process M phase of mitotic cell cycle organelle fission mitotic cell cycle There are more annotations in DAVID, please explore. To be more sensitive, I've rerun all pairs with a relaxed artifact filter iii (remove genes with read counts below the median read counts of the total normalized count distribution) The new results are at: http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCC_vs_AGS/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCCC_vs_AGS/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABFF_vs_AGS/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABFFF_vs_AGS/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_cagAKO_vs_AGS/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCC_vs_cagAKO/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABFF_vs_cagAKO/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCC_vs_ABFF/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABFF_vs_ABFFF/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCCC_vs_ABFFF/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCCC_vs_cagAKO/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABFFF_vs_cagAKO/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCC_vs_ABCCC/index.html BW, Martin Dear Dionyssis, the reason for 'NA' for the p-value is that the gene was filtered by the initial quality filters in one of the 2 conditions. This filtering is explained in the "Analysis summary" at e.g.: http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCC_vs_cagAKO/index.html "Genes presenting any of the following were excluded from further analysis: i) genes with length less than 500 (211 genes), ii) genes whose average reads per 100 bp was less than the 25th quantile of the total normalized distribution of average reads per 100bp (0 genes with cutoff value 0.02046 average reads per 100 bp), iii) genes with read counts below the median read counts of the total normalized count distribution (9074 genes with cutoff value 177 normalized read counts). The total number of genes excluded due to the application of gene filters was 3058." These filter settings ensure that the reported results are reliable and artifacts are excluded. For exploration, these restrictions can of course be lowered. As an example, I have run ABCC_vs_cagAKO without artifact filters and you can check this at: http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr2_ABCC_vs_cagAKO/index.html Let me know if you need other pairings with reduced or removed artifact filtering. BW, Martin #@ chromosome start end gene_id gc_content strand gene_name biotype p-value_edger FDR_edger log2_normalized_fold_change_ABCC_vs_cagAKO log2_normalized_mean_counts_ABCC log2_normalized_median_counts_ABCC log2_normalized_sd_counts_ABCC log2_normalized_mad_counts_ABCC log2_normalized_cv_counts_ABCC log2_normalized_rcv_counts_ABCC log2_normalized_mean_counts_cagAKO log2_normalized_median_counts_cagAKO log2_normalized_sd_counts_cagAKO log2_normalized_mad_counts_cagAKO log2_normalized_cv_counts_cagAKO log2_normalized_rcv_counts_cagAKO log2_normalized_counts_ABCC1 log2_normalized_counts_ABCC2 log2_normalized_counts_cagAKO1 log2_normalized_counts_cagAKO2 chr11 102391238 102401484 NM_002423 0.3753 - MMP7 protein_coding NA NA -1.15200309344505 4.25389732009935 4.25389732009935 0.631506315485239 0.662043759336689 0.148453586902866 0.15563228482469 3.08496250072116 3.08496250072116 0.707106781186548 0.7413 0.229210818939047 0.240294655065243 3.8073549220576 4.70043971814109 2.58496250072116 3.58496250072116 Dear Dionyssis, I've compared several up- and downregulated genes with the array results and they all agree. The problem is a different naming scheme we use, in A_vs_B, A is always the reference, which means in all the pairings ABCCC_vs_ABFFF ABCCC_vs_AGS ABCCC_vs_cagAKO ABCC_vs_ABCCC ABCC_vs_ABFF ABCC_vs_AGS ABCC_vs_cagAKO ABFFF_vs_AGS ABFFF_vs_cagAKO ABFF_vs_ABFFF ABFF_vs_AGS ABFF_vs_cagAKO cagAKO_vs_AGS the first identifier is considered the reference. A positive foldchange for gene x in ABCCC_vs_AGS means that x is upregulated in AGS. You can verify this inspecting the values in e.g. log2_normalized_mean_counts_ABCC and log2_normalized_mean_counts_AGS for strongly up- and downregulated genes. To conform with the foldchanges in the microarray results, multiply all values with -1. BW, Martin NR_001278 CYP2B7P1 2.66749(ABCC) 5.98806(AGS) GFoldChange(ABCC vs. AGS) -9.99061 chr19 41430169 41456565 NR_001278 0.4515 + CYP2B7P retained_intron 27 24 18 23 20 20 85 132 29 32 116 87 NM_002421 MMP1 2.72912 6.47495 chr11 102641232 102651359 NM_002425 0.3611 - MMP10 protein_coding NA NA -5.04439411935845 chr11 102641232 102651359 NM_002425 0.3611 - MMP10 protein_coding 67 30 20 24 31 36 2 2 20 14 1 0 metaseqr_ABCC_vs_AGS/lists/normalized_counts_table.txt.gz chromosome start end gene_id gc_content strand gene_name biotype ABCC1 ABCC2 ABCCC1 ABCCC2 ABFF1 ABFF2 ABFFF1 ABFFF2 cagAKO1 cagAKO2 AGS1 AGS2 chr14 23305741 23316808 NM_004995 0.5732 + MMP14 protein_coding 6867 5006 8953 9281 8021 6727 6542 5594 9648 7502 8839 5953 chr16 58059469 58080804 NM_002428 0.5951 + MMP15 protein_coding 3013 1825 2550 2727 2957 2842 1659 2026 3008 2333 2103 2023 chr20 33814538 33864804 NM_006690 0.491 + MMP24 protein_coding 1173 1085 1036 1157 1342 1309 915 864 1133 891 856 861 /data/images/proton/run272/www/metaseqr_ABCC_vs_AGS/lists/metaseqr_all_out_ABCC_vs_AGS.txt.gz 0.793549122532574 log2_normalized_fold_change_ABCC_vs_AGS chromosome start end gene_id gc_content strand gene_name biotype p-value_edger FDR_edger log2_normalized_fold_change_ABCC_vs_AGS log2_normalized_mean_counts_ABCC log2_normalized_median_counts_ABCC log2_normalized_sd_counts_ABCC log2_normalized_mad_counts_ABCC log2_normalized_cv_counts_ABCC log2_normalized_rcv_counts_ABCC log2_normalized_mean_counts_AGS log2_normalized_median_counts_AGS log2_normalized_sd_counts_AGS log2_normalized_mad_counts_AGS log2_normalized_cv_counts_AGS log2_normalized_rcv_counts_AGS log2_normalized_counts_ABCC1 log2_normalized_counts_ABCC2 log2_normalized_counts_AGS1 log2_normalized_counts_AGS2 chr19 41497203 41524301 NM_000767 0.4463 + CYP2B6 protein_coding NA NA 0.793549122532574 2.87744375108173 2.87744375108173 0.41363095099977 0.433632701784593 0.14374944804543 0.150700670211737 3.62869392134633 3.62869392134633 0.648797228523512 0.68017080064969 0.17879635003296 0.187442318198425 2.58496250072116 3.16992500144231 3.16992500144231 4.08746284125034 Dear Dionyssis, our server is up again and the results are at: http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_ABFFF/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_cagAKO/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFFF_vs_cagAKO/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCC_vs_ABCCC/index.html BW, Martin Dear Martin, Thank you very much for your prompt response. There seems to be a problem and I cannot connect with your server to download the data. I will try again later. Finally I will also require the ABCCC vs ABCC pairing. Sorry I did not bring it up earlier. Thanking you in advance Best regards Dionyssis #@ Dear Dionyssis, the additional pairs are now ready at: http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_ABFFF/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_cagAKO/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFFF_vs_cagAKO/index.html Best wishes, Martin #@ 15042016 Hi Martin, In addition to the already existing pairs you have done for us in the transcriptome analysis, can you please do the following too? 1. ABCCC vs. ABFFF 2. ABCCC vs CagAKO 3. ABFFF vs CagAKO We are familiarizing ourselves with the data and the platform, but at the moment it is difficult without the help from an expert in bioinformatics. I will call you next week for a meeting if possible before Easter. Thanking you in advance Dionyssis #@ Dear Dionyssis, thank you for the useful background information. http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_ABFFF/index.html #@ 03/03/2016 05:05 PM Dear Martin, Thank you very much for your prompt reply. Yes we have started this familiarization process with the reports, but since we are not bio-informatics specialists we need a bit of time and training…. At the moment however we will need to talk a bit in order to do some more comparisons which may answer some of our questions. These are transcriptomes of H. pylori-infected gastric epithelial cells. AGS is the uninfected epithelial cells state The terms ABCC and ABCCC refer to H. pylori cagA-positive strains with 2- or 3- terminal EPIYA phosphorylation motifs present in CagA protein, which modulate its virulence. ABFF and ABFFF refer to the phosphorylation-deficient EPIFA CagA mutants of the aforementioned H. pylori strains. CagAKO is the isogenic H. pylori CagA-knock out mutant produced by disruption of the cagA gene by kanamycin cassette insertion. Questions we would like to ask with regards to the induction of specific genes and/or pathways are the following: 1. H. pylori infected compared to uninfected condition. Comparisons ABCC vs AGS, CagAKO vs. AGS, ABFF vs AGS etc. 2. The effect of CagA protein. Comparison between (ABCC vs. AGS) and (CagAKO vs AGS) states or is it a situation that can be answered if we simply compare ABCC vs. CagAKO? 3. The effect of tyrosine phosphorylation (on EPIYA-C repeats) on virulence. Comparison between (ABCC vs AGS) and (ABFF vs AGS) OR (ABCCC vs AGS) and (ABFFF vs AGS) states or is it a situation that can be answered if we simply compare ABCC vs. ABFF OR ABCCC vs. ABFFF 4. The effect of the number of EPIYA-C repeats on virulence. Comparison between (ABCC vs AGS) and (ABCCC vs AGS) OR (ABFF vs AGS) and (ABFFF vs AGS) states or is it a situation that can be answered if we simply compare ABCC vs. ABCCC OR ABFF vs. ABFFF 5. As I also mentioned to you we have also done microarray studies for the ABCC, ABFF, CagAKO and uninfected AGS cases, with Vaggelis and Pantelis. Therefore, a comparison of the two methods with regards to the results would also be very interesting. Please, let me know if we can meet or do a skype call at your earliest convenience to see how we can proceed. We would like to submit some of these data to a FEBS meeting (deadline 18th of March). Thanking you in advance Dionyssis From: Martin Reczko [mailto:reczko@fleming.gr] Sent: Thursday, March 03, 2016 3:53 PM To: Dionyssios Sgouras Cc: 'Pantelis Hatzis'; Yiannis Karayiannis Subject: Re: Transcriptome analysis Dear Dionyssis, as my schedule is very tight this week, we can discuss the results next week. Have you familiarized yourself with the reports of the metaseqr analysis and inspected the xls lists of deregulated genes? This analyis was implemented by P. Moulos from our institute and is intended to be very self-explanatory. It is very helpful to check these results also by inspecting the tracks on the genome browser. Best wishes, Martin On 03/03/2016 12:31 PM, Dionyssios Sgouras wrote: Dear Martin, Thank you for the analysis. Obviously me and my students need your help to interpret the results and mine the data further. Shall we arrange to visit you tomorrow at Fleming after 14:00? Alternatively, we could extend our hospitality here at Pasteur Institute any time tomorrow, if it fits your schedule. I am in my office until 18:00 tonight and then you can always find me on the cell phone (6944634999). Thanking you in advance for your help. Kind regards Dionyssis ------------------------------------------ Dionyssios N. Sgouras, PhD Principal Investigator Laboratory of Medical Microbiology Hellenic Pasteur Institute 127 Vas. Sofias Avenue, 115 21 Athens, Greece Tel: +302106478824 Fax: +302106478832 Email: sgouras@pasteur.gr Skype: dionyssios.sgouras URL: http://www.pasteur.gr/?page_id=835&lang=en #@ sgouras@pasteur.gr The processing has finished. At http://genomics-lab.fleming.gr/cgi-bin/hgTracks?db=hg19&hubUrl=http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/hub.txt you will find tracks for all samples integrated in our UCSC genome browser mirror. http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCC_vs_AGS/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_AGS/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFF_vs_AGS/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFFF_vs_AGS/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_cagAKO_vs_AGS/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCC_vs_cagAKO/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFF_vs_cagAKO/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCC_vs_ABFF/index.html http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFF_vs_ABFFF/index.html If you need other pairings, let me know. We can discuss these results either tomorrow after 16:00 or Friday after 14:00. Best regards, Martin Reczko Using the URL http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/hub.txt for the track hub you can view the results al #@ RUN269 IonXpressRNA_004 DSR4-ABCC2 29,093,823 98 bp IonXpressRNA_005 DSR5-ABCCC1 20,963,303 102 bp IonXpressRNA_006 DSR6-ABCCC2 31,620,908 105 bp, RUN270 IonXpressRNA_001 DSR1-cagAKO1 26,483,035 112 bp IonXpressRNA_002 DSR2-cagAKO2 30,817,434 103 bp IonXpressRNA_003 DSR3-ABCC1 29,181,587 104 bp, RUN 271 IonXpressRNA_007 DSR7-ABFF1 27,149,506 105 bp IonXpressRNA_008 DSR8-ABFF2 34,445,135 101 bp IonXpressRNA_009 DSR9-ABFFF1 25,661,158 100 bp, RUN 272 IonXpressRNA_010 DSR10-ABFFF2 24,523,171 87 bp IonXpressRNA_011 DSR11-AGS1 26,509,967 85 bp IonXpressRNA_012 DSR12-AGS2 29,554,391 96 bp Dear Dionyssios, at http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/bam/ you will find the following bam files containing the reads and alignments: ABCC1.bam ABCCC1.bam ABFF1.bam ABFFF1.bam AGS1.bam cagAKO1.bam ABCC2.bam ABCCC2.bam ABFF2.bam ABFFF2.bam AGS2.bam cagAKO2.bam The credentials are: Sgouras SgourasLab Best wishes, Martin