Dear Dionyssis,

if you select significantly deregulated genes, 
please use the column J = FDR_edger
and select genes with FDR_edger <= 0.05 .
FDR_edger is the p-vaule adjusted for multiple testing.

The first 3 genes in your list would be removed this way.

For functional annotation, we recommend the online tool DAVID
at
https://david.ncifcrf.gov/home.jsp

I've attached the Nature Protocols paper explaining this in detail.

In our case:a) we copy the (up- or downregulated) GENBANK_ACCESSIONs in the colum 'gene_id'
b) select 'Functional Annotaion'
(leading to https://david.ncifcrf.gov/summary.jsp)
c) paste our genes into the box 'Paste a list'
d) choose under 'Select Identifier' the format GENBANK_ACCESSION
e) choose under 'List Type' : 'Gene List'
f) and 'Submit List'

In the 'Annotation Summary Results'
we expand 'Gene_Ontology' and click 'Chart' at 'GOTERM_BP_FAT'

For the genes downregulated in AGS (negative log2_normalized_fold_change_ABCCC_vs_AGS)
(= upregulated in ABCCC), the top terms are:

induction of apoptosis
induction of programmed cell death
positive regulation of apoptosis
positive regulation of programmed cell death
positive regulation of cell death
intracellular signaling cascade
regulation of phosphate metabolic process
regulation of phosphorus metabolic process
cell death
regulation of smooth muscle cell proliferation
death
regulation of apoptosis
positive regulation of transcription from RNA polymerase II promoter
regulation of programmed cell death
regulation of cell death

For the genes upregulated in AGS (positive log2_normalized_fold_change_ABCCC_vs_AGS)
(= downregulated in ABCCC), the top terms are:


sterol metabolic process
cholesterol biosynthetic process
cholesterol metabolic process
steroid biosynthetic process
steroid metabolic process
M phase
lipid biosynthetic process
cell cycle phase
mitosis
nuclear division
cell cycle process
M phase of mitotic cell cycle
organelle fission
mitotic cell cycle


There are more annotations in DAVID, please explore.

To be more sensitive,
I've rerun all pairs
with a relaxed artifact filter iii (remove genes with read
counts below the median read counts of the total normalized count
distribution)
The new results are at:
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCC_vs_AGS/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCCC_vs_AGS/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABFF_vs_AGS/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABFFF_vs_AGS/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_cagAKO_vs_AGS/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCC_vs_cagAKO/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABFF_vs_cagAKO/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCC_vs_ABFF/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABFF_vs_ABFFF/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCCC_vs_ABFFF/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCCC_vs_cagAKO/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABFFF_vs_cagAKO/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr1b_ABCC_vs_ABCCC/index.html


BW,
Martin


Dear Dionyssis,

the reason for 'NA' for the p-value is that
the gene was filtered by the initial quality filters
in one of the 2 conditions.

This filtering is explained in the "Analysis summary" at e.g.:
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCC_vs_cagAKO/index.html
"Genes presenting any of the following were excluded from further
analysis: i) genes with length less than 500 (211 genes), ii) genes
whose average reads per 100 bp was less than the 25th quantile of the
total normalized distribution of average reads per 100bp (0 genes with
cutoff value 0.02046 average reads per 100 bp), iii) genes with read
counts below the median read counts of the total normalized count
distribution (9074 genes with cutoff value 177 normalized read
counts). The total number of genes excluded due to the application of
gene filters was 3058."

These filter settings ensure that the reported results are reliable
and artifacts are excluded.
For exploration, these restrictions can of course be lowered.

As an example, I have run ABCC_vs_cagAKO without artifact filters
and you can check this at:
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr2_ABCC_vs_cagAKO/index.html

Let me know if you need other pairings with reduced or removed artifact
filtering.

BW,
Martin


#@
chromosome	start	end	gene_id	gc_content	strand	gene_name	biotype	p-value_edger	FDR_edger	log2_normalized_fold_change_ABCC_vs_cagAKO	log2_normalized_mean_counts_ABCC	log2_normalized_median_counts_ABCC	log2_normalized_sd_counts_ABCC	log2_normalized_mad_counts_ABCC	log2_normalized_cv_counts_ABCC	log2_normalized_rcv_counts_ABCC	log2_normalized_mean_counts_cagAKO	log2_normalized_median_counts_cagAKO	log2_normalized_sd_counts_cagAKO	log2_normalized_mad_counts_cagAKO	log2_normalized_cv_counts_cagAKO	log2_normalized_rcv_counts_cagAKO	log2_normalized_counts_ABCC1	log2_normalized_counts_ABCC2	log2_normalized_counts_cagAKO1	log2_normalized_counts_cagAKO2

chr11	102391238	102401484	NM_002423	0.3753	-	MMP7	protein_coding	NA	NA	-1.15200309344505	4.25389732009935	4.25389732009935	0.631506315485239	0.662043759336689	0.148453586902866	0.15563228482469	3.08496250072116	3.08496250072116	0.707106781186548	0.7413	0.229210818939047	0.240294655065243	3.8073549220576	4.70043971814109	2.58496250072116	3.58496250072116

Dear Dionyssis,

I've compared several up- and downregulated genes with the array results
and they all agree.

The problem is a different naming scheme we use, 
in A_vs_B, A is always the reference, which means in all the pairings
ABCCC_vs_ABFFF
ABCCC_vs_AGS
ABCCC_vs_cagAKO
ABCC_vs_ABCCC
ABCC_vs_ABFF
ABCC_vs_AGS
ABCC_vs_cagAKO
ABFFF_vs_AGS
ABFFF_vs_cagAKO
ABFF_vs_ABFFF
ABFF_vs_AGS
ABFF_vs_cagAKO
cagAKO_vs_AGS
the first identifier is considered the reference.

A positive foldchange for gene x in
ABCCC_vs_AGS
means that x is upregulated in AGS.

You can verify this inspecting the values in e.g. log2_normalized_mean_counts_ABCC
and log2_normalized_mean_counts_AGS for strongly up- and downregulated genes.

To conform with the foldchanges in the microarray
results, multiply all values with -1.

BW,
Martin


NR_001278 	 CYP2B7P1  2.66749(ABCC) 5.98806(AGS) GFoldChange(ABCC vs. AGS) -9.99061

chr19	41430169	41456565	NR_001278	0.4515	+	CYP2B7P	retained_intron	27	24	18	23	20	20	85	132	29	32	116	87
NM_002421 	 MMP1 
2.72912
6.47495


chr11	102641232	102651359	NM_002425	0.3611	-	MMP10	protein_coding	NA	NA	-5.04439411935845

chr11	102641232	102651359	NM_002425	0.3611	-	MMP10	protein_coding	67	30	20	24	31	36	2	2	20	14	1	0

metaseqr_ABCC_vs_AGS/lists/normalized_counts_table.txt.gz
chromosome	start	end	gene_id	gc_content	strand	gene_name	biotype	ABCC1	ABCC2	ABCCC1	ABCCC2	ABFF1	ABFF2	ABFFF1	ABFFF2	cagAKO1	cagAKO2	AGS1	AGS2
chr14	23305741	23316808	NM_004995	0.5732	+	MMP14	protein_coding	6867	5006	8953	9281	8021	6727	6542	5594	9648	7502	8839	5953
chr16	58059469	58080804	NM_002428	0.5951	+	MMP15	protein_coding	3013	1825	2550	2727	2957	2842	1659	2026	3008	2333	2103	2023
chr20	33814538	33864804	NM_006690	0.491	+	MMP24	protein_coding	1173	1085	1036	1157	1342	1309	915	864	1133	891	856	861

/data/images/proton/run272/www/metaseqr_ABCC_vs_AGS/lists/metaseqr_all_out_ABCC_vs_AGS.txt.gz
0.793549122532574 log2_normalized_fold_change_ABCC_vs_AGS
chromosome	start	end	gene_id	gc_content	strand	gene_name	biotype	p-value_edger	FDR_edger	log2_normalized_fold_change_ABCC_vs_AGS	log2_normalized_mean_counts_ABCC	log2_normalized_median_counts_ABCC	log2_normalized_sd_counts_ABCC	log2_normalized_mad_counts_ABCC	log2_normalized_cv_counts_ABCC	log2_normalized_rcv_counts_ABCC	log2_normalized_mean_counts_AGS	log2_normalized_median_counts_AGS	log2_normalized_sd_counts_AGS	log2_normalized_mad_counts_AGS	log2_normalized_cv_counts_AGS	log2_normalized_rcv_counts_AGS	log2_normalized_counts_ABCC1	log2_normalized_counts_ABCC2	log2_normalized_counts_AGS1	log2_normalized_counts_AGS2
chr19	41497203	41524301	NM_000767	0.4463	+	CYP2B6	protein_coding	NA	NA	0.793549122532574	2.87744375108173	2.87744375108173	0.41363095099977	0.433632701784593	0.14374944804543	0.150700670211737	3.62869392134633	3.62869392134633	0.648797228523512	0.68017080064969	0.17879635003296	0.187442318198425	2.58496250072116	3.16992500144231	3.16992500144231	4.08746284125034


Dear Dionyssis,

our server is up again and the results are at:
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_ABFFF/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_cagAKO/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFFF_vs_cagAKO/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCC_vs_ABCCC/index.html

BW,
Martin


Dear Martin,
Thank you very much for your prompt response.
There seems to be a problem and I cannot connect with your server to download the data. I will try again later.
Finally I will also require the ABCCC vs ABCC pairing. Sorry I did not bring it up earlier.
Thanking you in advance
Best regards
Dionyssis
 
#@
Dear Dionyssis,

the additional pairs are now ready at:
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_ABFFF/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_cagAKO/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFFF_vs_cagAKO/index.html

Best wishes,
Martin

#@
15042016
Hi Martin,

In addition to the already existing pairs you have done for us in the transcriptome analysis, can you please do the following too?

1.        ABCCC vs. ABFFF

2.       ABCCC vs CagAKO

3.       ABFFF vs CagAKO

We are familiarizing ourselves with the data and the platform, but at the moment it is difficult without the help from an expert in bioinformatics.

I will call you next week for a meeting if possible before Easter.

Thanking you in advance

Dionyssis
#@

Dear Dionyssis,

thank you for the useful background information.

http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_ABFFF/index.html


#@
03/03/2016 05:05 PM
Dear Martin,

Thank you very much for your prompt reply.

Yes we have started this familiarization process with the reports, but since we are not bio-informatics specialists we need a bit of time and training….

At the moment however we will need to talk a bit in order to do some more comparisons which may answer some of our questions.

These are transcriptomes of H. pylori-infected gastric epithelial cells. AGS is the uninfected epithelial cells state

The terms ABCC and ABCCC refer to H. pylori cagA-positive strains with 2- or 3- terminal EPIYA phosphorylation motifs present in CagA protein, which modulate its virulence.

ABFF and ABFFF refer to the phosphorylation-deficient EPIFA CagA mutants of the aforementioned H. pylori strains. CagAKO is the isogenic H. pylori CagA-knock out mutant produced by disruption of the cagA gene by kanamycin cassette insertion.

 
Questions we would like to ask with regards to the induction of specific genes and/or pathways are the following:

1.       H. pylori infected compared to uninfected condition. Comparisons ABCC vs AGS, CagAKO vs. AGS, ABFF vs AGS etc.

2.       The effect of CagA protein. Comparison between (ABCC vs. AGS) and (CagAKO vs AGS) states or is it a situation that can be answered if we simply compare ABCC vs. CagAKO?

3.       The effect of  tyrosine phosphorylation (on EPIYA-C repeats) on virulence. Comparison between (ABCC vs AGS) and (ABFF vs AGS) OR (ABCCC vs AGS) and (ABFFF vs AGS) states or is it a situation that can be answered if we simply compare ABCC vs. ABFF OR ABCCC vs. ABFFF

4.       The effect of the number of EPIYA-C repeats on virulence. Comparison between (ABCC vs AGS) and (ABCCC vs AGS) OR (ABFF vs AGS) and (ABFFF vs AGS) states or is it a situation that can be answered if we simply compare ABCC vs. ABCCC OR ABFF vs. ABFFF

5.       As I also mentioned to you we have also done microarray studies for the ABCC, ABFF, CagAKO and uninfected AGS cases, with Vaggelis and Pantelis. Therefore, a comparison of the two methods with regards to the results would also be very interesting.

 
Please, let me know if we can meet or do a skype call at your earliest convenience to see how we can proceed. We would like to submit some of these data to a FEBS meeting (deadline 18th of March).

 
Thanking you in advance

Dionyssis

From: Martin Reczko [mailto:reczko@fleming.gr]
Sent: Thursday, March 03, 2016 3:53 PM
To: Dionyssios Sgouras
Cc: 'Pantelis Hatzis'; Yiannis Karayiannis
Subject: Re: Transcriptome analysis

 
Dear Dionyssis,

as my schedule is very tight this week, we can
discuss the results next week.
Have you familiarized yourself with the
reports of the metaseqr analysis and
inspected the xls lists of deregulated genes?
This analyis was implemented by P. Moulos
from our institute and is intended to be very
self-explanatory.
It is very helpful to check these results also
by inspecting the tracks on the genome browser.

Best wishes,
Martin


On 03/03/2016 12:31 PM, Dionyssios Sgouras wrote:

    Dear Martin,

    Thank you for the analysis. Obviously me and my students need your help to interpret the results and mine the data further.

    Shall we arrange to visit you tomorrow at Fleming after 14:00? Alternatively, we could extend our hospitality here at Pasteur Institute any time tomorrow, if it fits your schedule.

    I am in my office until 18:00 tonight and then you can always find me on the cell phone (6944634999).

    Thanking you in advance for your help.

    Kind regards

    Dionyssis

     
    ------------------------------------------

    Dionyssios N. Sgouras, PhD

    Principal Investigator

    Laboratory of Medical Microbiology

    Hellenic Pasteur Institute

    127 Vas. Sofias Avenue,

    115 21 Athens, Greece

    Tel: +302106478824

    Fax: +302106478832

    Email: sgouras@pasteur.gr

    Skype: dionyssios.sgouras

    URL: http://www.pasteur.gr/?page_id=835&lang=en


#@

sgouras@pasteur.gr


The processing has finished.
At
http://genomics-lab.fleming.gr/cgi-bin/hgTracks?db=hg19&hubUrl=http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/hub.txt
you will find tracks for all samples integrated in our UCSC genome browser mirror.


http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCC_vs_AGS/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCCC_vs_AGS/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFF_vs_AGS/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFFF_vs_AGS/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_cagAKO_vs_AGS/index.html

http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCC_vs_cagAKO/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFF_vs_cagAKO/index.html

http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABCC_vs_ABFF/index.html
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/metaseqr_ABFF_vs_ABFFF/index.html

If you need other pairings, let me know.
We can discuss these results either tomorrow after 16:00 or Friday after 14:00.

Best regards,
Martin Reczko


Using the URL
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/hub.txt
for the track hub you can view the results al

#@
 
RUN269
IonXpressRNA_004         DSR4-ABCC2       29,093,823           98 bp
IonXpressRNA_005         DSR5-ABCCC1    20,963,303           102 bp
IonXpressRNA_006         DSR6-ABCCC2    31,620,908           105 bp,
 
RUN270
IonXpressRNA_001         DSR1-cagAKO1  26,483,035           112 bp
IonXpressRNA_002         DSR2-cagAKO2  30,817,434           103 bp
IonXpressRNA_003         DSR3-ABCC1       29,181,587           104 bp,
 
RUN 271
IonXpressRNA_007         DSR7-ABFF1       27,149,506           105 bp
IonXpressRNA_008         DSR8-ABFF2       34,445,135           101 bp
IonXpressRNA_009         DSR9-ABFFF1     25,661,158           100 bp,
 
RUN 272
IonXpressRNA_010         DSR10-ABFFF2   24,523,171           87 bp
IonXpressRNA_011         DSR11-AGS1       26,509,967           85 bp
IonXpressRNA_012         DSR12-AGS2       29,554,391           96 bp

Dear Dionyssios,
at
http://genomics-lab.fleming.gr/fleming/SgourasLab/run269-272/bam/
you will find the following bam files containing the reads and alignments:
ABCC1.bam  ABCCC1.bam  ABFF1.bam  ABFFF1.bam  AGS1.bam  cagAKO1.bam
ABCC2.bam  ABCCC2.bam  ABFF2.bam  ABFFF2.bam  AGS2.bam  cagAKO2.bam
The credentials are: Sgouras SgourasLab

Best wishes,
Martin