#/home/moulos/Rbase/bin/R
library(metaseqR)
the.path="/data/images/proton/run268/metaseqr"
metaseqr(
    sample.list=file.path(the.path,"targets.txt"),
    contrast=c("HUM_vs_SAL"),
    org="mm9",
    refdb="refseq",
    normalization="edger",
    statistics="edger",
    export.where=file.path(the.path,"metaseqr_Hum_vs_Sal"),
    qc.plots=c(
        "mds","biodetection","countsbio","saturation","readnoise",
        "correl","pairwise","boxplot","volcano","biodist","deheatmap"
    ),
    pcut=0.05,
    export.what=c("annotation","p.value","adj.p.value","fold.change","stats",
        "counts"),
    export.scale=c("log2"),
    export.values="normalized",
    export.counts.table=TRUE,
    restrict.cores=0.3,
fig.format=c("png","pdf")
)
The raw  bam  files, one for each RNA-Seq sample, were summarized to a gene read counts table, using the Bioconductor package GenomicRanges. In the final read counts table, each row represented one gene, each column one RNA-Seq sample and each cell, the corresponding read counts associated with each row and column.The gene counts table was normalized for inherent systematic or experimental biases (e.g. sequencing depth, gene length, GC content bias etc.) using the Bioconductor package edgeR after removing genes that had zero counts over all the RNA-Seq samples (3520 genes). The output of the normalization algorithm was a table with normalized counts, which can be used for differential expression analysis with statistical algorithms developed specifically for count data. Prior to the statistical testing procedure, the gene read counts were filtered for possible artifacts that could affect the subsequent statistical testing procedures. Genes presenting any of the following were excluded from further analysis: i) genes with length less than 500 (57 genes), ii) genes whose average reads per 100 bp was less than the 25<sup>th</sup> quantile of the total normalized distribution of average reads per 100bp (0 genes with cutoff value 0.04097 average reads per 100 bp), iii) genes with read counts below the median read counts of the total normalized count distribution (7716 genes with cutoff value 45 normalized read counts). The total number of genes excluded due to the application of gene filters was 3539. The total (unified) number of genes excluded due to the application of all filters was 11459. The resulting gene counts table was subjected to differential expression analysis for the contrasts HUM versus SAL using the Bioconductor package edgeR. The final numbers of differentially expressed genes were (per contrast): for the contrast HUM versus SAL, 550 (216) statistically significant genes were found with a p-value (FDR or adjusted p-value) threshold of 0.05 and of these, 86 (24) were up-regulated, 274 (192) were down-regulated and 190 (0) were not differentially expressed according to an absolute fold change cutoff value of 1 in log<sub>2</sub> scale.


metaseqr(
    sample.list=file.path(the.path,"targets.txt"),
    contrast=c("HUM_vs_SAL"),
    org="mm9",
    refdb="refseq",
    normalization="edger",
    statistics="edger",
    export.where=file.path(the.path,"metaseqr_Hum_vs_Sal2"),
    gene.filters=list(
        length=list(
            length=300
        ),
        avg.reads=list(
            average.per.bp=100,
            quantile=0.25
        ),
        expression=list(
            median=FALSE,
            mean=FALSE,
            quantile=0.25,
            known=NA,
            custom=NA
        ),
        biotype=NULL),
    qc.plots=c(
        "mds","biodetection","countsbio","saturation","readnoise",
        "correl","pairwise","boxplot","volcano","biodist","deheatmap"
    ),
    pcut=0.05,
    export.what=c("annotation","p.value","adj.p.value","fold.change","stats",
        "counts"),
    export.scale=c("log2"),
    export.values="normalized",
    export.counts.table=TRUE,
    restrict.cores=0.3,
fig.format=c("png","pdf")
)
#The gene counts table was normalized for inherent systematic or experimental biases (e.g. sequencing depth, gene length, GC content bias etc.) using the Bioconductor package edgeR after removing genes that had zero counts over all the RNA-Seq samples (3520 genes). The output of the normalization algorithm was a table with normalized counts, which can be used for differential expression analysis with statistical algorithms developed specifically for count data. Prior to the statistical testing procedure, the gene read counts were filtered for possible artifacts that could affect the subsequent statistical testing procedures. Genes presenting any of the following were excluded from further analysis: i) genes with length less than 300 (26 genes), ii) genes whose average reads per 100 bp was less than the 25th quantile of the total normalized distribution of average reads per 100bp (0 genes with cutoff value 0.04097 average reads per 100 bp), iii) genes with read counts below the 25th quantile of the normalized counts distribution (3498 genes with cutoff value 8 normalized read counts). The total number of genes excluded due to the application of gene filters was 3506. The total (unified) number of genes excluded due to the application of all filters was 8106. The resulting gene counts table was subjected to differential expression analysis for the contrasts HUM versus SAL using the Bioconductor package edgeR. The final numbers of differentially expressed genes were (per contrast): for the contrast HUM versus SAL, no statistical threshold defined.