Contents
SummaryFiguresResultsReferences
|
Summary
Analysis summary
Summary:
The raw bam files, one for each RNA-Seq sample, were summarized to a gene read counts table, using the Bioconductor package GenomicRanges. In the final read counts table, each row represented one gene, each column one RNA-Seq sample and each cell, the corresponding read counts associated with each row and column.The gene counts table was normalized for inherent systematic or experimental biases (e.g. sequencing depth, gene length, GC content bias etc.) using the Bioconductor package edgeR after removing genes that had zero counts over all the RNA-Seq samples (7954 genes). The output of the normalization algorithm was a table with normalized counts, which can be used for differential expression analysis with statistical algorithms developed specifically for count data. Prior to the statistical testing procedure, the gene read counts were filtered for possible artifacts that could affect the subsequent statistical testing procedures. Genes presenting any of the following were excluded from further analysis: i) genes with length less than 500 (204 genes), ii) genes whose average reads per 100 bp was less than the 25th quantile of the total normalized distribution of average reads per 100bp (0 genes with cutoff value 0.00238 average reads per 100 bp), iii) genes with read counts below the median read counts of the total normalized count distribution (5412 genes with cutoff value 2 normalized read counts). The total number of genes excluded due to the application of gene filters was 1487. The total (unified) number of genes excluded due to the application of all filters was 13803. The resulting gene counts table was subjected to differential expression analysis for the contrasts WT versus DARE using the Bioconductor package edgeR. The final numbers of differentially expressed genes were (per contrast): |