Welcome to the metaseqR2 report! If you are familiar with the metaseqR report, then you will find that there are not many differences with respect to the presented information. Some diagnostic and exploration plots were added. The most notable difference is that all plots are interactive. This helps a lot with exploration and interpretation but also adds a lot of computational burden. However, relatively modern systems with recent browser versions should be capable of rendering all the graphics. The metaseqR2 report has been tested with Google Chrome, Mozilla Firefox and Microsoft Edge. It has not been tested with Internet Explorer, Opera and Safari and most probably will not be. Other Chromium browsers (e.g. Brave) should also be fine.
One particular characteristic of the metaseqR2 report is that all plots are interactive. This is achieved by using the standard graphics underlying data with libraries including Highcharts, Plotly and jvenn to create more user-friendly and directly explorable plots. Instructions on the usage of these plots follow:
metaseqr2 call) from the Results section.The metaseqR2 report contains the sections described below, depending on which diagnostic and exploration plots have been asked for from the run command. As plots are categorized, if no plot from a specific category is asked for, then this category will not appear. Below, are the categories:
The Summary section is further categorized in several subsections. Specifically:
metaseqr2 pipeline for users that want to experiment as well as a critical messages displayed within the R session running metaseqr2 displayed as a log. Finally, if a targets file has been used to perform the analysis, a table depicting the parameters in the targets files is created and a link to download the actual targets file, but any relative paths to BAM files are stripped and the user is responsible to prepend them if the targets file has to be reused in another location, e.g. locally.The Quality control section contains several interactive plots concerning the overall quality control of each sample provided as well as overall assessments. The quality control plots are the Multidimensional Scaling (MDS) plot, the Biotypes detection (Biodetection) plot, the Biotype abundance (Countsbio) plot, the Read saturation (Saturation) plot, the Read noise (ReadNoise) plot, the Correlation heatmap (Correlation), the Pairwise sample scatterplots (Pairwise) and the Filtered entities (Filtered) plot. Each plot is accompanied by a detailed description of what it depicts. Where multiple plot are available (e.g. one for each sample), a selection list on the top of the respective section allows the selection of the sample to be displayed.
The Normalization section contains several interactive plots that can be used to inspect and assess the normalization procedure. Therefore, normalization plots are usually paired, showing the same data instance normalized and not normalized. The normalization plots are the Expression boxplots (Boxplots) plots, the GC content bias (GC bias) plots, the Gene length bias (Length bias) plots, the Within condition mean-difference (Mean-Difference) plots, the Mean-variance relationship (Mean-Variance) plot and the RNA composition (Rna composition) plot. Each plot is accompanied by a detailed description of what it depicts. Where multiple plot are available (e.g. one for each sample), a selection list on the top of the respective section allows the selection of the sample to be displayed.
The Statistics section contains several interactive plots that can be used to inspect and explore the outcome of statistical testing procedures. The statistics plots are the Volcano plot (Volcano), the MA or Mean-Difference across conditions (MA) plot, the Expression heatmap (Heatmap) plot, the Chromosome and biotype distributions (Biodist) plot, the Venn diagram across statistical tests (StatVenn), the Venn diagram across contrasts (FoldVenn) and the Deregulogram. Each plot is accompanied by a detailed description of what it depicts. Please note that the heatmap plots only show the top percentage of differentially expressed genes as this is controlled by the reportTop parameter of the metaseqr2 pipeline. When multiple plots are available (e.g. one for each contrast), a selection list on the top of the respective section allows the selection of the sample to be displayed.
The Results section contains a snapshot of differentially expressed genes in table format with basic information about each gene and links to external resources. Certain columns of the table are colored according to significance. Larger bars and more intense colors indicate higher significance. For example, the bar in the p_value column is larger if the genes has higher statistical significance and the fold change cell background is bright red if the gene is highly up-regulated. From the Results section, full gene lists can be downloaded in text tab-delimited format and viewed with a spreadsheet application such as MS Excel. A selector on the top of the section above the table allows the display of different contrasts.
The References section contains bibliographical references regading the algorithms used by the metaseqr2 pipeline and is adjusted according to the algorithms selected.
The raw bam files, one for each RNA-Seq sample, were summarized to a 3’UTR read counts table, using the Bioconductor package GenomicRanges. In the final read counts table, each row represented each column one RNA-Seq sample and each cell, the corresponding read counts associated with each row and column.The gene counts table was normalized for inherent systematic or experimental biases (e.g. sequencing depth, gene length, GC content bias etc. using the Bioconductor package DESeq after removing genes that had zero counts over all the RNA-Seq samples (29322 genes). The output of the normalization algorithm was a table with normalized counts, which can be used for differential expression analysis with statistical algorithms developed specifically for count data. Prior to the statistical testing procedure, the gene read counts were filtered for possible artifacts that may affect the subsequent statistical testing procedures. Genes/transcripts presenting any of the following were excluded from further analysis: , i) genes whose average numbers of reads per 100 bp was less than the 25th quantile of the total normalized distribution of average reads per 100bp (0 genes with cutoff value 0.08718 average reads per 100 bp), ii) genes with read counts below the median read counts of the total normalized count distribution (11124 genes with cutoff value 13 normalized read counts), iii) genes whose biotype matched the following: rRNA, IG_V_pseudogene, TR_V_pseudogene (28 genes), iv) genes which in 50% of samples did not exceed 1 counts (9127 genes) condition-wise. The total number of genes excluded due to the application of gene filters was 10351. The total (unified) number of genes excluded due to the application of all filters was 40616. The resulting gene counts table was subjected to differential expression analysis for the contrasts Cond1 versus Cond2 using the Bioconductor packages DESeq, DESeq2, edgeR, NOISeq, limma, NBPSeq, ABSSeq, DSS. In order to combine the statistical significance from multiple algorithms and perform meta-analysis, the PANDORA weighted p-value across results method was applied. The final numbers of differentially expressed genes were (per contrast): for the contrast Cond1 versus Cond2, no statistical threshold defined. Literature references for all the algorithms used can be found at the end of this report.
Read counts file: imported sam/bam/bed files
Conditions: Cond1, Cond2
Samples included: ED3R9, ED3R10, ED3R11, ED3R12, ED3R13, ED3R14, ED3R15, ED3R16
Samples excluded: none
Requested contrasts: Cond1_vs_Cond2
Library sizes:Organism: mouse (Mus musculus), genome version alias mm10
Annotation source: Ensembl genomes
Count type: utr
3’ UTR fraction: 1
3’ UTR minimum length: 300 bps
3’ UTR downstream: 50 bps
Exon filters: none applied
Gene filters: avgReads, expression, biotype, presenceFilter application: after normalization
Normalization algorithm: DESeq
Normalization arguments: locfuncStatistical algorithm(s): DESeq, DESeq2, edgeR, NOISeq, limma, NBPSeq, ABSSeq, DSS
Statistical arguments for DESeq: method, sharingMode, fitTypeMeta-analysis method: PANDORA weighted p-value across results
Multiple testing correction: Benjamini-Hochberg FDR
p-value threshold: not available
Logarithmic tranformation offset: 1
Analysis preset: not available
Quality control plots: multidimensional scaling, biotype detection, biotype counts, sample and biotype saturation, RNA composition, GC-content bias, filtered biotypes, correlation heatmap and correlogram, boxplots, transcript length bias, mean-difference plot, mean-variance plot, boxplots, filtered biotypes, DEG biotype detection, volcano plot, staistical significance MA plot
Figure format: png, pdf, jpg
Output directory: /data/images/proton3/run464/EDlab/metaseqR2_run464
Output data: Annotation, p-value, Adjusted p-value (FDR), Combined p-value, Adjusted combined p-value (FDR), Fold change, Statistics, Read counts
Output scale(s): Natural scale, log2 scale, Reads per Gene Model
Output values: Normalized values
Output statistics: Mean, Median, Standard deviation, Median Absolute Deviation (MAD), Coefficient of Variation
Total run time: 11 minutes 14 seconds
Number of filtered genes: 40616 which is the union of
The differential expression analysis and this report were generated using the following command:
metaseqr2(sampleList = file.path(the.path, "targets.txt"), fileType = "bam",
contrast = the.contrasts.1, org = "mm10", localDb = "/data/results/tools/rnaseq/metaseqr/mm10/annotation.sqlite",
refdb = "ensembl", transLevel = "gene", countType = "utr",
normalization = "deseq", statistics = c("deseq", "deseq2",
"edger", "noiseq", "limma", "nbpseq", "absseq", "dss"),
adjustMethod = "fdr", metaP = "pandora", figFormat = c("png",
"pdf", "jpg"), exportWhere = file.path(the.path, "metaseqR2_run464"),
restrictCores = 0.5, qcPlots = c("mds", "biodetection", "countsbio",
"saturation", "readnoise", "rnacomp", "gcbias", "pairwise",
"filtered", "correl", "boxplot", "lengthbias", "meandiff",
"meanvar", "boxplot", "filtered", "biodist", "volcano",
"mastat"), exonFilters = NULL, geneFilters = list(avgReads = list(averagePerBp = 100,
quantile = 0.25), expression = list(median = TRUE, mean = FALSE,
quantile = NA, known = NA, custom = NA), biotype = getDefaults("biotypeFilter",
"mm10"), presence = list(frac = 0.5, minCount = 1, perCondition = TRUE)),
outList = TRUE, exportWhat = c("annotation", "p_value", "adj_p_value",
"meta_p_value", "adj_meta_p_value", "fold_change", "stats",
"counts", "flags"), exportScale = c("natural", "log2",
"rpgm"), exportValues = "normalized", exportStats = c("mean",
"median", "sd", "mad", "cv"), exportCountsTable = TRUE,
saveGeneModel = TRUE, createTracks = TRUE, overwrite = TRUE,
trackInfo = list(stranded = TRUE, normTo = 1e+08, hubInfo = list(name = "EDHub",
shortLabel = "ED Hub", longLabel = "ED hub long", email = "reczko@fleming.gr")))
You can download the targets file from here
The following table summarizes the targets file used for the analysis. Do not forget to prepend the path to your BAM files in the
| samplename | filename | condition | paired | stranded |
|---|---|---|---|---|
| ED3R9 | ED3R9.bam | Cond1 | single | forward |
| ED3R10 | ED3R10.bam | Cond1 | single | forward |
| ED3R11 | ED3R11.bam | Cond1 | single | forward |
| ED3R12 | ED3R12.bam | Cond1 | single | forward |
| ED3R13 | ED3R13.bam | Cond2 | single | forward |
| ED3R14 | ED3R14.bam | Cond2 | single | forward |
| ED3R15 | ED3R15.bam | Cond2 | single | forward |
| ED3R16 | ED3R16.bam | Cond2 | single | forward |
You can use this link to load a UCSC Genome Browser session with the tracks derived from this analysis. If stranded mode was chosen, a trackhub will be loaded, otherwise, simple tracks will be loaded.
You can download individual bigWig files, one for each sample, using the following list:
Plus (+) strand
ED3R9
ED3R10
ED3R11
ED3R12
ED3R13
ED3R14
ED3R15
ED3R16
Minus (-) strand
ED3R9
ED3R10
ED3R11
ED3R12
ED3R13
ED3R14
ED3R15
ED3R16
The following figures summarize the quality control steps and assessment performed by the metaseqr2 pipeline. Each figure category is accompanied by an explanatory text. All figures are interactive wih additional controls on the top right of the figure.
The following figures allow for the assessment of the normalization procedures performed by the metaseqr2 pipeline. Each figure category is accompanied by an explanatory text. All figures are interactive wih additional controls on the top right corner of the figure.
The following figures allow for the assessment of the statistical testing procedures performed by the metaseqr2 pipeline. Each figure category is accompanied by an explanatory text. All figures are interactive wih additional controls on the top right corner of the figure.
The following tables allow for a quick exploration of the results of the statistical analysis performed by the metaseqr2 pipeline. If no statistical testing or contrasts requested, just ignore any respective texts and jump to tables or download the results.