Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added boxplots after batcheffect; boxplots now from transformed data; boxplot titles and filenames now also with genename; increased threshold before overlapping PCA labels are hidden #215

Merged
merged 6 commits into from
Aug 17, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- [#214](https://github.com/qbic-pipelines/rnadeseq/pull/214) Added smrnaseq input support -->follow-up to check that the CI test works. Modified smrnaseq testdata to be like QBiC datasets. Updated container env and fixed a resulting bug with include_graphics. Heatmaps are now equally ordered in zip and report
- [#213](https://github.com/qbic-pipelines/rnadeseq/pull/213) Added smrnaseq input support
- [#212](https://github.com/qbic-pipelines/rnadeseq/pull/212) Added computational methods if no --software_versions
- [#206](https://github.com/qbic-pipelines/rnadeseq/pull/206) Added logic to decide between rlog and vst, added tryCatch for heatmap saving because this only works unreliably
Expand All @@ -17,6 +16,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- [#215](https://github.com/qbic-pipelines/rnadeseq/pull/215) Boxplots of genes are now generated from rlog/vst counts instead of raw counts. Also, if batch effect correction is enabled, boxplots will be generated before and after correction
- [#214](https://github.com/qbic-pipelines/rnadeseq/pull/214) Added smrnaseq input support -->follow-up to check that the CI test works. Modified smrnaseq testdata to be like QBiC datasets. Updated container env and fixed a resulting bug with include_graphics. Heatmaps are now equally ordered in zip and report
- [#211](https://github.com/qbic-pipelines/rnadeseq/pull/21) Replaced heatmaply with pheatmap for static plots and removed kaleido and reticulate from container
- [#209](https://github.com/qbic-pipelines/rnadeseq/pull/209) Template update to 2.9, Chromium Falcon; exchanged file.exists for nf-core validation checks; changed round_DE param to int
- [#206](https://github.com/qbic-pipelines/rnadeseq/pull/206) Changed < and > to <= and => for logF/pval comparisons, renamed gene_counts_tables/deseq2_library_scaled_gene_counts.tsv to deseq2_library_scaled_gene_counts.tsv and added entry to the folder explanation; renamed param pval_threshold to adj_pval_threshold
Expand Down
128 changes: 108 additions & 20 deletions assets/RNAseq_report.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -1087,8 +1087,15 @@ ggsave(plot = pca_with_labels, filename = "differential_gene_expression/plots/PC
# Create interactive plot for report
pca <- ggplotly(pca, tooltip = "text", width=750, height=400)

#add export-as-SVG button to plot
# Add export-as-SVG button to plot
config(pca, modeBarButtonsToAdd = list(svg_exp)) %>% layout()

# Save non-corrected normalized object for boxplots
if (run_rlog) {
rld_noncorrected <- rld
} else {
vsd_noncorrected <- vsd
}
```

\
Expand All @@ -1101,8 +1108,10 @@ A PCA of the batch-effect corrected data is shown below and can be found [here](
```{r PCA_ifBatch_plot, echo=F, message=F, warning=F, dpi=1200, fig.align='center', eval=params$batch_effect}
if (run_rlog) {
assay(rld) <- limma::removeBatchEffect(assay(rld), rld$batch)
rld_corrected <- rld
} else {
assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$batch)
vsd_corrected <- vsd
}
pcaData2 <- plotPCA(if (run_rlog) rld else vsd, intgroup=c("combfactor"), ntop = dim(if (run_rlog) rld else vsd)[1], returnData=TRUE)
percentVar <- round(100*attr(pcaData, "percentVar"))
Expand Down Expand Up @@ -1569,7 +1578,13 @@ pg[['x']][['layout']][['annotations']][[1]][['y']] <- -params$adj_pval_threshold
config(pg, modeBarButtonsToAdd = list(svg_exp)) %>% layout(margin = list(l=80, b = ifelse(length(allgenes_files)>2, 150, 100)), width=700, height = 500*length(allgenes_files))
```

```{r box_plots, echo=F, message =F}
```{r box_plots_noBatch, echo=F, message=F}
# Decide which object to use for boxplots without batch correction
if (run_rlog) {
box_object <- rld_noncorrected
} else {
box_object <- vsd_noncorrected
}

############### BOXPLOTS GENE EXPRESSION PER CONDITION ##########################
# extract ID for genes to plot, make 20 plots:
Expand All @@ -1580,14 +1595,21 @@ if (length(DE_genes_plot) > 20) {
} else {
random_DE_genes_plot = DE_genes_plot
}
for (i in random_DE_genes_plot){
colData(cds)$combfactor_contrasts <- apply(as.data.frame(metadata_save[ ,conditions_contrasts]),1,paste, collapse = "_")
boxplot_counts <- plotCounts(cds, gene=i, intgroup=c("combfactor_contrasts"), returnData=TRUE, normalized = T)
boxplot_counts$variable = row.names(boxplot_counts)
plot <- ggplot(data=boxplot_counts, aes(x=combfactor_contrasts, y=count, fill=combfactor_contrasts)) +

colData(box_object)$combfactor_contrasts <- apply(as.data.frame(metadata_save[ ,conditions_contrasts]),1,paste, collapse = "_")

# Previously, the function plotCounts was used here before ggplot. In PR #215, this was changed in order to allow for plotting the rlog/vst-normalized counts before/after batch correction (plotCounts can only do a normalization by sizeFactors). This was done following the instructions here: https://support.bioconductor.org/p/68859/
# 1. Get columns from coldata
box_df <- data.frame(colData(box_object))

for (i in random_DE_genes_plot) {

# 2. Get 1 column from assay; this column will correspond to the target gene
box_df$count <- assay(box_object)[i,]
plot <- ggplot(data=box_df, aes(x=combfactor_contrasts, y=count, fill=combfactor_contrasts)) +
geom_boxplot(position=position_dodge()) +
geom_jitter(position=position_dodge(.8)) +
ggtitle(paste("Gene ",i,sep="")) + xlab("") + ylab("Normalized gene counts") + theme_bw() +
ggtitle(paste("Gene ",i,sep="")) + xlab("") + ylab(paste0(ifelse(run_rlog, "rlog", "vst"), "-normalized gene counts")) + theme_bw() +
theme(text = element_text(size=12),
axis.text.x = element_text(angle=45, vjust=1,hjust=1))
ggsave(filename=paste("differential_gene_expression/plots/boxplots_example_genes/",i,".svg",sep=""), width=10, height=5, plot=plot)
Expand All @@ -1596,7 +1618,7 @@ for (i in random_DE_genes_plot){
}

# make boxplots of interesting genes in gene list
if (isProvided(params$path_genelist)){
if (isProvided(params$path_genelist)) {
gene_ids <- read.table(params$path_genelist, col.names = "requested_gene_name")
write.table(gene_ids, file="differential_gene_expression/metadata/requested_gene_list.txt", col.names=F, row.names=F, sep="\t")
gene_ids$requested_gene_name <- sapply(gene_ids$requested_gene_name, toupper)
Expand All @@ -1605,23 +1627,89 @@ if (isProvided(params$path_genelist)){
# get Ensemble IDs from requested genes
requested_genes_plot <- subset(gene_names, gene_name %in% gene_ids$requested_gene_name)

# Check that genes are in the cds table
requested_genes_plot <- subset(requested_genes_plot, requested_genes_plot$Ensembl_ID %in% row.names(cds))
# Check that genes are in the box_object table
requested_genes_plot <- subset(requested_genes_plot, requested_genes_plot$Ensembl_ID %in% row.names(box_object))
requested_genes_plot_Ensembl <- requested_genes_plot$Ensembl_ID
requested_genes_plot_gene_name <- requested_genes_plot$gene_name

for (i in seq_along(requested_genes_plot_Ensembl)) {
boxplot_counts <- plotCounts(cds, gene=requested_genes_plot_Ensembl[i], intgroup=c("combfactor_contrasts"), returnData=TRUE, normalized = T)
boxplot_counts$variable = row.names(boxplot_counts)
plot <- ggplot(data=boxplot_counts, aes(x=combfactor_contrasts, y=count, fill=combfactor_contrasts)) +
geom_boxplot(position=position_dodge()) +
geom_jitter(position=position_dodge(.8)) +
ggtitle(paste("Gene ",requested_genes_plot_gene_name[i],sep="")) + xlab("") + ylab("Normalized gene counts") + theme_bw() +
theme(text = element_text(size=12),
axis.text.x = element_text(angle=45, vjust=1,hjust=1))

# As above
box_df$count <- assay(box_object)[requested_genes_plot_Ensembl[i],]
plot <- ggplot(data=box_df, aes(x=combfactor_contrasts, y=count, fill=combfactor_contrasts)) +
geom_boxplot(position=position_dodge()) +
geom_jitter(position=position_dodge(.8)) +
ggtitle(paste("Gene ",requested_genes_plot_gene_name[i],sep="")) + xlab("") + ylab(paste0(ifelse(run_rlog, "rlog", "vst"), "-normalized gene counts")) + theme_bw() +
theme(text = element_text(size=12),
axis.text.x = element_text(angle=45, vjust=1,hjust=1))
ggsave(filename=paste("differential_gene_expression/plots/boxplots_requested_genes/",requested_genes_plot_gene_name[i],"_",requested_genes_plot_Ensembl[i],".svg",sep=""), width=10, height=5, plot=plot)
ggsave(filename=paste("differential_gene_expression/plots/boxplots_requested_genes/",requested_genes_plot_gene_name[i],"_",requested_genes_plot_Ensembl[i],".png",sep=""), width=10, height=5, plot=plot)
ggsave(filename=paste("differential_gene_expression/plots/boxplots_requested_genes/",requested_genes_plot_gene_name[i],"_",requested_genes_plot_Ensembl[i],".pdf",sep=""), width=10, height=5, plot=plot)
}
}
}
```

```{r box_plots_Batch, echo=F, message=F, eval=params$batch_effect}
# Decide which object to use for boxplots without batch correction
if (run_rlog) {
box_object <- rld_corrected
} else {
box_object <- vsd_corrected
}

############### BOXPLOTS GENE EXPRESSION PER CONDITION ##########################


colData(box_object)$combfactor_contrasts <- apply(as.data.frame(metadata_save[ ,conditions_contrasts]),1,paste, collapse = "_")

# Previously, the function plotCounts was used here before ggplot. In PR #215, this was changed in order to allow for plotting the rlog/vst-normalized counts before/after batch correction (plotCounts can only do a normalization by sizeFactors). This was done following the instructions here: https://support.bioconductor.org/p/68859/
# 1. Get columns from coldata
box_df <- data.frame(colData(box_object))

for (i in random_DE_genes_plot) {

# 2. Get 1 column from assay; this column will correspond to the target gene
box_df$count <- assay(box_object)[i,]
plot <- ggplot(data=box_df, aes(x=combfactor_contrasts, y=count, fill=combfactor_contrasts)) +
geom_boxplot(position=position_dodge()) +
geom_jitter(position=position_dodge(.8)) +
ggtitle(paste("Gene ",i,sep="")) + xlab("") + ylab(paste0(ifelse(run_rlog, "rlog", "vst"), "-normalized gene counts")) + theme_bw() +
theme(text = element_text(size=12),
axis.text.x = element_text(angle=45, vjust=1,hjust=1))
ggsave(filename=paste("differential_gene_expression/plots/boxplots_example_genes/",i,"_after_batchcorrect.svg",sep=""), width=10, height=5, plot=plot)
ggsave(filename=paste("differential_gene_expression/plots/boxplots_example_genes/",i,"_after_batchcorrect.png",sep=""), width=10, height=5, plot=plot)
ggsave(filename=paste("differential_gene_expression/plots/boxplots_example_genes/",i,"_after_batchcorrect.pdf",sep=""), width=10, height=5, plot=plot)
}

# make boxplots of interesting genes in gene list
if (isProvided(params$path_genelist)) {
gene_ids <- read.table(params$path_genelist, col.names = "requested_gene_name")
write.table(gene_ids, file="differential_gene_expression/metadata/requested_gene_list.txt", col.names=F, row.names=F, sep="\t", quote=F)
gene_ids$requested_gene_name <- sapply(gene_ids$requested_gene_name, toupper)
gene_names$gene_name <- sapply(gene_names$gene_name, toupper)

# get Ensemble IDs from requested genes
requested_genes_plot <- subset(gene_names, gene_name %in% gene_ids$requested_gene_name)

# Check that genes are in the box_object table
requested_genes_plot <- subset(requested_genes_plot, requested_genes_plot$Ensembl_ID %in% row.names(box_object))
requested_genes_plot_Ensembl <- requested_genes_plot$Ensembl_ID
requested_genes_plot_gene_name <- requested_genes_plot$gene_name

for (i in seq_along(requested_genes_plot_Ensembl)) {

# As above
box_df$count <- assay(box_object)[requested_genes_plot_Ensembl[i],]
plot <- ggplot(data=box_df, aes(x=combfactor_contrasts, y=count, fill=combfactor_contrasts)) +
geom_boxplot(position=position_dodge()) +
geom_jitter(position=position_dodge(.8)) +
ggtitle(paste("Gene ",requested_genes_plot_gene_name[i],sep="")) + xlab("") + ylab(paste0(ifelse(run_rlog, "rlog", "vst"), "-normalized gene counts")) + theme_bw() +
theme(text = element_text(size=12),
axis.text.x = element_text(angle=45, vjust=1,hjust=1))
ggsave(filename=paste("differential_gene_expression/plots/boxplots_requested_genes/",requested_genes_plot_gene_name[i],"_",requested_genes_plot_Ensembl[i],"_after_batchcorrect.svg",sep=""), width=10, height=5, plot=plot)
ggsave(filename=paste("differential_gene_expression/plots/boxplots_requested_genes/",requested_genes_plot_gene_name[i],"_",requested_genes_plot_Ensembl[i],"_after_batchcorrect.png",sep=""), width=10, height=5, plot=plot)
ggsave(filename=paste("differential_gene_expression/plots/boxplots_requested_genes/",requested_genes_plot_gene_name[i],"_",requested_genes_plot_Ensembl[i],"_after_batchcorrect.pdf",sep=""), width=10, height=5, plot=plot)
}
}
```

Expand Down
4 changes: 0 additions & 4 deletions testdata/onecontrast.tsv

This file was deleted.

Loading