Skip to content

Commit

Permalink
Merge pull request #547 from gagneurlab/dev
Browse files Browse the repository at this point in the history
Strand specific counting wrt samples in FRASER
  • Loading branch information
vyepez88 authored May 29, 2024
2 parents ef7e11e + 0306816 commit 1bc83fd
Show file tree
Hide file tree
Showing 15 changed files with 63 additions and 70 deletions.
22 changes: 15 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,31 @@
[![Version](https://img.shields.io/github/v/release/gagneurlab/drop?include_prereleases)](https://github.com/gagneurlab/drop/releases)
[![Version](https://readthedocs.org/projects/gagneurlab-drop/badge/?version=latest)](https://gagneurlab-drop.readthedocs.io/en/latest)

The detection of RNA Outliers Pipeline (DROP) is an integrative workflow to detect aberrant expression, aberrant splicing, and mono-allelic expression from raw sequencing data.

The detection of RNA Outliers Pipeline (DROP) is an integrative workflow to detect aberrant expression, aberrant splicing, and mono-allelic expression from raw RNA sequencing data.

The manuscript is available in [Nature Protocols](https://www.nature.com/articles/s41596-020-00462-5).

The website containing the different reports of the Geuvadis demo dataset described in the paper can be found [here](https://cmm.in.tum.de/public/paper/drop_analysis/webDir/drop_analysis_index.html).
This [website](https://cmm.in.tum.de/public/paper/drop_analysis/webDir/drop_analysis_index.html) contains the different reports of the Geuvadis demo dataset described in the paper.

This [video](https://www.youtube.com/watch?v=XvgjiFQClhM&t=2761s) presents the tools used in DROP and their application to rare disease diagnostics.

This [video](https://www.youtube.com/watch?v=XvgjiFQClhM&t=2761s) introduces the tools used in DROP and their application to rare disease diagnostics.

<img src="drop_sticker.png" alt="drop logo" width="200" class="center"/>


## Quickstart
DROP is available on [bioconda](https://anaconda.org/bioconda/drop).
We recommend using a dedicated conda environment (`drop_env` in this example). Installation time: ~ 10min.
We recommend using a dedicated conda environment (`drop_env` in this example). Installation time: ~ 15 min.
```
mamba create -n drop_env -c conda-forge -c bioconda drop --override-channels
```

In the case of mamba/conda troubles, we recommend using the fixed `DROP_<version>.yaml` installation file we make available on our [public server](https://www.cmm.in.tum.de/public/paper/drop_analysis/). Install the current version and use the full path in the following command to install the conda environment `drop_env`

In the case troubles with mamba or conda, we recommend using the fixed `DROP_<version>.yaml` installation file we make available on our [public server](https://www.cmm.in.tum.de/public/paper/drop_analysis/). Install the current version and use the full path in the following command to install the conda environment `drop_env`
```
mamba env create -f DROP_1.3.4.yaml
mamba env create -f DROP_1.4.0.yaml
```

Test installation with demo project
Expand All @@ -47,6 +51,8 @@ For more information on different installation options, refer to the

## What's new


Version 1.4.0 fixes some bugs regarding the split reads counting for FRASER, which only affects cohorts containing samples with different strands (stranded forward, stranded reverse or unstranded). If your cohort contained samples with different strands, please rerun the AS module using version 1.4.0. In addition, due to snakemake updates affecting wBuild and the way we installed FRASER, installing DROP 1.3.3 no longer works. Version 1.3.4 fixes the FRASER version to ensure reproducibility and fixes certain scripts affected by the snakemake update. Running the pipeline with version 1.3.4 should provide the same outlier results as 1.3.3.
Due to snakemake updates affecting wBuild and the way we installed FRASER, installing DROP 1.3.3 no longer works. Version 1.3.4 simply fixes the FRASER version to ensure reproducibility and fixes certain scripts affected by the snakemake update. Running the pipeline with the version 1.3.4 should provide the same outlier results as 1.3.3.

Version 1.3.0 introduces the option to use FRASER 2.0 which is an improved version of FRASER that uses the Intron Jaccard Index metric instead of percent spliced in and splicing efficiency to quantify and later call aberrant splicing. To run FRASER 2.0, modify the `FRASER_version` parameter in the aberrantSplicing dictionary in the config file and adapt the `quantileForFiltering` and `deltaPsiCutoff` parameters. See the [config template](https://github.com/gagneurlab/drop/blob/master/drop/template/config.yaml) for more details. When switching between FRASER versions, we recommend running DROP in a
Expand All @@ -56,7 +62,9 @@ separate folder for each version. Moreover, DROP now allows users to provide lis

As of version 1.2.1 DROP has a new module that performs RNA-seq variant calling. The input are BAM files and the output either a single-sample or a multi-sample VCF file (option specified by the user) annotated with allele frequencies from gnomAD (if specified by the user). The sample annotation table does not need to be changed, but several new parameters in the config file have to be added and tuned. For more info, refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#rna-variant-calling-dictionary).

Also, as of version 1.2.1 the integration of external split and non-split counts to detect aberrant splicing is now possible. Simply specify in a new column in the sample annotation the directory containing the counts. For more info, refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#external-count-examples).

Also, as of version 1.2.1 the integration of external split and non-split counts to detect aberrant splicing is now possible. In a new column in the sample annotation, simply specify the directory containing the counts. For more info, refer to the [documentation](https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#external-count-examples).



## Set up a custom project
Expand Down
3 changes: 2 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@
author = 'Michaela Müller'

# The full version, including alpha/beta/rc tags
release_ = '1.3.4'
release_ = '1.4.0'




Expand Down
3 changes: 2 additions & 1 deletion docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ Install the latest version and use the full path in the following command to ins

.. code-block:: bash
mamba env create -f DROP_1.3.4.yaml
mamba env create -f DROP_1.4.0.yaml
Installation time: ~ 10min

Expand Down
2 changes: 1 addition & 1 deletion drop/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
from . import utils
from . import demo

__version__ = "1.3.4"

__version__ = "1.4.0"
3 changes: 2 additions & 1 deletion drop/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@

@click.group()
@click_log.simple_verbosity_option(logger)
@click.version_option('1.3.4',prog_name='drop')

@click.version_option('1.4.0',prog_name='drop')


def main():
Expand Down
8 changes: 8 additions & 0 deletions drop/config/SampleAnnotation.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,14 @@ def parse(self, sep='\t'):
optional_columns = {"GENE_COUNTS_FILE", "SPLICE_COUNTS_DIR", "GENE_ANNOTATION", "GENOME"}

annotationTable = pd.read_csv(self.file, sep=sep, index_col=False)

# FRASER cannot handle a mixture of stranded and unstranded samples, raise error in such cases
if (annotationTable['STRAND'] == 'no').any() & ((annotationTable['STRAND'] == 'reverse').any() | (annotationTable['STRAND'] == 'yes').any()):
raise ValueError("Data contains a mix of stranded and unstranded samples. Please analyze them separately.\n")

if annotationTable['STRAND'].isnull().values.any():
raise ValueError("STRAND is not provided for some samples. All samples should have STRAND value in the sample annotation file.\n")

missing_cols = [x for x in self.SAMPLE_ANNOTATION_COLUMNS if x not in annotationTable.columns.values]
if len(missing_cols) > 0:
if "GENE_ANNOTATION" in missing_cols and "ANNOTATION" in annotationTable.columns.values:
Expand Down
28 changes: 14 additions & 14 deletions drop/demo/sample_annotation_relative.tsv
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
RNA_ID RNA_BAM_FILE DNA_VCF_FILE DNA_ID DROP_GROUP PAIRED_END COUNT_MODE COUNT_OVERLAPS STRAND HPO_TERMS GENE_COUNTS_FILE GENE_ANNOTATION GENOME SPLICE_COUNTS_DIR
HG00096 Data/rna_bam/HG00096_ncbi.bam Data/dna_vcf/demo_chr21_ncbi.vcf.gz HG00096 outrider,fraser,mae,batch_0 True IntersectionStrict True no HP:0009802,HP:0010896 ncbi
HG00103 Data/rna_bam/HG00103.bam Data/dna_vcf/demo_chr21.vcf.gz HG00103 outrider,fraser,mae,batch_1 True IntersectionStrict True no HP:0004582,HP:0031959 ucsc
HG00106 Data/rna_bam/HG00106.bam Data/dna_vcf/demo_chr21.vcf.gz HG00106 outrider,outrider_external,fraser,fraser_external,mae,batch_1 True IntersectionStrict True no HP:0002895,HP:0006731 ucsc
HG00111 Data/rna_bam/HG00111.bam Data/dna_vcf/demo_chr21.vcf.gz HG00111 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0100491,HP:0100871
HG00116 Data/rna_bam/HG00116.bam Data/dna_vcf/demo_chr21.vcf.gz HG00116 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0030613,HP:0012767
HG00126 Data/rna_bam/HG00126.bam Data/dna_vcf/demo_chr21.vcf.gz HG00126 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0000290,HP:0000293
HG00132 Data/rna_bam/HG00132.bam Data/dna_vcf/demo_chr21.vcf.gz HG00132 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0006489,HP:0006490
HG00149 Data/rna_bam/HG00149.bam Data/dna_vcf/demo_chr21.vcf.gz HG00149 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0000014,HP:0000020,HP:0032663
HG00150 Data/rna_bam/HG00150.bam Data/dna_vcf/demo_chr21.vcf.gz HG00150 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0030809,HP:0006144
HG00176 Data/rna_bam/HG00176.bam Data/dna_vcf/demo_chr21.vcf.gz HG00176 outrider,outrider_external,fraser,fraser_external True IntersectionStrict True no HP:0005215,HP:0010234
HG00178 outrider_external Data/external_count_data/geneCounts.tsv.gz v29
HG00181 outrider_external Data/external_count_data/geneCounts.tsv.gz v29
HG00191 fraser_external Data/external_count_data
HG00201 fraser_external Data/external_count_data
HG00096 Data/rna_bam/HG00096_ncbi.bam Data/dna_vcf/demo_chr21_ncbi.vcf.gz HG00096 outrider,fraser,mae,batch_0 TRUE IntersectionStrict TRUE no HP:0009802,HP:0010896 ncbi
HG00103 Data/rna_bam/HG00103.bam Data/dna_vcf/demo_chr21.vcf.gz HG00103 outrider,fraser,mae,batch_1 TRUE IntersectionStrict TRUE no HP:0004582,HP:0031959 ucsc
HG00106 Data/rna_bam/HG00106.bam Data/dna_vcf/demo_chr21.vcf.gz HG00106 outrider,outrider_external,fraser,fraser_external,mae,batch_1 TRUE IntersectionStrict TRUE no HP:0002895,HP:0006731 ucsc
HG00111 Data/rna_bam/HG00111.bam Data/dna_vcf/demo_chr21.vcf.gz HG00111 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0100491,HP:0100871
HG00116 Data/rna_bam/HG00116.bam Data/dna_vcf/demo_chr21.vcf.gz HG00116 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0030613,HP:0012767
HG00126 Data/rna_bam/HG00126.bam Data/dna_vcf/demo_chr21.vcf.gz HG00126 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0000290,HP:0000293
HG00132 Data/rna_bam/HG00132.bam Data/dna_vcf/demo_chr21.vcf.gz HG00132 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0006489,HP:0006490
HG00149 Data/rna_bam/HG00149.bam Data/dna_vcf/demo_chr21.vcf.gz HG00149 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0000014,HP:0000020,HP:0032663
HG00150 Data/rna_bam/HG00150.bam Data/dna_vcf/demo_chr21.vcf.gz HG00150 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0030809,HP:0006144
HG00176 Data/rna_bam/HG00176.bam Data/dna_vcf/demo_chr21.vcf.gz HG00176 outrider,outrider_external,fraser,fraser_external TRUE IntersectionStrict TRUE no HP:0005215,HP:0010234
HG00178 outrider_external no Data/external_count_data/geneCounts.tsv.gz v29
HG00181 outrider_external no Data/external_count_data/geneCounts.tsv.gz v29
HG00191 fraser_external no Data/external_count_data
HG00201 fraser_external no Data/external_count_data
10 changes: 4 additions & 6 deletions drop/installRPackages.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,18 @@ if (file.exists(args[1])){
package=gsub("=.*", "", unlist(args)),
version=gsub(".*=", "", unlist(args)),
ref="")
packages[package == version, c("min_version", "max_version") := ""]
packages[package == version, version:=NA]
}
installed <- as.data.table(installed.packages())

for (pckg_name in packages$package) {
package_dt <- packages[package == pckg_name]
pckg_name <- gsub(".*/", "", pckg_name)
min_version <- package_dt$min_version
max_version <- package_dt$max_version
version <- package_dt$version
branch <- package_dt$ref

if (!pckg_name %in% installed$Package || (min_version != "" && (compareVersion(
installed[Package == pckg_name, Version], min_version) < 0 || compareVersion(
installed[Package == pckg_name, Version], max_version) > 0))) {
if (!pckg_name %in% installed$Package || (!is.na(version) && compareVersion(
installed[Package == pckg_name, Version], version) < 0)) {

package <- package_dt$package
message(paste("install", package))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,16 +29,15 @@ params <- snakemake@config$aberrantSplicing
# Create initial FRASER object
col_data <- fread(colDataFile)

col_data$strand <- 0L

fds <- FraserDataSet(colData = col_data,
workingDir = workingDir,
name = paste0("raw-local-", dataset))

# Add paired end and strand specificity to the fds
pairedEnd(fds) <- colData(fds)$PAIRED_END
strandSpecific(fds) <- 'no'
if(uniqueN(colData(fds)$STRAND) == 1){
strandSpecific(fds) <- unique(colData(fds)$STRAND)
}
strandSpecific(fds) <- colData(fds)$STRAND

# Save initial FRASER dataset
fds <- saveFraserDataSet(fds)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ sample_id <- snakemake@wildcards[["sample_id"]]

# If data is not strand specific, add genome info
genome <- NULL
if(strandSpecific(fds) == 0){
if(strandSpecific(fds[, sample_id]) == 0){
genome <- getBSgenome(genomeAssembly)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ if(length(exCountIDs) > 0){
for(resource in unique(exCountFiles)){
exSampleIDs <- exCountIDs[exCountFiles == resource]
exAnno <- fread(sample_anno_file, key="RNA_ID")[J(exSampleIDs)]
exAnno[, strand:=exAnno$STRAND]
exAnno$strand<- sapply(exAnno[, strand], switch, 'no' = 0L, 'unstranded' = 0L,
'yes' = 1L, 'stranded' = 1L, 'reverse' = 2L, -1L)
setnames(exAnno, "RNA_ID", "sampleID")

ctsNames <- c("k_j", "k_theta", "n_psi3", "n_psi5", "n_theta")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,26 +54,10 @@ print('Results per junction extracted')

# Add features
if(nrow(res_junc_dt) > 0){

# dcast to have one row per outlier
subsets <- res_junc_dt[, unique(FDR_set)]
subsets <- subsets[subsets != "transcriptome-wide"]
colorder <- colnames(res_junc_dt[, !"FDR_set", with=FALSE])
res_junc_dt <- dcast(res_junc_dt, ... ~ FDR_set, value.var="padjust")
setnames(res_junc_dt, "transcriptome-wide", "padjust")
for(subset_name in subsets){
setnames(res_junc_dt, subset_name, paste0("padjust_", subset_name))
}
setcolorder(res_junc_dt, colorder)

# number of samples per gene and variant
res_junc_dt[, numSamplesPerGene := uniqueN(sampleID), by = hgncSymbol]
res_junc_dt[, numEventsPerGene := .N, by = "hgncSymbol,sampleID"]
res_junc_dt[, numSamplesPerJunc := uniqueN(sampleID), by = "seqnames,start,end,strand"]

# add colData to the results
res_junc_dt <- merge(res_junc_dt, as.data.table(colData(fds)), by = "sampleID")
res_junc_dt[, c("bamFile", "pairedEnd", "STRAND", "RNA_BAM_FILE", "DNA_VCF_FILE", "COUNT_MODE", "COUNT_OVERLAPS") := NULL]
} else{
warning("The aberrant splicing pipeline gave 0 intron-level results for the ", dataset, " dataset.")
}
Expand All @@ -83,15 +67,6 @@ res_gene <- results(fds, psiType=psiTypes,
aggregate=TRUE, collapse=FALSE,
all=TRUE)
res_genes_dt <- as.data.table(res_gene)
subsets <- res_genes_dt[, unique(FDR_set)]
subsets <- subsets[subsets != "transcriptome-wide"]
colorder <- colnames(res_genes_dt[, !c("padjust", "FDR_set"), with=FALSE])
res_genes_dt <- dcast(res_genes_dt[,!"padjust", with=FALSE], ... ~ FDR_set, value.var="padjustGene")
setnames(res_genes_dt, "transcriptome-wide", "padjustGene")
for(subset_name in subsets){
setnames(res_genes_dt, subset_name, paste0("padjustGene_", subset_name))
}
setcolorder(res_genes_dt, colorder)
print('Results per gene extracted')
write_tsv(res_genes_dt, file=snakemake@output$resultTableGene_full)

Expand All @@ -103,9 +78,6 @@ res_genes_dt <- res_genes_dt[do.call(pmin, c(res_genes_dt[,padj_cols, with=FALSE
totalCounts >= 5,]

if(nrow(res_genes_dt) > 0){
res_genes_dt <- merge(res_genes_dt, as.data.table(colData(fds)), by = "sampleID")
res_genes_dt[, c("bamFile", "pairedEnd", "STRAND", "RNA_BAM_FILE", "DNA_VCF_FILE", "COUNT_MODE", "COUNT_OVERLAPS") := NULL]

# add HPO overlap information
sa <- fread(snakemake@config$sampleAnnotation,
colClasses = c(RNA_ID = 'character', DNA_ID = 'character'))
Expand Down
8 changes: 4 additions & 4 deletions drop/requirementsR.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
package min_version max_version ref
package version ref
devtools
gagneurlab/OUTRIDER 1.20.1 1.20.1 1.20.1
gagneurlab/FRASER 1.99.3 1.99.3 d6a422c
gagneurlab/tMAE 1.0.4 1.0.4 1.0.4
gagneurlab/OUTRIDER 1.20.1 1.20.1
gagneurlab/FRASER 1.99.4 1.99.4
gagneurlab/tMAE 1.0.4 1.0.4
VariantAnnotation
rmarkdown
knitr
Expand Down
3 changes: 2 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
[bumpversion]
current_version = 1.3.4

current_version = 1.4.0
commit = True

[bumpversion:file:setup.py]
Expand Down
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@

setuptools.setup(
name="drop",
version="1.3.4",
version="1.4.0",


author="Vicente A. Yépez, Michaela Müller, Nicholas H. Smith, Daniela Klaproth-Andrade, Luise Schuller, Ines Scheller, Christian Mertes <mertes@in.tum.de>, Julien Gagneur <gagneur@in.tum.de>",
author_email="yepez@in.tum.de",
Expand Down

0 comments on commit 1bc83fd

Please sign in to comment.