Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gimme maelstrom #869

Merged
merged 119 commits into from
Jun 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
97ecd3e
merge
siebrenf May 20, 2022
3fd67cd
merge develop
siebrenf May 20, 2022
52ff7a2
nitpicking
siebrenf May 20, 2022
95ca2b1
more renaming
siebrenf May 20, 2022
b5d419f
my secret sauce escaping again
siebrenf May 20, 2022
69933fc
sieve_bam
siebrenf May 20, 2022
13ee72a
Merge branch 'develop' into refactor
Maarten-vd-Sande May 24, 2022
4692f7c
revert subsampling
siebrenf May 24, 2022
91ef4d6
first work
Maarten-vd-Sande May 25, 2022
29ac1b1
Merge branch 'develop' into gimme_maelstrom
Maarten-vd-Sande May 25, 2022
2b16f08
not for broad peaks!
Maarten-vd-Sande May 25, 2022
0963bd7
more wip
Maarten-vd-Sande May 25, 2022
0524bf9
update requirements
Maarten-vd-Sande May 25, 2022
c077ee0
fix
Maarten-vd-Sande May 25, 2022
9e4175f
Update dag_tests.sh
Maarten-vd-Sande May 25, 2022
7c8a80f
fx
Maarten-vd-Sande May 26, 2022
2553b58
better
Maarten-vd-Sande May 26, 2022
90f03c6
Update gimme.yaml
Maarten-vd-Sande May 29, 2022
c4d52cb
Merge branch 'gimme_maelstrom' of github.com:vanheeringen-lab/seq2sci…
Maarten-vd-Sande May 30, 2022
e8a95b3
Merge branch 'develop' into gimme_maelstrom
Maarten-vd-Sande May 30, 2022
14a0e29
more worky
Maarten-vd-Sande May 31, 2022
13304b7
Merge branch 'gimme_maelstrom' of github.com:vanheeringen-lab/seq2sci…
Maarten-vd-Sande May 31, 2022
4bfcaad
used assembleis
Maarten-vd-Sande May 31, 2022
98c9475
something
Maarten-vd-Sande May 31, 2022
0f79702
more
Maarten-vd-Sande May 31, 2022
9f93975
more
Maarten-vd-Sande May 31, 2022
3600e0b
more
Maarten-vd-Sande May 31, 2022
710b184
more
Maarten-vd-Sande May 31, 2022
b39f40c
dsadsa
Maarten-vd-Sande May 31, 2022
7a82b0a
wip
Maarten-vd-Sande May 31, 2022
5aa42fa
Merge branch 'gimme_maelstrom' of github.com:vanheeringen-lab/seq2sci…
Maarten-vd-Sande May 31, 2022
a2b5cd5
check pls
Maarten-vd-Sande May 31, 2022
79490e2
Merge branch 'develop' into gimme_maelstrom
Maarten-vd-Sande May 31, 2022
9679816
more
Maarten-vd-Sande May 31, 2022
433c756
Merge branch 'gimme_maelstrom' of github.com:vanheeringen-lab/seq2sci…
Maarten-vd-Sande May 31, 2022
0223dba
cleanup
Maarten-vd-Sande May 31, 2022
ddc9a27
cleanup
Maarten-vd-Sande May 31, 2022
fe66a0d
fix
Maarten-vd-Sande May 31, 2022
ed77600
more fix
Maarten-vd-Sande Jun 1, 2022
5ade8fe
breps
Maarten-vd-Sande Jun 1, 2022
5150cde
does this work in one go?
Maarten-vd-Sande Jun 1, 2022
f5abec3
more wip
Maarten-vd-Sande Jun 1, 2022
900f976
wip
Maarten-vd-Sande Jun 1, 2022
aeaced9
Merge branch 'gimme_maelstrom' of github.com:vanheeringen-lab/seq1sci…
Maarten-vd-Sande Jun 1, 2022
336f24a
wip
Maarten-vd-Sande Jun 1, 2022
3ad4c11
wip
Maarten-vd-Sande Jun 1, 2022
4f1900b
test
Maarten-vd-Sande Jun 1, 2022
8f9b7f9
more
Maarten-vd-Sande Jun 1, 2022
512c510
Update motif_scan.smk
Maarten-vd-Sande Jun 1, 2022
92ff4e4
fix envs
Maarten-vd-Sande Jun 2, 2022
085ceb4
envs
Maarten-vd-Sande Jun 2, 2022
757ed16
requirements
Maarten-vd-Sande Jun 2, 2022
c00f7d0
Update gimme.yaml
Maarten-vd-Sande Jun 2, 2022
302a386
Update requirements.yaml
Maarten-vd-Sande Jun 2, 2022
fcb7388
Update requirements.yaml
Maarten-vd-Sande Jun 2, 2022
98d0d58
Update gimme.yaml
Maarten-vd-Sande Jun 2, 2022
66678f0
Update htmltable.yaml
Maarten-vd-Sande Jun 2, 2022
7ca3060
Update requirements.yaml
Maarten-vd-Sande Jun 2, 2022
4343bae
Update requirements.yaml
Maarten-vd-Sande Jun 2, 2022
f28e6f4
Update requirements.yaml
Maarten-vd-Sande Jun 2, 2022
2eb1ad8
Update requirements.yaml
Maarten-vd-Sande Jun 2, 2022
05d0b4a
Update gimme.yaml
Maarten-vd-Sande Jun 2, 2022
e618392
Update htmltable.yaml
Maarten-vd-Sande Jun 2, 2022
806cab6
Update requirements.yaml
Maarten-vd-Sande Jun 2, 2022
b021e5f
Update motif_scan.smk
Maarten-vd-Sande Jun 2, 2022
c02699f
Update requirements.yaml
Maarten-vd-Sande Jun 2, 2022
33e2d25
Update gimme.yaml
Maarten-vd-Sande Jun 2, 2022
7b9ea70
Update requirements.yaml
Maarten-vd-Sande Jun 2, 2022
9ceedbc
Update htmltable.yaml
Maarten-vd-Sande Jun 2, 2022
969f47d
thread parsing works okay again :D
Maarten-vd-Sande Jun 3, 2022
30486fe
slurm fixes
Maarten-vd-Sande Jun 3, 2022
8b04e5c
Delete Snakefile
Maarten-vd-Sande Jun 7, 2022
41e2530
Delete 2.txt
Maarten-vd-Sande Jun 7, 2022
7564dce
Delete 1.txt
Maarten-vd-Sande Jun 7, 2022
200d606
Delete 2.txt
Maarten-vd-Sande Jun 7, 2022
eccd31a
Update util.py
Maarten-vd-Sande Jun 7, 2022
6bbec1d
Update CHANGELOG.md
Maarten-vd-Sande Jun 7, 2022
ea78435
localrules
Maarten-vd-Sande Jun 7, 2022
f4d753c
Merge branch 'develop' into slurm
Maarten-vd-Sande Jun 7, 2022
325b60f
Merge branch 'develop' into gimme_maelstrom
Maarten-vd-Sande Jun 7, 2022
1c0ff08
Update gimme.yaml
Maarten-vd-Sande Jun 7, 2022
ab0c1ec
Merge branch 'slurm' into gimme_maelstrom
Maarten-vd-Sande Jun 7, 2022
0a926b9
breps fix
Maarten-vd-Sande Jun 9, 2022
5f6b181
Merge branch 'develop' into gimme_maelstrom
Maarten-vd-Sande Jun 9, 2022
156903e
Update peak_counts.smk
Maarten-vd-Sande Jun 10, 2022
cb0cd2d
Update run_tests.sh
Maarten-vd-Sande Jun 10, 2022
3f3639e
Update run_tests.sh
Maarten-vd-Sande Jun 10, 2022
433eeff
Update run_tests.sh
Maarten-vd-Sande Jun 10, 2022
60d8f0d
rename files
Maarten-vd-Sande Jun 10, 2022
d9d57e7
Merge branch 'gimme_maelstrom' of github.com:vanheeringen-lab/seq2sci…
Maarten-vd-Sande Jun 10, 2022
109e81a
new multiqc report
Maarten-vd-Sande Jun 13, 2022
8d4aebe
Merge branch 'develop' into gimme_maelstrom
Maarten-vd-Sande Jun 13, 2022
bc4f424
Update trackhub.smk
Maarten-vd-Sande Jun 13, 2022
ea5f1f1
Update motif_scan.smk
Maarten-vd-Sande Jun 13, 2022
ab9e05e
Update combine_biological_reps.py
Maarten-vd-Sande Jun 13, 2022
d679145
Update Snakefile
Maarten-vd-Sande Jun 13, 2022
b10d76d
Update Snakefile
Maarten-vd-Sande Jun 13, 2022
32f75a3
Update qc_peaks.smk
Maarten-vd-Sande Jun 13, 2022
e9b656c
Update htmltable.yaml
Maarten-vd-Sande Jun 13, 2022
885ca85
docs
Maarten-vd-Sande Jun 13, 2022
588a510
Update trackhub.smk
Maarten-vd-Sande Jun 13, 2022
5cb986e
Update htmltable.yaml
Maarten-vd-Sande Jun 14, 2022
2f18af2
no localrules
Maarten-vd-Sande Jun 14, 2022
66d3108
Update multiqc_config.yaml
Maarten-vd-Sande Jun 14, 2022
4efa754
Update multiqc_config.yaml
Maarten-vd-Sande Jun 14, 2022
58ca8ca
Update multiqc_config.yaml
Maarten-vd-Sande Jun 14, 2022
078f0da
Update qc.smk
Maarten-vd-Sande Jun 14, 2022
379f483
review
Maarten-vd-Sande Jun 17, 2022
9bbccf8
Merge branch 'gimme_maelstrom' of github.com:vanheeringen-lab/seq2sci…
Maarten-vd-Sande Jun 17, 2022
454195c
cpulimit
Maarten-vd-Sande Jun 17, 2022
a8e7420
Merge branch 'gimme_maelstrom' of github.com:vanheeringen-lab/seqscie…
Maarten-vd-Sande Jun 17, 2022
e2ea9b0
fix
Maarten-vd-Sande Jun 17, 2022
cb4128b
fix
Maarten-vd-Sande Jun 17, 2022
9b15a7d
fix
Maarten-vd-Sande Jun 17, 2022
f534b22
Update util.py
Maarten-vd-Sande Jun 17, 2022
c3e01dc
Update combine_biological_reps.py
Maarten-vd-Sande Jun 17, 2022
372051e
last fixe
Maarten-vd-Sande Jun 17, 2022
a8fee23
Merge branch 'gimme_maelstrom' of github.com:vanheeringen-lab/seq2sci…
Maarten-vd-Sande Jun 17, 2022
565c91b
really last
Maarten-vd-Sande Jun 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ All changed fall under either one of these types: `Added`, `Changed`, `Deprecate

### Added

- Seq2science makes a biological replicates count table, which is the mean of the biological replicates.
- Seq2science now supports differential motif analysis by gimme maelstrom!!!
- configurable setting `niceness` which sets a niceness prefix to all shell commands.

### Fixed
Expand Down
11 changes: 9 additions & 2 deletions docs/content/workflows/atac_seq.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,18 @@ Seq2science currently supports four different normalisation methods: quantile no

After these normalisations the counts are log normalised, and the base can be set with `logbase` and defaults to 2.
As a final step the count tables are mean-centered.
This final count table can be used for tools like [gimme maelstrom](https://gimmemotifs.readthedocs.io/en/master/reference.html#command-gimme-maelstrom) to scan for enriched transcription factor motifs.
Note that this table contains **all** peaks, and no selection on differential peaks has been made.

#### Differential motif analysis

Seq2science supports differential motif analysis!
This analysis is based on the tool [gimme maelstrom](https://gimmemotifs.readthedocs.io/en/master/reference.html#command-gimme-maelstrom).
As input to gimme maelstrom is the count table of the biological reps (average between reps) in the consensus peakset.
It then tries to solve a system of linear equations, where the output is the read counts in a peak, and the input is the motif score in the peak times the "motif activity".
This motif activity can then be compared across biological replicates for differential motifs.

#### Differential peak analysis
Seq2science can optionally use the raw peak counts table to perform differential peak analysis.
On top of that, Seq2science can also use the raw peak counts table to perform differential peak analysis.
See the [Differential gene/peak analysis page](../DESeq2.html) for more information!

#### Trackhub
Expand Down
11 changes: 9 additions & 2 deletions docs/content/workflows/chip_seq.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,11 +75,18 @@ Seq2science currently supports four different normalisation methods: quantile no

After these normalisations the counts are log normalised, and the base can be set with `logbase` and defaults to 2.
As a final step the count tables are mean-centered.
This final count table can be used for tools like [gimme maelstrom](https://gimmemotifs.readthedocs.io/en/master/reference.html#command-gimme-maelstrom) to scan for enriched transcription factor motifs.
Note that this table contains **all** peaks, and no selection on differential peaks has been made.

#### Differential motif analysis

Seq2science supports differential motif analysis!
This analysis is based on the tool [gimme maelstrom](https://gimmemotifs.readthedocs.io/en/master/reference.html#command-gimme-maelstrom).
As input to gimme maelstrom is the count table of the biological reps (average between reps) in the consensus peakset.
It then tries to solve a system of linear equations, where the output is the read counts in a peak, and the input is the motif score in the peak times the "motif activity".
This motif activity can then be compared across biological replicates for differential motifs.

#### Differential peak analysis
Seq2science can optionally use the raw peak counts table to perform differential peak analysis.
On top of that, Seq2science can also use the raw peak counts table to perform differential peak analysis.
See the [Differential gene/peak analysis page](../DESeq2.html) for more information!

#### Trackhub
Expand Down
20,708 changes: 20,223 additions & 485 deletions docs/resources/MultiQC_Report.html

Large diffs are not rendered by default.

8 changes: 5 additions & 3 deletions seq2science/envs/gimme.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ channels:
- bioconda
- defaults
dependencies:
- bioconda::gimmemotifs=0.15.1
Copy link
Member

@siebrenf siebrenf May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing via pip might be way faster, as this does not include most of the motif discovery tools (2 GB vs 250 MB). Some dependencies might be missing/outdated on pip.

unless we need motive scanning, then you're out of luck...

- bioconda::biofluff=3.0.3
- conda-forge::pandas=1.1.5 # won't be necessary after https://github.com/vanheeringen-lab/gimmemotifs/issues/168 is fixed
- bioconda::gimmemotifs=0.17.1
- bioconda::biofluff=3.0.4
- bioconda::gffread=0.12.7
- bioconda::orthofinder=2.5.4
- conda-forge::cpulimit=0.2
- conda-forge::conda-ecosystem-user-package-isolation=1.0
3 changes: 2 additions & 1 deletion seq2science/rules/configuration_explain.smk
Original file line number Diff line number Diff line change
Expand Up @@ -195,8 +195,9 @@ else:
"fastp_PE": f"Paired-end reads were trimmed with {href_v('fastp')}{options('trimoptions')}.",
"chipseeker": f"A peak feature distribution plot and peak localization plot relative to TSS were made with {hyperref('chipseeker')}.", # TODO: replace with href_v
"combine_peaks": f"A consensus set of summits was made with {href_v('gimmemotifs',text='gimmemotifs.combine_peaks',env='gimme')}.",
"gimme_maelstrom": f"Differential peaks analysis on the consensus peakset was performed with {href_v('gimmemotifs',text='gimme maelstrom',env='gimme')}.",
"bed_slop": f"All summits were extended with {config.get('slop')} bp to get a consensus peakset.",
"coverage_table": f"Finally, a count table from the consensus peakset with gimmemotifs.", # already cited in "combine_peaks"
"coverage_table": f"Finally, a count table from the consensus peakset was made with gimmemotifs.coverage_table.", # already cited in "combine_peaks"
"sce": f"{hyperref('sce')} S4 class was used to store scRNA-seq count tables and saved to RDATA format",
"sctk": f"scRNA count post-processing was performed using the {hyperref('sctk')} toolkit.",
}
21 changes: 17 additions & 4 deletions seq2science/rules/configuration_generic.smk
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,11 @@ def parse_samples():
os._exit(1) # noqa

# remove unused columns, and populate empty cells in used columns
samples_df = dense_samples(samples_df, config)
samples_df = dense_samples(samples_df,
config.get("technical_replicates") == "keep",
config.get("biological_replicates") == "keep",
config.get("ignore_strandedness", True),
config.get("create_trackhub", False))

# check if replicate names are unique between assemblies
if "technical_replicates" in samples_df:
Expand Down Expand Up @@ -335,15 +339,21 @@ if "assembly" in samples:
def parse_assemblies():
# list assemblies that are used in this workflow
used_assemblies = list(set(samples["assembly"]))

# we make a temporary _used assemblies as gimme maelstrom might need more assemblies downloaded
# and those assemblies need to be saved in the local/remote assemblies variable
if "motif2factors_reference" in config and config["run_gimme_maelstrom"]:
_used_assemblies = used_assemblies + config["motif2factors_reference"] + config["motif2factors_database_references"]
else:
_used_assemblies = used_assemblies

# dictionary with which providers to use per genome
providers = PickleDict(os.path.join(CACHE_DIR, "providers.p"))

# split into local and remote assemblies, depending on if all required files can be found
annotation_required = "rna_seq" in WORKFLOW or config["aligner"] == "star"
ftype = "annotation" if annotation_required else "genome"
local_assemblies = [a for a in used_assemblies if is_local(a, ftype, config)]
remote_assemblies = [a for a in used_assemblies if a not in local_assemblies]
local_assemblies = [a for a in _used_assemblies if is_local(a, ftype, config)]
remote_assemblies = [a for a in _used_assemblies if a not in local_assemblies]
search_assemblies = [assembly for assembly in remote_assemblies if providers.get(assembly, {}).get(ftype) is None]

# custom assemblies without provider (for local/remote annotations)
Expand Down Expand Up @@ -531,6 +541,9 @@ def any_given(*args, prefix="", suffix=""):
if column_name == "sample":
elements.extend(samples.index)
elif column_name in samples:
if (column_name == "assembly") and ("motif2factors_reference" in config):
elements.extend(config["motif2factors_database_references"])
elements.extend(config["motif2factors_reference"])
elements.extend(samples[column_name])

elements = [prefix + element + suffix for element in elements if isinstance(element, str)]
Expand Down
6 changes: 6 additions & 0 deletions seq2science/rules/configuration_workflows.smk
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,15 @@ if config.get("peak_caller", False):

if "--broad" in config["peak_caller"]["macs2"]:
config["macs2_types"].extend(["peaks.broadPeak", "peaks.gappedPeak"])
config["run_gimme_maelstrom"] = False
else:
config["macs2_types"].extend(["summits.bed", "peaks.narrowPeak"])

if "run_gimme_maelstrom" in config and config["run_gimme_maelstrom"]:
assert len(breps.index) > 1, (
"To run gimme maelstrom you need more than one biological replicate!"
)

# make sure that both maximum and minimum insert sizes are existing when one of them is used
if config.get("min_template_length") and not config.get("max_template_length"):
config["max_template_length"] = 1_000_000_000
Expand Down
2 changes: 1 addition & 1 deletion seq2science/rules/deseq2.smk
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def deseq_input(wildcards):
elif "atac" in WORKFLOW or "chip" in WORKFLOW:
# only uses a single peak caller ------------------------------------------v
# TODO different peak callers can probably be supported with wildcard_constraint peak_caller (.*) <-- empty allowed
return (expand("{counts_dir}/{peak_caller}/{{assembly}}_raw.tsv", **config)[0],)
return (expand("{counts_dir}/{peak_caller}/{{assembly}}_raw_technical_reps.tsv", **config)[0],)
else:
logger.error(
f"The workflow you are running ({WORKFLOW}) does not support deseq2. "
Expand Down
76 changes: 76 additions & 0 deletions seq2science/rules/motif_scan.smk
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
"""
all rules/logic related to motif scanning of peaks is found here.
"""
def get_motif2factors_input_genomes(wildcards):
all_out = []
if str(wildcards.assembly)[0:4] in ["GRCh", "GRCm", "hg19", "hg38", "mm10", "mm39"]:
return all_out

for assembly in config.get("motif2factors_database_references", []) + config.get("motif2factors_reference", []) + [wildcards.assembly]:
all_out.append(expand(f"{{genome_dir}}/{assembly}/{assembly}.annotation.gtf", **config)[0])
all_out.append(expand(f"{{genome_dir}}/{assembly}/{assembly}.fa", **config)[0])
return all_out


rule motif2factors:
"""
Create a gimme pfm/motif2factor file with the gimme motif2factors command.
For human/mouse it just uses the default database, for other species it is based
on TF orthologs.
"""
input:
genome=rules.get_genome.output,
all_genomes=get_motif2factors_input_genomes
output:
expand("{result_dir}/gimme/{{assembly}}.{{gimme_database}}.pfm", **config),
log:
expand("{log_dir}/gimme/motif2factors/{{assembly}}-{{gimme_database}}.log", **config),
benchmark:
expand("{benchmark_dir}/gimme/motif2factors/{{assembly}}-{{gimme_database}}.log", **config)[0],
params:
genomes_dir=config.get("genome_dir"),
database=config.get("gimme_maelstrom_database"),
motif2factors_reference=config.get("motif2factors_reference")
threads: 24
conda:
"../envs/gimme.yaml"
script:
f"{config['rule_dir']}/../scripts/motif2factors.py"


rule gimme_maelstrom:
"""
Gimme maelstrom is a method to infer differential motifs between two or more biological replicates.
"""
input:
genome=rules.get_genome.output,
count_table=expand("{counts_dir}/{{peak_caller}}/{{assembly}}_log2_quantilenorm_biological_reps.tsv", **config),
pfm=rules.motif2factors.output
output:
output=directory(expand("{result_dir}/gimme/maelstrom/{{assembly}}-{{gimme_database}}-{{peak_caller}}", **config)),
cache=temp(directory(expand("{result_dir}/gimme/maelstrom/cache/{{assembly}}-{{gimme_database}}-{{peak_caller}}", **config))),
params: config.get("gimme_maelstrom_params", "")
log:
expand("{log_dir}/gimme_maelstrom/{{assembly}}-{{gimme_database}}-{{peak_caller}}.log", **config),
benchmark:
expand("{benchmark_dir}/gimme_maelstrom/{{assembly}}-{{gimme_database}}-{{peak_caller}}.log", **config)[0],
resources:
mem_gb=40,
message: EXPLAIN["gimme_maelstrom"]
conda:
"../envs/gimme.yaml"
threads: 24
shell:
"""
NEW_CACHE={output.cache}
mkdir -p $NEW_CACHE
if [ -z $XDG_CACHE_HOME ]; then
XDG_CACHE_HOME=$HOME/.cache
fi
cp -r $XDG_CACHE_HOME/gimmemotifs $NEW_CACHE/
export XDG_CACHE_HOME=$NEW_CACHE
""" +
("cpulimit --include-children -l {threads}00 --\\" if config.get("cpulimit", True) else "") +
"""
gimme maelstrom {input.count_table} {input.genome} {output.output} --pfmfile {input.pfm} --nthreads {threads} {params} > {log} 2>&1
"""
81 changes: 57 additions & 24 deletions seq2science/rules/peak_counts.smk
Original file line number Diff line number Diff line change
Expand Up @@ -14,28 +14,40 @@ def count_table_output():
if ftype != "narrowPeak":
return []

return expand(
expanddict = {
"assemblies": ALL_ASSEMBLIES,
"peak_caller": config["peak_caller"].keys(),
"normalization": [
"raw",
f"meancenter_log{config['logbase']}_quantilenorm",
f"meancenter_log{config['logbase']}_TMM",
f"meancenter_log{config['logbase']}_RLE",
f"meancenter_log{config['logbase']}_upperquartile",
],
"mc": ["", "meancenter_"],
}

count_tables = expand(
[
"{counts_dir}/{peak_caller}/{assemblies}_{normalization}.tsv",
"{counts_dir}/{peak_caller}/{assemblies}_{normalization}_technical_reps.tsv",
"{counts_dir}/{peak_caller}/{assemblies}_onehotpeaks.tsv",
],
**{
**config,
**{
"assemblies": ALL_ASSEMBLIES,
"peak_caller": config["peak_caller"].keys(),
"normalization": [
"raw",
f"meancenter_log{config['logbase']}_quantilenorm",
f"meancenter_log{config['logbase']}_TMM",
f"meancenter_log{config['logbase']}_RLE",
f"meancenter_log{config['logbase']}_upperquartile",
],
"mc": ["", "meancenter_"],
},
},
**{**config, **expanddict}
)
if breps is not treps:
expanddict["normalization"] = [
f"log{config['logbase']}_quantilenorm",
f"log{config['logbase']}_TMM",
f"log{config['logbase']}_RLE",
]
count_tables.extend(
expand(
"{counts_dir}/{peak_caller}/{assemblies}_{normalization}_biological_reps.tsv",
**{**config, **expanddict}
)
)

return count_tables

def get_peakfile_for_summit(wildcards):
ftype = get_peak_ftype(wildcards.peak_caller)
Expand Down Expand Up @@ -167,7 +179,7 @@ rule coverage_table:
replicates=get_coverage_table_replicates("bam"),
replicate_bai=get_coverage_table_replicates("bam.bai"),
output:
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_raw.tsv", **config),
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_raw_technical_reps.tsv", **config),
log:
expand("{log_dir}/coverage_table/{{assembly}}-{{peak_caller}}.log", **config),
benchmark:
Expand Down Expand Up @@ -209,7 +221,7 @@ rule quantile_normalization:
input:
rules.coverage_table.output,
output:
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_quantilenorm.tsv", **config),
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_quantilenorm_technical_reps.tsv", **config),
log:
expand("{log_dir}/quantile_normalization/{{assembly}}-{{peak_caller}}-quantilenorm.log", **config),
benchmark:
Expand All @@ -234,7 +246,7 @@ rule edgeR_normalization:
input:
rules.coverage_table.output,
output:
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_{{normalisation,(TMM|RLE|upperquartile)}}.tsv", **config),
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_{{normalisation,(TMM|RLE|upperquartile)}}_technical_reps.tsv", **config),
log:
expand("{log_dir}/edgeR_normalization/{{assembly}}-{{peak_caller}}-{{normalisation}}.log", **config),
benchmark:
Expand All @@ -254,9 +266,9 @@ rule log_normalization:
Log1p normalization of a count table.
"""
input:
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_{{normalisation}}.tsv", **config),
rules.edgeR_normalization.output
output:
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_log{{base}}_{{normalisation}}.tsv", **config),
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_log{{base}}_{{normalisation}}_technical_reps.tsv", **config),
run:
import pandas as pd
import numpy as np
Expand Down Expand Up @@ -287,9 +299,9 @@ rule mean_center:
Mean centering of a count table.
"""
input:
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_log{{base}}_{{normalisation}}.tsv", **config),
rules.log_normalization.output
output:
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_meancenter_log{{base}}_{{normalisation}}.tsv", **config),
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_meancenter_log{{base}}_{{normalisation}}_technical_reps.tsv", **config),
run:
import pandas as pd

Expand All @@ -311,6 +323,27 @@ rule mean_center:
)


rule combine_biological_reps:
"""
Combine biological replicates by taking their mean!
"""
input:
rules.log_normalization.output
output:
expand("{counts_dir}/{{peak_caller}}/{{assembly}}_log{{base}}_{{normalisation}}_biological_reps.tsv", **config),
log:
expand("{log_dir}/combine_biological_reps/{{peak_caller}}/{{assembly}}_log{{base}}_{{normalisation}}_biological_reps.log", **config),
benchmark:
expand("{benchmark_dir}/combine_biological_reps/{{peak_caller}}/{{assembly}}_log{{base}}_{{normalisation}}_biological_reps.benchmark.txt", **config)[0]
params:
samples=lambda wildcards: samples[samples["assembly"] == wildcards.assembly].replace(' ', '_', regex=True).to_string(index_names=False),
breps=lambda wildcards: breps[breps["assembly"] == wildcards.assembly].index.to_list(),
technical_replicates=config.get("technical_replicates") == "keep",
biological_replicates=config.get("biological_replicates") == "keep",
script:
f"{config['rule_dir']}/../scripts/combine_biological_reps.py"


def get_all_narrowpeaks(wildcards):
return [
f"{config['result_dir']}/{{peak_caller}}/{{assembly}}-{replicate}_peaks.narrowPeak"
Expand Down
8 changes: 8 additions & 0 deletions seq2science/rules/qc.smk
Original file line number Diff line number Diff line change
Expand Up @@ -1036,4 +1036,12 @@ def get_peak_calling_qc(sample, wildcards):
)
)

if config["run_gimme_maelstrom"]:
output.extend(
expand(
"{qc_dir}/gimme/{{assembly}}-{gimme_maelstrom_database}-{peak_caller}_mqc.html",
**config
)
)

return output
14 changes: 14 additions & 0 deletions seq2science/rules/qc_peaks.smk
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,17 @@ rule upset_plot_peaks:
"../envs/upset.yaml"
script:
f"{config['rule_dir']}/../scripts/upset.py"


rule maelstrom_report_preparation:
"""
This rule injects the symlinked motif logos into the html so they get rendered in the multiqc report.
"""
input:
rules.gimme_maelstrom.output
output:
expand("{qc_dir}/gimme/{{assembly}}-{{gimme_database}}-{{peak_caller}}_mqc.html", **config)
log:
expand("{log_dir}/maelstrom_report_preparation/{{assembly}}-{{gimme_database}}-{{peak_caller}}.log", **config),
script:
f"{config['rule_dir']}/../scripts/maelstrom_report.py"
Loading