Skip to content

Commit

Permalink
review
Browse files Browse the repository at this point in the history
  • Loading branch information
uniqueg committed Feb 3, 2024
1 parent b268328 commit f715ccc
Show file tree
Hide file tree
Showing 12 changed files with 21 additions and 70 deletions.
29 changes: 19 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,17 +269,20 @@ your run.
# Sample downloads from SRA
An independent Snakemake workflow `workflow/rules/sra_download.smk` is included
for the download of SRA samples.
for the download of sequencing libraries from the Sequence Read Archive and
conversion into FASTQ.
The workflow expects the following config:
* `samples`, a sample table (tsv) with column *sample* containing *RR* identifiers,
see example [here](tests/input_files/sra_samples.tsv).
The workflow expects the following parameters in the configuration file:
* `samples`, a sample table (tsv) with column *sample* containing *SRR*
identifiers (ERR and DRR are also supported), see
[example](tests/input_files/sra_samples.tsv).
* `outdir`, an output directory
* `samples_out`, a pointer to a modified sample table with location of fastq files
* `samples_out`, a pointer to a modified sample table with the locations of
the corresponding FASTQ files
* `cluster_log_dir`, the cluster log directory.
For executing the example one can use the following conda execution
(with activated *zarp* environment):
For executing the example with Conda environments, one can use the following
command (from within the activated `zarp` Conda environment):
```bash
snakemake --snakefile="workflow/rules/sra_download.smk" \
Expand All @@ -290,10 +293,16 @@ snakemake --snakefile="workflow/rules/sra_download.smk" \
log_dir="logs" \
cluster_log_dir="logs/cluster_log"
```
or the singularity one by replacing ```local-conda``` with ```local-singularity```
After successful execution, `results/sra_downloads/sra_samples.out.tsv` should contain:
Alternatively, change the argument to `--profile` from `local-conda` to
`local-singularity` to execute the workflow steps within Singularity
containers.
After successful execution, `results/sra_downloads/sra_samples.out.tsv` should
contain:
```tsv
ssample fq1 fq2
sample fq1 fq2
SRR18552868 results/sra_downloads/compress/SRR18552868/SRR18552868.fastq.gz
SRR18549672 results/sra_downloads/compress/SRR18549672/SRR18549672_1.fastq.gz results/sra_downloads/compress/SRR18549672/SRR18549672_2.fastq.gz
ERR2248142 results/sra_downloads/compress/ERR2248142/ERR2248142.fastq.gz
Expand Down
Binary file removed pigz_latest.sif
Binary file not shown.
4 changes: 1 addition & 3 deletions tests/test_htsinfer_workflow/test.local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,4 @@ snakemake \
--keep-incomplete

# Check md5 sum of some output files
#find results/ -type f -name \*\.gz -exec gunzip '{}' \;
#find results/ -type f -name \*\.zip -exec sh -c 'unzip -o {} -d $(dirname {})' \;
md5sum --check "expected_output.md5"
md5sum --check "expected_output.md5"
1 change: 0 additions & 1 deletion tests/test_integration_workflow/test.local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -83,4 +83,3 @@ diff \
diff \
<(cat results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/synthetic_10_reads_paired_synthetic_10_reads_paired.salmon.pe/quant.genes.sf | cut -f1,5 | tail -n +2 | sort -k1,1) \
<(cat ../input_files/synthetic.mate_1.bed | cut -f7 | sort | uniq -c | sort -k2nr | awk '{printf($2"\t"$1"\n")}')

2 changes: 1 addition & 1 deletion tests/test_integration_workflow/test.slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,4 @@ diff \
<(cat ../input_files/synthetic.mate_1.bed | cut -f7 | sort | uniq -c | sort -k2nr | awk '{printf($2"\t"$1"\n")}')
diff \
<(cat results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/synthetic_10_reads_paired_synthetic_10_reads_paired.salmon.pe/quant.genes.sf | cut -f1,5 | tail -n +2 | sort -k1,1) \
<(cat ../input_files/synthetic.mate_1.bed | cut -f7 | sort | uniq -c | sort -k2nr | awk '{printf($2"\t"$1"\n")}')
<(cat ../input_files/synthetic.mate_1.bed | cut -f7 | sort | uniq -c | sort -k2nr | awk '{printf($2"\t"$1"\n")}')
11 changes: 0 additions & 11 deletions tests/test_integration_workflow/test.temp.flag.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,17 +41,6 @@ find results/ -type f -name \*\.gz -exec gunzip '{}' \;
find results/ -type f -name \*\.zip -exec sh -c 'unzip -o {} -d $(dirname {})' \;
md5sum --check "expected_output_temp_flag.md5"

# Checksum file generated with
# find results/ \
# -type f \
# -name \*\.gz \
# -exec gunzip '{}' \;
# find results/ \
# -type f \
# -name \*\.zip \
# -exec sh -c 'unzip -o {} -d $(dirname {})' \;
# md5sum $(cat expected_output.files) > expected_output_temp_flag.md5

# Check whether STAR produces expected alignments
# STAR alignments need to be fully within ground truth alignments for tests to pass; not checking
# vice versa because processing might cut off parts of reads (if testing STAR directly, add '-f 1'
Expand Down
3 changes: 0 additions & 3 deletions tests/test_integration_workflow_multiple_lanes/test.local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,6 @@ find results/ -type f -name \*\.gz -exec gunzip '{}' \;
find results/ -type f -name \*\.zip -exec sh -c 'unzip -o {} -d $(dirname {})' \;
md5sum --check "expected_output.md5"



# Check whether STAR produces expected alignments
# STAR alignments need to be fully within ground truth alignments for tests to pass; not checking
# vice versa because processing might cut off parts of reads (if testing STAR directly, add '-f 1'
Expand Down Expand Up @@ -74,4 +72,3 @@ diff \
diff \
<(cat results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/synthetic_10_reads_paired_synthetic_10_reads_paired.salmon.pe/quant.genes.sf | cut -f1,5 | tail -n +2 | sort -k1,1) \
<(cat ../input_files/synthetic.mate_1.bed | cut -f7 | sort | uniq -c | sort -k2nr | awk '{printf($2"\t"$1"\n")}')

13 changes: 0 additions & 13 deletions tests/test_integration_workflow_multiple_lanes/test.slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,6 @@ find results/ -type f -name \*\.gz -exec gunzip '{}' \;
find results/ -type f -name \*\.zip -exec sh -c 'unzip -o {} -d $(dirname {})' \;
md5sum --check "expected_output.md5"

# Checksum file generated with
# find results/ \
# -type f \
# -name \*\.gz \
# -exec gunzip '{}' \;
# find results/ \
# -type f \
# -name \*\.zip \
# -exec sh -c 'unzip -o {} -d $(dirname {})' \;
# md5sum $(cat expected_output.files) > expected_output.md5

# Check whether STAR produces expected alignments
# STAR alignments need to be fully within ground truth alignments for tests to pass; not checking
# vice versa because processing might cut off parts of reads (if testing STAR directly, add '-f 1'
Expand Down Expand Up @@ -83,5 +72,3 @@ diff \
diff \
<(cat results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/synthetic_10_reads_paired_synthetic_10_reads_paired.salmon.pe/quant.genes.sf | cut -f1,5 | tail -n +2 | sort -k1,1) \
<(cat ../input_files/synthetic.mate_1.bed | cut -f7 | sort | uniq -c | sort -k2nr | awk '{printf($2"\t"$1"\n")}')


12 changes: 0 additions & 12 deletions tests/test_integration_workflow_with_conda/test.local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,6 @@ find results/ -type f -name \*\.gz -exec gunzip '{}' \;
find results/ -type f -name \*\.zip -exec sh -c 'unzip -o {} -d $(dirname {})' \;
md5sum --check "expected_output.md5"

# Checksum file generated with
#find results/ \
# -type f \
# -name \*\.gz \
# -exec gunzip '{}' \;
#find results/ \
# -type f \
# -name \*\.zip \
# -exec sh -c 'unzip -o {} -d $(dirname {})' \;
#md5sum $(cat expected_output.files) > expected_output.md5

# Check whether STAR produces expected alignments
# STAR alignments need to be fully within ground truth alignments for tests to pass; not checking
# vice versa because processing might cut off parts of reads (if testing STAR directly, add '-f 1'
Expand Down Expand Up @@ -83,4 +72,3 @@ diff \
diff \
<(cat results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/synthetic_10_reads_paired_synthetic_10_reads_paired.salmon.pe/quant.genes.sf | cut -f1,5 | tail -n +2 | sort -k1,1) \
<(cat ../input_files/synthetic.mate_1.bed | cut -f7 | sort | uniq -c | sort -k2nr | awk '{printf($2"\t"$1"\n")}')

13 changes: 0 additions & 13 deletions tests/test_integration_workflow_with_conda/test.slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,6 @@ find results/ -type f -name \*\.gz -exec gunzip '{}' \;
find results/ -type f -name \*\.zip -exec sh -c 'unzip -o {} -d $(dirname {})' \;
md5sum --check "expected_output.md5"

# Checksum file generated with
# find results/ \
# -type f \
# -name \*\.gz \
# -exec gunzip '{}' \;
# find results/ \
# -type f \
# -name \*\.zip \
# -exec sh -c 'unzip -o {} -d $(dirname {})' \;
# md5sum $(cat expected_output.files) > expected_output.md5

# Check whether STAR produces expected alignments
# STAR alignments need to be fully within ground truth alignments for tests to pass; not checking
# vice versa because processing might cut off parts of reads (if testing STAR directly, add '-f 1'
Expand Down Expand Up @@ -83,5 +72,3 @@ diff \
diff \
<(cat results/samples/synthetic_10_reads_paired_synthetic_10_reads_paired/synthetic_10_reads_paired_synthetic_10_reads_paired.salmon.pe/quant.genes.sf | cut -f1,5 | tail -n +2 | sort -k1,1) \
<(cat ../input_files/synthetic.mate_1.bed | cut -f7 | sort | uniq -c | sort -k2nr | awk '{printf($2"\t"$1"\n")}')


2 changes: 0 additions & 2 deletions tests/test_sra_download_with_conda/test.local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@ snakemake --snakefile="../../workflow/rules/sra_download.smk" \
log_dir="logs" \
cluster_log_dir="logs/cluster_log"



# Check md5 sum of some output files
find results/ -type f -name \*\.gz -exec gunzip '{}' \;
find results/ -type f -name \*\.zip -exec sh -c 'unzip -o {} -d $(dirname {})' \;
Expand Down
1 change: 0 additions & 1 deletion tests/test_sra_download_with_conda/test.slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ snakemake --snakefile="../../workflow/rules/sra_download.smk" \
samples_out="results/sra_downloads/sra_samples.out.tsv" \
log_dir="logs" \
cluster_log_dir="logs/cluster_log"


# Check md5 sum of some output files
find results/ -type f -name \*\.gz -exec gunzip '{}' \;
Expand Down

0 comments on commit f715ccc

Please sign in to comment.