Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Image #15

Open
ksarathbabu opened this issue Jan 30, 2020 · 44 comments
Open

Docker Image #15

ksarathbabu opened this issue Jan 30, 2020 · 44 comments

Comments

@ksarathbabu
Copy link

Is there a Docker image available for RUFUS?

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Jan 31, 2020 via email

@moldach
Copy link

moldach commented Jan 29, 2021

Haven't heard back about building from source for two weeks now #18, so I've tried to build a Docker image.

I'm getting errors related to CMakeLists.txt but not sure how to fix it:

Dockerfile

FROM ubuntu:latest


LABEL \
        version="1.0.0" \
        description="RUFUS image for use in Workflows"

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update && \
apt-get install -y apt-utils \
libz-dev \
zlib1g-dev \
libbz2-dev \
liblzma-dev \
bc \
libncurses5-dev \
git \
build-essential \
g++ \
python \
gcc \
mono-mcs \
wget \
cmake

RUN mkdir -p /opt/tools
WORKDIR /opt/tools


# Download Bedtools 2.27.1
ENV VERSION 1.0
ENV NAME RUFUS
ENV URL "https://github.com/jandrewrfarrell/RUFUS/archive/V${VERSION}.tar.gz"
RUN wget -q -O - $URL | tar -zxv && \
cd ${NAME}-${VERSION} && \
mkdir build && cd build && \
cmake ../ -DCMAKE_C_COMPILER=$(which gcc) -DCMAKE_CXX_COMPILER=$(which g++) && \
make

docker build

sudo docker build -t rufus-v1.0 .

Most of the build seems to go okay, until the bwa part where I get the following error (truncated for convenience):

Error:

CMake Error: The source directory "/opt/tools/RUFUS-1.0/build/external" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.
make[2]: *** [externals/CMakeFiles/rufus_external_project.dir/build.make:111: externals/rufus_external_project-prefix/src/rufus_external_project-stamp/rufus_external_project-configure] Error 1
make[1]: *** [CMakeFiles/Makefile2:169: externals/CMakeFiles/rufus_external_project.dir/all] Error 2
make: *** [Makefile:84: all] Error 2
The command '/bin/sh -c wget -q -O - $URL | tar -zxv && cd ${NAME}-${VERSION} && mkdir build && cd build && cmake ../ -DCMAKE_C_COMPILER=$(which gcc) -DCMAKE_CXX_COMPILER=$(which g++) && make' returned a non-zero code: 2

Any advice would be greatly appreciated.

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Jan 29, 2021 via email

@moldach
Copy link

moldach commented Jan 30, 2021

I would recommend just building on a similar system with internet connection, then zipping the whole RUFUS dir and moving it to the secure system.

Can I use a version of gcc > 4.9.2? Or, can it only be built on gcc-4.9.2?

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Jan 30, 2021 via email

@kohrar
Copy link

kohrar commented Feb 8, 2021

I have created a Dockerfile for RUFUS as pull request #20. I had no issues with the build process when using the dependencies outlined in the README.

@moldach
Copy link

moldach commented Feb 10, 2021

I have created a Dockerfile for RUFUS as pull request #20. I had no issues with the build process when using the dependencies outlined in the README.

Hi @kohrar I really appreciate the attempt. Could you please provide some instructions for how you would run it (Docker is fine; I can figure out how to run Singularity from that)?

I pulled #20, built a container from your Dockerfile and pushed it to moldach686/rufus-v1.0.

$ git pull https://github.com/kohrar/RUFUS.git
$ cd RUFUS
$ sudo docker build -t rufus-v1.0 .
$ sudo docker tag rufus-v1.0:latest moldach686/rufus-v1.0:latest
$ sudo docker push moldach686/rufus-v1.0:latest

Next, I build a Singularity image within the RUFUS directory:

$ sudo singularity build rufus.sif docker://moldach686/rufus-v1.0:latest
 ll
total 356576
drwxrwxr-x  8 mtg mtg      4096 Feb  9 20:51 ./
drwxrwxr-x 12 mtg mtg      4096 Feb  9 21:38 ../
-rw-rw-r--  1 mtg mtg       614 Feb  9 18:26 CMakeLists.txt
-rw-rw-r--  1 mtg mtg       757 Feb  9 18:26 Dockerfile
drwxrwxr-x  3 mtg mtg      4096 Feb  9 18:26 externals/
drwxrwxr-x  8 mtg mtg      4096 Feb  9 18:26 .git/
-rw-rw-r--  1 mtg mtg       126 Feb  9 18:26 .gitignore
-rw-rw-r--  1 mtg mtg      6289 Feb  9 18:26 README.md
drwxrwxr-x  4 mtg mtg      4096 Feb  9 18:26 resources/
-rwxr-xr-x  1 mtg mtg 364978176 Feb  9 20:51 rufus.sif*
-rwxrwxr-x  1 mtg mtg     37610 Feb  9 18:26 runRufus.sh*
drwxrwxr-x  3 mtg mtg     12288 Feb  9 18:26 scripts/
drwxrwxr-x  5 mtg mtg      4096 Feb  9 18:26 src/
drwxrwxr-x  2 mtg mtg      4096 Feb  9 18:26 testRun/

Then I tar this folder and transfer it the cluster where I need to use Singularity

$ cd .. && tar czvf rufus.tar.gz RUFUS/
$ scp <command>

Finally, I try running the following command:

[ -d rufus-analysis ] || mkdir rufus-analysis && cd rufus-analysis && \

singularity -s exec \
-B /project/M-mtgraovac182840/matthew/tool-testing/MTG_oldPipeScript/alignment/bwa/:/usr/lib/locale/ \
-B /project/M-mtgraovac182840/indexes/GRCh37/:/usr/lib/locale/index/ \
-B /project/M-mtgraovac182840/tools/RUFUS/:/usr/lib/locale/RUFUS/ \
/project/M-mtgraovac182840/tools/RUFUS/rufus.sif /usr/lib/locale/RUFUS/runRufus.sh \
-s /usr/lib/locale/proband_bwaMEM_sort_dedupped.bam \
-c /usr/lib/locale/mom_bwaMEM_sort_dedupped.bam \
-c /usr/lib/locale/dad_bwaMEM_sort_dedupped.bam \
-t 2 \
--kmersize 25 \
--ref=/usr/lib/locale/index/Homo_sapiens.GRCh37.dna.toplevel.fa

However, I'm getting the following error:

/usr/lib/locale/RUFUS/scripts/RunJellyForRUFUS.sh: line 29: /usr/lib/locale/RUFUS/scripts/..//bin/externals/jellyfish/src/jellyfish_project/bin/jellyfish: No such file or directory`)

@kohrar
Copy link

kohrar commented Feb 10, 2021

Hi Matthew,

I've only done a cursory test with RUFUS after building the image and then running it with docker run --rm -ti rufus bash. I've tried this with Singularity but since the root filesystem is read-only, I had to copy /RUFUS out to a different location or use bind mounts as you've attempted.

Regarding your issue, it looks like you're running RUFUS from a bind mount at /usr/lib/locale/RUFUS sourced from /project/M-mtgraovac182840/tools/RUFUS/ rather than using the compiled binaries within the container at /RUFUS.

Did you copy the entire /RUFUS directory out of the container to this location? Can you verify that jellyfish does indeed exist at your project directory at /project/M-mtgraovac182840/tools/RUFUS/bin/externals/jellyfish/src/jellyfish_project/bin/jellyfish?

@moldach
Copy link

moldach commented Feb 17, 2021

it looks like you're running RUFUS from a bind mount at /usr/lib/locale/RUFUS sourced from /project/M-mtgraovac182840/tools/RUFUS/ rather than using the compiled binaries within the container at /RUFUS

So this part of my call was wrong: /project/M-mtgraovac182840/tools/RUFUS/rufus.sif /usr/lib/locale/RUFUS/runRufus.sh \.

However, when I change it to the following I still get an error:

singularity -s exec \
    -B /project/M-mtgraovac182840/matthew/tool-testing/MTG_oldPipeScript/alignment/bwa/:/usr/lib/locale/ \
    -B /project/M-mtgraovac182840/indexes/GRCh37/:/usr/lib/locale/index/ /project/M-mtgraovac182840/tools/rufus.sif \
    ./RUFUS/runRufus.sh \
    -s /usr/lib/locale/proband_bwaMEM_sort_dedupped.bam \
    -c /usr/lib/locale/mom_bwaMEM_sort_dedupped.bam \
    -c /usr/lib/locale/dad_bwaMEM_sort_dedupped.bam \
    -t 2 \
    --kmersize 25 \
    --ref=/usr/lib/locale/index/Homo_sapiens.GRCh37.dna.toplevel.fa

FATAL: stat /home/moldach/RUFUS/runRufus.sh: no such file or directory

It's almost like the I cannot see inside the container?

Let's take a look inside the Docker image:

Can see contents of container in Docker!

$ sudo docker run --rm -it rufus ls
RUFUS  boot  etc   lib	  media  opt   root  sbin  sys	usr
bin    dev   home  lib64  mnt	 proc  run   srv   tmp	var

Cannot see contents of container in Singularity?

Here it is showing the contents of my $PWD and not the contents of the container

$ singularity exec rufus.sif ls
Desktop    Public                                  mom_bwaMEM_sort_dedupped.bam.generator.Jhash.temp
Documents  Templates                               mom_bwaMEM_sort_dedupped.bam.generator.fq
Downloads  Videos                                  proband_bwaMEM_sort_dedupped.bam.generator
Music      dad_bwaMEM_sort_dedupped.bam.generator
Pictures   mom_bwaMEM_sort_dedupped.bam.generator

@moldach
Copy link

moldach commented Feb 17, 2021

One issue is that I need to supply -B $PWD and prefix the /RUFUS subdirectory, like so $ singularity exec -B $PWD rufus.sif $PWD/RUFUS/runRufus.sh.

However, I'm getting a libjellyfish-2.0.so.2 error when I try to run the command:

$ singularity -s exec -B /project/M-mtgraovac182840/matthew/tool-testing/MTG_oldPipeScript/alignment/bwa/:/usr/lib/locale/ -B /project/M-mtgraovac182840/indexes/GRCh37/:/usr/lib/locale/index/ -B $PWD /project/M-mtgraovac182840/tools/rufus.sif $PWD/RUFUS/runRufus.sh -s /usr/lib/locale/proband_bwaMEM_sort_dedupped.bam -c /usr/lib/locale/mom_bwaMEM_sort_dedupped.bam -c /usr/lib/locale/dad_bwaMEM_sort_dedupped.bam -t 2 --kmersize 25 --ref=/usr/lib/locale/index/Homo_sapiens.GRCh37.dna.toplevel.fa                                     checking for samtools
/usr/bin/samtools
samtools found
_arg_fastqA =
_arg_fastqB =
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Final reference path being used is /usr/lib/locale/index/Homo_sapiens.GRCh37.dna.toplevel.fa
Final bwa reference path being used is /usr/lib/locale/index/Homo_sapiens.GRCh37.dna.toplevel.fa
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
proband extension is bam
you provided the proband cram file /usr/lib/locale/proband_bwaMEM_sort_dedupped.bam
parent file name is mom_bwaMEM_sort_dedupped.bam
parent file extension name is bam
You provided the control bam file /usr/lib/locale/mom_bwaMEM_sort_dedupped.bam
parent file name is dad_bwaMEM_sort_dedupped.bam
parent file extension name is bam
You provided the control bam file /usr/lib/locale/dad_bwaMEM_sort_dedupped.bam
~~~~~~~~~~~~ printing out paramater values used in script ~~~~~~~~~~~~~~~~
value of ProbandGenerator proband_bwaMEM_sort_dedupped.bam.generator
Value of ParentGenerators:
 mom_bwaMEM_sort_dedupped.bam.generator
 dad_bwaMEM_sort_dedupped.bam.generator
Value of K is: 25
Value of Threads is: 2
value of ref is: /usr/lib/locale/index/Homo_sapiens.GRCh37.dna.toplevel.fa
value of min is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did not provide refHash
$_arg_min is empty
_arg_min is
MutantMinCov is
parent is  mom_bwaMEM_sort_dedupped.bam.generator
parent is  dad_bwaMEM_sort_dedupped.bam.generator
Running jellyfish for mom_bwaMEM_sort_dedupped.bam.generator
/project/M-mtgraovac182840/tools/RUFUS/bin/externals/jellyfish/src/jellyfish_project/bin/jellyfish: error while loading shared libraries: libjellyfish-2.0.so.2: cannot open shared object file: No such file or directory

@moldach
Copy link

moldach commented Feb 17, 2021

Hi @kohrar

I've only done a cursory test with RUFUS after building the image and then running it with docker run --rm -ti rufus bash

Did you not try and run /testRun/runTest.sh within the Docker container to verify that it runs?

I've tried the following but there appears to be issues:

(base) mtg@mtg-ThinkPad-P53:~/DOCKER-CONTAINERS/RUFUS$ sudo docker run --rm -it rufus-v1.0 
root@8f459380a65e:/# ls
RUFUS  boot  etc   lib    media  opt   root  sbin  sys  usr
bin    dev   home  lib64  mnt    proc  run   srv   tmp  var
root@8f459380a65e:/# bash RUFUS/testRun/runTest.sh 
RUFUS/testRun/runTest.sh: line 1: ./../runRufus.sh: No such file or directory

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Feb 17, 2021 via email

@kohrar
Copy link

kohrar commented Feb 17, 2021

Hi @moldach,

I ran the runTest script without issue within the Singularity image. I am not mounting an external version of RUFUS into the image as you are doing, which I think is leading to your issues. See my usage below:

% singularity run -H `pwd` /global/software/singularity/images/software/rufus.sif

Singularity> cd /
Singularity> ls
RUFUS  bin  boot  bulk  dev  environment  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  singularity  srv  sys  tmp  usr  var


## Here, I copy RUFUS out to somewhere where I can write to as / in singularity is read-only.
Singularity> cp -r RUFUS /tmp
Singularity> cd /tmp/RUFUS/testRun/
Singularity> sh runTest.sh
checking for samtools
/usr/bin/samtools
samtools found
...
MutantMinCov is
parent is  Mother.bam.generator
parent is  Father.bam.generator
Running jellyfish for Mother.bam.generator
...

Regarding your library issue, this is what should be loaded. Everything is within the image and not from some external bind mount.

Singularity> ldd /tmp/RUFUS/bin/externals/jellyfish/src/jellyfish_project/bin/jellyfish
        linux-vdso.so.1 =>  (0x00007fff654f9000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f91c17dd000)
        libjellyfish-2.0.so.2 => /RUFUS/bin/externals/jellyfish/src/jellyfish_project/lib/libjellyfish-2.0.so.2 (0x00007f91c15b2000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f91c1230000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f91c0f27000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f91c0d11000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f91c0947000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f91c19fa000)

@moldach
Copy link

moldach commented Feb 18, 2021

Hello @kohrar thank you very much for providing that detailed explanation and reproducible example!

I am not mounting an external version of RUFUS into the image as you are doing
-H $PWD turned out to be critical for this.

I'm fairly new to using Singularity and in the past, for example with Manta (and DeepVariant) I was using -B $PWD when faced with a function parameter that asked for --runDir:

singularity exec \
    -B /project/M-mtgraovac182840/matthew/tool-testing/MTG_human_genomics_pipeline-master/alignment/bwa/:/bams,/project/M-mtgraovac182840/indexes/GRCh37/:/reference \
    -B $PWD \
    /project/M-mtgraovac182840/tools/manta-1.6.0.img \
    /manta/bin/configManta.py \
    --bam /bams/proband_bwaMEM_sort_dedupped.bam \
    --referenceFasta /reference/Homo_sapiens.GRCh37.dna.toplevel.fa \
    --runDir $PWD

So, the fact that RUFUS doesn't ask for a output directory (and instead prints to the $PWD) caused the -B $PWD solution to fail.

As a word of caution for others, running the test on the singularity container failed for me; however - and luckily - it does work on my data 🥳

Working solution

singularity -s exec \
    -B /project/M-mtgraovac182840/matthew/tool-testing/MTG_oldPipeScript/alignment/bwa/:/usr/lib/locale/ \
    -B /project/M-mtgraovac182840/indexes/GRCh37/:/usr/lib/locale/index/ \
    -H `pwd` \
    /project/M-mtgraovac182840/tools/rufus.sif \
    ./RUFUS/runRufus.sh \
    -s /usr/lib/locale/proband_bwaMEM_sort_dedupped.bam \
    -c /usr/lib/locale/mom_bwaMEM_sort_dedupped.bam \
    -c /usr/lib/locale/dad_bwaMEM_sort_dedupped.bam \
    -t 2 \
    --kmersize 25 \
    --ref=/usr/lib/locale/index/Homo_sapiens.GRCh37.dna.toplevel.fa

Error on testRun

## test asks for 40 cores but we will just ask for 30
$ salloc --time=0:30:0 --mem-per-cpu=5000 --cpus-per-task=30
$ singularity run -H `pwd` rufus.sif
Singularity> cd /
Singularity> cp -r RUFUS /tmp
Singularity> cd /tmp/RUFUS/testRun/
Singularity> sh runTest.sh
checking for samtools
/usr/bin/samtools
samtools found
_arg_fastqA = 
_arg_fastqB = 
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Final reference path being used is /tmp/RUFUS/testRun/../resources/references/small_test_human_reference_v37_decoys.fa
Final bwa reference path being used is /tmp/RUFUS/testRun/../resources/references/small_test_human_reference_v37_decoys.fa
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
proband extension is bam
you provided the proband cram file /tmp/RUFUS/testRun/Child.bam
parent file name is Mother.bam
parent file extension name is bam
You provided the control bam file /tmp/RUFUS/testRun/Mother.bam
parent file name is Father.bam
parent file extension name is bam
You provided the control bam file /tmp/RUFUS/testRun/Father.bam
~~~~~~~~~~~~ printing out paramater values used in script ~~~~~~~~~~~~~~~~
value of ProbandGenerator Child.bam.generator
Value of ParentGenerators:
 Mother.bam.generator
 Father.bam.generator
Value of K is: 25
Value of Threads is: 40
value of ref is: /tmp/RUFUS/testRun/../resources/references/small_test_human_reference_v37_decoys.fa
value of min is: 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did not provide refHash
$_arg_min is empty
_arg_min is 
MutantMinCov is 
parent is  Mother.bam.generator 
parent is  Father.bam.generator 
Running jellyfish for Mother.bam.generator
Running jellyfish for Father.bam.generator
Running jellyfish for Child.bam.generator
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LANG = "en_CA.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LANG = "en_CA.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LANG = "en_CA.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
min not provided, building model
staring model
Call is histoFile HS ReadLength Threads
Parent File open - Child.bam.generator.Jhash.histo
first line = 0 - 0
getting another 
got 1 - 0
getting another 
got 2 - 1089
going with 2 - 1089
Number of reads = 3630
I = 0 	 0
I = 1 	 1089
I = 2 	 107
I = 3 	 31
I = 4 	 44
I = 5 	 21
I = 6 	 27
I = 7 	 54
I = 8 	 129
I = 9 	 98
SC = 25 vlaue = 1135
stdi = 31 stdev = 6
best error is 1/x^3.34726
On 1 pass
	 best Factor = 0 steps = 12
		bestSC = 27.4356 steps = 4
		best StdDev = 6.44494 steps = 3
		best skew factor = 0 steps = 0
		best Power factor = 1 steps = 3
On 2 pass
	 best Factor = 0 steps = 12
		bestSC = 26.9935 steps = 5
		best StdDev = 6.36922 steps = 3
		best skew factor = 0 steps = 0
		best Power factor = 1 steps = 3
On 3 pass
	 best Factor = 0 steps = 12
		bestSC = 27.0027 steps = 5
		best StdDev = 6.6012 steps = 3
		best skew factor = 0 steps = 0
		best Power factor = 1 steps = 3
Best Model is SC = 27.0027 StdDev = 6.6012 F = 0 skew = 0 bestP = 1
GenomeSize = 31411.6
prob not error = 0.0116117
prob not error = 0.136157
prob not error = 0.441105
prob not error = 0.722617
prob not error = 0.871386
prob not error = 0.937632
this one
GenomeSize = 31411.6
Inflection point = 3
Recomended RUFUS cutoff = -6.00331
-1std = 20.4015	-2std = 13.8003	-3std = 7.1991	-4std = 0.597893
done with model
mutant min coverage from generated model is 5
mutant SC coverage from generated model is 25
MaxHashDepth = 125
made it 
made it here 
starting RUFUS filter
_arg_fastqA = 
_arg_fastqB = 
running this one 
Call is PreBuiltMutHash Mutant.Mate1.fq Mutant.Mate2.fq firstpassfile hashsize MinQ HashCountThreshold threads 
VM: 19756; RSS: 2564
Paramaters are:
	PreBuiltMutHash = Child.bam.generator.k25_c5.HashList
	Mutant.mate1.fq = Child.bam.generator.temp.mate1.fastq
	Mutant.mate2.fq = Child.bam.generator.temp.mate2.fastq
	out stub = Child.bam.generator
	HashSize = 25
	MinQ = 13
	HashCountThreshold = 1
	Threads = 38
Parent File open - Child.bam.generator.k25_c5.HashList
MutFile.mate1 is Child.bam.generator.temp.mate1.fastq
here 
##File Opend
MutFile.mate2 is Child.bam.generator.temp.mate2.fastq
##File Opend
Reading in pre-built hash talbe
starting 
	Reading in MutHashFile
Done Hash Files
	 Mutations Hash size is 74
I am using 2564
VM: 19756; RSS: 2564; maxVM: 19756; maxRSS: 2564
Starting Search 
Read in 2040 lines: Found 20 Reads per sec = 6.16048e-11 
Done running RUFUS.Filter.cpp
skipping fastp fix
sort: invalid option -- 'T'
sort: invalid option -- 'O'
samblaster: Version 0.1.26
samblaster: Inputting from stdin
samblaster: Outputting to stdout
open: No such file or directory
[bam_sort_core] fail to open file Child.bam.generator.Mutations.fastq
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 40 sequences (6040 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 20, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (274, 298, 334)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (154, 454)
[M::mem_pestat] mean and std.dev: (297.37, 36.62)
[M::mem_pestat] low and high boundaries for proper pairs: (94, 514)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 40 reads in 0.009 CPU sec, 0.004 real sec
samblaster: Loaded 2 header sequence entries.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "Child.bam.generator.Mutations.fastq.bam".
ERROR: BWA failed on Child.bam.generator.Mutations.fastq.  Either the files are exactly the same of something went wrong in previous step
## Not empty files, just really small....
Singularity> ls -sh
total 11M
368K Child.bam					      0 Child.bam.generator.temp.mate2.fastq
4.0K Child.bam.generator			   272K Father.bam
4.0K Child.bam.generator.Jelly.chr		   4.0K Father.bam.generator
200K Child.bam.generator.Jhash			   4.0K Father.bam.generator.Jelly.chr
 68K Child.bam.generator.Jhash.histo		   188K Father.bam.generator.Jhash
9.1M Child.bam.generator.Jhash.histo.7.7.dist	    68K Father.bam.generator.Jhash.histo
 12K Child.bam.generator.Jhash.histo.7.7.model	   364K Mother.bam
   0 Child.bam.generator.Jhash.histo.7.7boom.prob  4.0K Mother.bam.generator
8.0K Child.bam.generator.Mutations.Mate1.fastq	   4.0K Mother.bam.generator.Jelly.chr
8.0K Child.bam.generator.Mutations.Mate2.fastq	   192K Mother.bam.generator.Jhash
   0 Child.bam.generator.Mutations.fastq.bam	    68K Mother.bam.generator.Jhash.histo
4.0K Child.bam.generator.filter.chr		   4.0K clean.sh
4.0K Child.bam.generator.k25_c5.HashList	   4.0K mer_counts_merged.jf
   0 Child.bam.generator.temp			   4.0K runDevTest.sh
   0 Child.bam.generator.temp.mate1.fastq	   4.0K runTest.sh
Singularity> exit

Thanks again for all the help!

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Feb 18, 2021 via email

@moldach
Copy link

moldach commented Feb 19, 2021

So the job ran for 24 hours but failed: Slurm Job_id=2711 Name=rufus_test Ended, Run time 1-00:08:45, COMPLETED, ExitCode 0

sort: invalid option -- 'T'
sort: invalid option -- 'O'
open: No such file or directory
[bam_sort_core] fail to open file proband_bwaMEM_sort_dedupped.bam.generator.Mutations.fastq
samblaster: Version 0.1.26
samblaster: Inputting from stdin
samblaster: Outputting to stdout
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 2000000 sequences (300000000 bp)...
[M::process] read 2000000 sequences (300000000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (22, 416954, 29, 16)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (411, 973, 1348)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 3222)
[M::mem_pestat] mean and std.dev: (830.10, 447.09)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 4159)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (397, 461, 534)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (123, 808)
[M::mem_pestat] mean and std.dev: (467.16, 106.23)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 945)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (405, 1253, 1870)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 4800)
[M::mem_pestat] mean and std.dev: (1221.37, 1039.51)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 6265)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (534, 950, 1892)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 4608)
[M::mem_pestat] mean and std.dev: (1187.62, 956.20)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 5966)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 2000000 reads in 1126.856 CPU sec, 41.206 real sec
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "proband_bwaMEM_sort_dedupped.bam.generator.Mutations.fastq.bam".

There seems to be an error with your perl install, ... The real problem thats killing your run is you seem to have an issues with samtools.

What isn't clear to me is which perl & samtools versions is it trying to use - is it tools from within the RUFUS container or is trying to use versions I installed on my system?

Let's check:

$ singularity run \
     -H `pwd` \
     /project/M-mtgraovac182840/tools/rufus.sif perl --version
This is perl 5, version 22, subversion 1 (v5.22.1) built for x86_64-linux-gnu-thread-multi
$ singularity run \
   -H `pwd`  \
   /project/M-mtgraovac182840/tools/rufus.sif samtools

Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.19-96b5f2294a

If I look for the paths of samtools and perl on my local host I see a different version:

$ samtools
Program: samtools (Tools for alignments in the SAM format)
Version: 1.3.1 (using htslib 1.3.1)
(base) [moldach@marc RUFUS-TEST]$ perl -v
This is perl 5, version 32, subversion 0 (v5.32.0) built for x86_64-linux

So it's clear that this is an issue with the tools inside the Dockerfile that @kohrar created:

#
# A Dockerfile to get RUFUS running
#

# GCC 4.9 only available up to 16.04
FROM ubuntu:16.04

ARG DEBIAN_FRONTEND=noninteractive 

COPY . /RUFUS

RUN set -ex; \
# Dependencies
	BUILD_DEPS="cmake build-essential g++-4.9 zlib1g-dev libbz2-dev libbz2-dev liblzma-dev libncurses5-dev"; \
	apt-get update; \
	apt-get install -y software-properties-common; \
	add-apt-repository ppa:ubuntu-toolchain-r/test; \
	apt-get install -y python wget git bc $BUILD_DEPS; \
# Build
	mkdir -p /RUFUS/bin; \
	cd /RUFUS/bin; \
	cmake ../ -DCMAKE_C_COMPILER=$(which gcc) -DCMAKE_CXX_COMPILER=$(which g++); \
	make; \
# Cleanup
	apt-get purge -y --auto-remove $BUILD_DEPS; \
	apt-get clean; \
	echo done
	
# Runtime tools
RUN set -ex; \
	apt install samtools; \
	echo done

I'm surprised that sudo apt install samtools is installing such an old version?

  1. Which version of samtools will work?
  2. could the fact that the base image is Ubuntu 16.04 be why its installing an older version? How to specify which samtools for the container instead?

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Feb 19, 2021 via email

@moldach
Copy link

moldach commented Feb 23, 2021

your Samtools isn’t working properly, you either don’t have it installed or you have a very old version

Correct, as I showed in my above post the following command in @kohrar's Dockerfile was installing Samtools version Version: 0.1.19-96b5f2294a inside the container.

I tried to make changes to the Dockerfile so it would download a newer version of Samtools-v1.3.1:

# A Dockerfile to get RUFUS running
#
# GCC 4.9 only available up to 16.04
FROM ubuntu:16.04

ARG DEBIAN_FRONTEND=noninteractive

COPY . /RUFUS

RUN set -ex; \
# Dependencies
        BUILD_DEPS="cmake build-essential g++-4.9 zlib1g-dev libbz2-dev liblzma-dev libncurses5-dev libcurl4-gnutls-dev libssl-dev libgcc-5-dev libgomp1"; \
        apt-get update; \
        apt-get install -y software-properties-common; \
        add-apt-repository ppa:ubuntu-toolchain-r/test; \
        apt-get install -y python wget git bc $BUILD_DEPS; \
        wget https://github.com/samtools/samtools/releases/download/1.3.1/samtools-1.3.1.tar.bz2 && \
        tar -xjvf samtools-1.3.1.tar.bz2 && \
        cd samtools-1.3.1 && \
        make -j 4; \
        make prefix=/usr/local/bin install; \
        # if you have old version such as 0.x from samtools, you may remove it and create a link to new version
        apt remove samtools; \
        ln -s /usr/local/bin/bin/samtools /usr/bin/samtools; \

# Build
        mkdir -p /RUFUS/bin; \
        cd /RUFUS/bin; \
        cmake ../ -DCMAKE_C_COMPILER=$(which gcc) -DCMAKE_CXX_COMPILER=$(which g++); \
        make; \
# Cleanup
        apt-get purge -y --auto-remove $BUILD_DEPS; \
        apt-get clean; \
        echo done

And, despite the fact that I included libgomp1 in the BUILD_DEPS I'm getting an error when trying to run it:

/RUFUS/bin/ModelDist: error while loading shared libraries: libgomp.so.1: cannot open shared object file: No such file or directory 

Not sure if there is a more logical/timely way of testing for the presence of libgomp.so.1 because I only get this error by running RUFUS for 4 hours - a considerable bummer to trouble-shoot

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Feb 23, 2021 via email

@moldach
Copy link

moldach commented Mar 2, 2021

Any update on this @jandrewrfarrell ? (@kohrar)
We urgently need this working.
Thanks

@kohrar
Copy link

kohrar commented Mar 3, 2021

Hi @moldach,

I have updated the Dockerfile under my pull request (#20) to include the missing dependency required by some RUFUS binaries as well as the newest version of Samtools.

This should help with some of the issues you reported. Could you please see if this gets you any further?

RUFUS % singularity run -H `pwd` /global/software/singularity/images/software/rufus.sif                   
Singularity> cd /RUFUS/bin
Singularity> ldd ModelDist
        linux-vdso.so.1 =>  (0x00007fff6691c000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7947fb4000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7947cab000)
        libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f7947a89000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7947873000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f79474a9000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f7948336000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f79472a5000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7947088000)
		
Singularity> samtools --version
samtools 1.11
Using htslib 1.11
Copyright (C) 2020 Genome Research Ltd.

@antares58
Copy link

I have been experimenting with getting RUFUS operational in some form. I ran into build errors trying to build the source, so decided to use @moldach's docker image, moldach686/rufus-v1.0. Running in Singularity, I get the following error running the test script:

Singularity> sh runTest.sh
checking for samtools
/usr/local/bin/samtools
samtools found
_arg_fastqA =
_arg_fastqB =
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Final reference path being used is /tmp/RUFUS/testRun/../resources/references/small_test_human_reference_v37_decoys.fa
Final bwa reference path being used is /tmp/RUFUS/testRun/../resources/references/small_test_human_reference_v37_decoys.fa
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
proband extension is bam
you provided the proband cram file /tmp/RUFUS/testRun/Child.bam
parent file name is Mother.bam
parent file extension name is bam
You provided the control bam file /tmp/RUFUS/testRun/Mother.bam
parent file name is Father.bam
parent file extension name is bam
You provided the control bam file /tmp/RUFUS/testRun/Father.bam

value of ProbandGenerator Child.bam.generator
Value of ParentGenerators:
 Mother.bam.generator
 Father.bam.generator
Value of K is: 25
Value of Threads is: 40
value of ref is: /tmp/RUFUS/testRun/../resources/references/small_test_human_reference_v37_decoys.fa
value of min is: 

Did not provide refHash
$_arg_min is empty
_arg_min is
MutantMinCov is
parent is Mother.bam.generator
parent is Father.bam.generator
Running jellyfish for Mother.bam.generator
/tmp/RUFUS/scripts/RunJellyForRUFUS.sh: line 34: 62244 Killed $JELLYFISH count --disk -m $K -L $L -s 8G -t $T -o $GEN.Jhash -C $GEN.fq

I also tried running this on our own data and got different errors depending on which shell I used:
with sh:
Singularity> sh ./runRufus.sh -s /scratch.global/lee04110/data/bams/Affected.bam -c /scratch.global/lee04110/data/bams/Mother.bam -c /scratch.global/lee04110/data/bams/Father.bam -c /scratch.global/lee04110/data/bams/Sister.bam -t 8 -k 25 -ref /scratch.global/lee04110/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa
checking for samtools
/usr/local/bin/samtools
samtools found
./runRufus.sh: 29: ./runRufus.sh: Bad substitution
./runRufus.sh: 50: ./runRufus.sh: Syntax error: "(" unexpected

with bash:
Singularity> bash ./runRufus.sh -s /scratch.global/lee04110/data/bams/Affected.bam -c /scratch.global/lee04110/data/bams/Mother.bam -c /scratch.global/lee04110/data/bams/Father.bam -c /scratch.global/lee04110/data/bams/Sister.bam -t 8 -k 25 -ref /scratch.global/lee04110/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa
checking for samtools
/usr/local/bin/samtools
samtools found
_arg_fastqA =
_arg_fastqB =
Reference file not built for BWA
this program requires the existence of the file ef.sa
Killing run with non-zero status
Killed

There is a properly generated .sa file in the same directory as the .fa file; I even tried renaming it "ef.sa" to no avail. Anyone have any ideas as to what is causing these errors? Is RUFUS still being actively maintained? Our lab would really like to be able to use it.

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Oct 18, 2022 via email

@antares58
Copy link

Thanks very much for your quick response. I was just running the test script from Singularity on the command line and I have no idea how much memory it had available, but I can run it via a slurm job and specify enough memory. I'll also try the RDIR workaround.

@antares58
Copy link

Since I am using the docker image, there seems to be no way work around the RDIR problem. I can't edit the script.

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Oct 19, 2022 via email

@antares58
Copy link

The issue is that the script itself executes the problematic line RDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )” and so fails, regardless of anything I can set in the environment or pass to Singularity.

When trying to build RUFUS from freshly checked out source, I get the following:
bwt_gen.c: In function ‘BWTIncMergeBwt’:
bwt_gen.c:953:15: warning: variable ‘bitsInWordMinusBitPerChar’ set but not used [-Wunused-but-set-variable]
unsigned int bitsInWordMinusBitPerChar;
^~~~~~~~~~~~~~~~~~~~~~~~~
[ 40%] No install step for 'bwa_project'
[ 41%] Completed 'bwa_project'
[ 41%] Built target bwa_project
Scanning dependencies of target fastp_project
[ 42%] Creating directories for 'fastp_project'
[ 43%] Performing download step (git clone) for 'fastp_project'
Cloning into 'fastp_project'...
Already on 'master'
[ 44%] No patch step for 'fastp_project'
[ 45%] No update step for 'fastp_project'
[ 45%] No configure step for 'fastp_project'
[ 46%] Performing build step for 'fastp_project'
/usr/bin/ld: cannot find -lisal
/usr/bin/ld: cannot find -ldeflate
collect2: error: ld returned 1 exit status
make[3]: *** [fastp] Error 1
make[2]: *** [externals/fastp/src/fastp_project-stamp/fastp_project-build] Error 2
make[1]: *** [externals/CMakeFiles/fastp_project.dir/all] Error 2
make: *** [all] Error 2

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Oct 19, 2022 via email

@antares58
Copy link

RUFUS build successful! Thanks again for responding so quickly. Glad to hear that you've gotten funding for RUFUS and that a docker image is planned.

@antares58
Copy link

The runTest script succeeded out of the box. However, when I run RUFUS with my own data, I get the same error I saw when running it from the docker image in singularity:

bash $RUFUS_DIR/runRufus.sh -s /scratch.global/lee04110/data/bams/Affected.bam -c /scratch.global/lee04110/data/bams/Mother.bam -c /scratch.global/lee04110/data/bams/Father.bam -c /scratch.global/lee04110/pakistan/data/bams/Sister.bam -t 24 -k 25 -ref /scratch.global/lee04110/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa
checking for samtools
/panfs/roc/msisoft/samtools/1.9_gcc-7.2.0_haswell/bin/samtools
samtools found
_arg_fastqA =
_arg_fastqB =
Reference file not built for BWA
this program requires the existence of the file ef.sa
Killing run with non-zero status

The resources directory that contains the GRCh38_full_analysis_set_plus_decoy_hla.fa fasta file does contain GRCh38_full_analysis_set_plus_decoy_hla.fa.sa as well. So I am baffled.

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Oct 20, 2022 via email

@antares58
Copy link

Wow, that is one embarrassing user error. Thank you. I'm thrilled to report that I've now run RUFUS successfully over some data that we are actively studying.

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Oct 21, 2022 via email

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Oct 21, 2022 via email

@antares58
Copy link

I didn't know what to make of these lines from stderr for a sample for which RUFUS successfully generated a final vcf:
Feature (HLA-A68:01:02:01:2896-3597) beyond the length of HLA-A68:01:02:01 size (3517 bp). Skipping.
Feature (HLA-DRB114:05:01:13411-13952) beyond the length of HLA-DRB114:05:01 size (13933 bp). Skipping.

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Oct 24, 2022 via email

@antares58
Copy link

Good to know, thanks. I have gotten a reproducible seg fault in RUFUSinterpret on a couple of samples, any ideas about what the problem might be? In both cases, I am able to run successfully on an affected sibling with the same controls.

lee04110@ln0004 [/scratch.global/lee04110/batch] % grep 'Segmentation' *.err
151439214.err: 431062 Segmentation fault | $RUFUSinterpret -mob ./Intermediates/$NameStub.overlap.hashcount.fastq.MOB.sam -mod $dumbFix.Jhash.histo.7.7.dist -mQ 1 -r $humanRef -hf $HashList -o ./$NameStub.overlap.hashcount.fastq.bam -m $MaxAlleleSize $(echo $parentCRString) -sR Intermediates/$NameStub.overlap.asembly.hash.fastq.Ref.sample -s Intermediates/$NameStub.overlap.asembly.hash.fastq.sample -e ./Intermediates/$NameStub.ref.RepRefHash

151440492.err: 3173376 Segmentation fault | $RUFUSinterpret -mob ./Intermediates/$NameStub.overlap.hashcount.fastq.MOB.sam -mod $dumbFix.Jhash.histo.7.7.dist -mQ 1 -r $humanRef -hf $HashList -o ./$NameStub.overlap.hashcount.fastq.bam -m $MaxAlleleSize $(echo $parentCRString) -sR Intermediates/$NameStub.overlap.asembly.hash.fastq.Ref.sample -s Intermediates/$NameStub.overlap.asembly.hash.fastq.sample -e ./Intermediates/$NameStub.ref.RepRefHash

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Oct 25, 2022 via email

@antares58
Copy link

If you can tell from the stderr output below what contig RUFUS didn't like before it seg faulted, please enlighten me:

Affected1:
[main] CMD: /home/pankrat2/public/bin/gatk4/RUFUS/scripts/..//bin/externals/bwa/src/bwa_project/bwa mem -t 24 -Y -E 0,0 -O 6,6 -d 500 -w 500 -L 0,0 /home/pankrat2/public/bin/gatk4/RUFUS/scripts/..//resources/primate_non-LTR_Retrotransposon.fasta ./Affected1.bam.generator.V2.overlap.hashcount.fastq
[main] Real time: 0.051 sec; CPU: 0.042 sec
/home/pankrat2/public/bin/gatk4/RUFUS/scripts/Overlap.shorter.sh: line 342: 3173374 Broken pipe samtools view ./$NameStub.overlap.hashcount.fastq.bam
3173375 | grep -v chrUn
3173376 Segmentation fault | $RUFUSinterpret -mob ./Intermediates/$NameStub.overlap.hashcount.fastq.MOB.sam -mod $dumbFix.Jhash.histo.7.7.dist -mQ 1 -r $humanRef -hf $HashList -o ./$NameStub.overlap.hashcount.fastq.bam -m $MaxAlleleSize $(echo $parentCRString) -sR Intermediates/$NameStub.overlap.asembly.hash.fastq.Ref.sample -s Intermediates/$NameStub.overlap.asembly.hash.fastq.sample -e ./Intermediates/$NameStub.ref.RepRefHash

Affected3:
[main] CMD: /home/pankrat2/public/bin/gatk4/RUFUS/scripts/..//bin/externals/bwa/src/bwa_project/bwa mem -t 24 -Y -E 0,0 -O 6,6 -d 500 -w 500 -L 0,0 /home/pankrat2/public/bin/gatk4/RUFUS/scripts/..//resources/primate_non-LTR_Retrotransposon.fasta ./Affected3.bam.generator.V2.overlap.hashcount.fastq
[main] Real time: 0.027 sec; CPU: 0.024 sec
/home/pankrat2/public/bin/gatk4/RUFUS/scripts/Overlap.shorter.sh: line 342: 431060 Done samtools view ./$NameStub.overlap.hashcount.fastq.bam
431061 | grep -v chrUn
431062 Segmentation fault | $RUFUSinterpret -mob ./Intermediates/$NameStub.overlap.hashcount.fastq.MOB.sam -mod $dumbFix.Jhash.histo.7.7.dist -mQ 1 -r $humanRef -hf $HashList -o ./$NameStub.overlap.hashcount.fastq.bam -m $MaxAlleleSize $(echo $parentCRString) -sR Intermediates/$NameStub.overlap.asembly.hash.fastq.Ref.sample -s Intermediates/$NameStub.overlap.asembly.hash.fastq.sample -e ./Intermediates/$NameStub.ref.RepRefHash

Maybe it's informative that neither of these files has any chrUn reads?

lee04110@ln0004 [/scratch.global/lee04110/tmp_rufus/Affected1] % samtools view ./Affected1.overlap.hashcount.fastq.bam | grep chrUn
lee04110@ln0004 [/scratch.global/lee04110/tmp_rufus/Affected1] %

lee04110@ln0004 [/scratch.global/lee04110/tmp_rufus/Affected3] % samtools view ./Affected3.bam.generator.V2.overlap.hashcount.fastq.bam | grep chrUn
lee04110@ln0004 [/scratch.global/lee04110/tmp_rufus/Affected3] %

Other samples that I spot-checked, which RUFUS had no problem with, did have chrUn reads, and tailing their stderr files shows output like:

Feature (chrUn_JTFH01001981v1_decoy:1803-2184) beyond the length of chrUn_JTFH01001981v1_decoy size (2087 bp). Skipping.
Feature (chrUn_JTFH01001981v1_decoy:1816-2187) beyond the length of chrUn_JTFH01001981v1_decoy size (2087 bp). Skipping.
Feature (chrUn_JTFH01001981v1_decoy:1816-2181) beyond the length of chrUn_JTFH01001981v1_decoy size (2087 bp). Skipping.

By just calling samtools view | grep chr, I see that Affected1 had reads in chromosomes chr1, chr2, chr7, chr9, chr12, chr17, chr19, chr1_KI270711v1_random, chr1_KI270711v1_random, chr1_KI270766v1_alt. Maybe RUFUS didn't like the alt?

Affected3 had reads in chr1, chr2, chr4, chr16, chr19. No alt or random or chrUn.

@antares58
Copy link

I have a different sample that the Overlap.shorter.sh script seg faults on in a different place:

Affected2.bam.generator.Mutations.Mate1.fastq Affected2.bam.generator.Mutations.Mate2.fastq
[main] Real time: 18.096 sec; CPU: 127.665 sec
samblaster: Both Unmapped 18 0.007 0 0.000 0.000 0.000
samblaster: Orphan/Singleton 100 0.037 1 1.000 0.373 0.000
samblaster: Both Mapped 269149 99.956 267 0.099 99.627 0.099
samblaster: Total 269267 100.000 268 0.100 100.000 0.100
samblaster:
samblaster: Marked 268 of 269267 (0.100%) total read ids as duplicates using 7644k memory in 0.458S CPU seconds and 18S wall time.
/home/pankrat2/public/bin/gatk4/RUFUS/scripts/Overlap.shorter.sh: line 194: 2822145 Segmentation fault $OverlapHash ./TempOverlap/$NameStub.sam.fastqd .98 100 1 FP 20 1 ./TempOverlap/$NameStub.1 0 $Threads

@jandrewrfarrell
Copy link
Owner

jandrewrfarrell commented Oct 27, 2022 via email

@antares58
Copy link

Working on it...

@antares58
Copy link

The inputs to Rufus are not blank, although as mentioned there are no chrUn reads in the two input files where the segfault occurs on line 342 of the Overlap.shorter.sh script. I've attached ls -l output for the working directories. Unfortunately, it's not possible to share the data itself. I'll be happy to test fixes or run code with verbose logging enabled to help you track down the problem.
Affected1.txt
Affected2.txt
Affected3.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants