Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in make -f src/Test.mk all #342

Open
genomics-ccrtd-cau opened this issue Nov 19, 2024 · 18 comments
Open

error in make -f src/Test.mk all #342

genomics-ccrtd-cau opened this issue Nov 19, 2024 · 18 comments

Comments

@genomics-ccrtd-cau
Copy link

seem to have isolated it to this step:

minimap2 -a -t 2 -R "@rg\tID:run1\tSM:sample1\tLB:library1\tPL:ILLUMINA" refs/AF086833.fa reads/SRR1553425_1.fastq reads/SRR1553425_2.fastq | \

samtools sort -@ 2 -o bam/SRR1553425.bam
[M::mm_idx_gen::0.0027.81] collected minimizers
[M::mm_idx_gen::0.004
5.15] sorted minimizers
[M::main::0.0045.14] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.004
4.78] mid_occ = 3; max_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_HPC: 0; #seq: 1
[M::mm_idx_stat::0.0054.67] distinct minimizers: 3590 (99.94% are singletons); average occurrences: 1.001; average spacing: 5.278
[M::worker_pipeline::0.074
2.04] mapped 10000 sequences
samtools sort: truncated file. Aborting

checked the .sam file by head and tail, seemed to be ok.

@ialbert
Copy link
Member

ialbert commented Nov 19, 2024

I this is a recent bio code output?

I tried it in a new directory:

bio code
make -f src/Test.mk test

and it all worked as expected. You can also test the makefile individually:

make -f src/run/minimap2.mk test

another possibility is that there is an older version of samtools that gets picked up in the pipeline

@genomics-ccrtd-cau
Copy link
Author

all new installs on new machine. tried the new directory and got this:

genomics1@RCST-4067-GENOMICS1-Mac-Studio ~
$ mkdir test_biostar_install
(bioinfo) 
genomics1@RCST-4067-GENOMICS1-Mac-Studio ~
$ cd test_biostar_install/
(bioinfo) 
genomics1@RCST-4067-GENOMICS1-Mac-Studio ~/test_biostar_install
$ bio code
# Writing: src/Test.mk
# Writing: src/data/golden_design.csv
# Writing: src/data/barton_counts.csv
# Writing: src/data/README.md
# Writing: src/data/golden_counts.csv
# Writing: src/data/barton_design.csv
# Writing: src/r/plot_heatmap.r
# Writing: src/r/evaluate_results.r
# Writing: src/r/deseq2.r
# Writing: src/r/create_tx2gene.r
# Writing: src/r/format_featurecounts.r
# Writing: src/r/combine_salmon.r
# Writing: src/r/edger.r
# Writing: src/r/plot_pca.r
# Writing: src/r/simulate_null.r
# Writing: src/r/simulate_counts.r
# Writing: src/recipes/download-data.mk
# Writing: src/recipes/rnaseq-with-salmon.mk
# Writing: src/recipes/genome-assembly.mk
# Writing: src/recipes/short-read-alignments.mk
# Writing: src/recipes/variant-calling.mk
# Writing: src/recipes/rnaseq-with-hisat.mk
# Writing: src/run/bioproject.mk
# Writing: src/run/deepvariant.mk
# Writing: src/run/ivar.mk
# Writing: src/run/gatk.mk
# Writing: src/run/refgenie.mk
# Writing: src/run/genbank.mk
# Writing: src/run/bwa.mk
# Writing: src/run/minimap2.mk
# Writing: src/run/curl.mk
# Writing: src/run/time.sh
# Writing: src/run/bcftools.mk
# Writing: src/run/tester.mk
# Writing: src/run/splitchrom.mk
# Writing: src/run/aria.mk
# Writing: src/run/freebayes.mk
# Writing: src/run/wiggle.mk
# Writing: src/run/rsync.mk
# Writing: src/run/snpeff.mk
# Writing: src/run/salmon.mk
# Writing: src/run/bgzip.mk
# Writing: src/run/coverage.mk
# Writing: src/run/bowtie2.mk
# Writing: src/run/star.mk
# Writing: src/run/sra.mk
# Writing: src/run/datasets.mk
# Writing: src/run/fastp.mk
# Writing: src/run/bcftools_parallel.mk
# Writing: src/run/hisat2.mk
# Writing: src/setup/doctor.r
# Writing: src/setup/init-rstudio.r
# Writing: src/setup/README.md
# Writing: src/setup/init-stats.sh
# Writing: src/workflows/airway.mk
# Writing: src/workflows/presenilin.mk
# Writing: src/workflows/benchmark.mk
# Writing: src/workflows/snpcall.mk
# Writing: src/workflows/snpeval.mk
# Created 59 files.
# Biostar Workflows: https://www.biostarhandbook.com/
(bioinfo) 
genomics1@RCST-4067-GENOMICS1-Mac-Studio ~/test_biostar_install
$  make -f src/run/minimap2.mk test
make -f src/run/tester.mk test_aligner MOD=src/run/minimap2.mk
# Get the reference genome.
make -f src/run/genbank.mk ACC=AF086833 REF=refs/AF086833.fa fasta
# Get the FASTQ reads.
make -f src/run/sra.mk SRR=SRR1553425 run
# Single end run.
make -f src/run/minimap2.mk \
		REF=refs/AF086833.fa \
		R1=reads/SRR1553425_1.fastq \
		BAM=bam/SRR1553425.bam \
		index run! run
# Paired end run.
make -f src/run/minimap2.mk \
		REF=refs/AF086833.fa \
		R1=reads/SRR1553425_1.fastq \
		R2=reads/SRR1553425_2.fastq \
		BAM=bam/SRR1553425.bam \
		index run! run
mkdir -p refs/
bio fetch AF086833 --format fasta > refs/AF086833.fa
-rw-r--r--  1 genomics1  staff    19K Nov 19 14:45 refs/AF086833.fa
# Create the directory.
mkdir -p reads
# Download the reads.
fastq-dump -F --split-files -X 10000 -O reads SRR1553425
# Rename the first in pair if final name is different.
if [ "reads/SRR1553425_1.fastq" != "reads/SRR1553425_1.fastq" ]; then mv -f reads/SRR1553425_1.fastq reads/SRR1553425_1.fastq; fi
# Rename the second pair if exists and is different.
if [ -f reads/SRR1553425_2.fastq ] && [ "reads/SRR1553425_2.fastq" != "reads/SRR1553425_2.fastq" ]; then mv -f reads/SRR1553425_2.fastq reads/SRR1553425_2.fastq; fi
Read 10000 spots for SRR1553425
Written 10000 spots for SRR1553425
-rw-r--r--  1 genomics1  staff   2.1M Nov 19 14:45 reads/SRR1553425_1.fastq
-rw-r--r--  1 genomics1  staff   2.1M Nov 19 14:45 reads/SRR1553425_2.fastq
# The minimap2 index is generated during alignment!
rm -rf bam/SRR1553425.bam bam/SRR1553425.bam.bai
make -f src/run/coverage.mk BAM=bam/SRR1553425.bam run!
rm -rf bam/SRR1553425.bw
mkdir -p bam/
minimap2 -a --MD -t 2  -R "@RG\tID:run1\tSM:sample1\tLB:library1\tPL:ILLUMINA" refs/AF086833.fa reads/SRR1553425_1.fastq  |\
	 samtools sort -@ 2 > bam/SRR1553425.bam
minimap2: unrecognized option: MD
[M::mm_idx_gen::0.002*5.81] collected minimizers
[M::mm_idx_gen::0.003*4.08] sorted minimizers
[M::main::0.003*4.07] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*3.79] mid_occ = 3; max_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_HPC: 0; #seq: 1
[M::mm_idx_stat::0.003*3.70] distinct minimizers: 3590 (99.94% are singletons); average occurrences: 1.001; average spacing: 5.278
[M::worker_pipeline::0.063*1.95] mapped 10000 sequences
[M::main] Version: 2.1.1-r341
[M::main] CMD: minimap2 -a --MD -t 2 -R @RG\tID:run1\tSM:sample1\tLB:library1\tPL:ILLUMINA refs/AF086833.fa reads/SRR1553425_1.fastq
[M::main] Real time: 0.063 sec; CPU: 0.122 sec
[bam_sort_core] merging from 0 files and 2 in-memory blocks...
samtools index bam/SRR1553425.bam
-rw-r--r--  1 genomics1  staff   480K Nov 19 14:45 bam/SRR1553425.bam
make -f src/run/coverage.mk BAM=bam/SRR1553425.bam REF=refs/AF086833.fa run
samtools faidx refs/AF086833.fa
# Create the directory.
mkdir -p bam/
# Generate the temporary bedgraph file.
LC_ALL=C; bedtools genomecov -ibam  bam/SRR1553425.bam -split -bg  | sort -k1,1 -k2,2n > bam/SRR1553425.bedgraph

# Convert the bedgraph file to bigwig.
bedGraphToBigWig bam/SRR1553425.bedgraph refs/AF086833.fa.fai bam/SRR1553425.bw
-rw-r--r--  1 genomics1  staff    83K Nov 19 14:45 bam/SRR1553425.bw
# The minimap2 index is generated during alignment!
rm -rf bam/SRR1553425.bam bam/SRR1553425.bam.bai
make -f src/run/coverage.mk BAM=bam/SRR1553425.bam run!
rm -rf bam/SRR1553425.bw
mkdir -p bam/
minimap2 -a --MD -t 2  -R "@RG\tID:run1\tSM:sample1\tLB:library1\tPL:ILLUMINA" refs/AF086833.fa reads/SRR1553425_1.fastq reads/SRR1553425_2.fastq |\
	 samtools sort -@ 2 > bam/SRR1553425.bam
minimap2: unrecognized option: MD
[M::mm_idx_gen::0.001*7.41] collected minimizers
[M::mm_idx_gen::0.003*4.91] sorted minimizers
[M::main::0.003*4.90] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.003*4.53] mid_occ = 3; max_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_HPC: 0; #seq: 1
[M::mm_idx_stat::0.003*4.41] distinct minimizers: 3590 (99.94% are singletons); average occurrences: 1.001; average spacing: 5.278
[M::worker_pipeline::0.063*1.98] mapped 10000 sequences
samtools sort: truncated file. Aborting
make[2]: *** [src/run/minimap2.mk:82: bam/SRR1553425.bam] Error 1
make[2]: *** Deleting file 'bam/SRR1553425.bam'
make[1]: *** [src/run/tester.mk:31: test_aligner] Error 2
make: *** [src/run/minimap2.mk:108: test] Error 2

@ialbert
Copy link
Member

ialbert commented Nov 19, 2024

the error there says:

minimap2: unrecognized option: MD

I wonder what the version is

minimap2 -V

mine is

2.26-r1175

@genomics-ccrtd-cau
Copy link
Author

minimap2 -V
2.1.1-r341

it's just what installed with you rmost wonderful biostar default install scripts.

but I researched that error and removed the --MD in a previous test and it was ok. but still threw the truncated file. aborting error.

@ialbert
Copy link
Member

ialbert commented Nov 19, 2024

actually the bam file is created in single end mode it looks like minimap2 ignores the missing MD and that is not the cause of the error,

it is pretty weird that the error occurs in paired end mode sorting only and not single end, the makefile tests both modes

    minimap2 -a --MD -t 2  -R "@RG\tID:run1\tSM:sample1\tLB:library1\tPL:ILLUMINA" \
    refs/AF086833.fa \
    reads/SRR1553425_1.fastq  | \
    samtools sort -@ 2 > bam/SRR1553425.bam

vs

    minimap2 -a --MD -t 2  -R "@RG\tID:run1\tSM:sample1\tLB:library1\tPL:ILLUMINA" \
    refs/AF086833.fa  \
    reads/SRR1553425_1.fastq \
    reads/SRR1553425_2.fastq | \
    samtools sort -@ 2 > bam/SRR1553425.bam

in my case both of these commands run when executed in the directory that I already ran my tests in

see how this works for you

@ialbert
Copy link
Member

ialbert commented Nov 19, 2024

what is your samtools version

samtools version

@genomics-ccrtd-cau
Copy link
Author

single worked,

(bioinfo) 
genomics1@RCST-4067-GENOMICS1-Mac-Studio ~/test_biostar_install
$     minimap2 -a --MD -t 2  -R "@RG\tID:run1\tSM:sample1\tLB:library1\tPL:ILLUMINA"     refs/AF086833.fa     reads/SRR1553425_1.fastq  |     samtools sort -@ 2 > bam/SRR1553425.bam
minimap2: unrecognized option: MD
[M::mm_idx_gen::0.002*9.95] collected minimizers
[M::mm_idx_gen::0.005*5.59] sorted minimizers
[M::main::0.005*5.57] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.005*5.20] mid_occ = 3; max_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_HPC: 0; #seq: 1
[M::mm_idx_stat::0.005*5.08] distinct minimizers: 3590 (99.94% are singletons); average occurrences: 1.001; average spacing: 5.278
[M::worker_pipeline::0.074*2.09] mapped 10000 sequences
[M::main] Version: 2.1.1-r341
[M::main] CMD: minimap2 -a --MD -t 2 -R @RG\tID:run1\tSM:sample1\tLB:library1\tPL:ILLUMINA refs/AF086833.fa reads/SRR1553425_1.fastq
[M::main] Real time: 0.075 sec; CPU: 0.156 sec
[bam_sort_core] merging from 0 files and 2 in-memory blocks...

paired same error. hmmm.

minimap2 -a --MD -t 2  -R "@RG\tID:run1\tSM:sample1\tLB:library1\tPL:ILLUMINA"     refs/AF086833.fa      reads/SRR1553425_1.fastq     reads/SRR1553425_2.fastq |     samtools sort -@ 2 > bam/SRR1553425.bam
minimap2: unrecognized option: MD
[M::mm_idx_gen::0.003*7.72] collected minimizers
[M::mm_idx_gen::0.005*4.63] sorted minimizers
[M::main::0.005*4.63] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.006*4.35] mid_occ = 3; max_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_HPC: 0; #seq: 1
[M::mm_idx_stat::0.006*4.27] distinct minimizers: 3590 (99.94% are singletons); average occurrences: 1.001; average spacing: 5.278
[M::worker_pipeline::0.076*2.05] mapped 10000 sequences
samtools sort: truncated file. Aborting

@genomics-ccrtd-cau
Copy link
Author

$ samtools

Program: samtools (Tools for alignments in the SAM format)
Version: 1.21 (using htslib 1.21)

@ialbert
Copy link
Member

ialbert commented Nov 19, 2024

could be that the sra data is incorrect, sometimes fastq dump misbehaves, in this case the second reads are wrong

seqkit stats reads/*.fastq

maybe delete the reads folder and rerun

@genomics-ccrtd-cau
Copy link
Author

$ samtools version
samtools 1.21
Using htslib 1.21
Copyright (C) 2024 Genome Research Ltd.

Samtools compilation details:
Features: build=configure curses=yes
CC: x86_64-apple-darwin13.4.0-clang
CPPFLAGS: -D_FORTIFY_SOURCE=2 -isystem /Users/genomics1/micromamba/envs/bioinfo/include -mmacosx-version-min=10.13
CFLAGS: -Wall -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -isystem /Users/genomics1/micromamba/envs/bioinfo/include -fdebug-prefix-map=/opt/mambaforge/envs/bioconda/conda-bld/samtools_1726176646420/work=/usr/local/src/conda/samtools-1.21 -fdebug-prefix-map=/Users/genomics1/micromamba/envs/bioinfo=/usr/local/src/conda-prefix
LDFLAGS: -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,/Users/genomics1/micromamba/envs/bioinfo/lib -L/Users/genomics1/micromamba/envs/bioinfo/lib
HTSDIR:
LIBS:
CURSES_LIB: -ltinfow -lncursesw

HTSlib compilation details:
Features: build=configure libcurl=yes S3=yes GCS=yes libdeflate=yes lzma=yes bzip2=yes plugins=yes plugin-path=/Users/genomics1/micromamba/envs/bioinfo/libexec/htslib htscodecs=1.6.1
CC: x86_64-apple-darwin13.4.0-clang
CPPFLAGS: -D_FORTIFY_SOURCE=2 -isystem /Users/genomics1/micromamba/envs/bioinfo/include -mmacosx-version-min=10.13
CFLAGS: -Wall -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -isystem /Users/genomics1/micromamba/envs/bioinfo/include -fdebug-prefix-map=/opt/mambaforge/envs/bioconda/conda-bld/htslib_1726173034934/work=/usr/local/src/conda/htslib-1.21 -fdebug-prefix-map=/Users/genomics1/micromamba/envs/bioinfo=/usr/local/src/conda-prefix -fvisibility=hidden
LDFLAGS: -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,/Users/genomics1/micromamba/envs/bioinfo/lib -L/Users/genomics1/micromamba/envs/bioinfo/lib -fvisibility=hidden -rdynamic

HTSlib URL scheme handlers present:
built-in: file, preload, data
Google Cloud Storage: gs+http, gs+https, gs
libcurl: gophers, smtp, wss, smb, rtsp, tftp, pop3, smbs, imaps, pop3s, ws, ftps, ftp, gopher, imap, http, https, sftp, smtps, scp, dict, mqtt, telnet
S3 Multipart Upload: s3w+https, s3w+http, s3w
Amazon S3: s3+https, s3, s3+http
crypt4gh-needed: crypt4gh
mem: mem

@genomics-ccrtd-cau
Copy link
Author

seqkit stats reads/*.fastq
processed files: 2 / 2 [======================================] ETA: 0s. done
file format type num_seqs sum_len min_len avg_len max_len
reads/SRR1553425_1.fastq FASTQ DNA 10,000 1,010,000 101 101 101
reads/SRR1553425_2.fastq FASTQ DNA 10,000 1,010,000 101 101 101

@ialbert
Copy link
Member

ialbert commented Nov 19, 2024

put the output into sam and sort it that way, there is got to be something there

@genomics-ccrtd-cau
Copy link
Author

minimap2 -a -t 2 -R "@rg\tID:run1\tSM:sample1\tLB:library1\tPL:ILLUMINA" refs/AF086833.fa reads/SRR1553425_1.fastq reads/SRR1553425_2.fastq > temp.sam

samtools view -Sb temp.sam -o temp.bam
[W::sam_read1_sam] Parse error at line 10026
samtools view: error reading file "temp.sam"

@genomics-ccrtd-cau
Copy link
Author

line 3 @sq SN:AF086833.2 LN:18959

line 10026 @sq SN:AF086833.2 LN:18959

@ialbert
Copy link
Member

ialbert commented Nov 19, 2024

that looks like it treats the two input files as two separate inputs and generates two SAM files in a concatenated manner,

almost certainly a bug

maybe add

minimap2 -x sr -a 

as a parameter (as short read mapping)

@genomics-ccrtd-cau
Copy link
Author

genomics-ccrtd-cau commented Nov 19, 2024

yes, you are correct about the concatenated, i think, I did this:

(bioinfo) 
genomics1@RCST-4067-GENOMICS1-Mac-Studio ~/test_biostar_install
$ grep "^@" temp.sam | sort -u > headers.sam
(bioinfo) 
genomics1@RCST-4067-GENOMICS1-Mac-Studio ~/test_biostar_install
$ grep -v "^@" temp.sam > alignments.sam
(bioinfo) 
genomics1@RCST-4067-GENOMICS1-Mac-Studio ~/test_biostar_install
$ cat headers.sam alignments.sam > temp_fixed.sam
(bioinfo) 
genomics1@RCST-4067-GENOMICS1-Mac-Studio ~/test_biostar_install
$ samtools view -Sb temp_fixed.sam -o temp_fixed.bam
(bioinfo) 

will try the minimap command above now.

$ minimap2 -x sr -a -t 2 -R "@RG\tID:run1\tSM:sample1\tLB:library1\tPL:ILLUMINA" refs/AF086833.fa reads/SRR1553425_1.fastq reads/SRR1553425_2.fastq > temp.sam
[E::main] unknown preset 'sr'
(bioinfo) 

@genomics-ccrtd-cau
Copy link
Author

this might be related? not sure:
lh3/minimap2#15

@genomics-ccrtd-cau
Copy link
Author

also found this, so might be an Apple M2 issue:
https://x.com/bioinformatics/status/1482041714994991104

Michael Barton
@bioinformatics
Installing minimap2 on an m1 mac. The new apple silicon is aarch64 architecture and supports the neon instruction set. Took me longer than it should to put it together, given that it's in the README.

make aarch64=1 arm_neon=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants