Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexlr: spaces are not supported in file path #129

Open
mxwang66 opened this issue Nov 5, 2024 · 6 comments
Open

Indexlr: spaces are not supported in file path #129

mxwang66 opened this issue Nov 5, 2024 · 6 comments

Comments

@mxwang66
Copy link

mxwang66 commented Nov 5, 2024

Hello devs,

I ran into this error running Indexlr in python on a linux x64 system. Below is the code to replicate this error:

from btllib import Indexlr, IndexlrFlag

assembly_path = 'genomes/Salmonella bongori/ncbi_dataset/data/GCF_000252995.1/genomic.fna'
kmerlen = 21
windowsize = 200
with Indexlr(assembly_path, kmerlen, windowsize, IndexlrFlag.LONG_MODE, 1) as assembly:
    for record in assembly:
        for m in record.minimizers:
            pass

and the error message:

cat: genomes/Salmonella: No such file or directory
cat: bongori/ncbi_dataset/data/GCF_000252995.1/genomic.fna: No such file or directory
[2024-11-05 01:39:10][WARNING] genomes/Salmonella bongori/ncbi_dataset/data/GCF_000252995.1/genomic.fna is empty.
[2024-11-05 01:39:10][ERROR] A helper process has finished unsuccessfully:
PID: 2313645
Outcome: exited with status 1
[2024-11-05 01:39:10][ERROR] Process pipeline: Communication failure.
[2024-11-05 01:39:10][ERROR] Process pipeline: Spawner process failed.
Segmentation fault (core dumped)

Got the same error when using the indexlr command:

indexlr -k 21 -w 200 --long -o test.out "genomes/Salmonella bongori/ncbi_dataset/data/GCF_000252995.1/genomic.fna"

Both run was successful if the space in the file path is replaced.

Thank you,
Michael

@lcoombe
Copy link
Member

lcoombe commented Nov 5, 2024

Hi Michael,

Thanks for reaching out!

I'm not really surprised that you would see unexpected behaviour when trying to use files with spaces in them - generally, the convention is to avoid using spaces in file/directory names (I found some good documentation of conventions here: https://datamanagement.hms.harvard.edu/plan-design/file-naming-conventions). There would be a number of issues possible there, including that indexlr can take multiple files, and the spaces would make it complicated to distinguish different files vs. a single file with the space(s).

Because of all that, I'd highly recommend removing spaces in your file names. If that isn't a good option (I understand sometimes changing file names after the fact can be messy), then you could make a soft-link to that problem file, and specify that to indexlr instead. Ex:

(btl) [lcoombe@hpce706 tmp]$ ls my\ test/
tmp.fa
(btl) [lcoombe@hpce706 tmp]$ ls -lah tmp1.fa 
lrwxrwxrwx 1 lcoombe btl 14 Nov  5 08:52 tmp1.fa -> my test/tmp.fa
(btl) [lcoombe@hpce706 tmp]$ indexlr -k24 -w5 tmp1.fa 
NC_012920.1_1-50	16371557189216790589 3754088885574348785 9549692006826280120 11376364575436671136 17106421449899264663 2564125396831936452 6856120600046910211 636075098808042691

Thank you for your interest in btllib!
Lauren

@mxwang66
Copy link
Author

mxwang66 commented Nov 5, 2024

Thanks for your response Lauren! Yeah the best practice would be having no spaces. But it would nice to make it supported, since python itself supports file path with spaces. But I understand that you might need to spend time on other features. Thanks again for maintaining btllib!

Michael

@vlad0x00
Copy link
Member

vlad0x00 commented Nov 5, 2024

@lcoombe
I think the issue can be resolved by surrounding path with quotes in https://github.com/bcgsc/btllib/blob/master/src/btllib/data_stream.cpp#L242-L299

At the start of the function, you can do const auto quoted_path = '"' + path + '"'; and then replace all usages of path in that function with quoted_path. Probably also worth checking that the path is not already quoted by checking if the first and last characters are quotes or apostrophes.

@lcoombe
Copy link
Member

lcoombe commented Nov 5, 2024

Thanks @vlad0x00!

That's really helpful - we can take a look at making that tweak (@parham-k or @jwcodee?) (We're a bit short-staffed, and I will be going on leave soon, so might not happen quickly :) )

@parham-k
Copy link
Member

parham-k commented Nov 5, 2024

I can take care of it.

@lcoombe
Copy link
Member

lcoombe commented Nov 5, 2024

I can take care of it.

Thanks so much Parham!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants