Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking SeqReader progress using pv #56

Open
vlad0x00 opened this issue Apr 4, 2022 · 2 comments
Open

Tracking SeqReader progress using pv #56

vlad0x00 opened this issue Apr 4, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@vlad0x00
Copy link
Member

vlad0x00 commented Apr 4, 2022

It would be useful to be able to track what % of the input file has been read by SeqReader at any time. Since SeqReader may use external tools to decompress files, it doesn't always have direct access to the input file to tell progress. However, it's possible to track progress using pv. To do this, btllib should run pv on the input file and pipe that further into either the decompression tool or SeqReader directly.

Instead of %, we would want to have the total number of bytes of the input file and the number of bytes read so far. pv allows this using the -bn options. The rationale here is that if you have multiple SeqReaders at the same time, the actual progress % is the sum of read bytes over the sum of total bytes for all SeqReaders.

Using pv this way shouldn't affect performance, as it likely uses splicing (https://linux.die.net/man/2/splice) instead of reading/writing. However, if it does, then there is an optimization that can be done on Linux. Instead of piping, pv on Linux lets you track file read progress of a process given its PID. In this case, we would call pv to track the progress of whatever external tool SeqReader uses or on SeqReader's process if no external tool is used.

@vlad0x00 vlad0x00 added the enhancement New feature or request label Apr 19, 2022
@parham-k
Copy link
Member

parham-k commented Jul 7, 2022

Is there an easier way of doing this for a single SeqReader which reads sequences one by one? Like, after each iteration in the code below, can we have the total number of bytes we've read so far? In that case we can just compare that with the file size.

btllib::SeqReader reader(...);
for (const auto& record : reader) {
    process(record.seq);
    std::cout << reader.bytes_read() / filesize << std::endl;
}

@vlad0x00
Copy link
Member Author

vlad0x00 commented Jul 7, 2022

That'd be fairly straightforward, but would only work for uncompressed files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants