Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata editing in pod5 files #100

Open
DCossey opened this issue Dec 15, 2023 · 10 comments
Open

Metadata editing in pod5 files #100

DCossey opened this issue Dec 15, 2023 · 10 comments
Assignees
Labels
question Further information is requested

Comments

@DCossey
Copy link

DCossey commented Dec 15, 2023

Hi, we were running into an error during dorado basecalling (of recovered files after a failed run) due to incorrect sequencing kit:

[2023-12-15 14:35:04.358] [error] Unknown sequencing_kit: FLO-PRO114M

So then we checked our pod5 files and saw the following:

flow_cell_product_code: FLO-PRO114M
sequencing_kit: FLO-PRO114M

Is it possible to edit the incorrect sequencing kit somehow?

@HalfPhoton
Copy link
Collaborator

Hi @DCossey ,
Yes it is possible to fix your metadata although it's not particularly clean as pod5 files are immutable.

There is a short part of the documentation relating to this, but here's a snippet more tailored to your issue.
You need to edit the RunInfo of each read.

import pod5

# New output file for edited data
with pod5.Writer("output.pod5") as writer:
    # Read all records
    with pod5.Reader("input.pod5") as reader:
          # Iterate over immutable ReadRecords
          for record in reader:
               # Convert to mutable Read
               read = record.to_read()
               # Edit the value
               read.run_info.sequencing_kit = "sequencing_kit_here"
               # Write the edited read
               writer.add_read(read)

Kind regards,
Rich

@HalfPhoton HalfPhoton self-assigned this Dec 18, 2023
@HalfPhoton HalfPhoton added the question Further information is requested label Dec 18, 2023
@jennieli421
Copy link

I followed the example code and revised a pod5 file successfully. However, when I try to check the content using pod5 view, I get this error (the same error if I try to view an untouched pod5):

POD5 has encountered an error: 'Error while processing "output2.pod5''

For detailed information set POD5_DEBUG=1'

@HalfPhoton
Copy link
Collaborator

What command are you running?

@jennieli421
Copy link

pod5 view "output2.pod5"

@HalfPhoton
Copy link
Collaborator

Can you try without the quotes please?

@jennieli421
Copy link

jennieli421 commented Dec 18, 2023

Tried and still the same error.

$ pod5 view original.pod5
read_id	filename	read_number	channel	mux	end_reason	start_time	start_sample	duration	num_samples	minknow_events	sample_rate	median_before	predicted_scaling_scale	predicted_scaling_shift	tracked_scaling_scale	tracked_scaling_shift	num_reads_since_mux_change	time_since_mux_change	run_id	sample_id	experiment_id	flow_cell_id	pore_type

POD5 has encountered an error: 'Error while processing 'original.pod5''

For detailed information set POD5_DEBUG=1'

@HalfPhoton
Copy link
Collaborator

Can you run the following then:

pod5 --version
POD5_DEBUG=1 pod5 view output2.pod5

And then share the contents of the pod5 .log files that are generated?

@HalfPhoton
Copy link
Collaborator

ah - this could be an new issue from polars==0.20

Can you please ensure you're using polars==0.19

If not please re-install polars with pip install -U polars~=0.19

@jennieli421
Copy link

jennieli421 commented Dec 18, 2023

Yes my polars==0.20. Note that I have to run pip install -U polars==0.19, otherwise it would say "requirement already satisfied". The error was fixed. Thanks!

@HalfPhoton
Copy link
Collaborator

Fantastic, sorry about that last issue - we're patching this as we speak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants