Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New bulk 5'-RACE supported protocol, non-overlaping reads rescue #343

Open
wants to merge 12 commits into
base: dev
Choose a base branch
from

Conversation

JustBioinfo
Copy link

This PR adds :

  • a new supported library_generation_method to allow the analysis of 5'RACE library where R1 reads not start directly by the UMI by adding a new process PRESTO_MASKPRIMERS_ALIGN_TRIM that launch MaskPrimers.py align in trim mode before PRESTO_MASKPRIMERS_UMI process.
  • an option --assemblepairs_join to allow non-overlapping reads to be rescued using assemblepairs join on failed reads from assemblepairs align. In fact, in our libraries we have a large proportion of reads which do not overlap, but which turn out to be detected as productive sequences at the end of the pipeline.

This PR doesn't add it at the moment but is it possible to have in options the possibility of opting for IgBlast's 19-column mode ?

enhancement #342

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/airrflow branch on the nf-core/test-datasets repository.

Do I need to add tests, and if so, can you tell me how?
So far I've been testing with real data sets from my research lab, should I add a test data set on nf-core/airrflow branch on the nf-core/test-datasets repository ? If so, I'll check with my team to see what I can provide.

  • Make sure your code lints (nf-core lint).

LookupError: Failed to clone from the remote: https://github.com/nf-core/modules.git``

  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.

I'm new to analyzing this type of data, so I'm not familiar with the various AIRR library generation methods. Can you help me to name the new supported protocols, so far I've called it specific_5p_race_umi but I'm not sure it's the right way to name it.

  • Output Documentation in docs/output.md is updated.
    Have two new output folders, :

    • presto/trim_upstream_umi_linker to store R1 reads where the UMI upstream sequence was trim.
    • presto/08-assemble-pairs-join if the new --assemblepairs_join option is enabled.
      Are they well named to update docs/output.md ?
  • CHANGELOG.md is updated.

Copy link
Author

@JustBioinfo JustBioinfo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

why remove specific_5p_race_umi from nextflow_schema.json ?
I can't find any comments on my pull request, what should I do next please ?
Thank you very much.

Copy link
Member

@ggabernet ggabernet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JustBioinfo, sorry for the late review as I realized I never made my previous comments public. I have one comment for now, and to review the newly introduced specific_5p_race_umi protocol it would be great to have more details about the library design of this protocol. In what way does it differ from the dt_5p_race_umi protocol? Maybe a quick diagram of the library construct would be helpful here.
Additionally, we provide test sequencing data for each of the implemented protocols in the nf-core/test-datasets repository, and this is what is used for testing in the respective test config profiles, so we'd need to create as well a test profile with test data for this new protocol.

@@ -0,0 +1,40 @@
process PRESTO_MASKPRIMERS_ALIGN_TRIM {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This process is almost the same as presto_maskprimers_align, so this one can be reused by just providing --primer_mask_mode parameter as trim

@ggabernet
Copy link
Member

ggabernet commented Nov 15, 2024

Hi,

why remove specific_5p_race_umi from nextflow_schema.json ? I can't find any comments on my pull request, what should I do next please ? Thank you very much.

I think I was just trying to solve some merge conflicts and might have removed your new profile from the config accidentally, let's see whether all other tests are passing!

@ggabernet
Copy link
Member

ggabernet commented Nov 22, 2024

Hi @JustBioinfo, thanks for your PR. There are a couple of changes that would improve this code, by providing a new parameter only and not a new protocol. This way people can trim sequences preceding the UMI in the specific_pcr_umi protocols and also the 5'-RACE protocols. It was quite a bit of change to provide as comments so I've just opened a PR to your repo.

The new parameters should then be documented in the parameter schema, which can be done with the nf-core schema build command. Let me know if you have questions with this step. Additionally, it would be good to provide a small test dataset to be able to test this parameter during the CI runs. Could you provide a fastq file with the first ~1000 - 2000 reads as an example? We'll need to add it to the github.com/nf-core/test-datasets (airrflow branch) repository to be able to use it for the CI tests.

Regarding the non-overlapping read rescue, I'm hesitant to add the assemblepairs join command for users as reads that are not properly overlapping or joined with a reference would be eliminated downstream anyways when aligning the reads to a reference and could also create artifacts (artificial deletions). Have you tested out the parameter --assemblepairs_sequential for your data? This one should try to join the reads by aligning to a V gene reference if they are not overlapping, and it introduces "Ns" at the junction gap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants