Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix filters parse and display #5061

Merged
merged 24 commits into from
Nov 26, 2024
Merged

Fix filters parse and display #5061

merged 24 commits into from
Nov 26, 2024

Conversation

dnil
Copy link
Collaborator

@dnil dnil commented Nov 22, 2024

This PR adds a functionality or fixes a bug.

Bit of a difference already without touching parsing:
Screenshot 2024-11-22 at 10 44 30

Then with parsing fixed:
Screenshot 2024-11-22 at 14 21 35

Screenshot 2024-11-22 at 14 34 12

I also added a filters badge to the variant page for good measure:
Screenshot 2024-11-22 at 14 49 06

Testing on cg-vm1 server (Clinical Genomics Stockholm)

Prepare for testing

  1. Make sure the PR is pushed and available on Docker Hub
  2. Fist book your testing time using the Pax software available at https://pax.scilifelab.se/. The resource you are going to call dibs on is scout-stage and the server is cg-vm1.
  3. ssh <USER.NAME>@cg-vm1.scilifelab.se
  4. sudo -iu hiseq.clinical
  5. ssh localhost
  6. (optional) Find out which scout branch is currently deployed on cg-vm1: podman ps
  7. Stop the service with current deployed branch: systemctl --user stop scout.target
  8. Start the scout service with the branch to test: systemctl --user start scout@<this_branch>
  9. Make sure the branch is deployed: systemctl --user status scout.target
  10. After testing is done, repeat procedure at https://pax.scilifelab.se/, which will release the allocated resource (scout-stage) to be used for testing by other users.
Testing on hasta server (Clinical Genomics Stockholm)

Prepare for testing

  1. ssh <USER.NAME>@hasta.scilifelab.se
  2. Book your testing time using the Pax software. us; paxa -u <user> -s hasta -r scout-stage. You can also use the WSGI Pax app available at https://pax.scilifelab.se/.
  3. (optional) Find out which scout branch is currently deployed on cg-vm1: conda activate S_scout; pip freeze | grep scout-browser
  4. Deploy the branch to test: bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_scout -t scout -b <this_branch>
  5. Make sure the branch is deployed: us; scout --version
  6. After testing is done, repeat the paxa procedure, which will release the allocated resource (scout-stage) to be used for testing by other users.

How to test:

  1. check a cancer case and an RD case, note filter tags missing and tags saying Pass when they should say Filtered.
  2. apply patch
  3. note improvements with old cases, e.g. some new danger badges on variantS, and the filters badge on the variant page. And in particular no new issues with them.
  4. for full effect, load/update the cases with the patch and note good tags both for callers and filters

Expected outcome:
The functionality should be working
Take a screenshot and attach or copy/paste the output.

Review:

  • code approved by
  • tests executed by

@dnil dnil marked this pull request as draft November 22, 2024 09:49
@dnil
Copy link
Collaborator Author

dnil commented Nov 22, 2024

Ok, with that out of the world, I would say we are not formally bugged at least not based on the demo case situation.
Screenshot 2024-11-22 at 10 53 13
Since the set tag is used without a "filteredIn", it is hard to fault the parsing here. This looks like more like a merging/pipeline issue.

But we could perhaps do slightly better here: since there is only one caller given, we could assign the FILTER status to the call I think!

Copy link

codecov bot commented Nov 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.81%. Comparing base (53f537b) to head (95ee422).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5061      +/-   ##
==========================================
+ Coverage   84.78%   84.81%   +0.02%     
==========================================
  Files         323      323              
  Lines       19442    19475      +33     
==========================================
+ Hits        16484    16517      +33     
  Misses       2958     2958              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


if raw_info or svdb_origin or other_info:
return callers

if category == "snv":
# cyvcf2 FILTER is None if VCF file column FILTER is "PASS"
filter_status = "Pass"
if variant.FILTER is not None:
filter_status = "Filtered - {}".format(filter_status.replace(";", " - "))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that filter_status could never take any other value than "Pass" here really, so it was either "Pass" or "Filtered - Pass" I guess?!

@dnil dnil marked this pull request as ready for review November 22, 2024 13:35
@@ -13,6 +13,14 @@ def parse_callers(variant, category="snv"):
2. If a svdb_origin tag (pipe separated) is found, callers listed will be marked Pass
3. If a set tag (dash separated, GATK CombineVariants) is found, callers will be marked Pass or Filtered accordingly

If the FILTER status is not PASS (e.g. None - cyvcf2 FILTER is None if VCF file column FILTER is "PASS"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno, this function is getting long. If I didn't have the thought about a new parsing module nudging I might split/refactor it. Say if you feel it is way long already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this function might be split into 3 more functions, one for each of the 3 situations:

1. If a FOUND_IN tag (comma separated) is found, callers listed will be marked Pass
2. If a svdb_origin tag (pipe separated) is found, callers listed will be marked Pass
3. If a set tag (dash separated, GATK CombineVariants) is found, callers will be marked Pass or Filtered accordingly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess porting it will be easier if it is somewhat clear; refactored into four separate cases (one for the gatk snv fallback as well).

@@ -28,36 +36,51 @@ def parse_callers(variant, category="snv"):
svdb_origin = variant.INFO.get("svdb_origin")
raw_info = variant.INFO.get("set")

filter_status_default = "Pass"
if variant.FILTER is not None:
filter_status_default = "Filtered - {}".format(variant.FILTER.replace(";", " - "))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the crucial changes. As noted below at its origin, the previous version would never add the actual FILTER tag on parsing. Python lazy initiation error. That would just have been a compile error in another language.

if other_info:
for info in other_info.split(","):
infos = other_info.split(",")
if len(infos) > 1:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the doctoring: add filter status if we are looking at a single caller, else just set as before.

@@ -56,7 +56,7 @@
</span>
{% endfor %}
{% for name, caller in variant.callers %} <!-- Collect info for specific callers -->
<span class="badge {% if caller == 'Pass' %}bg-success{% elif caller == 'Filtered' %}bg-secondary{% else %}bg-black{% endif %}" data-bs-toggle="tooltip" data-bs-html="true" title="{{caller}}">
<span class="badge {% if caller == 'Pass' %}bg-success{% elif 'Filtered' in caller %}bg-secondary{% else %}bg-black{% endif %}" data-bs-toggle="tooltip" data-bs-html="true" title="{{caller}}">
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show if substring matches, as in "Filtered - MinSomaticScore".

@dnil dnil added the Bugfix label Nov 22, 2024
@northwestwitch
Copy link
Member

Just to understand, when there is filter passed but not a specific caller, then originally the filter was not "Passed" but "filtered"?

Copy link
Member

@northwestwitch northwestwitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to capture more filtered events, as oppsite to what happens on main branch, which is nice!
Would be nice to:

  • Split the parsing function into different smaller functions, as you also suggested.
  • Fix the filter label displayed on variantS page, because it's too long for that cell.

I'm still trying to understand why we still have the green PASS label with or without callers for some cancer variants. I guess I have to look inside the VCF

@@ -13,6 +13,14 @@ def parse_callers(variant, category="snv"):
2. If a svdb_origin tag (pipe separated) is found, callers listed will be marked Pass
3. If a set tag (dash separated, GATK CombineVariants) is found, callers will be marked Pass or Filtered accordingly

If the FILTER status is not PASS (e.g. None - cyvcf2 FILTER is None if VCF file column FILTER is "PASS"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this function might be split into 3 more functions, one for each of the 3 situations:

1. If a FOUND_IN tag (comma separated) is found, callers listed will be marked Pass
2. If a svdb_origin tag (pipe separated) is found, callers listed will be marked Pass
3. If a set tag (dash separated, GATK CombineVariants) is found, callers will be marked Pass or Filtered accordingly

scout/parse/variant/callers.py Show resolved Hide resolved
@@ -45,19 +63,21 @@ def parse_callers(variant, category="snv"):
callers[caller] = "Filtered"
elif call == "Intersection":
for caller in callers:
callers[caller] = "Pass"
callers[caller] = filter_status_default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you have variant.FILTER not None and raw_info? Sorry, I'm trying to wrap my head around these conditions..

scout/server/blueprints/variant/utils.py Outdated Show resolved Hide resolved
@dnil
Copy link
Collaborator Author

dnil commented Nov 25, 2024

I'm still trying to understand why we still have the green PASS label with or without callers for some cancer variants. I guess I have to look inside the VCF

Yes, there are some lines there that appear odd to my understanding, much like if it was an amalgamation of different pipeline runs or call sets. I guess the demo is a) old and b) perhaps not entirely representative of one pipeline version, but has some edited lines etc.

@dnil
Copy link
Collaborator Author

dnil commented Nov 25, 2024

Just to understand, when there is filter passed but not a specific caller, then originally the filter was not "Passed" but "filtered"?

Hm, no, I thought it would have been Pass and None on all callers for the category?

I think you are clear on this, but to be sure, and to remember later, the issues were in particular

  1. If a filter status was new/missing from a scout constant VARIANT_FILTERS of known (mostly Lund specific) filter strings, the filter status badge was not shown.
  2. For variants with variant.FILTER status, callers are sometimes marked, e.g. with INFO.FOUND_IN. That was likely not what we expected from the pipeline side, but I guess it makes some sense as well, to tell which tool made the call that got filtered. These would previously be marked with caller status Pass even if a variant.FILTER status was set.

Copy link
Member

@northwestwitch northwestwitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice with the refactoring and new tests! 💯

CHANGELOG.md Show resolved Hide resolved
scout/parse/variant/callers.py Show resolved Hide resolved
scout/parse/variant/callers.py Outdated Show resolved Hide resolved
tests/parse/test_parse_callers.py Show resolved Hide resolved
tests/parse/test_parse_callers.py Show resolved Hide resolved
Copy link

sonarcloud bot commented Nov 26, 2024

@dnil dnil merged commit 6c115e8 into main Nov 26, 2024
25 checks passed
@dnil dnil deleted the fix_filters_parse_and_display branch November 26, 2024 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing filter status?
3 participants