-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix filters parse and display #5061
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5061 +/- ##
==========================================
+ Coverage 84.78% 84.81% +0.02%
==========================================
Files 323 323
Lines 19442 19475 +33
==========================================
+ Hits 16484 16517 +33
Misses 2958 2958 ☔ View full report in Codecov by Sentry. |
scout/parse/variant/callers.py
Outdated
|
||
if raw_info or svdb_origin or other_info: | ||
return callers | ||
|
||
if category == "snv": | ||
# cyvcf2 FILTER is None if VCF file column FILTER is "PASS" | ||
filter_status = "Pass" | ||
if variant.FILTER is not None: | ||
filter_status = "Filtered - {}".format(filter_status.replace(";", " - ")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that filter_status
could never take any other value than "Pass" here really, so it was either "Pass" or "Filtered - Pass" I guess?!
@@ -13,6 +13,14 @@ def parse_callers(variant, category="snv"): | |||
2. If a svdb_origin tag (pipe separated) is found, callers listed will be marked Pass | |||
3. If a set tag (dash separated, GATK CombineVariants) is found, callers will be marked Pass or Filtered accordingly | |||
|
|||
If the FILTER status is not PASS (e.g. None - cyvcf2 FILTER is None if VCF file column FILTER is "PASS"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno, this function is getting long. If I didn't have the thought about a new parsing module nudging I might split/refactor it. Say if you feel it is way long already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this function might be split into 3 more functions, one for each of the 3 situations:
1. If a FOUND_IN tag (comma separated) is found, callers listed will be marked Pass
2. If a svdb_origin tag (pipe separated) is found, callers listed will be marked Pass
3. If a set tag (dash separated, GATK CombineVariants) is found, callers will be marked Pass or Filtered accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess porting it will be easier if it is somewhat clear; refactored into four separate cases (one for the gatk snv fallback as well).
scout/parse/variant/callers.py
Outdated
@@ -28,36 +36,51 @@ def parse_callers(variant, category="snv"): | |||
svdb_origin = variant.INFO.get("svdb_origin") | |||
raw_info = variant.INFO.get("set") | |||
|
|||
filter_status_default = "Pass" | |||
if variant.FILTER is not None: | |||
filter_status_default = "Filtered - {}".format(variant.FILTER.replace(";", " - ")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the crucial changes. As noted below at its origin, the previous version would never add the actual FILTER tag on parsing. Python lazy initiation error. That would just have been a compile error in another language.
scout/parse/variant/callers.py
Outdated
if other_info: | ||
for info in other_info.split(","): | ||
infos = other_info.split(",") | ||
if len(infos) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the doctoring: add filter status if we are looking at a single caller, else just set as before.
@@ -56,7 +56,7 @@ | |||
</span> | |||
{% endfor %} | |||
{% for name, caller in variant.callers %} <!-- Collect info for specific callers --> | |||
<span class="badge {% if caller == 'Pass' %}bg-success{% elif caller == 'Filtered' %}bg-secondary{% else %}bg-black{% endif %}" data-bs-toggle="tooltip" data-bs-html="true" title="{{caller}}"> | |||
<span class="badge {% if caller == 'Pass' %}bg-success{% elif 'Filtered' in caller %}bg-secondary{% else %}bg-black{% endif %}" data-bs-toggle="tooltip" data-bs-html="true" title="{{caller}}"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show if substring matches, as in "Filtered - MinSomaticScore".
Just to understand, when there is filter passed but not a specific caller, then originally the filter was not "Passed" but "filtered"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to capture more filtered events, as oppsite to what happens on main branch, which is nice!
Would be nice to:
- Split the parsing function into different smaller functions, as you also suggested.
- Fix the filter label displayed on variantS page, because it's too long for that cell.
I'm still trying to understand why we still have the green PASS label with or without callers for some cancer variants. I guess I have to look inside the VCF
@@ -13,6 +13,14 @@ def parse_callers(variant, category="snv"): | |||
2. If a svdb_origin tag (pipe separated) is found, callers listed will be marked Pass | |||
3. If a set tag (dash separated, GATK CombineVariants) is found, callers will be marked Pass or Filtered accordingly | |||
|
|||
If the FILTER status is not PASS (e.g. None - cyvcf2 FILTER is None if VCF file column FILTER is "PASS"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this function might be split into 3 more functions, one for each of the 3 situations:
1. If a FOUND_IN tag (comma separated) is found, callers listed will be marked Pass
2. If a svdb_origin tag (pipe separated) is found, callers listed will be marked Pass
3. If a set tag (dash separated, GATK CombineVariants) is found, callers will be marked Pass or Filtered accordingly
scout/parse/variant/callers.py
Outdated
@@ -45,19 +63,21 @@ def parse_callers(variant, category="snv"): | |||
callers[caller] = "Filtered" | |||
elif call == "Intersection": | |||
for caller in callers: | |||
callers[caller] = "Pass" | |||
callers[caller] = filter_status_default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you have variant.FILTER not None and raw_info? Sorry, I'm trying to wrap my head around these conditions..
Yes, there are some lines there that appear odd to my understanding, much like if it was an amalgamation of different pipeline runs or call sets. I guess the demo is a) old and b) perhaps not entirely representative of one pipeline version, but has some edited lines etc. |
Hm, no, I thought it would have been Pass and None on all callers for the category? I think you are clear on this, but to be sure, and to remember later, the issues were in particular
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice with the refactoring and new tests! 💯
Quality Gate passedIssues Measures |
This PR adds a functionality or fixes a bug.
Bit of a difference already without touching parsing:
Then with parsing fixed:
I also added a filters badge to the variant page for good measure:
Testing on cg-vm1 server (Clinical Genomics Stockholm)
Prepare for testing
scout-stage
and the server iscg-vm1
.ssh <USER.NAME>@cg-vm1.scilifelab.se
sudo -iu hiseq.clinical
ssh localhost
podman ps
systemctl --user stop scout.target
systemctl --user start scout@<this_branch>
systemctl --user status scout.target
scout-stage
) to be used for testing by other users.Testing on hasta server (Clinical Genomics Stockholm)
Prepare for testing
ssh <USER.NAME>@hasta.scilifelab.se
us; paxa -u <user> -s hasta -r scout-stage
. You can also use the WSGI Pax app available at https://pax.scilifelab.se/.conda activate S_scout; pip freeze | grep scout-browser
bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_scout -t scout -b <this_branch>
us; scout --version
paxa
procedure, which will release the allocated resource (scout-stage
) to be used for testing by other users.How to test:
Expected outcome:
The functionality should be working
Take a screenshot and attach or copy/paste the output.
Review: