Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

civic+cgi evidence bug fixes & example data update #53

Merged
merged 6 commits into from
Sep 9, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
Changelog
============

0.5.5 - Sulfur Io (2024-09-09)
---------------------------------------------

**Added**

**Fixed**

* [#52](https://github.com/qbic-pipelines/querynator/issues/52): issue that lead to an inconsitent number of fields for the CIViC evidences
* [#51](https://github.com/qbic-pipelines/querynator/issues/51): CGI evidences are now filtered by the specified cancer type

**Dependencies**

**Deprecated**

0.5.4 - Sulfur Io (2024-07-24)
---------------------------------------------

Expand Down
Binary file modified example_files/cgi_test_out/cgi_test_out.cgi_results.zip
Binary file not shown.
178 changes: 89 additions & 89 deletions example_files/cgi_test_out/cgi_test_out.cgi_results/alterations.tsv

Large diffs are not rendered by default.

665 changes: 377 additions & 288 deletions example_files/cgi_test_out/cgi_test_out.cgi_results/biomarkers.tsv

Large diffs are not rendered by default.

This file was deleted.

180 changes: 90 additions & 90 deletions example_files/cgi_test_out/cgi_test_out.cgi_results/input01.tsv

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
CGI query date: 2023-05-30
CGI query date: 2024-09-09
API version: https://www.cancergenomeinterpreter.org/api/v1/
Input mutations: /Users/students/Documents/work_dir/querynator/example_files/example.vcf
Input mutations: /home-link/zxmgc83/querynator/example_files/example.vcf
Reference genome: GRCh37
Filtered out synonymous & low impact variants based on VEP annotation
24 changes: 0 additions & 24 deletions example_files/cgi_test_out/cgi_test_out.cgi_results/report.txt

This file was deleted.

29 changes: 29 additions & 0 deletions example_files/cgi_test_out/cgi_test_out.cgi_results/summary.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
===============================
=== Summary of the analysis ===
===============================

=== CGI-ANALYSIS ===
Analysis Code: CGI_query
Analysis ID: 45474d2bd5ff2e2a81e3
CGI version: v23.12.2
Date: 2024-09-09 12:17:40

=== INPUT ===
Analysed mutations: 88
Analysed cnas: 0
Analysed fusions: 0
Total samples: 1
Cancer type: BRCA
Reference genome: hg19

=== ALTERATIONS ===
Driver mutations: 84
Predicted and annotated drivers: 47
Predicted drivers: 4
Annotated drivers: 33

=== BIOMARKERS ===
Biomarkers in cancer type: 33
Biomarkers in cancer type - Level A: 1
Biomarkers in other cancer type: 441

Large diffs are not rendered by default.

35 changes: 18 additions & 17 deletions example_files/civic_test_out/civic_test_out.civic_results.tsv

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions example_files/civic_test_out/metadata.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
CIViC query date: 2023-05-30
CIViC query date: 2024-09-09
CIViCpy version: 3.0.0
Search mode: exact
Reference genome: GRCh37
Filtered out synonymous & low impact variants based on VEP annotation
Input File: /Users/students/Documents/work_dir/querynator/example_files/example.vcf
Input File: /home-link/zxmgc83/querynator/example_files/example.vcf

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions querynator/query_api/civic_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ def append_to_dict(dict1, dict2):
dict1[key].append(value)
else:
if dict1[key] == "":
dict1[key] = dict2[key]
dict1[key] = [dict2[key]]
else:
dict1[key] = [dict1[key], dict2[key]]

Expand Down Expand Up @@ -417,12 +417,12 @@ def get_evidence_information_from_variant(variant_obj, diseases):
"evidence_level": evidence.evidence_level,
"evidence_support": evidence.evidence_direction,
"evidence_type": evidence.evidence_type,
"evidence_phenotypes": ", ".join([i.name for i in evidence.phenotypes]),
"evidence_phenotypes": "+".join([i.name for i in evidence.phenotypes]),
"evidence_rating": evidence.rating,
"evidence_significance": evidence.significance,
"evidence_source": evidence.source.name,
"evidence_status": evidence.status,
"evidence_therapies": ", ".join([i.name for i in evidence.therapies]),
"evidence_therapies": "+".join([i.name for i in evidence.therapies]),
"evidence_therapy_interaction_type": evidence.therapy_interaction_type,
}
except IndexError:
Expand Down
43 changes: 31 additions & 12 deletions querynator/report_scripts/combine_cgi.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,9 @@ def link_biomarkers(biomarkers_df, logger):

def get_highest_evidence(row, biomarkers_linked):
"""
get highest associated CGI evidence of the current alteration (A-D) from the biomarkers datafrane
get highest associated CGI evidence of the current alteration (A-D) from the biomarkers dataframe.

consider evidence matched on gene, alteration and cancer type, as well as off-label use (level A evidence for different cancer is level C evidence for this cancer).

:param row: row of a pandas DataFrame
:type row: pandas Series
Expand All @@ -308,10 +310,8 @@ def get_highest_evidence(row, biomarkers_linked):
if row["Protein Change_CGI"].startswith("*"):
row["Protein Change_CGI"] = row["Protein Change_CGI"].replace("*", "\*")

# highest evidence level has lowest char value (A<B<C<D)
max_evidence_level = biomarkers_linked.loc[
(biomarkers_linked["alterations_link"].str.contains(row["Protein Change_CGI"]))
]["Evidence"].min()
curr_alteration_msk = biomarkers_linked["alterations_link"].str.contains(row["Protein Change_CGI"])
max_evidence_level = biomarkers_linked.loc[curr_alteration_msk, "Evidence"].min()

return max_evidence_level

Expand Down Expand Up @@ -342,6 +342,30 @@ def check_wildtypes(biomarkers: pd.DataFrame, vcf: pd.DataFrame, logger) -> None
return


def filter_biomarkers(biomarkers_df: pd.DataFrame, logger) -> pd.DataFrame:
"""
adapt biomarkers to only consider
- "complete" biomarkers (gene, alteration)
- matches between alteration & biomarker (gene, alteration, cancer type)
- off-label use (level A evidence for different cancer is level C evidence for this cancer)

:biomarkers_df : the dataframe containing the cgi result 'biomarkers.tsv'
:logger : the logger
:return : the adapted biomarkers dataframe
"""
complete_biom_msk = biomarkers_df.BioM == "complete"
match_msk = biomarkers_df["Match"] == "YES"
off_label_msk = ~match_msk & (biomarkers_df["Evidence"] == "A")
filter = complete_biom_msk & (match_msk | off_label_msk)

biomarkers_df.loc[off_label_msk, "Evidence"] = "C"
biomarkers_df["alterations_link"] = biomarkers_df["alterations_link"].astype(str)

logger.info(f"CGI: filtered {(~filter).sum()} irrelevant biomarkers, {filter.sum()} remaining")

return biomarkers_df.loc[filter]


def combine_cgi(cgi_path, outdir, logger):
"""
Command to combine the cgi results with the vcf's VEP annotation
Expand Down Expand Up @@ -369,19 +393,14 @@ def combine_cgi(cgi_path, outdir, logger):
alterations_df = read_modify_alterations(alterations_path)
merged_df = merge_alterations_vep(vep_df, alterations_df)

# link alterations in biomarkers
# link alterations in biomarkers & filter
biomarkers_df = link_biomarkers(biomarkers_df, logger)
biomarkers_df = filter_biomarkers(biomarkers_df, logger)
biomarkers_df.to_csv(f"{outdir}/combined_files/biomarkers_linked.tsv", sep="\t", index=False)
HomoPolyethylen marked this conversation as resolved.
Show resolved Hide resolved

check_wildtypes(biomarkers_df, vep_df, logger)

# add CGI evidence col to merged_df

# adapt biomarkers to only consider "complete" matches between alteration & biomarker
biomarkers_df = biomarkers_df[biomarkers_df.BioM == "complete"]
# biomarkers_linked["alterations_link"] = biomarkers_linked["alterations_link"].astype(str)
biomarkers_df["alterations_link"] = biomarkers_df["alterations_link"].apply(str)
# add CGI evidence col
merged_df["evidence_CGI"] = merged_df.apply(lambda x: get_highest_evidence(x, biomarkers_df), axis=1)
# write merged to report dir
merged_df.to_csv(f"{outdir}/combined_files/alterations_vep.tsv", sep="\t", index=False)
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from setuptools import find_packages, setup

VERSION = "0.5.4"
VERSION = "0.5.5"

with open("README.rst") as readme_file:
readme = readme_file.read()
Expand Down
Loading