Skip to content

Commit

Permalink
Merge pull request #53 from HomoPolyethylen/dev
Browse files Browse the repository at this point in the history
civic+cgi evidence bug fixes & example data update

closes #51
closes #52
  • Loading branch information
HomoPolyethylen authored Sep 9, 2024
2 parents c0d53ac + c1b68fb commit 21b8e58
Show file tree
Hide file tree
Showing 18 changed files with 839 additions and 715 deletions.
14 changes: 14 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
Changelog
============

0.5.5 - Sulfur Io (2024-09-09)
---------------------------------------------

**Added**

**Fixed**

* [#52](https://github.com/qbic-pipelines/querynator/issues/52): issue that lead to an inconsitent number of fields for the CIViC evidences
* [#51](https://github.com/qbic-pipelines/querynator/issues/51): CGI evidences are now filtered by the specified cancer type

**Dependencies**

**Deprecated**

0.5.4 - Sulfur Io (2024-07-24)
---------------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ The command above generates the following result directory:
outdir
├── combined_files
| ├── alterations_vep.tsv
| ├── biomarkers_linked.tsv
| ├── biomarkers_linked_filtered.tsv
| ├── civic_cgi_vep.tsv
| └── civic_vep.tsv
├── report
Expand Down
Binary file modified example_files/cgi_test_out/cgi_test_out.cgi_results.zip
Binary file not shown.
178 changes: 89 additions & 89 deletions example_files/cgi_test_out/cgi_test_out.cgi_results/alterations.tsv

Large diffs are not rendered by default.

665 changes: 377 additions & 288 deletions example_files/cgi_test_out/cgi_test_out.cgi_results/biomarkers.tsv

Large diffs are not rendered by default.

This file was deleted.

180 changes: 90 additions & 90 deletions example_files/cgi_test_out/cgi_test_out.cgi_results/input01.tsv

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
CGI query date: 2023-05-30
CGI query date: 2024-09-09
API version: https://www.cancergenomeinterpreter.org/api/v1/
Input mutations: /Users/students/Documents/work_dir/querynator/example_files/example.vcf
Input mutations: /home-link/zxmgc83/querynator/example_files/example.vcf
Reference genome: GRCh37
Filtered out synonymous & low impact variants based on VEP annotation
24 changes: 0 additions & 24 deletions example_files/cgi_test_out/cgi_test_out.cgi_results/report.txt

This file was deleted.

29 changes: 29 additions & 0 deletions example_files/cgi_test_out/cgi_test_out.cgi_results/summary.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
===============================
=== Summary of the analysis ===
===============================

=== CGI-ANALYSIS ===
Analysis Code: CGI_query
Analysis ID: 45474d2bd5ff2e2a81e3
CGI version: v23.12.2
Date: 2024-09-09 12:17:40

=== INPUT ===
Analysed mutations: 88
Analysed cnas: 0
Analysed fusions: 0
Total samples: 1
Cancer type: BRCA
Reference genome: hg19

=== ALTERATIONS ===
Driver mutations: 84
Predicted and annotated drivers: 47
Predicted drivers: 4
Annotated drivers: 33

=== BIOMARKERS ===
Biomarkers in cancer type: 33
Biomarkers in cancer type - Level A: 1
Biomarkers in other cancer type: 441

Large diffs are not rendered by default.

35 changes: 18 additions & 17 deletions example_files/civic_test_out/civic_test_out.civic_results.tsv

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions example_files/civic_test_out/metadata.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
CIViC query date: 2023-05-30
CIViC query date: 2024-09-09
CIViCpy version: 3.0.0
Search mode: exact
Reference genome: GRCh37
Filtered out synonymous & low impact variants based on VEP annotation
Input File: /Users/students/Documents/work_dir/querynator/example_files/example.vcf
Input File: /home-link/zxmgc83/querynator/example_files/example.vcf

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions querynator/query_api/civic_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ def append_to_dict(dict1, dict2):
dict1[key].append(value)
else:
if dict1[key] == "":
dict1[key] = dict2[key]
dict1[key] = [dict2[key]]
else:
dict1[key] = [dict1[key], dict2[key]]

Expand Down Expand Up @@ -417,12 +417,12 @@ def get_evidence_information_from_variant(variant_obj, diseases):
"evidence_level": evidence.evidence_level,
"evidence_support": evidence.evidence_direction,
"evidence_type": evidence.evidence_type,
"evidence_phenotypes": ", ".join([i.name for i in evidence.phenotypes]),
"evidence_phenotypes": "+".join([i.name for i in evidence.phenotypes]),
"evidence_rating": evidence.rating,
"evidence_significance": evidence.significance,
"evidence_source": evidence.source.name,
"evidence_status": evidence.status,
"evidence_therapies": ", ".join([i.name for i in evidence.therapies]),
"evidence_therapies": "+".join([i.name for i in evidence.therapies]),
"evidence_therapy_interaction_type": evidence.therapy_interaction_type,
}
except IndexError:
Expand Down
45 changes: 32 additions & 13 deletions querynator/report_scripts/combine_cgi.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,9 @@ def link_biomarkers(biomarkers_df, logger):

def get_highest_evidence(row, biomarkers_linked):
"""
get highest associated CGI evidence of the current alteration (A-D) from the biomarkers datafrane
get highest associated CGI evidence of the current alteration (A-D) from the biomarkers dataframe.
consider evidence matched on gene, alteration and cancer type, as well as off-label use (level A evidence for different cancer is level C evidence for this cancer).
:param row: row of a pandas DataFrame
:type row: pandas Series
Expand All @@ -308,10 +310,8 @@ def get_highest_evidence(row, biomarkers_linked):
if row["Protein Change_CGI"].startswith("*"):
row["Protein Change_CGI"] = row["Protein Change_CGI"].replace("*", "\*")

# highest evidence level has lowest char value (A<B<C<D)
max_evidence_level = biomarkers_linked.loc[
(biomarkers_linked["alterations_link"].str.contains(row["Protein Change_CGI"]))
]["Evidence"].min()
curr_alteration_msk = biomarkers_linked["alterations_link"].str.contains(row["Protein Change_CGI"])
max_evidence_level = biomarkers_linked.loc[curr_alteration_msk, "Evidence"].min()

return max_evidence_level

Expand Down Expand Up @@ -342,6 +342,30 @@ def check_wildtypes(biomarkers: pd.DataFrame, vcf: pd.DataFrame, logger) -> None
return


def filter_biomarkers(biomarkers_df: pd.DataFrame, logger) -> pd.DataFrame:
"""
adapt biomarkers to only consider
- "complete" biomarkers (gene, alteration)
- matches between alteration & biomarker (gene, alteration, cancer type)
- off-label use (level A evidence for different cancer is level C evidence for this cancer)
:biomarkers_df : the dataframe containing the cgi result 'biomarkers.tsv'
:logger : the logger
:return : the adapted biomarkers dataframe
"""
complete_biom_msk = biomarkers_df.BioM == "complete"
match_msk = biomarkers_df["Match"] == "YES"
off_label_msk = ~match_msk & (biomarkers_df["Evidence"] == "A")
filter = complete_biom_msk & (match_msk | off_label_msk)

biomarkers_df.loc[off_label_msk, "Evidence"] = "C"
biomarkers_df["alterations_link"] = biomarkers_df["alterations_link"].astype(str)

logger.info(f"CGI: filtered {(~filter).sum()} irrelevant biomarkers, {filter.sum()} remaining")

return biomarkers_df.loc[filter]


def combine_cgi(cgi_path, outdir, logger):
"""
Command to combine the cgi results with the vcf's VEP annotation
Expand Down Expand Up @@ -369,19 +393,14 @@ def combine_cgi(cgi_path, outdir, logger):
alterations_df = read_modify_alterations(alterations_path)
merged_df = merge_alterations_vep(vep_df, alterations_df)

# link alterations in biomarkers
# link alterations in biomarkers & filter
biomarkers_df = link_biomarkers(biomarkers_df, logger)
biomarkers_df.to_csv(f"{outdir}/combined_files/biomarkers_linked.tsv", sep="\t", index=False)
biomarkers_df = filter_biomarkers(biomarkers_df, logger)
biomarkers_df.to_csv(f"{outdir}/combined_files/biomarkers_linked_filtered.tsv", sep="\t", index=False)

check_wildtypes(biomarkers_df, vep_df, logger)

# add CGI evidence col to merged_df

# adapt biomarkers to only consider "complete" matches between alteration & biomarker
biomarkers_df = biomarkers_df[biomarkers_df.BioM == "complete"]
# biomarkers_linked["alterations_link"] = biomarkers_linked["alterations_link"].astype(str)
biomarkers_df["alterations_link"] = biomarkers_df["alterations_link"].apply(str)
# add CGI evidence col
merged_df["evidence_CGI"] = merged_df.apply(lambda x: get_highest_evidence(x, biomarkers_df), axis=1)
# write merged to report dir
merged_df.to_csv(f"{outdir}/combined_files/alterations_vep.tsv", sep="\t", index=False)
2 changes: 1 addition & 1 deletion querynator/report_scripts/create_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -705,7 +705,7 @@ def create_report_htmls(outdir, basename, civic_path, logger):

# read in files
vep_civic_cgi_merge = pd.read_csv(f"{outdir}/combined_files/civic_cgi_vep.tsv", sep="\t")
biomarkers_df = pd.read_csv(f"{outdir}/combined_files/biomarkers_linked.tsv", sep="\t")
biomarkers_df = pd.read_csv(f"{outdir}/combined_files/biomarkers_linked_filtered.tsv", sep="\t")
metadata_civic = f"{civic_path}/metadata.txt" # read reference genome from metadata file
# get path to save individual reports
report_path = f"{os.path.abspath(outdir)}/report/variant_reports"
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from setuptools import find_packages, setup

VERSION = "0.5.4"
VERSION = "0.5.5"

with open("README.rst") as readme_file:
readme = readme_file.read()
Expand Down

0 comments on commit 21b8e58

Please sign in to comment.