-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update parameters and callsets #34
Conversation
@@ -11,14 +11,20 @@ __definitions__: | |||
samples = params.samples.set_index("alias") | |||
if "ffpe" not in samples.columns: | |||
samples["ffpe"] = pd.NA | |||
- sex = samples.loc["tumor", "sex"] | |||
- sex = samples.loc[["tumor"], "sex"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the brackets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of groups with just one entry sample.loc["tumor", "sex"]
will just return sex as a string.
But if there are multiple entries for a group sex will become a series.
In the previous implementation rendering the scenario only worked for groups with a single entry.
Changing sex to sample.loc[["tumor"], "sex"]
will always return a series allowing to render single and multiple entries correctly.
Edit: In your other comment you mentioned that each alias should only occur once. So if we handle multiple panels by prefix this change probably also becomes unnecessary.
if len(samples.loc[["tumor"], "ffpe"].unique()) != 1: | ||
raise ValueError(f"All samples within a group must to be either ffpe or not.") | ||
- | | ||
if len(samples.loc[["tumor"], "purity"].unique()) != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each alias should occur only once in a group. We should also check for that when validating the sample sheet. If there are two panels for a patient we could name the two tumors tumor_panelname1 and 2. the scenario could support that by looking for the prefix tumor.
As we use panel data having high coverage for the mtb the preconfigured max read depth for varlociraptor preprocess was to low. To get a correct estimation the max-depth has been set to
30000
in theconfig.yaml
.Also considering the read position bias lead to missing variants in the past and therefore will be omitted.