Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update parameters and callsets #34

Closed
wants to merge 12 commits into from
Closed

Conversation

FelixMoelder
Copy link
Contributor

@FelixMoelder FelixMoelder commented Aug 4, 2023

As we use panel data having high coverage for the mtb the preconfigured max read depth for varlociraptor preprocess was to low. To get a correct estimation the max-depth has been set to 30000 in the config.yaml.

Also considering the read position bias lead to missing variants in the past and therefore will be omitted.

@FelixMoelder FelixMoelder changed the title fix: increase max read depth fix: update parameters and callsets Nov 13, 2023
@@ -11,14 +11,20 @@ __definitions__:
samples = params.samples.set_index("alias")
if "ffpe" not in samples.columns:
samples["ffpe"] = pd.NA
- sex = samples.loc["tumor", "sex"]
- sex = samples.loc[["tumor"], "sex"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the brackets?

Copy link
Contributor Author

@FelixMoelder FelixMoelder Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of groups with just one entry sample.loc["tumor", "sex"] will just return sex as a string.
But if there are multiple entries for a group sex will become a series.
In the previous implementation rendering the scenario only worked for groups with a single entry.
Changing sex to sample.loc[["tumor"], "sex"] will always return a series allowing to render single and multiple entries correctly.

Edit: In your other comment you mentioned that each alias should only occur once. So if we handle multiple panels by prefix this change probably also becomes unnecessary.

if len(samples.loc[["tumor"], "ffpe"].unique()) != 1:
raise ValueError(f"All samples within a group must to be either ffpe or not.")
- |
if len(samples.loc[["tumor"], "purity"].unique()) != 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each alias should occur only once in a group. We should also check for that when validating the sample sheet. If there are two panels for a patient we could name the two tumors tumor_panelname1 and 2. the scenario could support that by looking for the prefix tumor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants