Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: replace pyliftover with agct to improve performance #264

Merged
merged 3 commits into from
Feb 2, 2024

Conversation

korikuzma
Copy link
Member

close #263

@korikuzma korikuzma added priority:high High priority performance Improvements to performance build Changes that affect the build system or dependencies labels Feb 1, 2024
@korikuzma korikuzma self-assigned this Feb 1, 2024
jsstevenson
jsstevenson previously approved these changes Feb 1, 2024
@jsstevenson jsstevenson self-requested a review February 1, 2024 18:34
Comment on lines 23 to 24
LIFTOVER_CHAIN_37_TO_38 = environ.get("LIFTOVER_CHAIN_37_TO_38")
LIFTOVER_CHAIN_38_TO_37 = environ.get("LIFTOVER_CHAIN_38_TO_37")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@korikuzma are these still used? agct should just be pulling them via wags-tails

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh, I think we need to provide support to agct to allow env vars to be passed for the chain files. I plan on storing these files in EFS and pointing to that mounted location

Copy link
Member

@jsstevenson jsstevenson Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to set WAGS_TAILS_DIR to a parent directory, if that helps. Alternatively, we can have agct pass along a data_dir argument to the CustomData class from wags-tails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsstevenson What if we want to bypass wags-tails all together?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsstevenson This is what I would propose:

Current:

    def __init__(self, from_db: Genome, to_db: Genome) -> None:
        ....
        data_handler = CustomData(
            f"chainfile_{from_db.value}_to_{to_db.value}",
            "chain",
            lambda: "",
            self._download_function_builder(from_db, to_db),
            data_dir=get_data_dir() / "ucsc-chainfile",
        )
        file, _ = data_handler.get_latest()

Suggested change:

    def __init__(self, from_db: Genome, to_db: Genome, file: Optional[Path] = None) -> None:
        ...
        if not file:
            data_handler = CustomData(
                f"chainfile_{from_db.value}_to_{to_db.value}",
                "chain",
                lambda: "",
                self._download_function_builder(from_db, to_db),
                data_dir=get_data_dir() / "ucsc-chainfile",
            )
            file, _ = data_handler.get_latest()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think so -- make a PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, wanted to make sure I wasn't missing something before making changes.

@korikuzma korikuzma merged commit 775981e into main Feb 2, 2024
13 checks passed
@korikuzma korikuzma deleted the issue-263 branch February 2, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Changes that affect the build system or dependencies performance Improvements to performance priority:high High priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace pyliftover with agct
2 participants