-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: replace pyliftover with agct to improve performance #264
Conversation
LIFTOVER_CHAIN_37_TO_38 = environ.get("LIFTOVER_CHAIN_37_TO_38") | ||
LIFTOVER_CHAIN_38_TO_37 = environ.get("LIFTOVER_CHAIN_38_TO_37") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@korikuzma are these still used? agct
should just be pulling them via wags-tails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooooh, I think we need to provide support to agct to allow env vars to be passed for the chain files. I plan on storing these files in EFS and pointing to that mounted location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to set WAGS_TAILS_DIR
to a parent directory, if that helps. Alternatively, we can have agct
pass along a data_dir
argument to the CustomData
class from wags-tails
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsstevenson What if we want to bypass wags-tails all together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsstevenson This is what I would propose:
Current:
def __init__(self, from_db: Genome, to_db: Genome) -> None:
....
data_handler = CustomData(
f"chainfile_{from_db.value}_to_{to_db.value}",
"chain",
lambda: "",
self._download_function_builder(from_db, to_db),
data_dir=get_data_dir() / "ucsc-chainfile",
)
file, _ = data_handler.get_latest()
Suggested change:
def __init__(self, from_db: Genome, to_db: Genome, file: Optional[Path] = None) -> None:
...
if not file:
data_handler = CustomData(
f"chainfile_{from_db.value}_to_{to_db.value}",
"chain",
lambda: "",
self._download_function_builder(from_db, to_db),
data_dir=get_data_dir() / "ucsc-chainfile",
)
file, _ = data_handler.get_latest()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think so -- make a PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, wanted to make sure I wasn't missing something before making changes.
close #263