Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Staging env has broken public dataset download links #7318

Open
nayib-jose-gloria opened this issue Jul 31, 2024 · 0 comments
Open

Bug: Staging env has broken public dataset download links #7318

nayib-jose-gloria opened this issue Jul 31, 2024 · 0 comments

Comments

@nayib-jose-gloria
Copy link
Contributor

nayib-jose-gloria commented Jul 31, 2024

Example of dataset download link returning 403 in staging (but not prod): https://datasets.cellxgene.staging.single-cell.czi.technology/1e25d3e2-e3e7-49c6-a543-f378c15bfb8f.h5ad

Other datasets whose artifacts have this issue in staging: 2104fbb8-8ce3-4740-8b6a-bcbb46a13c0f, ff12e239-9292-4d25-bb0d-e4509b3bd92b

Early investigation shows this dataset artifact is not in the staging s3 bucket where we host those public dataset assets; its possible that this is the result of an early-terminated mirroring job (we have a script to mirror prod db + assets to staging, which can be locally run by engineers. It mirrors the DB first, then the assets.)

To confirm, we should download H5AD + strip labels with the cellxgene-schema CLI + reupload the H5AD in Staging and check whether the dataset download link now works.

Then, test that editing the dataset title and changing the DOI in the UI (both actions trigger a dataset update) does not cause the asset download link to start failing.

Finally, run the mirroring script (make mirror_env_data DEST_ENV=staging in single-cell-data-portal/backend) from prod -> staging and ensure it runs completely. Check whether this fixes the asset download links for the listed datasets above.

If any step above results in a broken download link, create a follow-up issue to investigate the bug and prioritize as a p0 as it may be affecting prod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant