-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GATEWAY_TIMEOUT encountered when building full GFE for release 3470 #153
Comments
@pbashyal-nmdp |
Looks like it timed out with Does retrying work ? |
Retrying as the same effect. I ran it in a debugger and it gave me this for the exact same allele Exception has occurred: TypeError
can only concatenate str (not "ApiException") to str
During handling of the above exception, another exception occurred:
File "/Users/ammon/Documents/00-Projects/nmdp-bioinformatics/02-Repositories/gfe-db/gfe-db/pipeline/jobs/build/src/app.py", line 372, in gfe_from_allele
features, gfe = gfe_maker.get_gfe(ann, locus)
File "/Users/ammon/Documents/00-Projects/nmdp-bioinformatics/02-Repositories/gfe-db/gfe-db/pipeline/jobs/build/src/app.py", line 581, in <module>
gfe = gfe_from_allele( The same error is also in the original error message. I'll try working with seq-ann and making individual API calls to the feature service with this allele to see if I can get to the bottom of it. |
When I built this one allele on its own it worked fine, no timeout encountered. I wonder if gfe-db is overloading the feature service API during the build. I wouldn't be surprised because the build proceeds extremely rapidly processing ~30,000 alleles in 15-20 minutes, I think that's around 20-25 alleles per second. That places a constant, very high load for on the API for the duration of the build. For alleles that encounter timeouts, I think it might be a good approach to decouple the retry mechanism from the build in gfe-db. I did some math and even minimal retry could drastically increase the build time and cost if even 20 alleles fail out of ~30,000. Decoupling the retry logic would also make it easier to set an alarm threshold if lots of alleles are failing. I'll follow up on this in a separate issue for gfe-db. |
Yes, feature service is getting overloaded. I'll look into adding a caching layer for the service. That should help with other uses as well. |
I guess another option would be to rate limit requests, but this will increase the build time. The current build is done on a c5d.2xlarge at $0.384 per hour and it takes around 20 minutes, so it could go a lot longer before cost becomes any kind of an issue. Honestly I think rate-limiting might be easier to implement than caching on the API side. |
seq-ann 1.1.0 threw an error during the gfe-db build job with the following stacktrace:
This occurred in the context of a full build for all alleles for version 3470.
Here is the call in the script that failed:
https://github.com/abk7777/gfe-db/blob/135d179e1f9295c5b8b72bfbd61d8789db30f0e2/gfe-db/pipeline/jobs/build/src/app.py#L580-L582
Using these dependencies:
The text was updated successfully, but these errors were encountered: