-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
increase default timeout in helm upgrade command #17
Comments
@jhamman I guess this is because the |
@scottyhq - have you still been getting timeouts? I think things have actually stabilized... |
we ran into this same error again with a new deployment https://circleci.com/gh/pangeo-data/pangeo-cloud-federation/480 our manual work-around is documented here pangeo-data/pangeo-cloud-federation#207 |
digging in a bit further this second time, it seems like the hub pod is stuck in pending b/c of the resources available (we are running two hubs - staging, prod - on a single m5.large instance). Here are some relevant commands and output
although it seems like our hub should fit on there based on the requested resources... https://github.com/pangeo-data/pangeo-cloud-federation/blob/staging/deployments/nasa/config/common.yaml#L44 |
also the nasa-prod proxy pod (proxy-549fc7cb4d-qd9fm) went onto a user-notebook pod...
|
ok, so the situation is that the hub tries to go onto the existing instance that is in
|
I'd recommend not putting the hub pod on the EFS drive, since sqlite and NFS don't mix very well. Not sure why EKS / kops is putting EBS volumes in a different zone than the k8s cluster :( |
Thanks for the feedback @yuvipanda! We are currently running staging.nasa.pangeo.io entirely on EFS. So We'll see if there are issues there with the Hub database that come up... I ended up setting up our 'default' provisioner to https://github.com/kubernetes-incubator/external-storage/tree/master/aws/efs (instead of EBS/gp2). It seems like the EBS going into different zones is a well known issue. Since EKS added Kuberenetes >1.12 recently. we can now use "topology-aware volume provisioning" if need be |
@scottyhq cool! with sqlite on NFS, the thing to watch out for is times when hub response times spike up, often to something like 1 - 5seconds per request, cascading pretty badly quickly. This hit us at Berkeley earlier, and the 'fix' was to move off shared storage. Otherwise, for the kinda workload we have it works fine. |
In one of our jupyterhub deployments (running on eks), we've been getting semi-regular
helm upgrade
timeouts when using hubploy:Would it be possible to add the
--timeout
option to hubploy's upgrade call? I believe the default is 300s.The text was updated successfully, but these errors were encountered: