Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] bring back AWS hubs #467

Merged
merged 8 commits into from
Feb 14, 2020
Merged

Conversation

scottyhq
Copy link
Member

@jhamman - take a look.

@yuvipanda i've removed tiller from the cluster, so this might require some changes to hubploy to get the helm tiller startci bit to work. see https://github.com/rimusz/helm-tiller.

@yuvipanda
Copy link
Member

Thanks, @scottyhq!

I've mixed feelings about running without tiller.

  1. With http://z2jh.jupyter.org/en/latest/setup-helm.html#secure-helm, we disallow tiller access over the network. So you already need permissions to exec / port forward into the kube-system namespace to access tiller. At that point you are screwed anyway. I agree it is super insecure without that step.
  2. Helm 3 is coming and I'd rather do one migration than 2

The bigger security issues IMO are:

  1. Publicly accessible kubernetes API endpoint
  2. Users can create arbitrary pods - this practically gives them root access. PodSecurityPolicy will fix this
  3. NetworkPolicy gives us more control over what network resources can be accessed. You want your users to be able to hit proxy, hub, external internet (maybe only DNS, https and http?) and nothing else. This provides great defense in depth.

IMO our time is better spent on these than removing tiller before helm 3.

@scottyhq
Copy link
Member Author

scottyhq commented Oct 30, 2019

Thanks, @scottyhq!

I've mixed feelings about running without tiller.

  1. With http://z2jh.jupyter.org/en/latest/setup-helm.html#secure-helm, we disallow tiller access over the network. So you already need permissions to exec / port forward into the kube-system namespace to access tiller. At that point you are screwed anyway. I agree it is super insecure without that step.
  2. Helm 3 is coming and I'd rather do one migration than 2

Fair points! We're definitely willing to wait on this. The issue I think is people not following the documentation exactly (...which can't be avoided ;), or potentially following different cluster-setup instructions leading to precarious situations (jupyterhub/zero-to-jupyterhub-k8s#616 (comment)). For what it's worth I found it quite easy to remove tiller and sill work with helm2 locally with these steps:

  1. convert existing releases from configmaps to secrets https://github.com/dragonsmith/tiller-releases-converter
tiller-releases-converter convert
tiller-releases-converter cleanup
  1. delete on-cluster tiller things
kubectl delete deployment tiller-deploy -n kube-system
kubectl delete clusterrolebinding tiller
kubectl delete clusterrolebinding cluster-admin-binding
kubectl delete serviceaccount tiller --namespace=kube-system
  1. from then on run helm commands with tiller running on local machine as described here (https://github.com/rimusz/helm-tiller)
helm tiller startci
helm upgrade ....
helm tiller stop

The bigger security issues IMO are:

  1. Publicly accessible kubernetes API endpoint
  2. Users can create arbitrary pods - this practically gives them root access. PodSecurityPolicy will fix this
  3. NetworkPolicy gives us more control over what network resources can be accessed. You want your users to be able to hit proxy, hub, external internet (maybe only DNS, https and http?) and nothing else. This provides great defense in depth.

IMO our time is better spent on these than removing tiller before helm 3.

Agreed. And always grateful for your thoughts on this @yuvipanda! We're finding it hard to stay on top of Kubernetes and always appreciate ideas and contributions from anyone out there for improved configurations.

@scottyhq scottyhq changed the title initial attempt to bring back aws hub [WIP] bring back AWS hubs Oct 30, 2019
@scottyhq
Copy link
Member Author

scottyhq commented Feb 14, 2020

@tjcrone @jhamman - I'd really like to merge this into staging now that we are using helm 3 (#543)! I was going to change the deployment folder from icesat2 to aws-uswest2 but hubploy is setup to use the folder name, and i don't think it's easy to change the kubernetes namespace and release names (would have to start from scratch. The release history shows for me running helm3 locally.

(base) [ec2-user@ip-192-168-49-107 pangeo-cloud-federation]$ helm3 history -n icesat2-staging icesat2-staging
REVISION	UPDATED                 	STATUS    	CHART              	APP VERSION	DESCRIPTION
255     	Fri Nov  1 21:50:21 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
256     	Fri Nov  1 22:23:48 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
257     	Fri Nov 22 18:05:59 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
258     	Thu Dec 19 00:45:08 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade "icesat2-staging" failed: timed out waiting for the condition
259     	Thu Dec 19 00:58:26 2019	superseded	pangeo-deploy-0.1.0	1.0        	Rollback to 257
260     	Thu Dec 19 00:59:01 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
261     	Sun Dec 22 20:33:10 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
262     	Sun Dec 22 20:43:07 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
263     	Mon Feb 10 20:21:33 2020	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
264     	Wed Feb 12 22:37:37 2020	deployed  	pangeo-deploy-0.1.0	1.0        	Upgrade complete

I'd love to get dynamic IP whitelisting configured for hubploy (berkeley-dsep-infra/hubploy#39), but for now will enable public access to test that this works.

@scottyhq
Copy link
Member Author

This PR removes the deployment/esip because it no longer exists, but the ESIP docker image is an option in the aws-uswest2 hub. I'll follow up with a separate PR for the aws-useast1 hub

@tjcrone
Copy link
Contributor

tjcrone commented Feb 14, 2020

Looks good. Let's try it!

@tjcrone
Copy link
Contributor

tjcrone commented Feb 14, 2020

Regarding the name change, since we use a separate nfs server for user home directories, we have had no problems doing delete --purge on helm deployments, both staging and prod, when necessary. Your mileage may vary and you should be careful with deletes because they can be destructive with some configurations. But if you think it will work, this could be a way to rename things and if it doesn't ruin everything, it will probably work great. :-)

@tjcrone
Copy link
Contributor

tjcrone commented Feb 14, 2020

Also, I think you are good to merge this when you are ready.

@scottyhq scottyhq merged commit abce21b into pangeo-data:staging Feb 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants