[WIP] bring back AWS hubs #467

scottyhq · 2019-10-30T06:10:06Z

@jhamman - take a look.

@yuvipanda i've removed tiller from the cluster, so this might require some changes to hubploy to get the helm tiller startci bit to work. see https://github.com/rimusz/helm-tiller.

yuvipanda · 2019-10-30T18:38:48Z

Thanks, @scottyhq!

I've mixed feelings about running without tiller.

With http://z2jh.jupyter.org/en/latest/setup-helm.html#secure-helm, we disallow tiller access over the network. So you already need permissions to exec / port forward into the kube-system namespace to access tiller. At that point you are screwed anyway. I agree it is super insecure without that step.
Helm 3 is coming and I'd rather do one migration than 2

The bigger security issues IMO are:

Publicly accessible kubernetes API endpoint
Users can create arbitrary pods - this practically gives them root access. PodSecurityPolicy will fix this
NetworkPolicy gives us more control over what network resources can be accessed. You want your users to be able to hit proxy, hub, external internet (maybe only DNS, https and http?) and nothing else. This provides great defense in depth.

IMO our time is better spent on these than removing tiller before helm 3.

scottyhq · 2019-10-30T21:14:20Z

Thanks, @scottyhq!

I've mixed feelings about running without tiller.

With http://z2jh.jupyter.org/en/latest/setup-helm.html#secure-helm, we disallow tiller access over the network. So you already need permissions to exec / port forward into the kube-system namespace to access tiller. At that point you are screwed anyway. I agree it is super insecure without that step.

Helm 3 is coming and I'd rather do one migration than 2

Fair points! We're definitely willing to wait on this. The issue I think is people not following the documentation exactly (...which can't be avoided ;), or potentially following different cluster-setup instructions leading to precarious situations (jupyterhub/zero-to-jupyterhub-k8s#616 (comment)). For what it's worth I found it quite easy to remove tiller and sill work with helm2 locally with these steps:

convert existing releases from configmaps to secrets https://github.com/dragonsmith/tiller-releases-converter

tiller-releases-converter convert
tiller-releases-converter cleanup

delete on-cluster tiller things

kubectl delete deployment tiller-deploy -n kube-system
kubectl delete clusterrolebinding tiller
kubectl delete clusterrolebinding cluster-admin-binding
kubectl delete serviceaccount tiller --namespace=kube-system

from then on run helm commands with tiller running on local machine as described here (https://github.com/rimusz/helm-tiller)

helm tiller startci
helm upgrade ....
helm tiller stop

The bigger security issues IMO are:

Publicly accessible kubernetes API endpoint

Users can create arbitrary pods - this practically gives them root access. PodSecurityPolicy will fix this

NetworkPolicy gives us more control over what network resources can be accessed. You want your users to be able to hit proxy, hub, external internet (maybe only DNS, https and http?) and nothing else. This provides great defense in depth.

IMO our time is better spent on these than removing tiller before helm 3.

Agreed. And always grateful for your thoughts on this @yuvipanda! We're finding it hard to stay on top of Kubernetes and always appreciate ideas and contributions from anyone out there for improved configurations.

scottyhq · 2020-02-14T01:40:17Z

@tjcrone @jhamman - I'd really like to merge this into staging now that we are using helm 3 (#543)! I was going to change the deployment folder from icesat2 to aws-uswest2 but hubploy is setup to use the folder name, and i don't think it's easy to change the kubernetes namespace and release names (would have to start from scratch. The release history shows for me running helm3 locally.

(base) [ec2-user@ip-192-168-49-107 pangeo-cloud-federation]$ helm3 history -n icesat2-staging icesat2-staging
REVISION	UPDATED                 	STATUS    	CHART              	APP VERSION	DESCRIPTION
255     	Fri Nov  1 21:50:21 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
256     	Fri Nov  1 22:23:48 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
257     	Fri Nov 22 18:05:59 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
258     	Thu Dec 19 00:45:08 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade "icesat2-staging" failed: timed out waiting for the condition
259     	Thu Dec 19 00:58:26 2019	superseded	pangeo-deploy-0.1.0	1.0        	Rollback to 257
260     	Thu Dec 19 00:59:01 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
261     	Sun Dec 22 20:33:10 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
262     	Sun Dec 22 20:43:07 2019	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
263     	Mon Feb 10 20:21:33 2020	superseded	pangeo-deploy-0.1.0	1.0        	Upgrade complete
264     	Wed Feb 12 22:37:37 2020	deployed  	pangeo-deploy-0.1.0	1.0        	Upgrade complete

I'd love to get dynamic IP whitelisting configured for hubploy (berkeley-dsep-infra/hubploy#39), but for now will enable public access to test that this works.

scottyhq · 2020-02-14T01:41:18Z

This PR removes the deployment/esip because it no longer exists, but the ESIP docker image is an option in the aws-uswest2 hub. I'll follow up with a separate PR for the aws-useast1 hub

tjcrone · 2020-02-14T02:05:01Z

Looks good. Let's try it!

tjcrone · 2020-02-14T02:20:14Z

Regarding the name change, since we use a separate nfs server for user home directories, we have had no problems doing delete --purge on helm deployments, both staging and prod, when necessary. Your mileage may vary and you should be careful with deletes because they can be destructive with some configurations. But if you think it will work, this could be a way to rename things and if it doesn't ruin everything, it will probably work great. :-)

tjcrone · 2020-02-14T02:20:49Z

Also, I think you are good to merge this when you are ready.

rsignell-usgs mentioned this pull request Oct 30, 2019

AWS Deployment pangeo-data/pangeo#71

Closed

scottyhq changed the title ~~initial attempt to bring back aws hub~~ [WIP] bring back AWS hubs Oct 30, 2019

scottyhq mentioned this pull request Feb 13, 2020

New helm versions break ocean and dev #542

Closed

scottyhq added 6 commits February 13, 2020 22:50

initial attempt to bring back aws hub

038b340

change login page message

d93f24d

added esip image

223f271

add nvidia docker runtime env variable for cuda driver mount

e960329

point to pangeo ml image

e3d873d

point to icesat2 2020 image

cb66b09

scottyhq force-pushed the icesat2-refactor branch from 44701d5 to cb66b09 Compare February 13, 2020 23:00

scottyhq and others added 2 commits February 14, 2020 01:21

rename folder to icesat2, remove esip deployment

3f8af08

fix yaml indent

d96a10c

scottyhq merged commit abce21b into pangeo-data:staging Feb 14, 2020

This was referenced Feb 14, 2020

Updating esip to latest pangeo notebook image #455

Closed

awscli fix for aws hubs #549

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] bring back AWS hubs #467

[WIP] bring back AWS hubs #467

scottyhq commented Oct 30, 2019

yuvipanda commented Oct 30, 2019

scottyhq commented Oct 30, 2019 •

edited

Loading

scottyhq commented Feb 14, 2020 •

edited

Loading

scottyhq commented Feb 14, 2020

tjcrone commented Feb 14, 2020

tjcrone commented Feb 14, 2020

tjcrone commented Feb 14, 2020

[WIP] bring back AWS hubs #467

[WIP] bring back AWS hubs #467

Conversation

scottyhq commented Oct 30, 2019

yuvipanda commented Oct 30, 2019

scottyhq commented Oct 30, 2019 • edited Loading

scottyhq commented Feb 14, 2020 • edited Loading

scottyhq commented Feb 14, 2020

tjcrone commented Feb 14, 2020

tjcrone commented Feb 14, 2020

tjcrone commented Feb 14, 2020

scottyhq commented Oct 30, 2019 •

edited

Loading

scottyhq commented Feb 14, 2020 •

edited

Loading