bugfix(etcd) - use count.index for etcd instance subnets #82

lachie83 · 2016-10-10T18:57:34Z

etcd instances are currently pinned to a single subnet which means a single AZ.

Pinning them cross AZ

lachie83 · 2016-10-10T18:58:17Z

Tested with make all and confirmed a got a clean k8s cluster

wellsie · 2016-10-11T16:45:32Z

Thanks Lachlan. I'm hesitant to spread the etcd nodes across AZs until I have had a chance to thoroughly research the implications. I have read that when it comes to raft it is best to keep nodes as close to each other as possible. It also seems that the guidance from Kubernetes is to keep clusters within an AZ. Federation seems to be the recommended approach to multi -AZ -Region support - I will be introducing examples to tack in coming weeks.

Having said that I think that if people like yourself are happily running etcd across AZs that it would be worthwhile to include that as a readily available option in tack. I'll keep this pr open whilst investigating.

lachie83 · 2016-10-11T17:55:23Z

Thanks @wellsie. What's your stance on failure domains with regard to this project? If it's a production-ready k8s cluster then we should document the fact that it's a vertically stacked cluster confined to a single AZ (and update the terraform to reflect). Given your statement above we should probably bind the worker ASG to the same subnet as the AZ of the etcd/master nodes, only use a single master/etcd node then stomp out clusters horizontally cross-AZ in-region.

wellsie · 2016-10-12T16:39:30Z

So far I have been less concerned about spreading worker nodes across AZs - they will continue to run without the control plane. If one did not want to federate clusters then I think the current configuration would be a good starting point.

I'm not sure that running a single etcd node per cluster, even with a federated solution, would be prudent.

lachie83 · 2016-10-12T19:13:17Z

Thanks @wellsie - LMK if I can be of any assistance with the investigation.

mirthy · 2016-10-17T17:20:26Z

We've been running clusters with both etcd and masters split across AZs and I haven't noticed any problems so far. Then again our clusters are pretty small and we haven't had any problems with AZs themselves either.

rimusz · 2016-10-27T14:25:13Z

etcd v2 is very robust comparing with etcd v1, there are no issues spreading etcd nodes across AZs :)

seanknox · 2016-10-30T19:57:22Z

@rimusz @lachie83 Hey y'all—do you know of documentation/case studies/etc re: etcd v2 stability across data centers/AZs?

lachie83 · 2016-10-31T05:01:10Z

Hi @seanknox. Unfortunately I do not. It might be worth looking into writing one up. LMK if you come across anything in your travels

Lachlan Evenson added 2 commits October 7, 2016 16:14

use count.index for etcd subnets rather than pinning to index 0

c4537f0

update etcd default ips

a77fc58

wellsie self-assigned this Oct 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix(etcd) - use count.index for etcd instance subnets #82

bugfix(etcd) - use count.index for etcd instance subnets #82

lachie83 commented Oct 10, 2016

lachie83 commented Oct 10, 2016

wellsie commented Oct 11, 2016

lachie83 commented Oct 11, 2016

wellsie commented Oct 12, 2016

lachie83 commented Oct 12, 2016

mirthy commented Oct 17, 2016

rimusz commented Oct 27, 2016

seanknox commented Oct 30, 2016

lachie83 commented Oct 31, 2016

bugfix(etcd) - use count.index for etcd instance subnets #82

Are you sure you want to change the base?

bugfix(etcd) - use count.index for etcd instance subnets #82

Conversation

lachie83 commented Oct 10, 2016

lachie83 commented Oct 10, 2016

wellsie commented Oct 11, 2016

lachie83 commented Oct 11, 2016

wellsie commented Oct 12, 2016

lachie83 commented Oct 12, 2016

mirthy commented Oct 17, 2016

rimusz commented Oct 27, 2016

seanknox commented Oct 30, 2016

lachie83 commented Oct 31, 2016