Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix(etcd) - use count.index for etcd instance subnets #82

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lachie83
Copy link
Contributor

etcd instances are currently pinned to a single subnet which means a single AZ.

Pinning them cross AZ

@lachie83
Copy link
Contributor Author

Tested with make all and confirmed a got a clean k8s cluster

@wellsie wellsie self-assigned this Oct 11, 2016
@wellsie
Copy link
Member

wellsie commented Oct 11, 2016

Thanks Lachlan. I'm hesitant to spread the etcd nodes across AZs until I have had a chance to thoroughly research the implications. I have read that when it comes to raft it is best to keep nodes as close to each other as possible. It also seems that the guidance from Kubernetes is to keep clusters within an AZ. Federation seems to be the recommended approach to multi -AZ -Region support - I will be introducing examples to tack in coming weeks.

Having said that I think that if people like yourself are happily running etcd across AZs that it would be worthwhile to include that as a readily available option in tack. I'll keep this pr open whilst investigating.

@lachie83
Copy link
Contributor Author

Thanks @wellsie. What's your stance on failure domains with regard to this project? If it's a production-ready k8s cluster then we should document the fact that it's a vertically stacked cluster confined to a single AZ (and update the terraform to reflect). Given your statement above we should probably bind the worker ASG to the same subnet as the AZ of the etcd/master nodes, only use a single master/etcd node then stomp out clusters horizontally cross-AZ in-region.

@wellsie
Copy link
Member

wellsie commented Oct 12, 2016

So far I have been less concerned about spreading worker nodes across AZs - they will continue to run without the control plane. If one did not want to federate clusters then I think the current configuration would be a good starting point.

I'm not sure that running a single etcd node per cluster, even with a federated solution, would be prudent.

@lachie83
Copy link
Contributor Author

Thanks @wellsie - LMK if I can be of any assistance with the investigation.

@mirthy
Copy link
Contributor

mirthy commented Oct 17, 2016

We've been running clusters with both etcd and masters split across AZs and I haven't noticed any problems so far. Then again our clusters are pretty small and we haven't had any problems with AZs themselves either.

@rimusz
Copy link

rimusz commented Oct 27, 2016

etcd v2 is very robust comparing with etcd v1, there are no issues spreading etcd nodes across AZs :)

@seanknox
Copy link

@rimusz @lachie83 Hey y'all—do you know of documentation/case studies/etc re: etcd v2 stability across data centers/AZs?

@lachie83
Copy link
Contributor Author

Hi @seanknox. Unfortunately I do not. It might be worth looking into writing one up. LMK if you come across anything in your travels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants