Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEV] Add CoreDHCP to Helm #86

Open
3 tasks
rainest opened this issue Nov 7, 2024 · 8 comments
Open
3 tasks

[DEV] Add CoreDHCP to Helm #86

rainest opened this issue Nov 7, 2024 · 8 comments
Assignees

Comments

@rainest
Copy link
Contributor

rainest commented Nov 7, 2024

Short Description
Remove the existing dnsmasq Deployment from the chart and replace it with CoreDHCP, for OpenCHAMI/roadmap#50

Definition of Done

  • The chart spawns a CoreDHCP instance.
  • The chart spawns a tftpd service that CoreDHCP can use.
  • CoreDHCP is able to talk to the chart's SMD instance.

Additional context
Ref #78 and #84 for equivalent work on the Podman side.

@rainest rainest self-assigned this Nov 7, 2024
@rainest
Copy link
Contributor Author

rainest commented Nov 7, 2024

I'm not sure how much we considered the needs of the DHCP server for the original dnsmasq Deployment. It was running, but IDK if we ever had a proof of concept for anything talking to it.

The DHCP server will handle requests for hosts outside the Kubernetes network. Normal broadcast delivery will not work as such, and we'd need to forward traffic to it.

This is apparently how CSM handles DHCP also--it has a Kea DHCP instance exposing a LoadBalancer, with metallb BGP peering to node networks and forwarding rules to the LB address on the node network (see CSM's PXE troubleshooting and DHCP troubleshooting guides).

I'm unsure where all that gets configured for full CSM setups, but found a basic example of minimal configuration for such a setup.

I don't think there's any way to handle dynamic population of the server_id or router config or that dynamic handling would even be desirable. AFAIK these will need to be config parameters that we just trust you've set to the correct value. The V4 ID needs to match the spec.loadBalancerIP. I don't think any other in-cluster config cares about the V6 ID.


We oddly had an existing tftpd key, but weren't using it in any of the templates. It was added alongside dnsmasq and the dnsmasq built-in TFTP server configuration.

@rainest
Copy link
Contributor Author

rainest commented Nov 8, 2024

#87 provides a basic "it runs!" set of values and templates, with some caveats:

  • The coresmd image is currently busted after some possibly incomplete file rearrangement upstream. The /coredhcp path in the image is a directory with a README.md; it's apparently supposed to get replaced with a binary built from some templated Go. I hulk smashed the (working) release binary from the repo into a local image build instead.
  • SMD in the chart does not appear to serve TLS. I stuffed a fake cert into the SMD plugin config. The plugin appears to have connected over HTTP fine (it logged level=debug msg="EthernetInterfaces: map[]" prefix="plugins/coresmd" for my empty SMD instance with no errors).
  • Config generation needs design work. The current YAML in YAML in YAML templates approach works, but is ugly as sin.

@synackd
Copy link
Contributor

synackd commented Nov 8, 2024

The coresmd image is currently busted after some possibly incomplete file rearrangement upstream. The /coredhcp path in the image is a directory with a README.md; it's apparently supposed to get replaced with a binary built from some templated Go. I hulk smashed the (working) release binary from the repo into a local image build instead.

Does the latest version v0.0.5 work for you? I examined it and the /coredhcp is a binary in this version.

@alexlovelltroy
Copy link
Member

SMD in the chart does not appear to serve TLS. I stuffed a fake cert into the SMD plugin config. The plugin appears to have connected over HTTP fine (it logged level=debug msg="EthernetInterfaces: map[]" prefix="plugins/coresmd" for my empty SMD instance with no errors).

At this point, we have not enabled TLS at the SMD level and rely on the API gateway for TLS termination and signed tokens for authN/authZ. Having said that, we have the ACME pieces running and we could create and rotate TLS certificates for the microservices using that or we could protect them using an mTLS service mesh. This matters more for k8S deployments than it does in our podman deployments

Do you have a proposal for mTLS within k8s for SMD that doesn't preclude the current operations?

@alexlovelltroy
Copy link
Member

I don't think there's any way to handle dynamic population of the server_id or router config or that dynamic handling would even be desirable. AFAIK these will need to be config parameters that we just trust you've set to the correct value. The V4 ID needs to match the spec.loadBalancerIP. I don't think any other in-cluster config cares about the V6 ID.

You're driving at the right stuff here. We may need to explore options outside of the standard k8s networking in order to get this to work reliably. I've never understood how networking would work to bring DHCP properly into a remote k8s cluster without complex and unpleasant VLAN incantations. The solution in CSM only works because of direct connections to the worker nodes and plenty of VLAN tagging.

@synackd
Copy link
Contributor

synackd commented Nov 8, 2024

The coresmd image is currently busted after some possibly incomplete file rearrangement upstream. The /coredhcp path in the image is a directory with a README.md; it's apparently supposed to get replaced with a binary built from some templated Go. I hulk smashed the (working) release binary from the repo into a local image build instead.

Does the latest version v0.0.5 work for you? I examined it and the /coredhcp is a binary in this version.

@rainest Ah, I found the issue. We were originally pushing coresmd as the container name and then started pushing coredhcp. This led to the former one not working while the latter did. We have deleted the coresmd container to eliminate confusion. Going forward, we should use ghcr.io/openchami/coredhcp as the CoreDHCP container that has the coresmd plugins built-in.

Thanks for reporting the issue!

@synackd
Copy link
Contributor

synackd commented Nov 8, 2024

I will update the quickstart docker-compose recipe to use the correct container.

@synackd
Copy link
Contributor

synackd commented Nov 8, 2024

The above PR also fixes an issue where 'permission denied' would be seen when binding to port 67. Fixed in coresmd v0.0.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants