Create several services route and svc endpoint overrides #416

stuggi · 2023-07-28T07:19:58Z

Creates the route for the keystoneapi, glance, placement, cinder, neutron, nova, heat, horizon, manila and swift. Also allows to customize the route via override.

Generates the service override for the env with what is configured in the externalEndpoints, or specified in the service template override.

Jira: OSP-26690

stuggi · 2023-07-28T07:20:05Z

/hold

openshift-ci · 2023-07-28T07:20:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [stuggi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

stuggi · 2023-07-28T07:21:46Z

in osctlplane
- the externalEndpoints is now outside the service template
- it has an override which allows to customize the route
- the service template has an override which allows to customize the routes for the service

  keystone:
    enabled: true
    externalEndpoints:
    - endpoint: internal
      ipAddressPool: internalapi
      loadBalancerIPs:
      - 172.17.0.80
      sharedIP: true
      sharedIPKey: ""
    override:
      route:
        metadata:
          annotations:
            "my": "annotation"
    template:
...
      override:
        service:
          - endpoint: public
            metadata:
              annotations:
                "my": "annotation"

the openstack-operator uses the service override to pass in the expected overrides for the endpoint services and results in:

apiVersion: keystone.openstack.org/v1beta1
kind: KeystoneAPI 
...
spec:
...
  override:      
    service:      
    - endpoint: public
      endpointURL: http://keystone-public-openstack.apps-crc.testing
      metadata:              
        annotations:
          dnsmasq.network.openstack.org/hostname: keystone-public.openstack.svc
          my: annotation
        labels:                               
          public: "true"         
          service: keystone
      spec:       
        ports:           
        - name: keystone-public               
          port: 5000                   
          protocol: TCP
          targetPort: 0
        selector:           
          service: keystone                   
        type: ClusterIP         
    - endpoint: internal
      endpointURL: http://keystone-internal.openstack.svc:5000
      metadata:     
        annotations:                          
          metallb.universe.tf/address-pool: internalapi
          metallb.universe.tf/allow-shared-ip: internalapi
          metallb.universe.tf/loadBalancerIPs: 172.17.0.80
        labels:         
          internal: "true"                    
          service: keystone              
      spec:      
        ports:    
        - name: keystone-internal
          port: 5000                          
          protocol: TCP         
          targetPort: 0
        selector: 
          service: keystone
        type: LoadBalancer

it is also possible to just use externalEndpoints to use metallb for public and internal:

  keystone:
    externalEndpoints:
    - endpoint: internal
      ipAddressPool: internalapi
      loadBalancerIPs:
      - 172.17.0.80
    - endpoint: public
      ipAddressPool: internalapi
      loadBalancerIPs:
      - 172.17.0.81
    template:
      databaseInstance: openstack
      secret: osp-secret

results in the following override in keystoneapi:

  override:       
    service:     
    - endpoint: public                        
      endpointURL: http://keystone-public.openstack.svc:5000
      metadata:      
        annotations:
          metallb.universe.tf/address-pool: internalapi
          metallb.universe.tf/allow-shared-ip: internalapi
          metallb.universe.tf/loadBalancerIPs: 172.17.0.81
        labels:                    
          public: "true"
          service: keystone
      spec:              
        ports:                                
        - name: keystone-public          
          port: 5000
          protocol: TCP
          targetPort: 0     
        selector:                             
          service: keystone     
        type: LoadBalancer
    - endpoint: internal
      endpointURL: http://keystone-internal.openstack.svc:5000
      metadata:                               
        annotations:                                   
          metallb.universe.tf/address-pool: internalapi
          metallb.universe.tf/allow-shared-ip: internalapi
          metallb.universe.tf/loadBalancerIPs: 172.17.0.80
        labels:                               
          internal: "true"                 
          service: keystone
      spec:        
        ports:                   
        - name: keystone-internal             
          port: 5000            
          protocol: TCP
          targetPort: 0
        selector:         
          service: keystone                   
        type: LoadBalancer

or use the service override to use another LoadBalance for the public endpoint

  keystone:
    externalEndpoints:
    - endpoint: internal
      ipAddressPool: internalapi
      loadBalancerIPs:
      - 172.17.0.80
    template:
      databaseInstance: openstack
      secret: osp-secret
      override:
        service:
        - endpoint: public
          metadata:
            annotations:
              my: loadbalancerannotation
          spec:
            type: LoadBalancer

results in the following override in keystoneapi:

  override:
    service:
    - endpoint: public
      endpointURL: http://keystone-public.openstack.svc:5000
      metadata:
        annotations:
          my: loadbalancerannotation
        labels:
          public: "true"
          service: keystone
      spec:
        ports:
        - name: keystone-public
          port: 5000
          protocol: TCP
          targetPort: 0
        selector:
          service: keystone
        type: LoadBalancer
    - endpoint: internal
      endpointURL: http://keystone-internal.openstack.svc:5000
      metadata:
        annotations:
          metallb.universe.tf/address-pool: internalapi
          metallb.universe.tf/allow-shared-ip: internalapi
          metallb.universe.tf/loadBalancerIPs: 172.17.0.80
        labels:
          internal: "true"
          service: keystone
      spec:
        ports:
        - name: keystone-internal
          port: 5000
          protocol: TCP
          targetPort: 0
        selector:
          service: keystone
        type: LoadBalancer

softwarefactory-project-zuul · 2023-07-28T08:40:00Z

This change depends on a change that failed to merge.

Changes openstack-k8s-operators/lib-common#313, openstack-k8s-operators/keystone-operator#289 are needed.

apis/core/v1beta1/openstackcontrolplane_types.go

gibizer · 2023-07-28T10:29:08Z

pkg/openstack/keystone.go

+		svc, err := sd.CreateEndpointServiceOverride()
+		if err != nil {
+			instance.Status.Conditions.Set(condition.FalseCondition(
+				condition.ExposeServiceReadyCondition,


I think we need to add this condition to the initConditions as well.

gibizer · 2023-07-28T10:50:43Z

pkg/openstack/keystone.go

I have two structural concerns.

we have a data round trip between openstack-operator and keystone-operator. The keystone-operator/api defines the ports and the service labels used for openstack keystone api service pods. Then openstack-operator reads those out, massage it into a set of overrides, and sends it back to keystone-operator. This feels like an unnecessary coupling between the two operators.

I assumed that we are moving the route management to openstack-operator as it can be done in a generic way and therefore we can move generic logic from each service operator to single place in openstack-operator. However I see that in the current PR it is not the case. This PR moves keystone specific logic from keystone-operator to openstack-operator and stores it in a keystone specific package of the openstack-operator. The keystone specific logic as far as I see is:

the service label generation logic for the pods are still in keystone-operator but now needs to be kept in sync with the label generation for the service in openstack-operator. See https://github.com/openstack-k8s-operators/keystone-operator/blob/2e5fe448c4990abddbd7ba4ee7463c6b0e2e5418/controllers/keystoneapi_controller.go#L719-L721 and L59 here.

While I'm not fully grokking it the comment at L144 in this package
// Do not include data.Path in parsing check because %(project_id)s // is invalid without being encoded, but they should not be encoded in the actual endpoint
seem to be coupled with (or duplicated from?) https://github.com/openstack-k8s-operators/keystone-operator/pull/289/files?diff=split&w=0#diff-89511165055af64058394ac61e20f832806f3f325b9cb3771f723d7e6cb94c8cR532-R542 This feels like a keystone specific detail that should not be in openstack-operator.

I have two structural concerns.

we have a data round trip between openstack-operator and keystone-operator. The keystone-operator/api defines the ports and the service labels used for openstack keystone api service pods. Then openstack-operator reads those out, massage it into a set of overrides, and sends it back to keystone-operator. This feels like an unnecessary coupling between the two operators.

yeah, thanks for that pointer. Now thinking about this, I think we could get rid of this. It was only required when passing both endpoint urls into the service operator. we can basically only pass the public endpoint name into the service operator if it is a route. if it is a service (clusterip, or loadbalancer) let the service operator create the endpoint information.

re the service label, I can improve that, so it is not required to get it from the api module. when the openstack-operator creates a service label and passes it via the override to the service operator, the operator just have to use this instead of the local default one. initially I had used the gkv.Kind as lower string, but then realized its different then the service name const with using the passed in information we can still do that.

I assumed that we are moving the route management to openstack-operator as it can be done in a generic way and therefore we can move generic logic from each service operator to single place in openstack-operator. However I see that in the current PR it is not the case. This PR moves keystone specific logic from keystone-operator to openstack-operator and stores it in a keystone specific package of the openstack-operator. The keystone specific logic as far as I see is:

the service label generation logic for the pods are still in keystone-operator but now needs to be kept in sync with the label generation for the service in openstack-operator. See https://github.com/openstack-k8s-operators/keystone-operator/blob/2e5fe448c4990abddbd7ba4ee7463c6b0e2e5418/controllers/keystoneapi_controller.go#L719-L721 and L59 here.

I think the service labels can be done better, see above.

While I'm not fully grokking it the comment at L144 in this package

// Do not include data.Path in parsing check because %(project_id)s // is invalid without being encoded, but they should not be encoded in the actual endpoint

seem to be coupled with (or duplicated from?) https://github.com/openstack-k8s-operators/keystone-operator/pull/289/files?diff=split&w=0#diff-89511165055af64058394ac61e20f832806f3f325b9cb3771f723d7e6cb94c8cR532-R542 This feels like a keystone specific detail that should not be in openstack-operator.

This is not keystone specific. this is what we have right now in the endpoint.ExposeEndpoints() func. I integrate right now another service ( glance) to see what can be done in a more generic way

as mentioned above, we can do it in a more generic way , but there will remain some logic at least for some of the services. e.g. for nova we have to create route per api and per cell for vnc proxy.

I have two structural concerns.

we have a data round trip between openstack-operator and keystone-operator. The keystone-operator/api defines the ports and the service labels used for openstack keystone api service pods. Then openstack-operator reads those out, massage it into a set of overrides, and sends it back to keystone-operator. This feels like an unnecessary coupling between the two operators.

yeah, thanks for that pointer. Now thinking about this, I think we could get rid of this. It was only required when passing both endpoint urls into the service operator. we can basically only pass the public endpoint name into the service operator if it is a route. if it is a service (clusterip, or loadbalancer) let the service operator create the endpoint information.

Sounds good.

re the service label, I can improve that, so it is not required to get it from the api module. when the openstack-operator creates a service label and passes it via the override to the service operator, the operator just have to use this instead of the local default one. initially I had used the gkv.Kind as lower string, but then realized its different then the service name const with using the passed in information we can still do that.

I think in general "tell don't ask" is a good principle. So if you can tell the service operator which label to put on the pods it creates so that the service in the pods become routeable then we can have a simpler architecture.

I assumed that we are moving the route management to openstack-operator as it can be done in a generic way and therefore we can move generic logic from each service operator to single place in openstack-operator. However I see that in the current PR it is not the case. This PR moves keystone specific logic from keystone-operator to openstack-operator and stores it in a keystone specific package of the openstack-operator. The keystone specific logic as far as I see is:

the service label generation logic for the pods are still in keystone-operator but now needs to be kept in sync with the label generation for the service in openstack-operator. See https://github.com/openstack-k8s-operators/keystone-operator/blob/2e5fe448c4990abddbd7ba4ee7463c6b0e2e5418/controllers/keystoneapi_controller.go#L719-L721 and L59 here.

I think the service labels can be done better, see above.

While I'm not fully grokking it the comment at L144 in this package

// Do not include data.Path in parsing check because %(project_id)s // is invalid without being encoded, but they should not be encoded in the actual endpoint

seem to be coupled with (or duplicated from?) https://github.com/openstack-k8s-operators/keystone-operator/pull/289/files?diff=split&w=0#diff-89511165055af64058394ac61e20f832806f3f325b9cb3771f723d7e6cb94c8cR532-R542 This feels like a keystone specific detail that should not be in openstack-operator.

This is not keystone specific. this is what we have right now in the endpoint.ExposeEndpoints() func. I integrate right now another service ( glance) to see what can be done in a more generic way

You are right, it is there, I missed that. But then my question is do we need to move that logic (or the comment?) from lib-common to both the sevice operator and to openstack-operator.

as mentioned above, we can do it in a more generic way , but there will remain some logic at least for some of the services. e.g. for nova we have to create route per api and per cell for vnc proxy.

I think that is OK. If we express the need for those routes in the Nova CRD per nova api and nova vnc proxy sub CRs then we can easily see that nova needs those routes so whoever creates a Nova CRD needs to prepare them and pass information about them. This would be very similar to how nova express the need of rabbitmqcluster per cell while other operators only ask for a single rabbitmq.

Thanks @stuggi for the answers. I think this discussion is moving to the right direction.

as mentioned above, we can do it in a more generic way , but there will remain some logic at least for some of the services. e.g. for nova we have to create route per api and per cell for vnc proxy.

I think that is OK. If we express the need for those routes in the Nova CRD per nova api and nova vnc proxy sub CRs then we can easily see that nova needs those routes so whoever creates a Nova CRD needs to prepare them and pass information about them. This would be very similar to how nova express the need of rabbitmqcluster per cell while other operators only ask for a single rabbitmq.

I need to walk back on that now that I see more. In case of rabbitmq it is fairly simple that you need to pass a cluster for each cell. However for vnc and metadata we have a lot more logic in the nova-operator.

neither vnc nor metadata is needed in cell0.

the vnc service an optional cell service. If it is requested to be deployed in a given cell then it needs a route, if it is not requested to be depolyed then the route is not needed either

the metadata service is also optional but in a different way. It can be deployed to top level only or can be deployed per cell. There can be cells where it is not deployed while deployed in others. But it can never be deployed both on the top level and in some cells.

To be able to handle these cases openstack-operator needs to grow logic that is fairly nova specific. I would like to avoid that if possible.

Another complication is mentioned by @kk7ds today during our discussion. The openstack-operator now need to know the number and type of the endpoints the service operator will expose. But that is fairly service specific. For example if cinder develops the v4 major API (or for some reason nova needs to expose the legacy v2.0 API) then that can only be done by exposing new endpoints in keystone with different paths. As far as I see supporting the extra endpoints would require not just a service operator change but also the service specific logic change in openstack-operator.

Thats not the case for glance. glanceinternal is for the internal endpoint, routes are only created for public endpoint, so for this example there will not multiple routes. In general if there are multiple endpoints for different type which are served by the same pod via different path, the same route is used. We already have this config. If there are different pods service a service with different endpoints there would be a new service required and then another route. A new endpoint type in keystone get registered with this route. If we stay with the glance (but as said above here we don't create two routes) as an example you have two api templates which each has an override to pass in an endpointURL, so both can register their keystoneEndpoint with their specific route.

Maybe the glance example wasn't perfect. Today we assume that glance internal endpoint does not need a route, but I think that is artificial restriction. I.e. in an edge deployment scenario the internal glance needed for the computes might be connected to the computes over WAN. Anyhow I think this issue is solved. As you said, if we need two differently configured backend Pods then we will anyhow need two API CRs with independent Services and Routes if needed.

late in this conversation about Routes, but yeah, even in an Edge scenario a given Glance Edge Pod needs a service as it's not publicly exposed. A service should be enough to ensure the communication with the Pods that live in Central. Note that this refers to an architecture where a list of GlanceAPI Pods are only "deployed in Central", but logically connected to the storage backend that lives at the Edge.
We can register endpoints (internal only) to allow the Edge Compute node to use this GlancePod, and this will result similar to what we do today for a generic deployment with the API split between internal and external.
/cc @fultonj @konan-abhi ^.
Regarding Manila, yes, for tempest purposes we had the need of configuring both v1 and v2, but they are served by the same Pod, so no concerns on my side about this as long as we can rely on the endpointURL override.

late in this conversation about Routes, but yeah, even in an Edge scenario a given Glance Edge Pod needs a service as it's not publicly exposed. A service should be enough to ensure the communication with the Pods that live in Central. Note that this refers to an architecture where a list of GlanceAPI Pods are only "deployed in Central", but logically connected to the storage backend that lives at the Edge. We can register endpoints (internal only) to allow the Edge Compute node to use this GlancePod, and this will result similar to what we do today for a generic deployment with the API split between internal and external. /cc @fultonj @konan-abhi ^. Regarding Manila, yes, for tempest purposes we had the need of configuring both v1 and v2, but they are served by the same Pod, so no concerns on my side about this as long as we can rely on the endpointURL override.

Are we mixing two things here?

what service needs to be exposed on a public network

what service needs to have a routable service (e.g. needs to be accessed via WAN instead of LAN) regardless of the fact that the service access is private or public

allow the Edge Compute node to use this GlancePod,

Does it mean the IP of this internal glance service needs to be exposed to a routed network?

late in this conversation about Routes, but yeah, even in an Edge scenario a given Glance Edge Pod needs a service as it's not publicly exposed. A service should be enough to ensure the communication with the Pods that live in Central. Note that this refers to an architecture where a list of GlanceAPI Pods are only "deployed in Central", but logically connected to the storage backend that lives at the Edge. We can register endpoints (internal only) to allow the Edge Compute node to use this GlancePod, and this will result similar to what we do today for a generic deployment with the API split between internal and external. /cc @fultonj @konan-abhi ^. Regarding Manila, yes, for tempest purposes we had the need of configuring both v1 and v2, but they are served by the same Pod, so no concerns on my side about this as long as we can rely on the endpointURL override.

Are we mixing two things here?

1. what service needs to be exposed on a public network 2. what service needs to have a routable service (e.g. needs to be accessed via WAN instead of LAN) regardless of the fact that the service access is private or public

allow the Edge Compute node to use this GlancePod,

Does it mean the IP of this internal glance service needs to be exposed to a routed network?

I think so, I suspect we're mixing things here (publicly exposed service which needs a Route vs internal service) and talking about Edge is not the right place/example.

pkg/openstack/common.go

abays · 2023-07-28T16:25:28Z

/test openstack-operator-build-deploy-kuttl

softwarefactory-project-zuul · 2023-08-01T10:29:34Z

Build failed (check pipeline). For information on how to proceed, see
http://docs.openstack.org/infra/manual/developers.html#automated-testing

https://review.rdoproject.org/zuul/buildset/ba0e47361afd41e293bde4cac84c081c

✔️ openstack-operator-content-provider SUCCESS in 1h 20m 10s
❌ openstack-operator-crc-podified-edpm-baremetal FAILURE in 59m 27s
❌ openstack-operator-crc-podified-edpm-deployment FAILURE in 1h 00m 56s

softwarefactory-project-zuul · 2023-08-01T12:18:21Z

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 416,f8932064f7aeaccb1fbce76f1d28468371a10846

softwarefactory-project-zuul · 2023-08-01T12:37:34Z

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 416,7e0f9e6aba055dff4cdd346b662cfaaaaaabdf84

softwarefactory-project-zuul · 2023-08-01T12:37:56Z

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 416,9be40d4f6a544134fc6a830b9e5af265ffb8f458

stuggi · 2023-08-01T12:41:36Z

The PR now has two commits one for keystoneapi and one for glanceapi to show an API which has sub CRDs. There are still things to improve, e.g. use a dedicated condition, set it correct and kuttl.

softwarefactory-project-zuul · 2023-08-01T13:48:24Z

This change depends on a change that failed to merge.

Changes openstack-k8s-operators/keystone-operator#289, openstack-k8s-operators/glance-operator#285 are needed.

softwarefactory-project-zuul · 2023-08-02T08:35:05Z

Build failed (check pipeline). For information on how to proceed, see
http://docs.openstack.org/infra/manual/developers.html#automated-testing

https://review.rdoproject.org/zuul/buildset/bb03046769f9461280b1ab2cf2b3b8b8

❌ openstack-operator-content-provider FAILURE in 5m 51s
⚠️ openstack-operator-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-operator-content-provider
⚠️ openstack-operator-crc-podified-edpm-deployment SKIPPED Skipped due to failed job openstack-operator-content-provider

Creates the route for the cinderapi, also allows to customize the route via override. Generats the service override for the env with what is configured in the externalEndpoints, or specified in the service template override. Depends-On: openstack-k8s-operators/lib-common#313 Depends-On: openstack-k8s-operators/keystone-operator#289 Depends-On: openstack-k8s-operators/cinder-operator#248 Jira: OSP-26690

Creates the route for the neutronapi, also allows to customize the route via override. Generats the service override for the env with what is configured in the externalEndpoints, or specified in the service template override. Depends-On: openstack-k8s-operators/lib-common#313 Depends-On: openstack-k8s-operators/keystone-operator#289 Depends-On: openstack-k8s-operators/neutron-operator#194 Jira: OSP-26690

Creates the route for the nova components, also allows to customize the route via override. Generats the service override for the env with what is configured in the externalEndpoints, or specified in the service template override. Depends-On: openstack-k8s-operators/lib-common#313 Depends-On: openstack-k8s-operators/keystone-operator#289 Depends-On: openstack-k8s-operators/nova-operator#489 Jira: OSP-26690

Creates the route for the heatapi and heatcfnapi, also allows to customize the route via override. Generats the service override for the env with what is configured in the externalEndpoints, or specified in the service template override. Depends-On: openstack-k8s-operators/lib-common#313 Depends-On: openstack-k8s-operators/keystone-operator#289 Depends-On: openstack-k8s-operators/heat-operator#227 Jira: OSP-26690

Creates the route for the horizon, also allows to customize the route via override. Generats the service override for the env with what is configured in the externalEndpoints, or specified in the service template override. Depends-On: openstack-k8s-operators/lib-common#313 Depends-On: openstack-k8s-operators/keystone-operator#289 Depends-On: openstack-k8s-operators/horizon-operator#202 Jira: OSP-26690

Creates the route for the manila, also allows to customize the route via override. Generats the service override for the env with what is configured in the externalEndpoints, or specified in the service template override. Depends-On: openstack-k8s-operators/lib-common#313 Depends-On: openstack-k8s-operators/keystone-operator#289 Depends-On: openstack-k8s-operators/manila-operator#117 Jira: OSP-26690

Creates the route for the swift, also allows to customize the route via override. Generats the service override for the env with what is configured in the externalEndpoints, or specified in the service template override. Depends-On: openstack-k8s-operators/lib-common#313 Depends-On: openstack-k8s-operators/keystone-operator#289 Depends-On: openstack-k8s-operators/swift-operator#43 Jira: OSP-26690

stuggi · 2023-08-17T14:37:18Z

pkg/openstack/nova.go

+		cellOverride := instance.Spec.Nova.CellOverride[cellName]
+
+		// metadata
+		metadataServiceName := novav1.GetMetadataServiceName(&cellName)


@gibizer how about this, really simple approach openstack-k8s-operators/nova-operator@1b769dd

I still think that by taking the cell name as a parameter we force the client (openstack-operator) to know that the name of the service is based on the cellname, and what is that cellname.

Also if any new field of NovaMetadataSpec will be used included in the generated Service name then that cannot be just implemented inside the GetMetadataServiceName function, it needs to be provided as a new function parameter.

openshift-ci · 2023-08-17T15:12:45Z

@stuggi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/openstack-operator-build-deploy-kuttl	`79e3416`	link	true	`/test openstack-operator-build-deploy-kuttl`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

gibizer · 2023-08-19T14:44:12Z

pkg/openstack/common.go

+	}
+
+	sd.hostname = ptr.To(route.GetHostname())
+	sd.endpointURL = "http://" + *sd.hostname


If I understand the resulting CRD model and the code correctly then KeystoneAPISpec.override["public"].endpointURL will be unconditionally populated with the information generated here.

I think this means that OpenStackControlPlane.spec.keystone.template.override[].endpointURL is a field in the CRD that should never be set by the human creating the OpenStackControlPlane CR. I think the best would be not to have such a spec field that is not intended to be ever set.

In the Nova CRD we intentionally not include the service CRD Spec structs directly to NovaSpec but have a parallel template struct. This way we can keep the fields with generated value out of the struct used by the user (the template) but keep it in the struct used by the service reconciler (spec).

If we want to apply that pattern here to hide the unsettable endpointURL from the OpenStackControlPlane then the OpenStackControlPlaneSpec should stop including the KeystoneAPISpec. Instead a KeystoneAPITemplate struct needs to be created that has all the KeystoneAPISpec fields except endpointURL. The value of that field is generated by the openstack-operator so it can provided that extra value when it creates the KeystoneAPISpec from the KeystoneAPITemplate.

(of course KeystoneAPISpec is just an example here, this comment is valid for all the endpointURL fields being introduced now)

If I understand the resulting CRD model and the code correctly then KeystoneAPISpec.override["public"].endpointURL will be unconditionally populated with the information generated here.

I think this means that OpenStackControlPlane.spec.keystone.template.override[].endpointURL is a field in the CRD that should never be set by the human creating the OpenStackControlPlane CR. I think the best would be not to have such a spec field that is not intended to be ever set.

In the Nova CRD we intentionally not include the service CRD Spec structs directly to NovaSpec but have a parallel template struct. This way we can keep the fields with generated value out of the struct used by the user (the template) but keep it in the struct used by the service reconciler (spec).

If we want to apply that pattern here to hide the unsettable endpointURL from the OpenStackControlPlane then the OpenStackControlPlaneSpec should stop including the KeystoneAPISpec. Instead a KeystoneAPITemplate struct needs to be created that has all the KeystoneAPISpec fields except endpointURL. The value of that field is generated by the openstack-operator so it can provided that extra value when it creates the KeystoneAPISpec from the KeystoneAPITemplate.

(of course KeystoneAPISpec is just an example here, this comment is valid for all the endpointURL fields being introduced now)

The idea is that a human/user can set it if the intend is to use it with an ingress other then an OCP route (not implemented in this patch). when endpointURL gets set by a user no route gets created, and just the endpointURL passed into the service operators, to be used for the service registration in keystone. Then the user has to create the ingress manually.

Do we really want to support both use cases (route creation by the openstack operator, and endpoint url passthrough)? Does it mean the user can first configure OpenStackControlPlane with automatic route creation then later specify the endpoint url manually and therefore instruct the openstack-operator to delete the route, and instruct the service operator to update the endpoint url in keystone according the new new user specified value? The reverse direction too to switch from manually created route to openstack-operator created route?

gibizer · 2023-08-19T15:05:25Z

pkg/openstack/nova.go

+		// novncproxy
+		for _, endpointType := range []service.Endpoint{service.EndpointPublic, service.EndpointInternal} {


The novncproxy is not created as a keystone endpoint so there is no concept of public and private novncproxy in nova.

(This again shows that now you need to teach openstack-operator nova specific things)

gibizer · 2023-08-19T15:11:16Z

pkg/openstack/nova.go

+
+		apiServiceOverrides[string(endpointType)] = *svcOverride
+	}
+


I think you missed the case when NovaMetadata is deployed not per cell but with one single service on the top level. While NovaMetadata does not need to be route to the k8s public network, I think the service overrides still need to be passed through. (If I'm mistaken then I don't get while you have explicit NovaMetadat handling below for the case where NovaMetadata deployed per cell)

gibizer · 2023-08-19T15:12:55Z

pkg/openstack/nova.go

@@ -67,6 +173,17 @@ func ReconcileNova(ctx context.Context, instance *corev1beta1.OpenStackControlPl
 		// we need to support either rabbitmq vhosts or deploy a separate
 		// RabbitMQCluster per nova cell.
 		instance.Spec.Nova.Template.DeepCopyInto(&nova.Spec)
+		nova.Spec.APIServiceTemplate.Override.Service = apiServiceOverrides
+


Missing the case when metadata deployed on the top level instead of per cell.

gibizer · 2023-08-19T15:36:53Z

config/samples/core_v1beta1_openstackcontrolplane_network_isolation.yaml

+                  metallb.universe.tf/allow-shared-ip: internalapi
+                  metallb.universe.tf/loadBalancerIPs: 172.17.0.80
+              spec:
+                type: LoadBalancer


Is this something we expect the human user to set or is it more like something that the service operator implementation should decide?

the human/user, this moves into more native k8s interface to configure the MetalLB endpoint. also allows to use different LoadBalancer than MetalLB

gibizer · 2023-08-19T15:46:25Z

I think we need to talk about the goals we want to achieve with this change. So far I heard about the following goals:

make it possible to implement TLS in a single place

I think this is not possible after this change. Based on Update keystoneapi to use service override keystone-operator#289 (comment) For the TLS implementation we still need to adapt each service operator to use TLS (endpoint scheme change and cert secret usage).
allow seamlessly changing the openshift Route to some other way of exposing our services (Gateway API?)

This change makes the switch possible by changing openstack-operator. However before this change we already had every Route creation hidden within lib-common behind ExposeEndpoint() so switching to a new route resource would be a lib-common change.

The openstack-operator based route creation forces us to teach service operator specific knowledge (the number of services to expose with a route, the name of the k8s services etc) to openstack-operator, while the lib-common based solution would allow keeping the service operator specific knowledge to stay only in the service operators. (This is mostly due to the "call path". If openstack-operator creates the route then openstack-operator needs to know the information to create them, if lib-common creates the route then because lib-common is called by the service operator it is enough if the service operator knows this information)
decouple the service deployment step from the external exposure of that service

I think this goal refers to deployment pattern supported by the oc expose command. However I don't think we are actually supporting this pattern with the proposed change. For me the pattern is:
i) create a deployment with a service
ii) check that it works OK
iii) use oc expose to publish the service to the outside world
However the proposed change pre-creates the routes even before any service is deployed.
Also note that openstack itself might not support this pattern today as openstack needs to know the public URL of the service (e.g. for the keystone endpoint or for the compute vnc configuration) even before the openstack service can be fully deployed.

Please let me know if I missed other goals.

openshift-merge-robot · 2023-08-24T22:35:01Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

No longer relevant, see openstack-k8s-operators#416 (comment) Isol-net CR to become collapsed cells as well Not clear for cell-local vs global services replicas Turn off unrelated services TODO: this to end up in https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/tests/roles/backend_services/tasks/main.yaml#L41 Signed-off-by: Bohdan Dobrelia <bdobreli@redhat.com>

bogdando · 2023-09-01T11:32:14Z

I think we need to talk about the goals we want to achieve with this change. So far I heard about the following goals:

make it possible to implement TLS in a single place
I think this is not possible after this change. Based on Update keystoneapi to use service override keystone-operator#289 (comment) For the TLS implementation we still need to adapt each service operator to use TLS (endpoint scheme change and cert secret usage).

allow seamlessly changing the openshift Route to some other way of exposing our services (Gateway API?)
This change makes the switch possible by changing openstack-operator. However before this change we already had every Route creation hidden within lib-common behind ExposeEndpoint() so switching to a new route resource would be a lib-common change.
The openstack-operator based route creation forces us to teach service operator specific knowledge (the number of services to expose with a route, the name of the k8s services etc) to openstack-operator, while the lib-common based solution would allow keeping the service operator specific knowledge to stay only in the service operators. (This is mostly due to the "call path". If openstack-operator creates the route then openstack-operator needs to know the information to create them, if lib-common creates the route then because lib-common is called by the service operator it is enough if the service operator knows this information)

decouple the service deployment step from the external exposure of that service
I think this goal refers to deployment pattern supported by the oc expose command. However I don't think we are actually supporting this pattern with the proposed change. For me the pattern is:
i) create a deployment with a service
ii) check that it works OK
iii) use oc expose to publish the service to the outside world
However the proposed change pre-creates the routes even before any service is deployed.
Also note that openstack itself might not support this pattern today as openstack needs to know the public URL of the service (e.g. for the keystone endpoint or for the compute vnc configuration) even before the openstack service can be fully deployed.

Please let me know if I missed other goals.

this indeed looks like an unwanted coupling, and teaching the CP operator a lot of services-specific things, great summary.

stuggi · 2023-09-04T09:44:35Z

I think we need to talk about the goals we want to achieve with this change. So far I heard about the following goals:

make it possible to implement TLS in a single place
I think this is not possible after this change. Based on Update keystoneapi to use service override keystone-operator#289 (comment) For the TLS implementation we still need to adapt each service operator to use TLS (endpoint scheme change and cert secret usage).

With this change it is possible to:

handle TLS for public endpoints in a single place as all routes get created in a single place
manage TLS certs in a single place for the TLS-E situation when the route should use re-encryption to the backend deployment and only pass in a cert secret to the service (analog to rabbitmq)

In the situation for TLS-E the service operator required change is:

use https if cert secret is provided
use the cert information from the secret in the deployment for e.g. httpd or service direct
consider the cert secret in the hash calc for configuration data to restart if the certs get renewed by cert-manager

allow seamlessly changing the openshift Route to some other way of exposing our services (Gateway API?)
This change makes the switch possible by changing openstack-operator. However before this change we already had every Route creation hidden within lib-common behind ExposeEndpoint() so switching to a new route resource would be a lib-common change.

or use a different ingress than route by passing in the endpointURL. No route would then be created and the user has to create the ingress manually.

The openstack-operator based route creation forces us to teach service operator specific knowledge (the number of services to expose with a route, the name of the k8s services etc) to openstack-operator, while the lib-common based solution would allow keeping the service operator specific knowledge to stay only in the service operators. (This is mostly due to the "call path". If openstack-operator creates the route then openstack-operator needs to know the information to create them, if lib-common creates the route then because lib-common is called by the service operator it is enough if the service operator knows this information)

decouple the service deployment step from the external exposure of that service
I think this goal refers to deployment pattern supported by the oc expose command. However I don't think we are actually supporting this pattern with the proposed change. For me the pattern is:
i) create a deployment with a service
ii) check that it works OK
iii) use oc expose to publish the service to the outside world
However the proposed change pre-creates the routes even before any service is deployed.
Also note that openstack itself might not support this pattern today as openstack needs to know the public URL of the service (e.g. for the keystone endpoint or for the compute vnc configuration) even before the openstack service can be fully deployed.

The current flow and implementation in this patch is only to have a simple process/reconciliation. An alternative workflow idea I have is:

do not pre-create route (like at the moment) in the openstack-operator
create OSP service CR without setting endointURL
let the service operators create their deployment + services
service gets initially registered with service url instead of route url
service-operators to set an "create ingress" annotation on the created service for which a route/ingress should be created and the service type is not LoadBalancer
for default situation with OCP route
- on reconcile of the openstack-operator it checks for the services having the "create ingress" annotation on the service
- openstack-operator creates route and set endpointURL
- service operator updates its keystone-endpoint using the endpointURL information
if a different ingress should be used (might need a switch to disable create of route) user to set the endpointURL of the service and create the ingress manually

Also in this approach the openstack-operator would create certs after the initial create for the services and set the cert secret name. The service then switches to tls.

Please let me know if I missed other goals.
Additional goals:

reduce service operator dependencies (go.mod)
- don't have a dependency of the service operators to OCP specific resource type routes. with this also can be used independent upstream or with native k8s
- don't have a dependency of the service operators to cert-manager and provide just a named secret to be used
allow customization of services via overrides. This allows also the usage of 3rd party LoadBalancer instead of MetalLB
move into being able to use the service operators on native k8s
allow using a different ingress than OCP route

gibizer · 2023-09-04T14:18:05Z

Thanks @stuggi for the discussion both here an over a call. I think we are converging on a solution based on your latest proposal in #416 (comment) That solution can be massaged to drop all the service specific knowledge from the openstack-operator and only pass down service labels first to trigger k8s Service creation and then do a route creation in the openstack-operator based on annotated services and then pass down the resulting endpointURL to the service operators.

gibizer · 2023-09-04T15:17:15Z

@stuggi I discovered that we need to include the novnc public URL into the nova-compute config[1]. So from nova-operator perspective when the endpointURL will be set on NovaNoVNCProxy the EDPM compute config Secret will be updated. Then we expect the dataplane-operator to trigger the rerun of the ansible that copies that updated config to the EDPM nodes. The last part of this is not yet implemented, the dataplane-operator does not watch the config. But we need that anyhow for rabbit password rotation for example.

[1] https://docs.openstack.org/nova/latest/admin/remote-console-access.html#configuration

stuggi · 2023-09-12T09:38:10Z

closing in favor of #457

No longer relevant, see openstack-k8s-operators#416 (comment) Isol-net CR to become collapsed cells as well Not clear for cell-local vs global services replicas Turn off unrelated services TODO: this to end up in https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/tests/roles/backend_services/tasks/main.yaml#L41 Signed-off-by: Bohdan Dobrelia <bdobreli@redhat.com>

?No longer relevant, see openstack-k8s-operators#416 (comment) Isol-net CRs to become collapsed cells as well Not clear for cell-local vs global services replicas Turn off unrelated services (for my Nova dev specifc needs) TODO: this to end up in https://github.com/openstack-k8s-operators/data-plane-adoption/blob/main/tests/roles/backend_services/tasks/main.yaml#L41 Signed-off-by: Bohdan Dobrelia <bdobreli@redhat.com>

openshift-ci bot added the do-not-merge/hold label Jul 28, 2023

openshift-ci bot requested review from dprince and viroel July 28, 2023 07:20

openshift-ci bot added the approved label Jul 28, 2023

stuggi requested review from olliewalsh, gibizer and abays and removed request for viroel July 28, 2023 07:24

gibizer reviewed Jul 28, 2023

View reviewed changes

stuggi force-pushed the handle_ingress branch from 4a17665 to eb87850 Compare August 1, 2023 09:06

openshift-merge-robot added the needs-rebase label Aug 1, 2023

stuggi changed the title ~~Create keystoneapi route and svc endpoint overrides~~ Create keystoneapi and glanceapi route and svc endpoint overrides Aug 1, 2023

stuggi force-pushed the handle_ingress branch from f893206 to 7e0f9e6 Compare August 1, 2023 12:37

stuggi force-pushed the handle_ingress branch from 7e0f9e6 to 9be40d4 Compare August 1, 2023 12:37

stuggi force-pushed the handle_ingress branch from 9be40d4 to 99828bd Compare August 1, 2023 12:40

openshift-merge-robot removed the needs-rebase label Aug 1, 2023

stuggi force-pushed the handle_ingress branch from 99828bd to bcf459f Compare August 1, 2023 12:52

stuggi force-pushed the handle_ingress branch from bcf459f to 35dce07 Compare August 2, 2023 08:27

stuggi added 8 commits August 17, 2023 16:35

Remove MetalLB endpoint type and use instead override for rabbitmq

79e3416

stuggi force-pushed the handle_ingress branch from e579698 to 79e3416 Compare August 17, 2023 14:36

stuggi commented Aug 17, 2023

View reviewed changes

gibizer reviewed Aug 19, 2023

View reviewed changes

openshift-merge-robot added the needs-rebase label Aug 24, 2023

stuggi closed this Sep 12, 2023

stuggi mentioned this pull request Oct 2, 2023

Allow customize route via spec.Override.Route openstack-k8s-operators/keystone-operator#269

Closed

		// novncproxy
		for _, endpointType := range []service.Endpoint{service.EndpointPublic, service.EndpointInternal} {

Create several services route and svc endpoint overrides #416

Create several services route and svc endpoint overrides #416

Conversation

stuggi commented Jul 28, 2023 • edited Loading

stuggi commented Jul 28, 2023

openshift-ci bot commented Jul 28, 2023

stuggi commented Jul 28, 2023

softwarefactory-project-zuul bot commented Jul 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abays commented Jul 28, 2023

softwarefactory-project-zuul bot commented Aug 1, 2023

softwarefactory-project-zuul bot commented Aug 1, 2023

softwarefactory-project-zuul bot commented Aug 1, 2023

softwarefactory-project-zuul bot commented Aug 1, 2023

stuggi commented Aug 1, 2023

softwarefactory-project-zuul bot commented Aug 1, 2023

softwarefactory-project-zuul bot commented Aug 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Aug 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gibizer commented Aug 19, 2023

openshift-merge-robot commented Aug 24, 2023

bogdando commented Sep 1, 2023

stuggi commented Sep 4, 2023

gibizer commented Sep 4, 2023

gibizer commented Sep 4, 2023 • edited Loading

stuggi commented Sep 12, 2023

stuggi commented Jul 28, 2023 •

edited

Loading

gibizer commented Sep 4, 2023 •

edited

Loading