Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add enablement flag for nim #1330

Open
wants to merge 23 commits into
base: incubation
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions apis/infrastructure/v1/nim_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
package v1

import (
operatorv1 "github.com/openshift/api/operator/v1"
)

// nimSpec enables NVIDIA NIM integration
type NimSpec struct {
// +kubebuilder:validation:Enum=Managed;Removed
// +kubebuilder:default=Managed
trujillm marked this conversation as resolved.
Show resolved Hide resolved
ManagementState operatorv1.ManagementState `json:"managementState,omitempty"`
}
15 changes: 15 additions & 0 deletions apis/infrastructure/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,17 @@ spec:
- Removed
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
nim:
description: Configures and enables NVIDIA NIM integration
properties:
managementState:
default: Managed
enum:
- Managed
- Removed
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
type: object
serving:
description: |-
Serving configures the KNative-Serving stack used for model serving. A Service
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ metadata:
categories: AI/Machine Learning, Big Data
certified: "False"
containerImage: quay.io/opendatahub/opendatahub-operator:v2.21.0
createdAt: "2024-11-18T18:28:55Z"
createdAt: "2024-11-21T19:59:21Z"
olm.skipRange: '>=1.0.0 <2.21.0'
operators.operatorframework.io/builder: operator-sdk-v1.31.0
operators.operatorframework.io/project_layout: go.kubebuilder.io/v3
Expand Down
9 changes: 8 additions & 1 deletion components/kserve/kserve.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ type Kserve struct {
// This field is optional. If no default deployment mode is specified, Kserve will use Serverless mode.
// +kubebuilder:validation:Enum=Serverless;RawDeployment
DefaultDeploymentMode DefaultDeploymentMode `json:"defaultDeploymentMode,omitempty"`
// Configures and enables NVIDIA NIM integration
NIM infrav1.NimSpec `json:"nim,omitempty"`
}

func (k *Kserve) Init(ctx context.Context, _ cluster.Platform) error {
Expand All @@ -63,7 +65,6 @@ func (k *Kserve) Init(ctx context.Context, _ cluster.Platform) error {
var dependentParamMap = map[string]string{
"odh-model-controller": "RELATED_IMAGE_ODH_MODEL_CONTROLLER_IMAGE",
}

// Update image parameters for odh-model-controller
if err := deploy.ApplyParams(DependentPath, dependentParamMap); err != nil {
log.Error(err, "failed to update image", "path", DependentPath)
Expand Down Expand Up @@ -131,6 +132,12 @@ func (k *Kserve) ReconcileComponent(ctx context.Context, cli client.Client,
return err
}
}
extraParamsMap := map[string]string{
lburgazzoli marked this conversation as resolved.
Show resolved Hide resolved
"nim-state": string(k.NIM.ManagementState),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that here we are updating the params.env only in the KServe is Managed, if not, the value in the params.env is untouched. However in the ModelMesh component reconciliation, the params.env value is updated regardless of the status of the KServe component.

It is probably not an issue but wonder if the behavior should be made consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is that NIM depends on Kserve. If Kserve is not managed, NIM shouldn't be either. If Kserve is not managed, is the Kserve component's reconciler running? How would you suggest making the behaviour consistent?

Copy link
Contributor

@lburgazzoli lburgazzoli Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a single reconciler for all the components, when one component is marked as Removed, the specific component reconcile is invoked and it deletes the related resources. In this case it would leave the nim-state as it was defined previously, which I don't think it would be a problem, since KServe would be set as Removed.

In the ModelMesh however, the nim-state is updated regardless of the fact that the ModelMesh is set to Managed or Removed, see https://github.com/opendatahub-io/opendatahub-operator/pull/1330/files#diff-be28ef017be36951371e5f700cafcdad5925b03de9ae41d754825781f213aa78R131-R137.

So for my POV, it should not be a problem, just want to double check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the Kserve's component has some cleanup logic in place, can we use it to set nim-state to removed if kserve is removed? That way, we can make both behaviours consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. it is the same ReconcileComponent metho, there is a check enabled := k.GetManagementState() == operatorv1.Managed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ConfigMap being deleted when Kserve's component is removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is, so yes, it is not an issue, my point was mainly about consistency in the code.
But I'm fine to leave it as it is.

}
if err := deploy.ApplyParams(Path, nil, extraParamsMap); err != nil {
return fmt.Errorf("failed to update image from %s : %w", Path, err)
trujillm marked this conversation as resolved.
Show resolved Hide resolved
}
}

if err := k.configureServiceMesh(ctx, cli, owner, dscispec); err != nil {
Expand Down
1 change: 1 addition & 0 deletions components/kserve/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

25 changes: 25 additions & 0 deletions components/modelmeshserving/modelmeshserving.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ import (

operatorv1 "github.com/openshift/api/operator/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
"k8s.io/apimachinery/pkg/runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
logf "sigs.k8s.io/controller-runtime/pkg/log"

Expand Down Expand Up @@ -126,6 +128,13 @@ func (m *ModelMeshServing) ReconcileComponent(ctx context.Context,
}
}

extraParamsMap := map[string]string{
"nim-state": getNimManagementFlag(owner),
}
lburgazzoli marked this conversation as resolved.
Show resolved Hide resolved
if err := deploy.ApplyParams(Path, nil, extraParamsMap); err != nil {
return fmt.Errorf("failed to update image from %s : %w", Path, err)
}

if err := deploy.DeployManifestsFromPath(ctx, cli, owner, Path, dscispec.ApplicationsNamespace, ComponentName, enabled); err != nil {
return fmt.Errorf("failed to apply manifests from %s : %w", Path, err)
}
Expand Down Expand Up @@ -173,3 +182,19 @@ func (m *ModelMeshServing) ReconcileComponent(ctx context.Context,

return nil
}

func getNimManagementFlag(obj metav1.Object) string {
removed := string(operatorv1.Removed)
un, err := runtime.DefaultUnstructuredConverter.ToUnstructured(obj)
if err != nil {
return removed
}
kserve, foundKserve, _ := unstructured.NestedString(un, "spec", "components", "kserve", "managementState")
if foundKserve && kserve != removed {
nim, foundNim, _ := unstructured.NestedString(un, "spec", "components", "kserve", "nim", "managementState")
if foundNim {
return nim
}
}
return removed
}
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,17 @@ spec:
- Removed
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
nim:
description: Configures and enables NVIDIA NIM integration
properties:
managementState:
default: Managed
enum:
- Managed
- Removed
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
type: object
serving:
description: |-
Serving configures the KNative-Serving stack used for model serving. A Service
Expand Down
17 changes: 17 additions & 0 deletions docs/api-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@ _Appears in:_
| `Component` _[Component](#component)_ | | | |
| `serving` _[ServingSpec](#servingspec)_ | Serving configures the KNative-Serving stack used for model serving. A Service<br />Mesh (Istio) is prerequisite, since it is used as networking layer. | | |
| `defaultDeploymentMode` _[DefaultDeploymentMode](#defaultdeploymentmode)_ | Configures the default deployment mode for Kserve. This can be set to 'Serverless' or 'RawDeployment'.<br />The value specified in this field will be used to set the default deployment mode in the 'inferenceservice-config' configmap for Kserve.<br />This field is optional. If no default deployment mode is specified, Kserve will use Serverless mode. | | Enum: [Serverless RawDeployment] <br />Pattern: `^(Serverless\|RawDeployment)$` <br /> |
| `nim` _[NimSpec](#nimspec)_ | Configures and enables NVIDIA NIM integration | | |



Expand Down Expand Up @@ -523,6 +524,22 @@ _Appears in:_
| `certificate` _[CertificateSpec](#certificatespec)_ | Certificate specifies configuration of the TLS certificate securing communication<br />for the gateway. | | |


#### NimSpec



nimSpec enables NVIDIA NIM integration



_Appears in:_
- [Kserve](#kserve)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `managementState` _[ManagementState](#managementstate)_ | | Managed | Enum: [Managed Removed] <br /> |


#### ServiceMeshSpec


Expand Down