Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add enablement flag for nim #1330

Merged
merged 24 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
42089a8
add enablement flag for nim
trujillm Oct 31, 2024
8b09a92
add generated files
trujillm Oct 31, 2024
cf86046
add api generated file
trujillm Oct 31, 2024
59daf4d
revert image for csv
trujillm Oct 31, 2024
11582fa
update NIM with its own spec
trujillm Oct 31, 2024
1395504
create nim struct remove from serverless
trujillm Oct 31, 2024
b19d8bf
updated generated files
trujillm Nov 1, 2024
da745e8
update nim to default removed
trujillm Nov 1, 2024
5dfaee8
update image to correct var
trujillm Nov 1, 2024
6078979
update to managed as default
trujillm Nov 4, 2024
a9321b6
add nim-state to params.env
trujillm Nov 6, 2024
8206fcb
fix NVIDIA typo
trujillm Nov 7, 2024
c4a92ca
Merge branch 'opendatahub-io:incubation' into add_nim_enablement
trujillm Nov 7, 2024
2ee890c
Merge branch 'opendatahub-io:incubation' into add_nim_enablement
trujillm Nov 18, 2024
d5a6a33
resolve conflict
trujillm Nov 20, 2024
44dac58
Merge branch 'opendatahub-io:incubation' into add_nim_enablement
trujillm Nov 20, 2024
7ad5727
add NIM flag check to model mesh
trujillm Nov 21, 2024
ea63a9e
fix condtional for standard practice nim flag
trujillm Nov 21, 2024
473db6c
fix to conditional for standards
trujillm Nov 21, 2024
ba4cc15
update typo for error output nim flag kserve
trujillm Nov 22, 2024
7c57ea4
Fix deploy to dependent path
trujillm Nov 22, 2024
d60bc76
update doc/manifests references for nim flag
trujillm Nov 22, 2024
c96bf75
fix image typo csv
trujillm Nov 25, 2024
9245d19
update if on nim-state
trujillm Nov 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ and configure these applications.
- [Deployment](#deployment)
- [Test with customized manifests](#test-with-customized-manifests)
- [Update API docs](#update-api-docs)
- [Enabled logging](#enabled-logging)
- [Example DSCInitialization](#example-dscinitialization)
- [Example DataScienceCluster](#example-datasciencecluster)
- [Run functional Tests](#run-functional-tests)
Expand Down Expand Up @@ -304,6 +305,8 @@ spec:
managementState: Managed
kserve:
managementState: Managed
nim:
managementState: Managed
serving:
ingressGateway:
certificate:
Expand Down
12 changes: 12 additions & 0 deletions apis/infrastructure/v1/nim_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
package v1

import (
operatorv1 "github.com/openshift/api/operator/v1"
)

// nimSpec enables NVIDIA NIM integration
type NimSpec struct {
// +kubebuilder:validation:Enum=Managed;Removed
// +kubebuilder:default=Managed
trujillm marked this conversation as resolved.
Show resolved Hide resolved
ManagementState operatorv1.ManagementState `json:"managementState,omitempty"`
}
15 changes: 15 additions & 0 deletions apis/infrastructure/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,17 @@ spec:
- Removed
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
nim:
description: Configures and enables NVIDIA NIM integration
properties:
managementState:
default: Managed
enum:
- Managed
- Removed
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
type: object
serving:
description: |-
Serving configures the KNative-Serving stack used for model serving. A Service
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ metadata:
},
"kserve": {
"managementState": "Managed",
"nim": {
"managementState": "Managed"
},
"serving": {
"ingressGateway": {
"certificate": {
Expand Down Expand Up @@ -103,7 +106,7 @@ metadata:
categories: AI/Machine Learning, Big Data
certified: "False"
containerImage: quay.io/opendatahub/opendatahub-operator:v2.21.0
createdAt: "2024-11-18T18:28:55Z"
createdAt: "2024-11-22T19:16:14Z"
olm.skipRange: '>=1.0.0 <2.21.0'
operators.operatorframework.io/builder: operator-sdk-v1.31.0
operators.operatorframework.io/project_layout: go.kubebuilder.io/v3
Expand Down Expand Up @@ -1082,7 +1085,7 @@ spec:
value: /opt/manifests
- name: ODH_PLATFORM_TYPE
value: OpenDataHub
image: REPLACE_IMAGE:latest
image: quay.io/opendatahub/opendatahub-operator:latest
trujillm marked this conversation as resolved.
Show resolved Hide resolved
imagePullPolicy: Always
livenessProbe:
httpGet:
Expand Down
9 changes: 8 additions & 1 deletion components/kserve/kserve.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ type Kserve struct {
// This field is optional. If no default deployment mode is specified, Kserve will use Serverless mode.
// +kubebuilder:validation:Enum=Serverless;RawDeployment
DefaultDeploymentMode DefaultDeploymentMode `json:"defaultDeploymentMode,omitempty"`
// Configures and enables NVIDIA NIM integration
NIM infrav1.NimSpec `json:"nim,omitempty"`
}

func (k *Kserve) Init(ctx context.Context, _ cluster.Platform) error {
Expand All @@ -63,7 +65,6 @@ func (k *Kserve) Init(ctx context.Context, _ cluster.Platform) error {
var dependentParamMap = map[string]string{
"odh-model-controller": "RELATED_IMAGE_ODH_MODEL_CONTROLLER_IMAGE",
}

// Update image parameters for odh-model-controller
if err := deploy.ApplyParams(DependentPath, dependentParamMap); err != nil {
log.Error(err, "failed to update image", "path", DependentPath)
Expand Down Expand Up @@ -131,6 +132,12 @@ func (k *Kserve) ReconcileComponent(ctx context.Context, cli client.Client,
return err
}
}
extraParamsMap := map[string]string{
lburgazzoli marked this conversation as resolved.
Show resolved Hide resolved
"nim-state": string(k.NIM.ManagementState),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that here we are updating the params.env only in the KServe is Managed, if not, the value in the params.env is untouched. However in the ModelMesh component reconciliation, the params.env value is updated regardless of the status of the KServe component.

It is probably not an issue but wonder if the behavior should be made consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is that NIM depends on Kserve. If Kserve is not managed, NIM shouldn't be either. If Kserve is not managed, is the Kserve component's reconciler running? How would you suggest making the behaviour consistent?

Copy link
Contributor

@lburgazzoli lburgazzoli Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a single reconciler for all the components, when one component is marked as Removed, the specific component reconcile is invoked and it deletes the related resources. In this case it would leave the nim-state as it was defined previously, which I don't think it would be a problem, since KServe would be set as Removed.

In the ModelMesh however, the nim-state is updated regardless of the fact that the ModelMesh is set to Managed or Removed, see https://github.com/opendatahub-io/opendatahub-operator/pull/1330/files#diff-be28ef017be36951371e5f700cafcdad5925b03de9ae41d754825781f213aa78R131-R137.

So for my POV, it should not be a problem, just want to double check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the Kserve's component has some cleanup logic in place, can we use it to set nim-state to removed if kserve is removed? That way, we can make both behaviours consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. it is the same ReconcileComponent metho, there is a check enabled := k.GetManagementState() == operatorv1.Managed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ConfigMap being deleted when Kserve's component is removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is, so yes, it is not an issue, my point was mainly about consistency in the code.
But I'm fine to leave it as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we update line120-124 to this:

if !enabled {
		if err := deploy.ApplyParams(DependentPath, nil, map[string]string{"nim-state": "removed"}); err != nil {
			return fmt.Errorf("failed to update NIM flag to removed : %w", err)
		}
		if err := k.removeServerlessFeatures(ctx, cli, owner, dscispec); err != nil {
			return err
		}
	} else {
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zdtsw ✅ Done

}
if err := deploy.ApplyParams(DependentPath, nil, extraParamsMap); err != nil {
return fmt.Errorf("failed to update NIM flag from %s : %w", Path, err)
}
}

if err := k.configureServiceMesh(ctx, cli, owner, dscispec); err != nil {
Expand Down
1 change: 1 addition & 0 deletions components/kserve/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

25 changes: 25 additions & 0 deletions components/modelmeshserving/modelmeshserving.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ import (

operatorv1 "github.com/openshift/api/operator/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"
"k8s.io/apimachinery/pkg/runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
logf "sigs.k8s.io/controller-runtime/pkg/log"

Expand Down Expand Up @@ -126,6 +128,13 @@ func (m *ModelMeshServing) ReconcileComponent(ctx context.Context,
}
}

extraParamsMap := map[string]string{
"nim-state": getNimManagementFlag(owner),
}
lburgazzoli marked this conversation as resolved.
Show resolved Hide resolved
if err := deploy.ApplyParams(DependentPath, nil, extraParamsMap); err != nil {
return fmt.Errorf("failed to update image from %s : %w", Path, err)
}

if err := deploy.DeployManifestsFromPath(ctx, cli, owner, Path, dscispec.ApplicationsNamespace, ComponentName, enabled); err != nil {
return fmt.Errorf("failed to apply manifests from %s : %w", Path, err)
}
Expand Down Expand Up @@ -173,3 +182,19 @@ func (m *ModelMeshServing) ReconcileComponent(ctx context.Context,

return nil
}

func getNimManagementFlag(obj metav1.Object) string {
removed := string(operatorv1.Removed)
un, err := runtime.DefaultUnstructuredConverter.ToUnstructured(obj)
if err != nil {
return removed
}
kserve, foundKserve, _ := unstructured.NestedString(un, "spec", "components", "kserve", "managementState")
if foundKserve && kserve != removed {
nim, foundNim, _ := unstructured.NestedString(un, "spec", "components", "kserve", "nim", "managementState")
if foundNim {
return nim
}
}
return removed
}
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,17 @@ spec:
- Removed
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
nim:
description: Configures and enables NVIDIA NIM integration
properties:
managementState:
default: Managed
enum:
- Managed
- Removed
pattern: ^(Managed|Unmanaged|Force|Removed)$
type: string
type: object
serving:
description: |-
Serving configures the KNative-Serving stack used for model serving. A Service
Expand Down
3 changes: 3 additions & 0 deletions config/samples/datasciencecluster_v1_datasciencecluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ spec:
managementState: "Managed"
kserve: {
managementState: "Managed",
nim: {
managementState: "Managed"
},
serving: {
ingressGateway: {
certificate: {
Expand Down
17 changes: 17 additions & 0 deletions docs/api-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@ _Appears in:_
| `Component` _[Component](#component)_ | | | |
| `serving` _[ServingSpec](#servingspec)_ | Serving configures the KNative-Serving stack used for model serving. A Service<br />Mesh (Istio) is prerequisite, since it is used as networking layer. | | |
| `defaultDeploymentMode` _[DefaultDeploymentMode](#defaultdeploymentmode)_ | Configures the default deployment mode for Kserve. This can be set to 'Serverless' or 'RawDeployment'.<br />The value specified in this field will be used to set the default deployment mode in the 'inferenceservice-config' configmap for Kserve.<br />This field is optional. If no default deployment mode is specified, Kserve will use Serverless mode. | | Enum: [Serverless RawDeployment] <br />Pattern: `^(Serverless\|RawDeployment)$` <br /> |
| `nim` _[NimSpec](#nimspec)_ | Configures and enables NVIDIA NIM integration | | |



Expand Down Expand Up @@ -523,6 +524,22 @@ _Appears in:_
| `certificate` _[CertificateSpec](#certificatespec)_ | Certificate specifies configuration of the TLS certificate securing communication<br />for the gateway. | | |


#### NimSpec



nimSpec enables NVIDIA NIM integration



_Appears in:_
- [Kserve](#kserve)

| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `managementState` _[ManagementState](#managementstate)_ | | Managed | Enum: [Managed Removed] <br /> |


#### ServiceMeshSpec


Expand Down