NO-ISSUE: Add IBM _watsonx_ flavor to OpenShift AI operator #6996

jhernand · 2024-11-14T08:28:14Z

Currently when we add the OpenShift AI operator we add it with all its dependencies and the default configuration. This is suitable for most users, but it brings dependencies that aren't really needed in all cases. IBM watsonx, for example, doesn't require many of the dependencies. In order to simplify that use case this patch adds support for a new flavor property. It can have the values default and watsonx. When the value is watsonx the operator will be installed without the pipelines, serverless and servicemesh dependencies, and with the following configuration:

apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: default-dsc
spec:
  components:
    codeflare:
      managementState: Removed
    dashboard:
      managementState: Removed
    datasciencepipelines:
      managementState: Removed
    kserve:
      managementState: Managed
      defaultDeploymentMode: RawDeployment
      serving:
        managementState: Removed
        name: knative-serving
    kueue:
      managementState: Removed
    modelmeshserving:
      managementState: Removed
    ray:
      managementState: Removed
    trainingoperator:
      managementState: Managed
    trustyai:
      managementState: Removed
    workbenches:
      managementState: Removed

The assisted installer UI doesn't support the operator properties mechanisms, so this needs to be done via the API. For example, the following Python script creates a new clsuter using the watsonx flavor:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import base64
import json
import pathlib
import requests
import subprocess

# Details of the cluster:
name = "mycluster"
base_url = "https://api.openshift.com"
base_dns_domain = "mydomain"
openshift_version = "4.16"
cpu_architecture = "x86_64"

# Find the home directory, as we will take the pull secret and SSH key from there:
home_dir = pathlib.Path.home()

# Read the pull secret. To obtain your pull secret visit this page:
#
# https://console.redhat.com/openshift/install/pull-secret
#
# Then save the result to a `pull.txt` file in your home directory.
with open(home_dir / "pull.txt", "r") as file:
    pull_secret = file.read().strip()

# Read the public SSH key:
with open(home_dir / ".ssh" / "id_rsa.pub", "r") as file:
    ssh_public_key = file.read().strip()

# Prepare the properties for the operator:
openshift_ai_properties = json.dumps({
    "flavor": "watsonx",
})

# Create the cluster:
response = requests.post(
    f"{base_url}/api/assisted-install/v2/clusters",
    json={
        "name": name,
        "openshift_version": openshift_version,
        "base_dns_domain": base_dns_domain,
        "cpu_architecture": cpu_architecture,
        "pull_secret": pull_secret,
        "ssh_public_key": ssh_public_key,
        "machine_networks": [
            {
                "cidr": "192.168.100.0/24",
            },
        ],
        "api_vips": [
            {
              "ip": "192.168.100.20",
            },
        ],
        "ingress_vips": [
            {
              "ip": "192.168.100.21",
            },
        ],
        "olm_operators": [
            {
                "name": "openshift-ai",
                "properties": openshift_ai_properties,
            },
        ],
    },
)
if response.status_code != 201:
    raise Exception(f"Failed to create cluster: {response.content}")
cluster = response.json()
cluster_id = cluster["id"]
print(f"cluster_id: {cluster_id}")

# Create the infrastructure environment:
response = requests.post(
    f"{base_url}/api/assisted-install/v2/infra-envs",
    json={
        "name": name,
        "cluster_id": cluster_id,
        "openshift_version": openshift_version,
        "cpu_architecture": cpu_architecture,
        "pull_secret": pull_secret,
        "ssh_authorized_key": ssh_public_key,
        "image_type": "full-iso",
    },
)
if response.status_code != 201:
    raise Exception(f"Failed to create infrastructure environment: {response.content}")
infra_env = response.json()
infra_env_id = infra_env["id"]
print(f"infra_env_id: {infra_env_id}")

# Print ISO URL:
iso_url = infra_env["download_url"]
print(f"iso_url: {iso_url}")

Related: https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.11/html/installing_and_uninstalling_openshift_ai_self-managed/preparing-openshift-ai-for-ibm-cpd_prepare-openshift-ai-ibm-cpd#installing-openshift-data-science-operator-using-cli-ibm-cpd_prepare-openshift-ai-ibm-cpd

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19056

What environments does this code impact?

Automation (CI, tools, etc)
Cloud
Operator Managed Deployments
None

How was this code tested?

assisted-test-infra environment
dev-scripts environment
Reviewer's test appreciated
Waiting for CI to do a full test run
Manual (Elaborate on how it was tested)
No tests needed

Tested manually creating a cluster with the watsonx flavor enabled.

Checklist

Title and description added to both, commit and PR.
Relevant issues have been associated (see CONTRIBUTING guide)
This change does not require a documentation update (docstring, docs, README, etc)
Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?
Is there a bug required (and linked) for this change?
Should this PR be backported?

Currently when we add the _OpenShift AI_ operator we add it with all its dependencies and the default configuration. This is suitable for most users, but it brings dependencies that aren't really needed in all cases. _IBM watsonx_, for example, doesn't require many of the dependencies. In order to simplify that use case this patch adds support for a new _flavor_ property. It can have the values `default` and `watsonx`. When the value is `watsonx` the operator will be installed without the _pipelines_, _serverless_ and _servicemesh_ dependencies, and with the following configuration: ```yaml apiVersion: datasciencecluster.opendatahub.io/v1 kind: DataScienceCluster metadata: name: default-dsc spec: components: codeflare: managementState: Removed dashboard: managementState: Removed datasciencepipelines: managementState: Removed kserve: managementState: Managed defaultDeploymentMode: RawDeployment serving: managementState: Removed name: knative-serving kueue: managementState: Removed modelmeshserving: managementState: Removed ray: managementState: Removed trainingoperator: managementState: Managed trustyai: managementState: Removed workbenches: managementState: Remove ``` The assisted installer UI doesn't support the operator properties mechanisms, so this needs to be done via the API. For example, the following Python script creates a new clsuter using the _watsonx_ flavor: ```python #!/usr/bin/env python3 # -*- coding: utf-8 -*- import base64 import json import pathlib import requests import subprocess # Details of the cluster: name = "mycluster" base_url = "https://api.openshift.com" base_dns_domain = "mydomain" openshift_version = "4.16" cpu_architecture = "x86_64" # Find the home directory, as we will take the pull secret and SSH key from there: home_dir = pathlib.Path.home() # Read the pull secret. To obtain your pull secret visit this page: # # https://console.redhat.com/openshift/install/pull-secret # # Then save the result to a `pull.txt` file in your home directory. with open(home_dir / "pull.txt", "r") as file: pull_secret = file.read().strip() # Read the public SSH key: with open(home_dir / ".ssh" / "id_rsa.pub", "r") as file: ssh_public_key = file.read().strip() # Prepare the properties for the operator: openshift_ai_properties = json.dumps({ "flavor": "watsonx", }) # Create the cluster: response = requests.post( f"{base_url}/api/assisted-install/v2/clusters", json={ "name": name, "openshift_version": openshift_version, "base_dns_domain": base_dns_domain, "cpu_architecture": cpu_architecture, "pull_secret": pull_secret, "ssh_public_key": ssh_public_key, "machine_networks": [ { "cidr": "192.168.100.0/24", }, ], "api_vips": [ { "ip": "192.168.100.20", }, ], "ingress_vips": [ { "ip": "192.168.100.21", }, ], "olm_operators": [ { "name": "openshift-ai", "properties": openshift_ai_properties, }, ], }, ) if response.status_code != 201: raise Exception(f"Failed to create cluster: {response.content}") cluster = response.json() cluster_id = cluster["id"] print(f"cluster_id: {cluster_id}") # Create the infrastructure environment: response = requests.post( f"{base_url}/api/assisted-install/v2/infra-envs", json={ "name": name, "cluster_id": cluster_id, "openshift_version": openshift_version, "cpu_architecture": cpu_architecture, "pull_secret": pull_secret, "ssh_authorized_key": ssh_public_key, "image_type": "full-iso", }, ) if response.status_code != 201: raise Exception(f"Failed to create infrastructure environment: {response.content}") infra_env = response.json() infra_env_id = infra_env["id"] print(f"infra_env_id: {infra_env_id}") # Print ISO URL: iso_url = infra_env["download_url"] print(f"iso_url: {iso_url}") ``` Related: https://issues.redhat.com/browse/MGMT-19056 Related: https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.11/html/installing_and_uninstalling_openshift_ai_self-managed/preparing-openshift-ai-for-ibm-cpd_prepare-openshift-ai-ibm-cpd#installing-openshift-data-science-operator-using-cli-ibm-cpd_prepare-openshift-ai-ibm-cpd Signed-off-by: Juan Hernandez <juan.hernandez@redhat.com>

openshift-ci-robot · 2024-11-14T08:28:18Z

@jhernand: This pull request explicitly references no jira issue.

In response to this:

Currently when we add the OpenShift AI operator we add it with all its dependencies and the default configuration. This is suitable for most users, but it brings dependencies that aren't really needed in all cases. IBM watsonx, for example, doesn't require many of the dependencies. In order to simplify that use case this patch adds support for a new flavor property. It can have the values default and watsonx. When the value is watsonx the operator will be installed without the pipelines, serverless and servicemesh dependencies, and with the following configuration:
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
 name: default-dsc
spec:
 components:
   codeflare:
     managementState: Removed
   dashboard:
     managementState: Removed
   datasciencepipelines:
     managementState: Removed
   kserve:
     managementState: Managed
     defaultDeploymentMode: RawDeployment
     serving:
       managementState: Removed
       name: knative-serving
   kueue:
     managementState: Removed
   modelmeshserving:
     managementState: Removed
   ray:
     managementState: Removed
   trainingoperator:
     managementState: Managed
   trustyai:
     managementState: Removed
   workbenches:
     managementState: Remove
The assisted installer UI doesn't support the operator properties mechanisms, so this needs to be done via the API. For example, the following Python script creates a new clsuter using the watsonx flavor:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import base64
import json
import pathlib
import requests
import subprocess

# Details of the cluster:
name = "mycluster"
base_url = "https://api.openshift.com"
base_dns_domain = "mydomain"
openshift_version = "4.16"
cpu_architecture = "x86_64"

# Find the home directory, as we will take the pull secret and SSH key from there:
home_dir = pathlib.Path.home()

# Read the pull secret. To obtain your pull secret visit this page:
#
# https://console.redhat.com/openshift/install/pull-secret
#
# Then save the result to a `pull.txt` file in your home directory.
with open(home_dir / "pull.txt", "r") as file:
   pull_secret = file.read().strip()

# Read the public SSH key:
with open(home_dir / ".ssh" / "id_rsa.pub", "r") as file:
   ssh_public_key = file.read().strip()

# Prepare the properties for the operator:
openshift_ai_properties = json.dumps({
   "flavor": "watsonx",
})

# Create the cluster:
response = requests.post(
   f"{base_url}/api/assisted-install/v2/clusters",
   json={
       "name": name,
       "openshift_version": openshift_version,
       "base_dns_domain": base_dns_domain,
       "cpu_architecture": cpu_architecture,
       "pull_secret": pull_secret,
       "ssh_public_key": ssh_public_key,
       "machine_networks": [
           {
               "cidr": "192.168.100.0/24",
           },
       ],
       "api_vips": [
           {
             "ip": "192.168.100.20",
           },
       ],
       "ingress_vips": [
           {
             "ip": "192.168.100.21",
           },
       ],
       "olm_operators": [
           {
               "name": "openshift-ai",
               "properties": openshift_ai_properties,
           },
       ],
   },
)
if response.status_code != 201:
   raise Exception(f"Failed to create cluster: {response.content}")
cluster = response.json()
cluster_id = cluster["id"]
print(f"cluster_id: {cluster_id}")

# Create the infrastructure environment:
response = requests.post(
   f"{base_url}/api/assisted-install/v2/infra-envs",
   json={
       "name": name,
       "cluster_id": cluster_id,
       "openshift_version": openshift_version,
       "cpu_architecture": cpu_architecture,
       "pull_secret": pull_secret,
       "ssh_authorized_key": ssh_public_key,
       "image_type": "full-iso",
   },
)
if response.status_code != 201:
   raise Exception(f"Failed to create infrastructure environment: {response.content}")
infra_env = response.json()
infra_env_id = infra_env["id"]
print(f"infra_env_id: {infra_env_id}")

# Print ISO URL:
iso_url = infra_env["download_url"]
print(f"iso_url: {iso_url}")
Related: https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.11/html/installing_and_uninstalling_openshift_ai_self-managed/preparing-openshift-ai-for-ibm-cpd_prepare-openshift-ai-ibm-cpd#installing-openshift-data-science-operator-using-cli-ibm-cpd_prepare-openshift-ai-ibm-cpd

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19056

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Tested manually creating a cluster with the watsonx flavor enabled.

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jhernand · 2024-11-14T08:28:32Z

/hold

This is experimental.

openshift-ci · 2024-11-14T08:29:42Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhernand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jhernand]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2024-11-14T09:29:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.79%. Comparing base (9e03100) to head (6d10412).
Report is 1 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #6996       +/-   ##
===========================================
- Coverage   68.28%   56.79%   -11.50%     
===========================================
  Files         271      172       -99     
  Lines       38650    13823    -24827     
===========================================
- Hits        26394     7851    -18543     
+ Misses       9862     5258     -4604     
+ Partials     2394      714     -1680

see 100 files with indirect coverage changes

openshift-ci · 2024-11-14T09:38:27Z

@jhernand: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/edge-e2e-metal-assisted-odf-4-17	`6d10412`	link	true	`/test edge-e2e-metal-assisted-odf-4-17`
ci/prow/edge-e2e-metal-assisted-lvm	`6d10412`	link	true	`/test edge-e2e-metal-assisted-lvm`
ci/prow/edge-e2e-ai-operator-ztp	`6d10412`	link	true	`/test edge-e2e-ai-operator-ztp`
ci/prow/edge-images	`6d10412`	link	true	`/test edge-images`
ci/prow/edge-subsystem-aws	`6d10412`	link	true	`/test edge-subsystem-aws`
ci/prow/edge-verify-generated-code	`6d10412`	link	true	`/test edge-verify-generated-code`
ci/prow/okd-scos-e2e-aws-ovn	`6d10412`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-agent-compact-ipv4	`6d10412`	link	true	`/test e2e-agent-compact-ipv4`
ci/prow/edge-e2e-metal-assisted-cnv-4-17	`6d10412`	link	true	`/test edge-e2e-metal-assisted-cnv-4-17`
ci/prow/edge-e2e-metal-assisted-mtv-4-17	`6d10412`	link	true	`/test edge-e2e-metal-assisted-mtv-4-17`
ci/prow/edge-ci-index	`6d10412`	link	true	`/test edge-ci-index`
ci/prow/images	`6d10412`	link	true	`/test images`
ci/prow/edge-lint	`6d10412`	link	true	`/test edge-lint`
ci/prow/edge-subsystem-kubeapi-aws	`6d10412`	link	true	`/test edge-subsystem-kubeapi-aws`
ci/prow/edge-e2e-metal-assisted	`6d10412`	link	true	`/test edge-e2e-metal-assisted`
ci/prow/edge-unit-test	`6d10412`	link	true	`/test edge-unit-test`
ci/prow/mce-images	`6d10412`	link	true	`/test mce-images`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 14, 2024

openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 14, 2024

openshift-ci bot requested review from eranco74 and pastequo November 14, 2024 08:29

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NO-ISSUE: Add IBM _watsonx_ flavor to OpenShift AI operator #6996

NO-ISSUE: Add IBM _watsonx_ flavor to OpenShift AI operator #6996

jhernand commented Nov 14, 2024 •

edited

Loading

openshift-ci-robot commented Nov 14, 2024

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Checklist

Reviewers Checklist

jhernand commented Nov 14, 2024

openshift-ci bot commented Nov 14, 2024

codecov bot commented Nov 14, 2024

openshift-ci bot commented Nov 14, 2024

NO-ISSUE: Add IBM _watsonx_ flavor to OpenShift AI operator #6996

Are you sure you want to change the base?

NO-ISSUE: Add IBM _watsonx_ flavor to OpenShift AI operator #6996

Conversation

jhernand commented Nov 14, 2024 • edited Loading

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Checklist

Reviewers Checklist

openshift-ci-robot commented Nov 14, 2024

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Checklist

Reviewers Checklist

jhernand commented Nov 14, 2024

openshift-ci bot commented Nov 14, 2024

codecov bot commented Nov 14, 2024

Codecov Report

openshift-ci bot commented Nov 14, 2024

jhernand commented Nov 14, 2024 •

edited

Loading