Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO-ISSUE: Add IBM _watsonx_ flavor to OpenShift AI operator #6996

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jhernand
Copy link
Contributor

@jhernand jhernand commented Nov 14, 2024

Currently when we add the OpenShift AI operator we add it with all its dependencies and the default configuration. This is suitable for most users, but it brings dependencies that aren't really needed in all cases. IBM watsonx, for example, doesn't require many of the dependencies. In order to simplify that use case this patch adds support for a new flavor property. It can have the values default and watsonx. When the value is watsonx the operator will be installed without the pipelines, serverless and servicemesh dependencies, and with the following configuration:

apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: default-dsc
spec:
  components:
    codeflare:
      managementState: Removed
    dashboard:
      managementState: Removed
    datasciencepipelines:
      managementState: Removed
    kserve:
      managementState: Managed
      defaultDeploymentMode: RawDeployment
      serving:
        managementState: Removed
        name: knative-serving
    kueue:
      managementState: Removed
    modelmeshserving:
      managementState: Removed
    ray:
      managementState: Removed
    trainingoperator:
      managementState: Managed
    trustyai:
      managementState: Removed
    workbenches:
      managementState: Removed

The assisted installer UI doesn't support the operator properties mechanisms, so this needs to be done via the API. For example, the following Python script creates a new clsuter using the watsonx flavor:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import base64
import json
import pathlib
import requests
import subprocess

# Details of the cluster:
name = "mycluster"
base_url = "https://api.openshift.com"
base_dns_domain = "mydomain"
openshift_version = "4.16"
cpu_architecture = "x86_64"

# Find the home directory, as we will take the pull secret and SSH key from there:
home_dir = pathlib.Path.home()

# Read the pull secret. To obtain your pull secret visit this page:
#
# https://console.redhat.com/openshift/install/pull-secret
#
# Then save the result to a `pull.txt` file in your home directory.
with open(home_dir / "pull.txt", "r") as file:
    pull_secret = file.read().strip()

# Read the public SSH key:
with open(home_dir / ".ssh" / "id_rsa.pub", "r") as file:
    ssh_public_key = file.read().strip()

# Prepare the properties for the operator:
openshift_ai_properties = json.dumps({
    "flavor": "watsonx",
})

# Create the cluster:
response = requests.post(
    f"{base_url}/api/assisted-install/v2/clusters",
    json={
        "name": name,
        "openshift_version": openshift_version,
        "base_dns_domain": base_dns_domain,
        "cpu_architecture": cpu_architecture,
        "pull_secret": pull_secret,
        "ssh_public_key": ssh_public_key,
        "machine_networks": [
            {
                "cidr": "192.168.100.0/24",
            },
        ],
        "api_vips": [
            {
              "ip": "192.168.100.20",
            },
        ],
        "ingress_vips": [
            {
              "ip": "192.168.100.21",
            },
        ],
        "olm_operators": [
            {
                "name": "openshift-ai",
                "properties": openshift_ai_properties,
            },
        ],
    },
)
if response.status_code != 201:
    raise Exception(f"Failed to create cluster: {response.content}")
cluster = response.json()
cluster_id = cluster["id"]
print(f"cluster_id: {cluster_id}")

# Create the infrastructure environment:
response = requests.post(
    f"{base_url}/api/assisted-install/v2/infra-envs",
    json={
        "name": name,
        "cluster_id": cluster_id,
        "openshift_version": openshift_version,
        "cpu_architecture": cpu_architecture,
        "pull_secret": pull_secret,
        "ssh_authorized_key": ssh_public_key,
        "image_type": "full-iso",
    },
)
if response.status_code != 201:
    raise Exception(f"Failed to create infrastructure environment: {response.content}")
infra_env = response.json()
infra_env_id = infra_env["id"]
print(f"infra_env_id: {infra_env_id}")

# Print ISO URL:
iso_url = infra_env["download_url"]
print(f"iso_url: {iso_url}")

Related: https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.11/html/installing_and_uninstalling_openshift_ai_self-managed/preparing-openshift-ai-for-ibm-cpd_prepare-openshift-ai-ibm-cpd#installing-openshift-data-science-operator-using-cli-ibm-cpd_prepare-openshift-ai-ibm-cpd

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19056

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Tested manually creating a cluster with the watsonx flavor enabled.

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Currently when we add the _OpenShift AI_ operator we add it with all its
dependencies and the default configuration. This is suitable for most
users, but it brings dependencies that aren't really needed in all
cases. _IBM watsonx_, for example, doesn't require many of the
dependencies. In order to simplify that use case this patch adds support
for a new _flavor_ property. It can have the values `default` and
`watsonx`. When the value is `watsonx` the operator will be installed
without the _pipelines_, _serverless_ and _servicemesh_ dependencies,
and with the following configuration:

```yaml
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: default-dsc
spec:
  components:
    codeflare:
      managementState: Removed
    dashboard:
      managementState: Removed
    datasciencepipelines:
      managementState: Removed
    kserve:
      managementState: Managed
      defaultDeploymentMode: RawDeployment
      serving:
        managementState: Removed
        name: knative-serving
    kueue:
      managementState: Removed
    modelmeshserving:
      managementState: Removed
    ray:
      managementState: Removed
    trainingoperator:
      managementState: Managed
    trustyai:
      managementState: Removed
    workbenches:
      managementState: Remove
```

The assisted installer UI doesn't support the operator properties
mechanisms, so this needs to be done via the API. For example, the
following Python script creates a new clsuter using the _watsonx_
flavor:

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import base64
import json
import pathlib
import requests
import subprocess

# Details of the cluster:
name = "mycluster"
base_url = "https://api.openshift.com"
base_dns_domain = "mydomain"
openshift_version = "4.16"
cpu_architecture = "x86_64"

# Find the home directory, as we will take the pull secret and SSH key from there:
home_dir = pathlib.Path.home()

# Read the pull secret. To obtain your pull secret visit this page:
#
# https://console.redhat.com/openshift/install/pull-secret
#
# Then save the result to a `pull.txt` file in your home directory.
with open(home_dir / "pull.txt", "r") as file:
    pull_secret = file.read().strip()

# Read the public SSH key:
with open(home_dir / ".ssh" / "id_rsa.pub", "r") as file:
    ssh_public_key = file.read().strip()

# Prepare the properties for the operator:
openshift_ai_properties = json.dumps({
    "flavor": "watsonx",
})

# Create the cluster:
response = requests.post(
    f"{base_url}/api/assisted-install/v2/clusters",
    json={
        "name": name,
        "openshift_version": openshift_version,
        "base_dns_domain": base_dns_domain,
        "cpu_architecture": cpu_architecture,
        "pull_secret": pull_secret,
        "ssh_public_key": ssh_public_key,
        "machine_networks": [
            {
                "cidr": "192.168.100.0/24",
            },
        ],
        "api_vips": [
            {
              "ip": "192.168.100.20",
            },
        ],
        "ingress_vips": [
            {
              "ip": "192.168.100.21",
            },
        ],
        "olm_operators": [
            {
                "name": "openshift-ai",
                "properties": openshift_ai_properties,
            },
        ],
    },
)
if response.status_code != 201:
    raise Exception(f"Failed to create cluster: {response.content}")
cluster = response.json()
cluster_id = cluster["id"]
print(f"cluster_id: {cluster_id}")

# Create the infrastructure environment:
response = requests.post(
    f"{base_url}/api/assisted-install/v2/infra-envs",
    json={
        "name": name,
        "cluster_id": cluster_id,
        "openshift_version": openshift_version,
        "cpu_architecture": cpu_architecture,
        "pull_secret": pull_secret,
        "ssh_authorized_key": ssh_public_key,
        "image_type": "full-iso",
    },
)
if response.status_code != 201:
    raise Exception(f"Failed to create infrastructure environment: {response.content}")
infra_env = response.json()
infra_env_id = infra_env["id"]
print(f"infra_env_id: {infra_env_id}")

# Print ISO URL:
iso_url = infra_env["download_url"]
print(f"iso_url: {iso_url}")
```

Related: https://issues.redhat.com/browse/MGMT-19056
Related: https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.11/html/installing_and_uninstalling_openshift_ai_self-managed/preparing-openshift-ai-for-ibm-cpd_prepare-openshift-ai-ibm-cpd#installing-openshift-data-science-operator-using-cli-ibm-cpd_prepare-openshift-ai-ibm-cpd
Signed-off-by: Juan Hernandez <juan.hernandez@redhat.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 14, 2024
@openshift-ci-robot
Copy link

@jhernand: This pull request explicitly references no jira issue.

In response to this:

Currently when we add the OpenShift AI operator we add it with all its dependencies and the default configuration. This is suitable for most users, but it brings dependencies that aren't really needed in all cases. IBM watsonx, for example, doesn't require many of the dependencies. In order to simplify that use case this patch adds support for a new flavor property. It can have the values default and watsonx. When the value is watsonx the operator will be installed without the pipelines, serverless and servicemesh dependencies, and with the following configuration:

apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
 name: default-dsc
spec:
 components:
   codeflare:
     managementState: Removed
   dashboard:
     managementState: Removed
   datasciencepipelines:
     managementState: Removed
   kserve:
     managementState: Managed
     defaultDeploymentMode: RawDeployment
     serving:
       managementState: Removed
       name: knative-serving
   kueue:
     managementState: Removed
   modelmeshserving:
     managementState: Removed
   ray:
     managementState: Removed
   trainingoperator:
     managementState: Managed
   trustyai:
     managementState: Removed
   workbenches:
     managementState: Remove

The assisted installer UI doesn't support the operator properties mechanisms, so this needs to be done via the API. For example, the following Python script creates a new clsuter using the watsonx flavor:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import base64
import json
import pathlib
import requests
import subprocess

# Details of the cluster:
name = "mycluster"
base_url = "https://api.openshift.com"
base_dns_domain = "mydomain"
openshift_version = "4.16"
cpu_architecture = "x86_64"

# Find the home directory, as we will take the pull secret and SSH key from there:
home_dir = pathlib.Path.home()

# Read the pull secret. To obtain your pull secret visit this page:
#
# https://console.redhat.com/openshift/install/pull-secret
#
# Then save the result to a `pull.txt` file in your home directory.
with open(home_dir / "pull.txt", "r") as file:
   pull_secret = file.read().strip()

# Read the public SSH key:
with open(home_dir / ".ssh" / "id_rsa.pub", "r") as file:
   ssh_public_key = file.read().strip()

# Prepare the properties for the operator:
openshift_ai_properties = json.dumps({
   "flavor": "watsonx",
})

# Create the cluster:
response = requests.post(
   f"{base_url}/api/assisted-install/v2/clusters",
   json={
       "name": name,
       "openshift_version": openshift_version,
       "base_dns_domain": base_dns_domain,
       "cpu_architecture": cpu_architecture,
       "pull_secret": pull_secret,
       "ssh_public_key": ssh_public_key,
       "machine_networks": [
           {
               "cidr": "192.168.100.0/24",
           },
       ],
       "api_vips": [
           {
             "ip": "192.168.100.20",
           },
       ],
       "ingress_vips": [
           {
             "ip": "192.168.100.21",
           },
       ],
       "olm_operators": [
           {
               "name": "openshift-ai",
               "properties": openshift_ai_properties,
           },
       ],
   },
)
if response.status_code != 201:
   raise Exception(f"Failed to create cluster: {response.content}")
cluster = response.json()
cluster_id = cluster["id"]
print(f"cluster_id: {cluster_id}")

# Create the infrastructure environment:
response = requests.post(
   f"{base_url}/api/assisted-install/v2/infra-envs",
   json={
       "name": name,
       "cluster_id": cluster_id,
       "openshift_version": openshift_version,
       "cpu_architecture": cpu_architecture,
       "pull_secret": pull_secret,
       "ssh_authorized_key": ssh_public_key,
       "image_type": "full-iso",
   },
)
if response.status_code != 201:
   raise Exception(f"Failed to create infrastructure environment: {response.content}")
infra_env = response.json()
infra_env_id = infra_env["id"]
print(f"infra_env_id: {infra_env_id}")

# Print ISO URL:
iso_url = infra_env["download_url"]
print(f"iso_url: {iso_url}")

Related: https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.11/html/installing_and_uninstalling_openshift_ai_self-managed/preparing-openshift-ai-for-ibm-cpd_prepare-openshift-ai-ibm-cpd#installing-openshift-data-science-operator-using-cli-ibm-cpd_prepare-openshift-ai-ibm-cpd

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19056

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Tested manually creating a cluster with the watsonx flavor enabled.

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jhernand
Copy link
Contributor Author

/hold

This is experimental.

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 14, 2024
Copy link

openshift-ci bot commented Nov 14, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhernand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2024
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.79%. Comparing base (9e03100) to head (6d10412).
Report is 1 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #6996       +/-   ##
===========================================
- Coverage   68.28%   56.79%   -11.50%     
===========================================
  Files         271      172       -99     
  Lines       38650    13823    -24827     
===========================================
- Hits        26394     7851    -18543     
+ Misses       9862     5258     -4604     
+ Partials     2394      714     -1680     

see 100 files with indirect coverage changes

Copy link

openshift-ci bot commented Nov 14, 2024

@jhernand: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-e2e-metal-assisted-odf-4-17 6d10412 link true /test edge-e2e-metal-assisted-odf-4-17
ci/prow/edge-e2e-metal-assisted-lvm 6d10412 link true /test edge-e2e-metal-assisted-lvm
ci/prow/edge-e2e-ai-operator-ztp 6d10412 link true /test edge-e2e-ai-operator-ztp
ci/prow/edge-images 6d10412 link true /test edge-images
ci/prow/edge-subsystem-aws 6d10412 link true /test edge-subsystem-aws
ci/prow/edge-verify-generated-code 6d10412 link true /test edge-verify-generated-code
ci/prow/okd-scos-e2e-aws-ovn 6d10412 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-agent-compact-ipv4 6d10412 link true /test e2e-agent-compact-ipv4
ci/prow/edge-e2e-metal-assisted-cnv-4-17 6d10412 link true /test edge-e2e-metal-assisted-cnv-4-17
ci/prow/edge-e2e-metal-assisted-mtv-4-17 6d10412 link true /test edge-e2e-metal-assisted-mtv-4-17
ci/prow/edge-ci-index 6d10412 link true /test edge-ci-index
ci/prow/images 6d10412 link true /test images
ci/prow/edge-lint 6d10412 link true /test edge-lint
ci/prow/edge-subsystem-kubeapi-aws 6d10412 link true /test edge-subsystem-kubeapi-aws
ci/prow/edge-e2e-metal-assisted 6d10412 link true /test edge-e2e-metal-assisted
ci/prow/edge-unit-test 6d10412 link true /test edge-unit-test
ci/prow/mce-images 6d10412 link true /test mce-images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants