Explore configuring Knative for Nvidia triton #169

NohaIhab · 2024-01-15T14:40:31Z

Context

Exploring the configuration of knative-serving to make an ISVC with Nvidia Triton image

What needs to get done

Expose what we need to configure Knative Serving for Nvidia triton images Tensorflow - KServe Documentation Website

Definition of Done

an ISVC is run with triton successfully and the configuration needed is documented in the issue

The text was updated successfully, but these errors were encountered:

syncronize-issues-to-jira · 2024-01-15T14:40:38Z

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5198.

This message was autogenerated

NohaIhab · 2024-02-04T21:35:23Z

To explore running an ISVC with Triton, I followed the guide in the KServe v0.11 documentation.
The guide suggests a setup to run an ISVC with Trition.

Setup for Triton

Your cluster's Istio Ingress gateway must be network accessible.
Skip tag resolution for nvcr.io
Increase progress deadline to 600s

To address these setup points in Charmed Kubeflow.

Setup correspondence in Charmed Kubeflow

Currently there's an issue with sending requests to ISVCs using the istio gateway, see Unable to make requests to an ISVC from outside the cluster kserve-operators#205
The need to Skipping tag resolution depends on the registry setup in the Knative Serving controller, from the Knative docs:

The Knative Serving controller must be configured to access the container registry to use this feature.

this feature refers to tag resolution.
For example, in a deployment with microk8s using the internal registry tag resolution was functional.
It's not clear to me why this is mentioned in the guide, it can be possibly assuming a private registry that needs to be trusted.

In the knative docs, the progressDeadline is said to be 600s by default, and since we don't explicitly configure it in the charm, so it is 600s in a CKF deployment.

Findings

I was able to create an ISVC and make a prediction with Triton following the steps kserve guide, but without doing the setup, and doing the workaround for istio mentioned in 1.. Based on that, further investigation is needed as to when these setup requirements 2 and 3 are needed. It's also something to consider whether a user has changed the knative configuration since they deployed it with CKF, so even if the defaults enable triton, it's better to explicitly set these configurations to make sure the ISVC can run correctly.

NohaIhab · 2024-02-06T09:47:03Z

I wrote a thread in the kserve slack channel to ask why skipping tag resolution is needed, and filed an issue in knative documentation

kimwnasptd · 2024-02-07T08:51:01Z

Can we also better understand as part of this effort why Knative is doing tag resolution to digest in the first place?
https://knative.dev/docs/serving/tag-resolution/

kimwnasptd · 2024-02-27T09:40:40Z

Cross referencing here some thoughts also on configuring Knative for GPUs, which will also affect how to configure Knative in general for Triton #171 (comment)

And lastly, adding some comments regarding how to reach the ISVC in the first place. This is the state from when I last tested, but @NohaIhab let's confirm

By default the ISVC pod gets a sidecar, in the user namespace
The status.address.url of the ISVC is the externalName SVC in the namespace
The virtual services are using the knative-local-gateway, which is using the istio-ingressgateway pod but with port 8081
When I tried to hit the URL of the ISVC I got 403, and this was because of the communication between knative-local-gateway <> ISVC pod

Because of this I propose that we explicitly disable the sidecar for now, to confirm we can at least do the inference

NohaIhab · 2024-02-27T14:14:54Z

Reproducing the behavior in the above comment to confirm:

See steps to reproduce

1. deploy ckf 1.8/stable and configure dashboard access

create a kubeflow user namespace by logging into the dashboard

create an isvc in the kubeflow user namespace:

kubectl apply -n admin -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF

create a temp pod to test requests to the isvc

kubectl run netshoot --rm -i --tty --image nicolaka/netshoot -- /bin/bash

from inside the pod, make a prediction request to the isvc using the ISVC's status.address.url

# prepare inference input
cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF

# curl isvc
curl -v -H "Content-Type: application/json" http://sklearn-iris.admin.svc.cluster.local/v1/models/sklearn-iris:predict -d @./iris-input.json

When making a request to isvc from a pod inside the cluster to the isvc's status.address.url, got a 403 in response with RBAC: access denied:

curl -v -H "Content-Type: application/json" http://sklearn-iris.admin.svc.cluster.local/v1/models/sklearn-iris:predict -d @./iris-input.json
* Host sklearn-iris.admin.svc.cluster.local:80 was resolved.
* IPv6: (none)
* IPv4: 10.152.183.48
*   Trying 10.152.183.48:80...
* Connected to sklearn-iris.admin.svc.cluster.local (10.152.183.48) port 80
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris.admin.svc.cluster.local
> User-Agent: curl/8.6.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 76
> 
< HTTP/1.1 403 Forbidden
< content-length: 19
< content-type: text/plain
< date: Tue, 27 Feb 2024 13:38:24 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 25
< 
* Connection #0 to host sklearn-iris.admin.svc.cluster.local left intact
RBAC: access denied

The isvc pod in a kubeflow user namespace has istio sidecar injection enabled by default, that is enabled for all pods in kubeflow user namespaces using the label istio-injection: enabled.
Running the same isvc in a non- kubeflow user namespace (with istio sidecar disabled), I am able to make a prediction from a pod inside the cluster:

curl -v -H "Content-Type: application/json" "sklearn-iris.kserve-test.svc.cluster.local/v1/models/sklearn-iris:predict" -d @./iris-input.json
* Host sklearn-iris.kserve-test.svc.cluster.local:80 was resolved.
* IPv6: (none)
* IPv4: 10.152.183.48
*   Trying 10.152.183.48:80...
* Connected to sklearn-iris.kserve-test.svc.cluster.local (10.152.183.48) port 80
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris.kserve-test.svc.cluster.local
> User-Agent: curl/8.6.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 76
> 
< HTTP/1.1 200 OK
< content-length: 21
< content-type: application/json
< date: Tue, 27 Feb 2024 12:49:55 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 24
< 
* Connection #0 to host sklearn-iris.kserve-test.svc.cluster.local left intact
{"predictions":[1,1]}

This confirms that by disabling istio sidecar injection, we can reach the inference service

NohaIhab · 2024-02-28T09:16:22Z

Because of this I propose that we explicitly disable the sidecar for now, to confirm we can at least do the inference

in response to this, how will we disable the sidecar for inference services? the existing istio-injection: enabled label enforces istio sidecar injection on all pods in kubeflow user namespaces.
I'm thinking the possible ways to do this are:

tell users in the documentation for triton to create a new namespace with kubectl and apply the isvc in it, this way it will not have the istio injection label.
tell users in the documentation to remove the istio-injection: enabled label from the user namespace, this is not feasible because we need the sidecar for other components like notebooks.
tell users to add the annotation "sidecar.istio.io/inject": "false" to the isvc before creation, this way the isvc pod will not have a sidecar even in the user namespace, an example of isvc with the annotation:

kubectl apply -n admin -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
  annotations:
    "sidecar.istio.io/inject": "false"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF

the isvc pod resulted:

sklearn-iris-predictor-00001-deployment-9c65bbb9d-kdlgf   2/2     Running   0          34s

the 2 containers are kserve-container and queue-proxy, with no istio sidecar.
in this example, admin is a kubeflow user namespace.

The 3 options listed above all rely on the users to follow the documentation, it would be ideal if we had it working out of the box, so I'm open to suggestions in that regard.

kimwnasptd · 2024-02-28T10:00:36Z

I believe the first two proposals are not really feasible, since an end user will only have access to their namespace via the Kubeflow UIs. So they won't have permissions to either

Update the annotations/labels in their namespace
Create a new namespace in the cluster

So this leaves us only with option 3 as the viable one. Then indeed the last piece is if we could ideally add this annotation automatically to the ISVCs.

We were discussing something similar with @DnPlas for Istio CNI and IIRC the best approach for this would be to Charm something like Kyberno canonical/istio-operators#356 (comment)

So my understanding is that for now we'll have to rely on documentation, similarly to how users should disable sidecars for Katib.

DnPlas · 2024-02-28T13:30:28Z

Thanks @NohaIhab for all the context.

Please correct me if I am wrong, this is what I understand is needed:

Istio ingress gateway must be network accessible - I will leave this for later.
Skip tag resolution for nvcr.io. This is because the KServe team had issues in the past with that image registry. From the Slack thread you referenced I see that "Tag resolution against nvcr.io still requires an API/Auth key". This means that we somehow have to change that in our knative-serving charm, either passing a config option or change it in our manifest files.
The progress deadline has to also be changed to allow pulling a big image and model. Again, this has to be either a config option or somewhere in our manifest files we have to change it.

Both (2) and (3) are changed in the config-deployment ConfigMap that gets created whenever we deploy knative-serving. I don't expect this to be a huge change, but you have to also check where we can do this because knative-serving configuration values are set here. Maybe this could also be part of your exploration.

For (1) I understand that the KServe documentation tells you to have a "network accessible" ingress gateway because the way they are performing inference is by directly hitting the ISVC URL that is given by the knative-operators provided that we have a correct knative gateway configuration.

From @kimwnasptd's comment and from past discussions, we know that our current setup does not allow any ISVC or ksvc from being reachable by the ingress gateway domain name because we are lacking the right configuration canonical/kserve-operators#205.

I understand that you disabled the istio sidecar injection in your test namespace and that worked, but I wonder if this is what we want in the long run. We have investigated ways to mutate k8s objects with kyverno, but I also wonder if fixing canonical/kserve-operators#205 instead of charming a new component to add annotations/labels automatically would be our best option here.

NohaIhab · 2024-02-28T13:49:14Z

Thank you @DnPlas for the review.
first, to conclude the configuration for knative-serving:

add a config option to knative-serving charm for registriesSkippingTagResolving (change in config-deployment configmap)
add a config option to knative-serving charm for progressDeadline (change in config-deployment configmap)
enable by default configuring Affinities and Tolerations, (change in knative's config-features configmap), the following:

kubernetes.podspec-affinity: "enabled"
kubernetes.podspec-nodeselector: "enabled"
kubernetes.podspec-tolerations: "enabled"

to your point on where these will be changed, I need to look into it for the way our charm is installing KnativeServing. We might have to patch the corresponding configmaps with lightkube after installing, but I will confirm.

NohaIhab · 2024-02-28T13:53:56Z

second, for the issue canonical/kserve-operators#205 with istio
Indeed we need to resolve the root cause of this in the long run, but due to the time limitation for this epic, we thought to do this workaround as we are doing for Katib. In the future, when we resolve the above issue seperately, we can then remove this workaround from the triton doc. WDYT?

kimwnasptd · 2024-02-29T07:34:00Z

To add some more context on the Gateways situation, there are 2 angles on how users will hit ISVCs:

from the "external" Istio IngressGateway, that has AuthService/Dex checks
from a local IngressGateway, that does not have auth

NOTE: by IngressGateway I refer to the actual Pod that a Gateway object configures. So in this case the istio-ingressgateway-* pods

In both cases we need to (not now necessarily) have an e2e story for having mTLS in all the traffic path (IngressGateway, knative, isvc) as well as authorization policies (require sidecars everywhere).

The "external" IngressGateway though is more complicated because, additionally to the above, we'll need a way to get programmatic access tokens, for the authentication flow (which would also require AuthorizationPolicies in user namespaces for the identity of those tokens).

So for this effort, as @NohaIhab nicely explained, we propose to

Use the local knative-local-gateway which is created from Knative Serving (no need for tokens)
Disable the istio sidecar for now from the ISVC, since it results in 403 and avoiding those is too complicated for this effort (will require sidecars in Knative, and then a way to propagate AuthorizationPolicies in user namespaces)

One thing that we need, and I can create those, are issues for those things I mentioned and the blockers. So that this context is more broken down and such decisions are easier to track based on the context.

@DnPlas if you have a more efficient approach to unblock this effort let us know, but if not because the clock is ticking we'll need to move with the above approach

NohaIhab · 2024-02-29T12:11:18Z

In response to

Both (2) and (3) are changed in the config-deployment ConfigMap that gets created whenever we deploy knative-serving. I don't expect this to be a huge change, but you have to also check where we can do this because knative-serving configuration values are set here. Maybe this could also be part of your exploration.

To modify the configmaps created by knative-serving, we'd need to modify the KnativeServing CR manifest by adding the following to the .spec:

  config:
    deployment:
      progress-deadline: {{ progress_deadline}}
      registries-skipping-tag-resolving: {{ registries_skip_tag_resolving }}
    features:
      kubernetes.podspec-affinity: "enabled"
      kubernetes.podspec-nodeselector: "enabled"
      kubernetes.podspec-tolerations: "enabled"

see reference in knative's upstream repo for the configuration keys: config-deployment and config-features
each the progress-deadline and registries-skipping-tag-resolving will be populated from a charm config that we will add as part of this effort in knative-serving charm.
The defaults of these configs will be the same as upstream's defaults, so that we preserve the same behavior as before.

default of progress-deadline: "600s"
default of registries-skipping-tag-resolving: "kind.local,ko.local,dev.local"

DnPlas · 2024-03-01T09:14:27Z

@kimwnasptd @NohaIhab I am okay with doing this iteratively by:

Providing documentation for running ISVCs in our current setup with istio sidecar disabled
Working on the gateway story down the road. BTW, I proposed to work on that issue for next cycle Enhancements for kserve-operators in 24.10 kserve-operators#211, so I guess a proper fix for the gateways will land in 1.9 or 1.10.

This seems like a good compromise in the interim to unblock this integration

It is still unclear to me in which of our versions we are going to support this because it seems like this change will land in main and we'll provide temporal documentation for this. Can we also specify what version we are targetting for this change (or if it's just latest/edge).

NohaIhab · 2024-03-01T10:18:10Z

@DnPlas sounds good, I've created issue canonical/kserve-operators#216 to track the local gateway issue as well. You can link it as well in the enhancements issue.
This feature will be in latest/edge for the time being and eventually it will land in CKF 1.9 AFAIK. @kimwnasptd can confirm.

NohaIhab · 2024-03-01T11:17:49Z

closing this as agreed with @DnPlas that sufficient information has been gathered to move on.

NohaIhab added the enhancement New feature or request label Jan 15, 2024

NohaIhab closed this as completed Mar 1, 2024

NohaIhab mentioned this issue Mar 7, 2024

Implement Triton integration with KNative Serving #171

Closed

github-project-automation bot added this to MLOps Solution Issues Aug 29, 2024

github-project-automation bot moved this to Done in MLOps Solution Issues Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore configuring Knative for Nvidia triton #169

Explore configuring Knative for Nvidia triton #169

NohaIhab commented Jan 15, 2024 •

edited

Loading

syncronize-issues-to-jira bot commented Jan 15, 2024

NohaIhab commented Feb 4, 2024 •

edited

Loading

NohaIhab commented Feb 6, 2024 •

edited

Loading

kimwnasptd commented Feb 7, 2024

kimwnasptd commented Feb 27, 2024 •

edited

Loading

NohaIhab commented Feb 27, 2024 •

edited

Loading

NohaIhab commented Feb 28, 2024

kimwnasptd commented Feb 28, 2024 •

edited

Loading

DnPlas commented Feb 28, 2024

NohaIhab commented Feb 28, 2024 •

edited

Loading

NohaIhab commented Feb 28, 2024

kimwnasptd commented Feb 29, 2024 •

edited

Loading

NohaIhab commented Feb 29, 2024 •

edited

Loading

DnPlas commented Mar 1, 2024 •

edited

Loading

NohaIhab commented Mar 1, 2024

NohaIhab commented Mar 1, 2024

Explore configuring Knative for Nvidia triton #169

Explore configuring Knative for Nvidia triton #169

Comments

NohaIhab commented Jan 15, 2024 • edited Loading

Context

What needs to get done

Definition of Done

syncronize-issues-to-jira bot commented Jan 15, 2024

NohaIhab commented Feb 4, 2024 • edited Loading

Setup for Triton

Setup correspondence in Charmed Kubeflow

Findings

NohaIhab commented Feb 6, 2024 • edited Loading

kimwnasptd commented Feb 7, 2024

kimwnasptd commented Feb 27, 2024 • edited Loading

NohaIhab commented Feb 27, 2024 • edited Loading

NohaIhab commented Feb 28, 2024

kimwnasptd commented Feb 28, 2024 • edited Loading

DnPlas commented Feb 28, 2024

NohaIhab commented Feb 28, 2024 • edited Loading

NohaIhab commented Feb 28, 2024

kimwnasptd commented Feb 29, 2024 • edited Loading

NohaIhab commented Feb 29, 2024 • edited Loading

DnPlas commented Mar 1, 2024 • edited Loading

NohaIhab commented Mar 1, 2024

NohaIhab commented Mar 1, 2024

NohaIhab commented Jan 15, 2024 •

edited

Loading

NohaIhab commented Feb 4, 2024 •

edited

Loading

NohaIhab commented Feb 6, 2024 •

edited

Loading

kimwnasptd commented Feb 27, 2024 •

edited

Loading

NohaIhab commented Feb 27, 2024 •

edited

Loading

kimwnasptd commented Feb 28, 2024 •

edited

Loading

NohaIhab commented Feb 28, 2024 •

edited

Loading

kimwnasptd commented Feb 29, 2024 •

edited

Loading

NohaIhab commented Feb 29, 2024 •

edited

Loading

DnPlas commented Mar 1, 2024 •

edited

Loading