-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore configuring Knative for Nvidia triton #169
Comments
Thank you for reporting us your feedback! The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5198.
|
To explore running an ISVC with Triton, I followed the guide in the KServe v0.11 documentation. Setup for Triton
To address these setup points in Charmed Kubeflow. Setup correspondence in Charmed Kubeflow
FindingsI was able to create an ISVC and make a prediction with Triton following the steps kserve guide, but without doing the setup, and doing the workaround for istio mentioned in |
Can we also better understand as part of this effort why Knative is doing tag resolution to digest in the first place? |
Cross referencing here some thoughts also on configuring Knative for GPUs, which will also affect how to configure Knative in general for Triton #171 (comment) And lastly, adding some comments regarding how to reach the ISVC in the first place. This is the state from when I last tested, but @NohaIhab let's confirm
Because of this I propose that we explicitly disable the sidecar for now, to confirm we can at least do the inference |
Reproducing the behavior in the above comment to confirm: See steps to reproduce1. deploy ckf 1.8/stable and configure dashboard access
When making a request to isvc from a pod inside the cluster to the isvc's status.address.url, got a
The isvc pod in a kubeflow user namespace has istio sidecar injection enabled by default, that is enabled for all pods in kubeflow user namespaces using the label
This confirms that by disabling istio sidecar injection, we can reach the inference service |
in response to this, how will we disable the sidecar for inference services? the existing
the isvc pod resulted:
the 2 containers are The 3 options listed above all rely on the users to follow the documentation, it would be ideal if we had it working out of the box, so I'm open to suggestions in that regard. |
I believe the first two proposals are not really feasible, since an end user will only have access to their namespace via the Kubeflow UIs. So they won't have permissions to either
So this leaves us only with option 3 as the viable one. Then indeed the last piece is if we could ideally add this annotation automatically to the ISVCs. We were discussing something similar with @DnPlas for Istio CNI and IIRC the best approach for this would be to Charm something like Kyberno canonical/istio-operators#356 (comment) So my understanding is that for now we'll have to rely on documentation, similarly to how users should disable sidecars for Katib. |
Thanks @NohaIhab for all the context. Please correct me if I am wrong, this is what I understand is needed:
Both (2) and (3) are changed in the For (1) I understand that the KServe documentation tells you to have a "network accessible" ingress gateway because the way they are performing inference is by directly hitting the ISVC URL that is given by the knative-operators provided that we have a correct knative gateway configuration. From @kimwnasptd's comment and from past discussions, we know that our current setup does not allow any I understand that you disabled the istio sidecar injection in your test namespace and that worked, but I wonder if this is what we want in the long run. We have investigated ways to mutate k8s objects with kyverno, but I also wonder if fixing canonical/kserve-operators#205 instead of charming a new component to add annotations/labels automatically would be our best option here. |
Thank you @DnPlas for the review.
to your point on where these will be changed, I need to look into it for the way our charm is installing |
second, for the issue canonical/kserve-operators#205 with istio |
To add some more context on the Gateways situation, there are 2 angles on how users will hit ISVCs:
NOTE: by IngressGateway I refer to the actual Pod that a In both cases we need to (not now necessarily) have an e2e story for having mTLS in all the traffic path (IngressGateway, knative, isvc) as well as authorization policies (require sidecars everywhere). The "external" IngressGateway though is more complicated because, additionally to the above, we'll need a way to get programmatic access tokens, for the authentication flow (which would also require AuthorizationPolicies in user namespaces for the identity of those tokens). So for this effort, as @NohaIhab nicely explained, we propose to
One thing that we need, and I can create those, are issues for those things I mentioned and the blockers. So that this context is more broken down and such decisions are easier to track based on the context. @DnPlas if you have a more efficient approach to unblock this effort let us know, but if not because the clock is ticking we'll need to move with the above approach |
In response to
To modify the configmaps created by
see reference in knative's upstream repo for the configuration keys: config-deployment and config-features
|
@kimwnasptd @NohaIhab I am okay with doing this iteratively by:
This seems like a good compromise in the interim to unblock this integration It is still unclear to me in which of our versions we are going to support this because it seems like this change will land in |
@DnPlas sounds good, I've created issue canonical/kserve-operators#216 to track the local gateway issue as well. You can link it as well in the enhancements issue. |
closing this as agreed with @DnPlas that sufficient information has been gathered to move on. |
Context
Exploring the configuration of
knative-serving
to make an ISVC with Nvidia Triton imageWhat needs to get done
Expose what we need to configure Knative Serving for Nvidia triton images Tensorflow - KServe Documentation Website
Definition of Done
an ISVC is run with triton successfully and the configuration needed is documented in the issue
The text was updated successfully, but these errors were encountered: