Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: initial move of kserve to new structure #1347

Open
wants to merge 3 commits into
base: feature-operator-refactor
Choose a base branch
from

Conversation

grdryn
Copy link
Member

@grdryn grdryn commented Nov 6, 2024


Description

This change moves responsibility of reconciling KServe to the kserve controller, which reacts to a Kserve CR.

JIRA: https://issues.redhat.com/browse/RHOAIENG-13179

How Has This Been Tested?

  • Tested locally in different configurations
  • e2e tests added (open to suggestions for additional tests on top of what's included here)

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link

openshift-ci bot commented Nov 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from grdryn. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

main.go Show resolved Hide resolved
Copy link
Member Author

@grdryn grdryn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review to highlight some new things, and questions

controllers/components/kserve/kserve_controller.go Outdated Show resolved Hide resolved
controllers/components/kserve/kserve_support.go Outdated Show resolved Hide resolved
controllers/components/kserve/kserve_controller.go Outdated Show resolved Hide resolved
Copy link

codecov bot commented Nov 9, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (feature-operator-refactor@d748e02). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...atasciencecluster/datasciencecluster_controller.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@                     Coverage Diff                      @@
##             feature-operator-refactor    #1347   +/-   ##
============================================================
  Coverage                             ?   26.84%           
============================================================
  Files                                ?       60           
  Lines                                ?     4783           
  Branches                             ?        0           
============================================================
  Hits                                 ?     1284           
  Misses                               ?     3350           
  Partials                             ?      149           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@grdryn
Copy link
Member Author

grdryn commented Nov 13, 2024

/retest

@grdryn grdryn force-pushed the RHOAIENG-13179-kserve branch 6 times, most recently from df27878 to 270d2f0 Compare November 18, 2024 16:46
@grdryn grdryn changed the title WIP: initial move of kserve to new structure feat: initial move of kserve to new structure Nov 18, 2024
@grdryn
Copy link
Member Author

grdryn commented Nov 20, 2024

/retest

^the logs from the previous run look like there might have been some infra strangeness, so just going to try again rather than trying to understand further, since the tests appear to pass fine for me locally

image

@lburgazzoli
Copy link
Contributor

@grdryn beside some small points, the PR looks good ! I'd recommend to sync up with @zdtsw as she is doing somethign related to the model controller here #1338

@zdtsw
Copy link
Member

zdtsw commented Nov 21, 2024

@grdryn beside some small points, the PR looks good ! I'd recommend to sync up with @zdtsw as she is doing somethign related to the model controller here #1338

if kserve is working in this PR, i am fine to get this one out first, then i can rebase on it.

controllers/components/kserve/kserve_controller.go Outdated Show resolved Hide resolved

// clusterRoleAggregationRulePredicate is a watch predicate that can be used with
// ClusterRoles to ignore the rules field on update if aggregationRule is set.
var clusterRoleAggregationRulePredicate = predicate.Funcs{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure i understand this part: in which example case, this should be return false as not trigger reconcile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it now to reflect what Luca suggested (and also I've moved it to pkg/controller/predicates/clusterrole/clusterrole.go), so it might make more sense now?

If not: essentially, if aggregationRule is set, then even though the resource we apply has an empty ruleset, k8s will populate it with the rules from the other ClusterRoles that are selected by the aggregationRule section. If we reconcile back to an empty rule set, then it will just be flip-flopping continuously between empty and aggregated, so this predicate will return false when aggregateRule is set AND the change is only to the rules, since the rules are more managed by k8s rather than the operator.

Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make sense to me.

As a follow up improvement, we can try to remove the rules field before the resource gets applied in case it is an aggregation and the rules is empty. But it requires some small experiments and does not need to be done in this pr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in short, this predicate is to return false:
if old .spec.aggregationRule block and the new .spec.aggregationRule block are the same.
(regardless, old has .spec.rules or new has .spec.rules or if both's .spec.rules are the same/or not) ?

but, why this did not happen on "incubation"? the old modelMeshRolePredicates does not catch kserve-admin which is the only one uses aggregationRule

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but, why this did not happen on "incubation"

I think this depends on the SSA vs Merge Patch semantic so it may have happened that the object were left untouched.

@grdryn grdryn changed the title feat: initial move of kserve to new structure WIP: feat: initial move of kserve to new structure Nov 24, 2024
With the old value, opendatahub-odh-auth-provider, the value would be
set, then almost immediately be reset to the value of
{{.AuthExtensionName}} (which appears to be equivalent to
<dsci.ApplicationNamespace>-auth-provider), via the separate manifest
specified in the following file, which gets applied by the same list
of resources in the feature:

z-migrations/kserve-predictor-authorizationpolicy.patch.tmpl.yaml

This causes a complication when watching the AuthorizationPolicy,
because it causes it to continuously update, flip-flopping between the
two values.

It seems to make more sense to just set the value to the expected
value from the outset, so that's what this change attempts to do.
Copy link
Member Author

@grdryn grdryn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lburgazzoli @zdtsw thanks for your reviews & feedback! I think I've addressed everything so far, but I'm calling out a couple of additional things here that have changed since last time, to make sure you see them. Thanks!

@grdryn grdryn changed the title WIP: feat: initial move of kserve to new structure feat: initial move of kserve to new structure Nov 25, 2024
@grdryn
Copy link
Member Author

grdryn commented Nov 26, 2024

I don't currently plan on making any more changes to this pull request, unless there are more requests to change things by reviewers.

There's one open discussion at #1347 (comment), which could be considered non-blocking for merge IMHO (it's not a breaking bug, and any fix could be done async in a follow-up PR).

@lburgazzoli
Copy link
Contributor

LGDM, @zdtsw ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

6 participants