Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster_config: init cluster variables on startup #1059

Merged
merged 3 commits into from
Oct 3, 2024

Conversation

ykaliuta
Copy link
Contributor

Cluster configuration is supposed to be the same during operator pod
lifetime. There is no point to detect it on every reconcile cycle
keeping in mind that it can causes many api requests.

Do the initialization on startup. Keep GetOperatorNamespace()
returning error since it defines some logic in the DSCInitialization
reconciler.

Automatically called init() won't work here due to need to check
error of the initialization.

Leave GetDomain() as is since it uses OpenshiftIngress configuration
which is created when DSCInitialization instantiated.

Log cluster configuration on success.

Many thanks for submitting your Pull Request ❤️!

Please make sure that your PR meets the following requirements:

  • You have read the contributors guide
  • Pull Request contains description of the issue
  • Pull Request contains link to the JIRA issue
  • Pull Request contains link to any dependent or related Pull Request

Description

JIRA issue:

How Has This Been Tested?

Screenshot or short clip:

Merge criteria:

  • Have a meaningful commit messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link

openshift-ci bot commented Jun 14, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ykaliuta ykaliuta changed the title RFC: cluster_config: init cluster variables on startup cluster_config: init cluster variables on startup Sep 12, 2024
@ykaliuta ykaliuta marked this pull request as ready for review September 12, 2024 14:52
@ykaliuta
Copy link
Contributor Author

/cc @lburgazzoli

@ykaliuta
Copy link
Contributor Author

/test opendatahub-operator-e2e

@@ -22,13 +23,68 @@ import (
// +kubebuilder:rbac:groups="config.openshift.io",resources=ingresses,verbs=get

type Platform string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change :( But will not update until other changes needed due to the test instability

return err
}

printClusterConfig(log)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need a separate function for adding a log? Or are we planning to use it more widely later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was changing format in the meantime and did not want to pollute the main function with the details. But if you like I can inline it.

@lburgazzoli
Copy link
Contributor

LGTM

func Init(ctx context.Context, cli client.Client, log logr.Logger) error {
var err error

clusterConfig.Namespace, err = getOperatorNamespace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but we could end with a "" as clusterConfig.Namespace's value
which is the main cause for err != nil
should this not be fatal => os.Exit(1) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in configureAlertManager and in watchMonitoringSecretResource the error causes return. In other places where the value compared "" does not change the flow. I did not revisit callees' logic. It may make sense.

should this not be fatal => os.Exit(1) ?

it's not fatal. Do you mean it should be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am thinking it should be fatal.
for this comments // not fatal, fallback to ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am thinking it should be fatal. for this comments // not fatal, fallback to ""

It changes behaviour. Mean, we continued to work if no environment variable or cannot read namespace from the filesystem, but if it's fatal we don't.

On the other hand with the environment variable corner cases (like tests) are covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand.

  1. env
  2. check /var/...
  3. log as error but continue
    but the consequence on (3) will leave to later problem, so if it should os.Exit(1) directly in main process or not.
    e.g i can understand someone build the whole product without having (1) set, but the reason can skip (2) is something really wrong with the pod.

but this is a question out of the scope of your PR.
the "" has been there, we do not need to change anything here

"Release", clusterConfig.Release)
}

func GetOperatorNamespace() (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so
getOperatorNamespace() call with real logic and return retrieve value
GetOperatorNamespace(...) return clusterConfig.Namespace

similar to
getPlatform() and getRelease() call logic return value
GetPlatform(...) and GetRelease(...) return filed's value

GetRelease(...) indirectly rely on getPlatform()

Copy link
Member

@zdtsw zdtsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ykaliuta ykaliuta force-pushed the cluster-config-init branch 4 times, most recently from 8994b24 to 13c401a Compare September 25, 2024 21:04
@ykaliuta
Copy link
Contributor Author

Can I get review before it needs rebase again? :) Added change with preserving rootCtx till mgr.Start.

main.go Outdated
@@ -127,7 +128,8 @@ func main() { //nolint:funlen,maintidx
ctrl.SetLogger(logger.ConfigLoggers(logmode))

// root context
ctx := ctrl.SetupSignalHandler()
rootCtx := ctrl.SetupSignalHandler()
ctx := logf.IntoContext(rootCtx, setupLog)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i understand why for this change:
so the old code is, using rootCtx everywhere
the new code is, pass the new ctx which is context.WithValue(rootCtx, contextKey{}, setupLog) instead except to start manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new code should pass the original context without setuplog since I found it a bit incorrect to pass log.WithName("setup") further. Did I do a mistake?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so at the beginning, we do setupLog = ctrl.Log.WithName("setup")
then later e.g

if err = (&dscictrl.DSCInitializationReconciler{
		Client:                mgr.GetClient(),
		Scheme:                mgr.GetScheme(),
		Log:                   logger.LogWithLevel(ctrl.Log.WithName(operatorName).WithName("controllers").WithName("DSCInitialization"), logmode),
		Recorder:              mgr.GetEventRecorderFor("dscinitialization-controller"),
		ApplicationsNamespace: dscApplicationsNamespace,
	}).SetupWithManager(ctx, mgr); err != nil {
		setupLog.Error(err, "unable to create controller", "controller", "DSCInitiatlization")
		os.Exit(1)
	}

call with setupLog when problem to start contoller.
context is not used for for logging "unable to create controller", "controller", "DSCInitiatlization".
or?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean ctx in .SetupWithManager(ctx, mgr)? It is not used at all IIUC, I planned patch to remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but what is "the new code should pass the original context without setuplog"?

because from this change, except the mgr.Start(rootCtx) is kept to use the root context (regardless what the name of variable is now), everywhere else is using the new one (logf.IntoContext(rootCtx, setupLog)). so i think the way e.g upgrade.CreateDefaultDSCI(ctx,...) is starting to have "setup" with this PR vs it did not before the PR? which is "the new code should pass the original context with setuplog" 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore my last comment :) Let me think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So upgrade.CreateDefaultDSCI(ctx,...) as executed by manager after Start() will have ctx without "setup" logger. Basically, what it uses now getting logger directly from the controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, controller framework passes its own logger with the context, so it's not needed to save rootCtx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zdtsw I reverted that change. Any more concerns?

Just a bit more clarity

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Cluster configuration is supposed to be the same during operator pod
lifetime. There is no point to detect it on every reconcile cycle
keeping in mind that it can causes many api requests.

Do the initialization on startup. Keep GetOperatorNamespace()
returning error since it defines some logic in the DSCInitialization
reconciler.

Automatically called init() won't work here due to need to check
error of the initialization.

Wrap logger into context and use it in the Init() like
controller-runtime does [1][2].

Save root context without the logger for mgr.Start since "setup"
logger does not fit normal manager's work.

Leave GetDomain() as is since it uses OpenshiftIngress configuration
which is created when DSCInitialization instantiated.

Log cluster configuration on success from Init, so remove platform
logging from main.

[1] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L297
[2] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L111

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
GetRelease return values are defined at startup, the error checked
in main, so no point to return error anymore.

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
@ykaliuta
Copy link
Contributor Author

ykaliuta commented Oct 3, 2024

/retest-required

@openshift-ci openshift-ci bot added the lgtm label Oct 3, 2024
Copy link

openshift-ci bot commented Oct 3, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zdtsw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Oct 3, 2024
@zdtsw zdtsw merged commit 8bebb1a into opendatahub-io:incubation Oct 3, 2024
7 of 8 checks passed
VaishnaviHire added a commit that referenced this pull request Oct 9, 2024
* cluster_config: init cluster variables on startup (#1059)

* cluster_config: move type definitions to the beginning of the file

Just a bit more clarity

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

* cluster_config: init cluster variables on startup

Cluster configuration is supposed to be the same during operator pod
lifetime. There is no point to detect it on every reconcile cycle
keeping in mind that it can causes many api requests.

Do the initialization on startup. Keep GetOperatorNamespace()
returning error since it defines some logic in the DSCInitialization
reconciler.

Automatically called init() won't work here due to need to check
error of the initialization.

Wrap logger into context and use it in the Init() like
controller-runtime does [1][2].

Save root context without the logger for mgr.Start since "setup"
logger does not fit normal manager's work.

Leave GetDomain() as is since it uses OpenshiftIngress configuration
which is created when DSCInitialization instantiated.

Log cluster configuration on success from Init, so remove platform
logging from main.

[1] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L297
[2] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L111

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

* cluster: do not return error from GetRelease

GetRelease return values are defined at startup, the error checked
in main, so no point to return error anymore.

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

---------

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

* components: move params.env image updating to Init stage (#1191)

* components, main: add Component Init method

Add Init() method to the Component interface and call it from main on
startup.

Will be used to move startup-time code from ReconcileComponent
(like adjusting params.env).

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

* components: move params.env image updating to Init stage

Jira: https://issues.redhat.com/browse/RHOAIENG-11592

Image names in environment are not supposed to be changed during
runtime of the operator, so it makes sense to update them only on
startup.

If manifests are overriden by DevFlags, the DevFlags' version will
be used.

The change is straight forward for most of the components where only
images are updated and params.env is located in the kustomize root
directory, but some components (dashboard, ray, codeflare,
modelregistry) also update some extra parameters. For them image
part only is moved to Init since other updates require runtime DSCI
information.

The patch also changes logic for ray, codeflare, and modelregistry
in this regard to update non-image parameters regardless of DevFlags
like it was changed in dashboard recently.

The DevFlags functionality brings some concerns:

- For most components the code is written such a way that as soon as
DevFlags supplied the global path variables are changed and never
reverted back to the defaults. For some (dashboard, trustyai) there
is (still global) OverridePath/entryPath pair and manifests reverted
to the default, BUT there is no logic of transition.

- codeflare: when manifests are overridden namespace param is
updated in the hardcoded (stock) path;

This logic is preserved.

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

---------

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

* update: remove webhook service from bundle (#1275)

- we do not need it in bundle, CSV auto generate one during installation
- if we install operator via OLM, webhook service still get used from config/webhook/service.yaml

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: add validation on application and monitoring namespace in DSCI (#1263)

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* logger: add zap command line switches (#1280)

Allow to tune preconfigured by --log-mode zap backend with standard
zap command line switches from controller-runtime (zap-devel,
zap-encoder, zap-log-level, zap-stacktrace-level,
zap-time-encoding)[1].

This brings flexibility in logger setup for development environments
first of all.

The patch does not change default logger setup and does not change
DSCI override functionality.

[1] https://sdk.operatorframework.io/docs/building-operators/golang/references/logging

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

* Modify Unit Tests GitHub Actions workflow to get code coverage test reports (#1279)

* Create codecov.yml

* Added to run test coverage also on PRs

* Removing trailing ]

* Update .github/workflows/codecov.yml

Co-authored-by: Wen Zhou <wenzhou@redhat.com>

* Removed go mod install dependency

* Consolidated codecov workflow into unit tests workflow

---------

Co-authored-by: Wen Zhou <wenzhou@redhat.com>

* webhook: move initialization inside of the module (#1284)

Add webhook.Init() function and hide webhook setup inside of the
module. It will make it possible to replace Init with a NOP (no
operation) with conditional compilation for no-webhook build.

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>

* feat: pass platform from env variable or fall back to use old logic (#1252)

* feat: pass platform from env variables or fall back to use old logic

- introduce new env var ODH_PLATFORM_TYPE, value set during build time
  - if value not match, fall back to detect managed then self-managed
- introduce new env var OPERATOR_RELEASE_VERSION, value also set during build time
  - if value is empty, fall back to use old way from CSV to read version or use 0.0.0
- add support from makefile
  - use envstubst to replace version

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* update: remove release version in the change

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* fix: update release version in DSCI and DSC .status for upgrade case (#1287)

- DSCI: if current version is not matching, update it
- DSC: in both reconcile pass and fail case, update it

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

* Update version to 2.19.0 (#1291)

Co-authored-by: VaishnaviHire <17230536+VaishnaviHire@users.noreply.github.com>

---------

Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Co-authored-by: Wen Zhou <wenzhou@redhat.com>
Co-authored-by: Adrián Sanz Gómiz <100594859+asanzgom@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: VaishnaviHire <17230536+VaishnaviHire@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants