-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster_config: init cluster variables on startup #1059
Conversation
Skipping CI for Draft Pull Request. |
27b2ff0
to
506570d
Compare
506570d
to
996c598
Compare
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
996c598
to
c7f423f
Compare
/cc @lburgazzoli |
c7f423f
to
20a6860
Compare
fad47e8
to
1e7d1fe
Compare
/test opendatahub-operator-e2e |
@@ -22,13 +23,68 @@ import ( | |||
// +kubebuilder:rbac:groups="config.openshift.io",resources=ingresses,verbs=get | |||
|
|||
type Platform string | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change :( But will not update until other changes needed due to the test instability
return err | ||
} | ||
|
||
printClusterConfig(log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we need a separate function for adding a log? Or are we planning to use it more widely later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was changing format in the meantime and did not want to pollute the main function with the details. But if you like I can inline it.
LGTM |
func Init(ctx context.Context, cli client.Client, log logr.Logger) error { | ||
var err error | ||
|
||
clusterConfig.Namespace, err = getOperatorNamespace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR, but we could end with a "" as clusterConfig.Namespace
's value
which is the main cause for err != nil
should this not be fatal => os.Exit(1) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in configureAlertManager and in watchMonitoringSecretResource the error causes return. In other places where the value compared "" does not change the flow. I did not revisit callees' logic. It may make sense.
should this not be fatal => os.Exit(1) ?
it's not fatal. Do you mean it should be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am thinking it should be fatal.
for this comments // not fatal, fallback to ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am thinking it should be fatal. for this comments
// not fatal, fallback to ""
It changes behaviour. Mean, we continued to work if no environment variable or cannot read namespace from the filesystem, but if it's fatal we don't.
On the other hand with the environment variable corner cases (like tests) are covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand.
- env
- check /var/...
- log as error but continue
but the consequence on (3) will leave to later problem, so if it should os.Exit(1) directly in main process or not.
e.g i can understand someone build the whole product without having (1) set, but the reason can skip (2) is something really wrong with the pod.
but this is a question out of the scope of your PR.
the "" has been there, we do not need to change anything here
"Release", clusterConfig.Release) | ||
} | ||
|
||
func GetOperatorNamespace() (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so
getOperatorNamespace()
call with real logic and return retrieve value
GetOperatorNamespace(...)
return clusterConfig.Namespace
similar to
getPlatform()
and getRelease()
call logic return value
GetPlatform(...)
and GetRelease(...)
return filed's value
GetRelease(...) indirectly rely on getPlatform()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1e7d1fe
to
a579e36
Compare
8994b24
to
13c401a
Compare
Can I get review before it needs rebase again? :) Added change with preserving rootCtx till mgr.Start. |
main.go
Outdated
@@ -127,7 +128,8 @@ func main() { //nolint:funlen,maintidx | |||
ctrl.SetLogger(logger.ConfigLoggers(logmode)) | |||
|
|||
// root context | |||
ctx := ctrl.SetupSignalHandler() | |||
rootCtx := ctrl.SetupSignalHandler() | |||
ctx := logf.IntoContext(rootCtx, setupLog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure i understand why for this change:
so the old code is, using rootCtx everywhere
the new code is, pass the new ctx which is context.WithValue(rootCtx, contextKey{}, setupLog)
instead except to start manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new code should pass the original context without setuplog since I found it a bit incorrect to pass log.WithName("setup") further. Did I do a mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so at the beginning, we do setupLog = ctrl.Log.WithName("setup")
then later e.g
if err = (&dscictrl.DSCInitializationReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Log: logger.LogWithLevel(ctrl.Log.WithName(operatorName).WithName("controllers").WithName("DSCInitialization"), logmode),
Recorder: mgr.GetEventRecorderFor("dscinitialization-controller"),
ApplicationsNamespace: dscApplicationsNamespace,
}).SetupWithManager(ctx, mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "DSCInitiatlization")
os.Exit(1)
}
call with setupLog when problem to start contoller.
context is not used for for logging "unable to create controller", "controller", "DSCInitiatlization".
or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean ctx in .SetupWithManager(ctx, mgr)? It is not used at all IIUC, I planned patch to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but what is "the new code should pass the original context without setuplog"?
because from this change, except the mgr.Start(rootCtx
) is kept to use the root context (regardless what the name of variable is now), everywhere else is using the new one (logf.IntoContext(rootCtx, setupLog)). so i think the way e.g upgrade.CreateDefaultDSCI(ctx,...)
is starting to have "setup" with this PR vs it did not before the PR? which is "the new code should pass the original context with setuplog" 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore my last comment :) Let me think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So upgrade.CreateDefaultDSCI(ctx,...)
as executed by manager after Start()
will have ctx
without "setup" logger. Basically, what it uses now getting logger directly from the controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, controller framework passes its own logger with the context, so it's not needed to save rootCtx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zdtsw I reverted that change. Any more concerns?
bb2c5c3
to
e604740
Compare
Just a bit more clarity Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Cluster configuration is supposed to be the same during operator pod lifetime. There is no point to detect it on every reconcile cycle keeping in mind that it can causes many api requests. Do the initialization on startup. Keep GetOperatorNamespace() returning error since it defines some logic in the DSCInitialization reconciler. Automatically called init() won't work here due to need to check error of the initialization. Wrap logger into context and use it in the Init() like controller-runtime does [1][2]. Save root context without the logger for mgr.Start since "setup" logger does not fit normal manager's work. Leave GetDomain() as is since it uses OpenshiftIngress configuration which is created when DSCInitialization instantiated. Log cluster configuration on success from Init, so remove platform logging from main. [1] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L297 [2] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L111 Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
GetRelease return values are defined at startup, the error checked in main, so no point to return error anymore. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
e604740
to
f8d6e88
Compare
/retest-required |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: zdtsw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* cluster_config: init cluster variables on startup (#1059) * cluster_config: move type definitions to the beginning of the file Just a bit more clarity Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * cluster_config: init cluster variables on startup Cluster configuration is supposed to be the same during operator pod lifetime. There is no point to detect it on every reconcile cycle keeping in mind that it can causes many api requests. Do the initialization on startup. Keep GetOperatorNamespace() returning error since it defines some logic in the DSCInitialization reconciler. Automatically called init() won't work here due to need to check error of the initialization. Wrap logger into context and use it in the Init() like controller-runtime does [1][2]. Save root context without the logger for mgr.Start since "setup" logger does not fit normal manager's work. Leave GetDomain() as is since it uses OpenshiftIngress configuration which is created when DSCInitialization instantiated. Log cluster configuration on success from Init, so remove platform logging from main. [1] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L297 [2] https://github.com/kubernetes-sigs/controller-runtime/blob/38546806f2faf5973e3321a7bd5bb3afdbb5767d/pkg/internal/controller/controller.go#L111 Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * cluster: do not return error from GetRelease GetRelease return values are defined at startup, the error checked in main, so no point to return error anymore. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> --------- Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * components: move params.env image updating to Init stage (#1191) * components, main: add Component Init method Add Init() method to the Component interface and call it from main on startup. Will be used to move startup-time code from ReconcileComponent (like adjusting params.env). Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * components: move params.env image updating to Init stage Jira: https://issues.redhat.com/browse/RHOAIENG-11592 Image names in environment are not supposed to be changed during runtime of the operator, so it makes sense to update them only on startup. If manifests are overriden by DevFlags, the DevFlags' version will be used. The change is straight forward for most of the components where only images are updated and params.env is located in the kustomize root directory, but some components (dashboard, ray, codeflare, modelregistry) also update some extra parameters. For them image part only is moved to Init since other updates require runtime DSCI information. The patch also changes logic for ray, codeflare, and modelregistry in this regard to update non-image parameters regardless of DevFlags like it was changed in dashboard recently. The DevFlags functionality brings some concerns: - For most components the code is written such a way that as soon as DevFlags supplied the global path variables are changed and never reverted back to the defaults. For some (dashboard, trustyai) there is (still global) OverridePath/entryPath pair and manifests reverted to the default, BUT there is no logic of transition. - codeflare: when manifests are overridden namespace param is updated in the hardcoded (stock) path; This logic is preserved. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> --------- Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * update: remove webhook service from bundle (#1275) - we do not need it in bundle, CSV auto generate one during installation - if we install operator via OLM, webhook service still get used from config/webhook/service.yaml Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: add validation on application and monitoring namespace in DSCI (#1263) Signed-off-by: Wen Zhou <wenzhou@redhat.com> * logger: add zap command line switches (#1280) Allow to tune preconfigured by --log-mode zap backend with standard zap command line switches from controller-runtime (zap-devel, zap-encoder, zap-log-level, zap-stacktrace-level, zap-time-encoding)[1]. This brings flexibility in logger setup for development environments first of all. The patch does not change default logger setup and does not change DSCI override functionality. [1] https://sdk.operatorframework.io/docs/building-operators/golang/references/logging Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * Modify Unit Tests GitHub Actions workflow to get code coverage test reports (#1279) * Create codecov.yml * Added to run test coverage also on PRs * Removing trailing ] * Update .github/workflows/codecov.yml Co-authored-by: Wen Zhou <wenzhou@redhat.com> * Removed go mod install dependency * Consolidated codecov workflow into unit tests workflow --------- Co-authored-by: Wen Zhou <wenzhou@redhat.com> * webhook: move initialization inside of the module (#1284) Add webhook.Init() function and hide webhook setup inside of the module. It will make it possible to replace Init with a NOP (no operation) with conditional compilation for no-webhook build. Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> * feat: pass platform from env variable or fall back to use old logic (#1252) * feat: pass platform from env variables or fall back to use old logic - introduce new env var ODH_PLATFORM_TYPE, value set during build time - if value not match, fall back to detect managed then self-managed - introduce new env var OPERATOR_RELEASE_VERSION, value also set during build time - if value is empty, fall back to use old way from CSV to read version or use 0.0.0 - add support from makefile - use envstubst to replace version Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: remove release version in the change Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: update release version in DSCI and DSC .status for upgrade case (#1287) - DSCI: if current version is not matching, update it - DSC: in both reconcile pass and fail case, update it Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update version to 2.19.0 (#1291) Co-authored-by: VaishnaviHire <17230536+VaishnaviHire@users.noreply.github.com> --------- Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Yauheni Kaliuta <ykaliuta@redhat.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Adrián Sanz Gómiz <100594859+asanzgom@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: VaishnaviHire <17230536+VaishnaviHire@users.noreply.github.com>
Cluster configuration is supposed to be the same during operator pod
lifetime. There is no point to detect it on every reconcile cycle
keeping in mind that it can causes many api requests.
Do the initialization on startup. Keep GetOperatorNamespace()
returning error since it defines some logic in the DSCInitialization
reconciler.
Automatically called init() won't work here due to need to check
error of the initialization.
Leave GetDomain() as is since it uses OpenshiftIngress configuration
which is created when DSCInitialization instantiated.
Log cluster configuration on success.
Many thanks for submitting your Pull Request ❤️!
Please make sure that your PR meets the following requirements:
Description
JIRA issue:
How Has This Been Tested?
Screenshot or short clip:
Merge criteria: