-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add workspace phase update logic #3183
✨ Add workspace phase update logic #3183
Conversation
b1944b3
to
08b37fe
Compare
e6b7c5c
to
9b951a7
Compare
@sttts @embik so this might need a discussion. In addition, it's very confusing (I found this from running this in my fork for some time already), if mountpoint goes down due to network, dead agent, you name it... And Workspace still shows The solution I proposed was to add a new phase - Even more, not even serve these workspaces at proxy layer if they are This allows some third-party component, or in our case, a mount controller, to inject a condition of its own based on those changes to the workspace So I think not to go further from the original enhancement https://github.com/kcp-dev/enhancements/pull/6/files#diff-716b46559ae0795860e35aebf72fce98bdf288d935756a40b917410386e10870R265 In addition to avoid making the workspace unavailable for some non-terminal conditions, only Conditions with prefix |
cfc160f
to
fce06d3
Compare
fce06d3
to
b36adce
Compare
I think we have to distinguish between being switched to the new phase for some hard reason, like maintenance, versus soft reasons like network issues, or when the agent is not reachable. The former is definitely a phase, the later is not. There are many actors potentially who might experience problems to reach a mount. Think of some HA env where one component thinks the mount is down and the other side thinks it is up. If one of them can mark the mount unavailable for all components, that amplifies errors. |
I think final result of this is that author needs to decide if accessing workspace would be allowed or not. So all and all, is there anything you suggest to change in this proposed implementation? If document, would our public docs would be enough?
I think this now for implemented to deal with. But its authors ownership to make decision. |
"Unavailable" as a phase seems to cover the soft reasons too. I want to discourage an implementor to implement health checks in some controller and then set that phase indirectly through a condition. Note: a condition from a health check is fine. Maybe "Inaccessible" is better? Or leave it with Unavailable but be very clear in the docs that this is not meant for health checks. |
690c6bd
to
0db20f1
Compare
/retest |
// When we promote this to workspace structure, we should make this check smarter and better tested. | ||
if (cluster.String() == ws.Spec.Cluster) && (ws.Annotations[tenancyv1alpha1.ExperimentalWorkspaceMountAnnotationKey] != "" && mountObjString == ws.Annotations[tenancyv1alpha1.ExperimentalWorkspaceMountAnnotationKey]) { | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were "overoptimization". It backfired when now condition and annotations is being updated (in 2 different reconcile cycles due to not try to update both in one) where annotations gets updated and now this prevents it to move forward and update error codes cache.
Overengineering from my side :/
// This should be simplified once we promote this to workspace structure. | ||
clusterWorkspaceMountAnnotation: map[logicalcluster.Name]map[string]string{}, | ||
shardClusterWorkspaceMountAnnotation: map[string]map[logicalcluster.Name]map[string]string{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have few duplicate ones here. We could do some more nicer abstraction here.
Most data is shard
-> cluster
-> workspace
-> data
- so some in-memory store mini-engine could help. Now its very hard to make changes where and make sure we clean, add, and update when we need to.
Something to think of during insomnia nights.
/retest |
480cd96
to
8c40ccb
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to look fine for now to me, I agree with @sttts that we should clearly document expectations for implementors once we have this stabilised a bit and are ready for people to integrate with it.
LGTM label has been added. Git tree hash: cea8ad2dfbed0a3dd9f3ee892695151f0f888cc5
|
// LogicalClusterPhaseUnavailable phase is used to indicate that the logical cluster us unavailable to be used. | ||
// It will will not be served via front-proxy when in this state. | ||
// Possible state transitions are from Ready to Unavailable and from Unavailable to Ready. | ||
LogicalClusterPhaseUnavailable LogicalClusterPhaseType = "Unavailable" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we have to be very clear that this is NOT a phase for temporary unavailability backed by e.g. some probing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
pkg/reconciler/tenancy/workspacemounts/workspacemounts_reconcile_updater.go
Outdated
Show resolved
Hide resolved
if current.Status == v1.ConditionTrue { | ||
return reconcileStatusContinue, nil | ||
} | ||
conditions.MarkTrue(workspace, tenancyv1alpha1.MountConditionReady) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this and the if above the same?
pkg/reconciler/tenancy/workspacemounts/workspacemounts_reconcile_updater.go
Outdated
Show resolved
Hide resolved
I think we might need to move to WorkspaceType before this but agree. |
// It returns true if the phase was changed, false otherwise. | ||
func updateTerminalConditionsAndPhase(workspace *tenancyv1alpha1.Workspace) bool { | ||
func terminalConditionPhase(workspace *tenancyv1alpha1.Workspace) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really what I meant. Either call it terminalConditionPhase
and make it side-effect-free (i.e. returning the phase string), or updateTerminalConditionPhase
(without the "And").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking second one :) As reconcilers already playing a lot with pointers and editing existing objects - this just felt natural.
7a42025
to
141f375
Compare
/lgtm |
LGTM label has been added. Git tree hash: 14964f76c011b2238bdad778ec1cf3c71441424e
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sttts The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
Add a reconciler to switch workspaces phase to "Unavailable" if any workspace conditions are not ready.
This way, if some of the auxiliary conditions are
Unavailable
, the phase will change.Adds mounts controller to propagate status to the workspace when mount is not ready. This way we don't use the workspace at all if the mount is not responding.
Adds code to index/proxy to not serve workspaces which are in
Unavailable
state, so we avoid using workspaces with errorous backends.partially based on kcp-dev/enhancements#6
Related issue(s)
Fixes #
Release Notes