-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the logic for quorum loss scenario #382
Conversation
@abdasgupta Labels area/todo, kind/todo do not exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could only review one file. Will push further review comments tomorrow.
type ClusterMgmtController struct { | ||
client.Client | ||
logger logr.Logger | ||
ImageVector imagevector.ImageVector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like ImageVector
is not used anywhere. Do you plan to use it later?
} | ||
|
||
// SetupWithManager sets up manager with a new controller and cmc as the reconcile.Reconciler | ||
func (cmc *ClusterMgmtController) SetupWithManager(mgr ctrl.Manager, workers int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method not used anywhere. Should ideally be called to register the controller with the manager
} | ||
|
||
// NewClusterMgmtController creates a new ClusterMgmtController object | ||
func NewClusterMgmtController(mgr manager.Manager) *ClusterMgmtController { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function not used anywhere currently
type ClusterMgmtController struct { | ||
client.Client | ||
logger logr.Logger | ||
ImageVector imagevector.ImageVector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImageVector
is not used anywhere. Do you intend to use it later?
// SetupWithManager sets up manager with a new controller and cmc as the reconcile.Reconciler | ||
func (cmc *ClusterMgmtController) SetupWithManager(mgr ctrl.Manager, workers int) error { | ||
|
||
ctrl, err := controller.New(clusterMgmtControllerName, mgr, controller.Options{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor:
There is a more concise way to implement it. Then you do not have to explicitly handle the error returned from controller.New
return ctrl.NewControllerManagedBy(mgr).
WithOptions(controller.Options{
MaxConcurrentReconciles: workers,
}).Watches(
&source.Kind{Type: &coordinationv1.Lease{}},
&handler.EnqueueRequestForOwner{OwnerType: &druidv1alpha1.Etcd{}, IsController: true},
builder.WithPredicates(druidpredicates.IsMemberLease()),
).Complete(cmc)
}, fmt.Errorf("cound not scale up statefulset to replica number : %v", err) | ||
} | ||
|
||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the need of continue
here?
|
||
logger := cmc.logger.WithValues("etcd", kutil.Key(etcd.Namespace, etcd.Name).String()) | ||
|
||
// run a loop every 5 minutes that will monitor the cluster health and take action if members in the etcd cluster are down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is written that the loop runs every 5 mins i could not see the delay between 2 loop runs. If there is no error then this loop can run forever and then there is no chance of processing of the next reconcile request. Maybe i miss something. Please check.
}, fmt.Errorf("cound not fetch statefulset: %v", err) | ||
} | ||
|
||
if _, err := controllerutils.GetAndCreateOrStrategicMergePatch(ctx, cmc.Client, sts, func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally these steps of scaling down, deletion of PVC and scaling up to 1 should in separate functions and one can use flow which is also used in gardener for this.
} | ||
|
||
func getMatchingLabels(sts *appsv1.StatefulSet) map[string]string { | ||
labels := make(map[string]string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use make with initial size. Here you have a fixed size of 2.
func getMatchingLabels(sts *appsv1.StatefulSet) map[string]string { | ||
labels := make(map[string]string) | ||
|
||
labels["name"] = sts.Labels["name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will label name
and instance
be always present in sts
? If not then you should check for existence of key before adding to the new map.
Suggestion:
func getMatchingLabels(sts *appsv1.StatefulSet) map[string]string {
const nameLabelKey = "name"
const instanceLabelKey = "instance"
labels := make(map[string]string, 2)
if v, ok := sts.Labels[nameLabelKey]; ok {
labels[nameLabelKey] = v
}
if v, ok := sts.Labels[instanceLabelKey]; ok {
labels[instanceLabelKey] = v
}
return labels
}
if err := cmc.DeleteAllOf(ctx, &corev1.PersistentVolumeClaim{}, | ||
client.InNamespace(sts.GetNamespace()), | ||
client.MatchingLabels(getMatchingLabels(sts))); err != nil { | ||
return ctrl.Result{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an error during deletion of PVC, then the request will be re-queued after 10s, after 10 seconds you repeat the above step of setting the replicas to 0. This can be avoided as you already have queries the KAPI to get the sts. If the replicas is already 0 then the above step can be skipped.
} | ||
|
||
// scale up the statefulset to ETCD replicas | ||
if _, err := controllerutils.GetAndCreateOrStrategicMergePatch(ctx, cmc.Client, sts, func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous step the scale up is done to 1 replica and immediately after that the scale up is again attempted to be equal to the original etcd replicas. Assuming that the above call only changes the spec in etcd and returns and does not wait for the scale up to complete. Is this intended or should you wait for the first replica to be healthy before scaling up from 1-3?
@abdasgupta You need rebase this pull request with latest master branch. Please check. |
957ab98
to
b127876
Compare
b127876
to
b1fde3a
Compare
b1fde3a
to
1ccf4dd
Compare
1ccf4dd
to
33010f3
Compare
Added a flag in ETCD druid that will not allow Druid to apply quorum-loss annotation in case of quorum-loss. The flag needs to be set true if somebody wants to allow Druid to handle quorum loss automatically |
@@ -108,6 +122,37 @@ func (ec *EtcdCustodian) Reconcile(ctx context.Context, req ctrl.Request) (ctrl. | |||
return ctrl.Result{}, err | |||
} | |||
|
|||
conLength := len(etcd.Status.Conditions) | |||
if conLength > 0 && etcd.Status.Conditions[conLength-1].Reason == "QuorumLost" && etcd.Spec.Replicas > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please iterate through the condition list and find the condition with type Ready
before using it in the if
condition?
Assuming it to be the last condition in the array might be okay now, but future changes might cause this to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should consider any earlier conditions from the list. It might be very disastrous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mean considering any other condition. What I meant is that the Ready
condition that gives Reason == "QuorumLost"
might not always be at conLength-1
and it might be good if possible to parse through all the conditions and make sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose, there is a quorum loss. Action is taken on it and then there is some other condition appeared. If we are parsing the whole list every time then we may pick old quorum loss case which is already taken care of. Is it not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list would have the same conditions. The reasons would change depending on the state of the cluster.
When we recover from quorum loss for instance, the condition would be updated and hence picking up an old quorum loss case will not happen imo
conLength := len(etcd.Status.Conditions) | ||
if conLength > 0 && etcd.Status.Conditions[conLength-1].Reason == "QuorumLost" && etcd.Spec.Replicas > 1 { | ||
logger.Info("Quorum loss detected. Taking measures to fix it.") | ||
if !ec.config.EnableAutomaticQuorumLossHandling { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rather have the rest of this block inside an if ec.config.EnableAutomaticQuorumLossHandling{...}
so that in cases where quorum is lost and the flag is not set, we shouldn't be blocking the rest the code like updateEtcdStatus
and/or other functions from running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we want to update ETCD status if the quorum is lost? If quorum is lost and ETCD status is updated, ETCD controller may start working to fix the ETCD . This will give totally unintended result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay
Fair enough
@@ -73,7 +73,7 @@ func (r *readyCheck) Check(ctx context.Context, etcd druidv1alpha1.Etcd) []Resul | |||
renew := lease.Spec.RenewTime | |||
if renew == nil { | |||
r.logger.Info("Member hasn't acquired lease yet, still in bootstrapping phase", "name", lease.Name) | |||
continue | |||
return []Result{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
Won't returning an empty result set if any lease has an empty RenewTime
result in an empty member list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't get you what you asked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is, let's say the status already had 2 entries in the member list in the etcd status.
Let's say we add a third member and at the start at lease, the new leave will have a nil renewTime. This returns []Result{}
and will result in the member list in the etcd status being completely removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add permissions for etcds
in charts/etcd/templates/etcd-role.yaml
as etcd-backup-restore is now intended to read the etcd resource (ref: here)
33010f3
to
5f5173a
Compare
Like this ? :
|
yes, but afaik we don't need the
|
/hold pending test results |
@abdasgupta I guess we can close this PR ? |
We are closing this PR as we decided to handle quorum loss differently. We have identified that two types of quorum loss can happen in the ETCD clusters. One is transient quorum loss. Our ETCD cluster in control plane can recover automatically from transient quorum loss which involves unavailability of most of the ETCD pods due to network problem, resource problem etc. Another is permanent quorum loss. This involves permanent loss of ETCD data directory. A human operator needs to intervene in that case and recover the ETCD cluster. We are writing a playbook for the human operator to follow. The playbook mainly contains the following steps:
|
How to categorize this PR?
/area control-plane
/kind enhancement
What this PR does / why we need it:
This PR restores a multinode ETCD cluster after the quorum is lost. The steps are like following:
Which issue(s) this PR fixes:
Fixes [Feature] Handle quorum loss scenario by Druid in ETCD multinode #362
Special notes for your reviewer:
Release note: