Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create member, delta/full-snapshot Leases #254

Closed
wants to merge 7 commits into from

Conversation

timuthy
Copy link
Member

@timuthy timuthy commented Nov 4, 2021

How to categorize this PR?

/area control-plane
/kind enhancement

What this PR does / why we need it:
This PR adds member, full snapshot and delta snapshot Leases which are maintained by etcd-backup-restore.

Which issue(s) this PR fixes:
Fixes part of #158

Special notes for your reviewer:
A "component" concept was introduced in this PR which in the long run can replace the chart that we use and instead build the K8s resources within the code to achieve a better test-ability and more control how the client is used to reconcile the objects (e.g. used verb UPDATE vs PATCH) or how merging objects will be implemented. The same approach was taken in gardener/gardener.

Please see this comment for more information.

I changed the role, rolebinding and serviceaccount names along the way (721f412) to more comply with the scheme used in Gardener components (/cc @aaronfern)

Release note:

Etcd-Druid now creates member `Lease` objects which enables the heartbeat functionality for etcd members. Along the way a new flag `--etcd-member-unknown-threshold` was introduced. It determines the duration after which a etcd member's state is considered `unknown` when the member `Lease` is not renewed.
Etcd-Druid now creates delta-snapshot and full-snapshot `Lease` objects if backups are enabled. This is necessary in order to run compaction `Job`s.

@timuthy timuthy requested a review from a team as a code owner November 4, 2021 16:46
@gardener-robot gardener-robot added area/control-plane Control plane related kind/enhancement Enhancement, improvement, extension needs/review Needs review size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) needs/second-opinion Needs second review by someone else labels Nov 4, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 4, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Nov 4, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 5, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 5, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 5, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 5, 2021
Copy link
Contributor

@stoyanr stoyanr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment, I am a bit intimidated by the Etcd component that is introduced in this PR. Yes, it's fairly small now, but it already makes a commitment to refactoring the Etcd controller in a similar fashion in the future, which is a rather huge refactoring. Until this done, the unfinished Etcd component will stick out badly and beg for attention.

Also, the way components are used here is markedly different from the way they are used in g/g. Here, the Etcd component is basically taking over part of the Reconcile. In g/g, components are responsible for deploying / destroying / waiting upon specific objects such as extensions, dns resources, k8s resources such as deployments and services, etc. that have to be deployed / destroyed as part of the very complex shoot reconciliation.

I would propose a somewhat different approach that is equally good (or better) in terms of modularity / testability but avoids making such commitments, and is also much more similar to the way components are used in g/g. Instead of an Etcd component, we could introduce a Lease component to actually create / update / delete an individual lease, and perhaps a MemberLease to capture member lease specifics, as well as helper functions such as DeployOrDestroyLease and DeployOrDestroyMemberLeases with the logic currently in the respective Etcd methods. The controller will then call these functions (in the future, perhaps as part of a flow, as in the g/g shoot controller).

Comment on lines 29 to 32
// GetDeltaSnapshotLeaseName returns the `Lease` name for delta snapshots.
GetDeltaSnapshotLeaseName() string
// GetDeltaSnapshotLeaseName returns the `Lease` name for full snapshots.
GetFullSnapshotLeaseName() string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these methods should be removed. Instead, DeltaSnapshotLeaseName and FullSnapshotLeaseName could be added to Values, set together with other Values fields and used wherever needed. This is also the approach used in most g/g components. Get methods should be used only when it's not possible to know a value in advance, e.g. you need to actually deploy an object in order to get a value from its status.

}
}

func (e *etcd) reconcileLease(ctx context.Context, lease *coordinationv1.Lease, enabled bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not reconciling a Lease object, just creating, updating, or deleting it (as part of the reconcilation of an Etcd object). The actual reconciliation of a Lease will be performed by the respective controller. Let's use a name that matches more closely what this method is doing. I propose deployOrDestroy here though I guess deploy, update or createOrUpdate would also be ok - but not reconcile which has a very different meaning and is therefore confusing.

Suggested change
func (e *etcd) reconcileLease(ctx context.Context, lease *coordinationv1.Lease, enabled bool) error {
func (e *etcd) deployOrDestroyLease(ctx context.Context, lease *coordinationv1.Lease, enabled bool) error {

return err
}

func (e *etcd) reconcileMemberLease(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not reconciling a Lease object, it's creating, updating, or deleting multiple Lease objects (as part of the reconcilation of an Etcd object). The actual reconciliation of the Leases will be performed by the respective controller. Let's use a name that matches more closely what this method is doing. I propose deployOrDestroy here though I guess deploy, update or createOrUpdate would also be ok - but not reconcile which has a very different meaning and is therefore confusing.

Suggested change
func (e *etcd) reconcileMemberLease(ctx context.Context) error {
func (e *etcd) deployOrDestroyMemberLeases(ctx context.Context) error {

@@ -1206,6 +1218,10 @@ func (r *EtcdReconciler) getMapFromEtcd(etcd *druidv1alpha1.Etcd) (map[string]in
etcdValues["etcdDefragTimeout"] = etcd.Spec.Etcd.EtcdDefragTimeout
}

if etcd.Spec.Etcd.MemberHeartbeat != nil {
etcdValues["heartbeatDuration"] = etcd.Spec.Etcd.MemberHeartbeat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could we have matching names here, e.g. either both "heartbeatDuration" or both "memberHeartbeat" (or "memberHeartbeatDuration")?

return fmt.Sprintf("full-snapshot-%s", e.values.EtcdName)
}

func (e *etcd) Deploy(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add doc comments on these exported methods? (Deploy, Destroy, etc.)

Comment on lines 71 to 73
func (e *etcd) Destroy(_ context.Context) error {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Destroy not implemented? Shouldn't it delete all the leases deployed by Deploy?

@timuthy
Copy link
Member Author

timuthy commented Nov 12, 2021

I will continue working on this PR after the v0.7.0 release and further modularize as suggested by @stoyanr.

@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 25, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 25, 2021
@timuthy
Copy link
Member Author

timuthy commented Nov 25, 2021

Let's continue with #262 which I filed because of the many changes in the code base that happened in the meantime. @stoyanr I addressed your comments there. Please let me know if it goes into the intended direction.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Control plane related kind/enhancement Enhancement, improvement, extension needs/changes Needs (more) changes needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/review Needs review needs/second-opinion Needs second review by someone else size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants