-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Better garbage collection of snapshots #621
Comments
/assign @seshachalam-yv |
@gardener/etcd-druid-maintainers, I would like to propose an enhancement to the current
|
Thanks for the neat write-up @seshachalam-yv.
Shall we also capture what would be a good value for these values like for e.g. keeping delta snapshots for 14 days.
I've always found this difficult to comprehend, can you may be also attach a sample of how this will look for a week's period. Just placeholders in the folder structure as defined for blob store.
What's the usecase for this? As mostly PITR is not possible so having more backups here doesn't really help much, given the auto-restoration from last known good delta/full snapshot. Nit Pick |
@seshachalam-yv retention algorithm (current) is quite complex. I would really like to get the reasoning (maybe there is sound reasoning) which made us make this simple thing so complicated. Is it possible for you to provide reasoning for why is the following required (this will add context/background on why things are the way they are):
This only increases code complexity for sure. Since we are re-looking at this topic, lets take this opportunity to understand the use cases that require such complication and if there are no use-cases (which in turn means that its just technical-complexity-debt) then lets simplify this so that understanding, maintaining and consuming this becomes quite easy |
I understand the concept might initially seem difficult to grasp. For illustration, let's consider a time period from gantt
dateFormat YYYY-MM-DD HH:mm
axisFormat %Y-%m-%d-%H:%M
title Garbage collection (GC) Exponential Policy
section Full Snapshots
full 1 :crit, full1, 2023-06-11 11:00, 15m
full 2 :crit, full2, after full1, 15m
full 3 :crit, full3, after full2, 15m
full 4 :active, full4, after full3, 15m
[Retain] Most recent full snapshot from (11-12) - full 4: retain,2023-06-11 11:00, 1h
full 5 :crit, full5, after full4, 15m
full 6 :crit, full6, after full5, 15m
full 7 :crit, full7, after full6, 15m
full 8 :active, full8, after full7, 15m
[Retain] Most recent full snapshot from (12-13) - full 8: retain, after full4, 1h
full 9 :active, full9, after full8, 15m
full 10 :active, full10, after full9, 15m
full 11 :active, full11, after full10, 15m
full 12 :active, full12, after full11, 15m
[Retain] All full snapshots from current hour (13-14): retain, after full8, 1h
fullSnapshotRetentionPeriod is 31 days: fullSnapshotRetentionPeriod, 2023-06-11 11:00, 3h
In line with our policy, only the latest full snapshot,
Snapshots
Snapshot
Thus, To clarify our terminology:Latest Snapshot: In our example, Most Recent Snapshot: The most recent Current Hour: All full snapshots from the current hour are retained. Here, the current hour is 13:00-14:00, so This example covers a span of |
Let's consider the same time period from gantt
dateFormat YYYY-MM-DD HH:mm
axisFormat %Y-%m-%d-%H:%M
title Garbage collection (GC) Exponential Policy (fullSnapshotRetentionPeriod = 2h)
section Full Snapshots
full 1 :crit, full1, 2023-06-11 11:00, 15m
full 2 :crit, full2, after full1, 15m
full 3 :crit, full3, after full2, 15m
full 4 :crit, full4, after full3, 15m
GC [deleted] all snapshots older than fullSnapshotRetentionPeriod : deleted, 2023-06-11 11:00, 1h
full 5 :crit, full5, after full4, 15m
full 6 :crit, full6, after full5, 15m
full 7 :crit, full7, after full6, 15m
full 8 :active, full8, after full7, 15m
[Retain] Most recent full snapshot from (12-13) - full 8: retain, after full4, 1h
full 9 :active, full9, after full8, 15m
full 10 :active, full10, after full9, 15m
full 11 :active, full11, after full10, 15m
full 12 :active, full12, after full11, 15m
[Retain] All full snapshots from current hour (13-14): retain, after full8, 1h
fullSnapshotRetentionPeriod is now 2 hours: fullSnapshotRetentionPeriod, after full4, 2h
In this case, the snapshots
However, snapshots |
Feature (What you would like to be added):
Better garbage collection of snapshots.
Motivation (Why is this needed?):
Current garbage collection policy of
Exponential
is hard-coded to a certain schedule of full and delta snapshots. This does not work well for full snapshot schedules configured differently than the expected schedule of "once per hour" or "once per day". Additionally, delta snapshots are retained for only the past 24 hours. This needs to be made configurable so that delta snapshots can be persisted for a longer period, enough for operators to debug any possible issues/bugs in productive environments.Etcd
CRD to IncludeDeltaSnapshotRetentionPeriod
Field etcd-druid#649DeltaSnapshotRetentionPeriod
Flag toetcd-backup-restore
etcd-druid#650The text was updated successfully, but these errors were encountered: