Impl compaction & TTL system for Streams & Pipelines #99

thedodd · 2021-09-28T02:18:39Z

Streams

Record the stream's last compaction timestamp, and compare on startup. Base initial delay on delta. This ensures that a periodically restarting stream partition (which shouldn't happen) doesn't miss compaction.
~~Per partition compaction based on id+source.~~
- Should we preserve only the latest? Or should we preserve the original only as we know it is a duplicate event according to the CloudEvents1.0?
- Should we just not worry about this at all? Maybe there is no value add given other compaction strategies and the fact that this does not actually guarantee that there will never be duplicate events across partitions?
- Not going to implement this bit, as we gain little benefit and sacrifice performance a bit on the write path.
Per partition timestamp based truncation. As events pass out of the TTL threshold, events are truncated.
- Event batches should have a timestamp written into a secondary index for the last event's offset per batch.
- Stream CRDs should be updated to include a compaction policy sub-structure. Users should be able to specify the retention policy; only time-based retention is currently supported.
- Stream controller should check the earliest value in the timestamp secondary index, and when the time elapses the configured retention policy, it should spawn a task to prune the old data.
Update operator to pass along retention policy data to stream statefulsets.

Observations: for any system which does not have transactional integration directly with the full Hadron Stream, there is no way to guard against duplicate re-processing other than the transactional processing model.

Pipelines

Pipeline stage data should be deleted once the entire Pipeline instance is complete.
Pipelines need to transactionally copy their root event to account for cases where the root event may be compacted away. Or we could to a pipeline offsets check first before we compact a range.

Compaction routine is now well-tested. Woot woot! Operator has been updated to pass along retention policy config to stream. Updated deps across all components. closes #99

thedodd added A-crd Hadron K8s CRDs A-streams Hadron server streams labels Sep 28, 2021

thedodd mentioned this issue Sep 28, 2021

Docs, README & RFC cleanup #67

Closed

11 tasks

thedodd added the T-needs-design Needs additional design work label Oct 8, 2021

thedodd changed the title ~~Impl compaction & TTL system for Streams~~ Impl compaction & TTL system for Streams & Pipelines Oct 11, 2021

thedodd added a commit that referenced this issue Nov 5, 2021

Implement compaction

ad08d59

Compaction routine is now well-tested. Woot woot! Operator has been updated to pass along retention policy config to stream. Updated deps across all components. closes #99

thedodd mentioned this issue Nov 5, 2021

Compaction foundations #109

Merged

thedodd removed the T-needs-design Needs additional design work label Nov 10, 2021

thedodd closed this as completed in #109 Nov 10, 2021

thedodd added a commit that referenced this issue Nov 10, 2021

Implement compaction

6e9f494

Compaction routine is now well-tested. Woot woot! Operator has been updated to pass along retention policy config to stream. Updated deps across all components. closes #99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impl compaction & TTL system for Streams & Pipelines #99

Impl compaction & TTL system for Streams & Pipelines #99

thedodd commented Sep 28, 2021 •

edited

Loading

Impl compaction & TTL system for Streams & Pipelines #99

Impl compaction & TTL system for Streams & Pipelines #99

Comments

thedodd commented Sep 28, 2021 • edited Loading

Streams

Pipelines

thedodd commented Sep 28, 2021 •

edited

Loading