You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Record the stream's last compaction timestamp, and compare on startup. Base initial delay on delta. This ensures that a periodically restarting stream partition (which shouldn't happen) doesn't miss compaction.
Per partition compaction based on id+source.
Should we preserve only the latest? Or should we preserve the original only as we know it is a duplicate event according to the CloudEvents1.0?
Should we just not worry about this at all? Maybe there is no value add given other compaction strategies and the fact that this does not actually guarantee that there will never be duplicate events across partitions?
Not going to implement this bit, as we gain little benefit and sacrifice performance a bit on the write path.
Per partition timestamp based truncation. As events pass out of the TTL threshold, events are truncated.
Event batches should have a timestamp written into a secondary index for the last event's offset per batch.
Stream CRDs should be updated to include a compaction policy sub-structure. Users should be able to specify the retention policy; only time-based retention is currently supported.
Stream controller should check the earliest value in the timestamp secondary index, and when the time elapses the configured retention policy, it should spawn a task to prune the old data.
Update operator to pass along retention policy data to stream statefulsets.
Observations: for any system which does not have transactional integration directly with the full Hadron Stream, there is no way to guard against duplicate re-processing other than the transactional processing model.
Pipelines
Pipeline stage data should be deleted once the entire Pipeline instance is complete.
Pipelines need to transactionally copy their root event to account for cases where the root event may be compacted away. Or we could to a pipeline offsets check first before we compact a range.
The text was updated successfully, but these errors were encountered:
Compaction routine is now well-tested. Woot woot!
Operator has been updated to pass along retention policy config to
stream.
Updated deps across all components.
closes#99
Compaction routine is now well-tested. Woot woot!
Operator has been updated to pass along retention policy config to
stream.
Updated deps across all components.
closes#99
Streams
Per partition compaction based on id+source.Observations: for any system which does not have transactional integration directly with the full Hadron Stream, there is no way to guard against duplicate re-processing other than the transactional processing model.
Pipelines
The text was updated successfully, but these errors were encountered: