This repository has been archived by the owner on Dec 13, 2023. It is now read-only.
Duplicate data in external storage #3384
Closed
SimonMisencik
started this conversation in
Ideas
Replies: 1 comment 3 replies
-
@SimonMisencik Do you have a reason to not hash the payload on
classDiagram
class Payload
Payload: +Id id
Payload: +Blob data
Payload: +Timestamp createdOn
class PayloadHash
PayloadHash: +Id id
PayloadHash: +String hash
PayloadHash: +String path
PayloadHash: +Payload payload
PayloadHash: +Timestamp createdOn
Payload "1" <.. "0..n" PayloadHash
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
One of the weaknesses of external storage is that it stores a lot of duplicate data, so it would be good to save the data only once to save storage space.
I implemented this functionality here:
This implementation calculates a digest(sha-256 in this case) from payloadBytes and uses it as the id of the record in the database. If data with the same digest already exists in the database, only created_on is updated to avoid deletion during the cleaning of old data.
Beta Was this translation helpful? Give feedback.
All reactions