You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In an eventually consistent system, it may be impossible to guarantee the order in which requests are stored. This can be a problem when trying to maintain a consistent state between a source and a target where the same traffic pattern is replayed. Although high-frequency updates may not occur in the expected order on the source, you can still guarantee that the state of the document is the same on both the source and target. This can be accomplished by using mechanisms to ensure that incoming updates are only written if the document currently stored is older than the incoming document. While some applications implement this logic within their codebase, with capture and replay, it would be possible to replay the original request with a transform that guarantees consistent state between the source and the target.
What solution would you like?
We propose adding a transformation feature within the capture and replay tool that verifies the timestamp of incoming updates against the stored version of the document. The transformation would ensure that an update is only applied if the timestamp of the incoming request is newer than the existing version on the target cluster. This would help maintain a consistent document state between the source and target, even in scenarios with high-frequency updates or out-of-order requests.
The solution could leverage OpenSearch’s existing document metadata and timestamp features, adding logic in the replay phase to enforce order based on timestamps.
What alternatives have you considered?
Custom Application Logic: Applications could be modified to include timestamp-based checks or version control directly in the codebase. However, this approach requires developers to write and maintain custom code, and it doesn’t easily extend to scenarios involving legacy systems or third-party applications.
Eventual Consistency Tuning: Alternatively, users could tune consistency settings within their clusters to reduce the impact of out-of-order requests. However, this often introduces trade-offs with latency and scalability, and may not resolve all cases of out-of-order updates.
Do you have any additional context?
In OpenSearch, the _update API can include checks for conditions such as timestamps or version numbers, allowing for precise control over when an update is applied. This type of control is valuable in maintaining data consistency during capture and replay migrations.
For example, the following script can be used in OpenSearch to update a document only if the incoming timestamp is newer:
This example shows how OpenSearch users today can use scripting within the _update API to manage document versions based on timestamps. A similar approach can be integrated into the capture and replay transformation logic to achieve consistent states across clusters.
Examples
In OpenSearch today, users often address this issue by employing scripting within the _update API to enforce constraints like timestamps or version numbers. For instance, using a script to compare timestamps before applying an update helps ensure that only the latest data is written, thereby maintaining consistency.
Another technique is to leverage OpenSearch’s optimistic concurrency control using the if_seq_no and if_primary_term parameters to control updates based on the document’s sequence number and primary term. This method is commonly used to prevent conflicts when updating documents concurrently.
Summary
Adding this feature to the OpenSearch Migrations repository would help automate the enforcement of consistency rules during migrations. This is particularly beneficial when replaying captured traffic from an Elasticsearch source to an OpenSearch target, where differences in consistency guarantees or order of updates can lead to divergent document states.
The text was updated successfully, but these errors were encountered:
sumobrian
changed the title
[FEATURE] Ensure Consistent Document State During Replay with Conditional Update Transformations
Ensure Consistent Document State During Replay with Conditional Update Transformations
Oct 23, 2024
Is your feature request related to a problem?
In an eventually consistent system, it may be impossible to guarantee the order in which requests are stored. This can be a problem when trying to maintain a consistent state between a source and a target where the same traffic pattern is replayed. Although high-frequency updates may not occur in the expected order on the source, you can still guarantee that the state of the document is the same on both the source and target. This can be accomplished by using mechanisms to ensure that incoming updates are only written if the document currently stored is older than the incoming document. While some applications implement this logic within their codebase, with capture and replay, it would be possible to replay the original request with a transform that guarantees consistent state between the source and the target.
What solution would you like?
We propose adding a transformation feature within the capture and replay tool that verifies the timestamp of incoming updates against the stored version of the document. The transformation would ensure that an update is only applied if the timestamp of the incoming request is newer than the existing version on the target cluster. This would help maintain a consistent document state between the source and target, even in scenarios with high-frequency updates or out-of-order requests.
The solution could leverage OpenSearch’s existing document metadata and timestamp features, adding logic in the replay phase to enforce order based on timestamps.
What alternatives have you considered?
Do you have any additional context?
In OpenSearch, the
_update
API can include checks for conditions such as timestamps or version numbers, allowing for precise control over when an update is applied. This type of control is valuable in maintaining data consistency during capture and replay migrations.For example, the following script can be used in OpenSearch to update a document only if the incoming timestamp is newer:
This example shows how OpenSearch users today can use scripting within the _update API to manage document versions based on timestamps. A similar approach can be integrated into the capture and replay transformation logic to achieve consistent states across clusters.
Examples
In OpenSearch today, users often address this issue by employing scripting within the _update API to enforce constraints like timestamps or version numbers. For instance, using a script to compare timestamps before applying an update helps ensure that only the latest data is written, thereby maintaining consistency.
Another technique is to leverage OpenSearch’s optimistic concurrency control using the
if_seq_no
andif_primary_term
parameters to control updates based on the document’s sequence number and primary term. This method is commonly used to prevent conflicts when updating documents concurrently.Summary
Adding this feature to the OpenSearch Migrations repository would help automate the enforcement of consistency rules during migrations. This is particularly beneficial when replaying captured traffic from an Elasticsearch source to an OpenSearch target, where differences in consistency guarantees or order of updates can lead to divergent document states.
The text was updated successfully, but these errors were encountered: