forked from basho/riak_kv
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nhse d34 nhskv.i33 token #35
Open
martinsumner
wants to merge
37
commits into
nhse-develop-3.4
Choose a base branch
from
nhse-d34-nhskv.i33-token
base: nhse-develop-3.4
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Start a token manager on each node. The token manager will on request register a session against a token, and remove the registration when the session terminates. If another request is received for the same token, it will be queued until the previous session releases the token (or the queued session dies). There is a session server that can be started to run a session (which request a token from the local token_manager on startup), and will allow for the session to be used by making calls to riak_client functions. Each call will renew the session. After token timeout, the session will terminate.
Allow, by configuration, for the conditional check to be strengthened to use the new token-based mechanism
Means that if a topology change occurs without node failure, sibling protection is still maintained.
Also catch erpc errors when initial connecting to node, and have back-off and retry. This smooths handling of the case where a node is suddenly killed.
To make it simpler and clearer to remove session token
If the strong conditional check is at the API - becomes much harder to make the check identically and reliably in the HTTP API. Webmachine has lots of potential exit points outside of our control - so it is hard to reliably release a session on exit.
Change the GET run as part of a conditional PUT, so as not ot require a full object fetch.
To help in riak_test
Also allows for the async update to verifications that an upstream node has granted a token
ThomasArts
reviewed
Jun 13, 2024
ThomasArts
reviewed
Jun 13, 2024
ThomasArts
reviewed
Jun 13, 2024
Discussion of the reliability of process monitoring, including the monitoring of processes on remote nodes, led to a new design which simplifies the message passing required by relying instead on monitoring. A request should now be refused if there exists an currently active grant, and the preflist has changed since that grant was activated. In all other situations grants should be made (either immediately or after being queued awaiting a release).
With aim to improve clarity
The naming has been changed following review to try and clarify meaning. The downstream_check was previously sync - but two token_managers could attempt to check in each other at the same time (for different tokens) - resulting in deadlock. These messages are now async not sync.
ThomasArts
reviewed
Jun 17, 2024
ThomasArts
reviewed
Jun 17, 2024
ThomasArts
reviewed
Jun 18, 2024
ThomasArts
reviewed
Jun 18, 2024
The token_manager will no longer monitor remote upstream sessions. Add a backup GC process, whereby if a request is blocked, there is a check of the existence in the remote manager of an association with the session. The only thing that must be reliable, is therefore the monitoring of local sessions by managers. Before using a local session, an association check is now made, that the local token manager has an active association for that session.
Very hard to test the GC process - hard to create a PID which doesn't trigger down on die. There was a discrepancy in sloppy_quorum between PB/WM APIs for the conditional check. This is resolved.
Once a HEAD has been converted to a GET, disqualify it from the find_bestobject calculation. As GETs occur after HEADs in the GET_FSM, it should by default be the case when allow_mult=true that VCget >= VChead. This is not the case with lww=true, wher eobjects may regress in VC terms. However as lww=true, it is destined to prefer the GET which came after the HEAD. The code as-is could cause a HEAD to continuously require re-fetching, and lead to requests timing out. This is demonstrated in the riak_test associated with the conditional PUT token implementation
…k/riak_kv into nhse-d34-nhskv.i33-token
ThomasArts
reviewed
Jun 19, 2024
ThomasArts
approved these changes
Jun 19, 2024
Change following review
lists:uniq introduced in OTP25
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add support for a token manager and token session process to act as a requestor for tokens, and warp riak_client calls within a session.
This is then used to be provide a stronger conditional PUT API to allow for check-and-set operations that will behave reliably when the cluster is healthy, as well as in simple failure scenarios and operational changes. The CAS will fall-back to eventual consistency in extreme failure scenarios.