Fix OSS sealunwrapper adding extra get + put request to all storage get requests #29050
+85
−49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Starting after commit 0ec3524, for each
GET
request Vault makes to the storage backend, the OSS seal unwrapper will make 1 additionalGET
request plus 1 additionalPUT
request. This is because the seal unwrapper now tries to unwrap all storage entries, even if the entry was not wrapped in the first place! Prior to 0ec3524, there was a check to see if the storage entry was wrapped before proceeding with the unwrap procedure, which prevented this behavior.The increased storage latency, writes, and traffic resulting from the extra requests causes noticeable performance degradation for some workloads. 0ec3524 was first included in Vault
v1.15.0
, and after upgrading from Vaultv1.14.9
tov1.15.6
, some of our Vault clusters would take 2-3x longer to start up.Other users have also reported storage-related issues after upgrading to Vault
v1.15+
: #24726.Description
This PR restores the check to prevent Vault from unwrapping a storage entry if the storage entry is not wrapped. This should eliminate the extra
GET
andPUT
requests.Testing
A small test is included in this PR to confirm that at least the extra
PUT
request is no longer being made.I also ran some benchmarks using vault-benchmark (awesome tool BTW!) + a small wrapper script. The chart below shows the p90 of the time needed to perform different benchmark operations on a sample Vault instance. The Vault instance was configured with Amazon S3 for storage and caching was explicitly disabled.
[📊 Additional benchmark results can be found here.]
From the graph, it seems like:
post-0ec3524
in the graph) take significantly longer compared to operations made on a build of the parent commit (labeledpre-0ec3524
).main
branch continues to exhibit similar slow operation issues.main+fix
) shows improved performance, matching the performance level ofpre-0ec3524
.Other notes
Why did you disable caching during the benchmarks?
Due to how
vault-benchmark
benchmarks are structured, it is difficult to evaluate storage performance without disabling caching.vault-benchmark
benchmarks start off with an empty cluster and then write a bunch of state to it. Then they might read that state back or perform other write operations.The problem is that with a few exceptions, all write operations will write to both the storage and the in-memory cache. So by starting off with an empty cluster and then setting up the state by performing writes, we have effectively written the entire cluster state to the in-memory cache. Therefore most read benchmarks we perform will be benchmarking the in-memory cache instead of the actual storage codepath.
I have still included the benchmark results with the cache enabled however, in case they are helpful.