Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write and run API k6 load tests during staging API deployments #5000

Open
5 tasks
sarayourfriend opened this issue Sep 27, 2024 · 0 comments
Open
5 tasks

Write and run API k6 load tests during staging API deployments #5000

sarayourfriend opened this issue Sep 27, 2024 · 0 comments
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: infra Related to the Terraform config and other infrastructure 🔒 staff only Restricted to staff members

Comments

@sarayourfriend
Copy link
Collaborator

Problem

We rely on staging to verify that new code deployments are zero-downtime safe. Part of zero-downtime safety necessarily requires the service to actually handle and respond to requests. Some migrations which are decided not zero-downtime safe, would never cause problems in a service that is not handling requests. That is because zero-downtime safety is precisely related to whether the service is able to handle and respond to requests when two versions of the application are running at the same time. Nothing is verified if the versions do not handle requests during that period.

Because the staging API does not see virtually any traffic, and certainly not consistent traffic, we cannot rely on staging deployments to verify zero-downtime safety from this perspective.

Description

Write k6 load tests that can run on a timer and exercise all non-deprecated media requests: search, thumbnail, waveform, related, single result. We will rely on HMAC request signing to bypass caching and rate limiting.

Ideally, the tests would also register new OAuth applications and make authenticated requests. However, there is currently no way to register and verify an application programmatically. Enable a new option in the staging API that auto-verifies OAuth applications if the request has a valid HMAC signature. Now we can add additional tests that exercise the authentication workflow and make authenticated requests. (This will require adding the HMAC signing secret as an environment variable to the staging API).

The tests must be able to run using one of the constant timed k6 executors (probably constant-vus but maybe ramping-vus if the staging API needs to warm up for request handling before the deployment rather than jumping straight to the peak traffic level of the test).

Like the frontend k6 local tests, they should be executed against the local API in test during CI on pull requests.

Unlike the frontend k6 staging tests, which execute post deployment, the API tests will execute during deployment. Initiate the k6 tests as a parallel task to dispatching the staging deployment workflow. The staging API typically takes 8–10 minutes to deploy, so the k6 tests should execute for a sufficient period of time before and after the deployment to give a head and tail to the peak traffic levels in relation to the deployment period. For example, k6 could be started with at least 2 minutes before triggering the staging deployment, and allowed to run for 15 minutes total, resulting in a 2-minute head (+/- the time it takes the deployment GitHub Workflow to start and get to the point of deploying) and 5–7 minute tail of traffic compared to the deployment period.

Steps, to be done in separate PRs:

  • Add HMAC signature verification to OAuth registration route and auto-verify when a valid signature is supplied.
  • Add the HMAC signing secret to the staging API environment variables to support the above.
  • Write API k6 tests and sign all requests with the HMAC signing secret. Follow the example from the frontend tests which already implements this using the http.ts utility. No work needs to be done to enable HMAC signing other than using the custom http.ts wrapper utility instead of k6's http directly.
  • Run API k6 tests against the local API in CI.
  • Run API k6 tests according to the process described above during staging deployments.

Additional context

I've written this issue in response to a recent incident which highlighted the differences between staging and production as a vulnerability to our confidence in staging as a representative environment that we can trust to validate changes to the fullest possible extent.

@sarayourfriend sarayourfriend added 🟨 priority: medium Not blocking but should be addressed soon 🌟 goal: addition Addition of new feature 💻 aspect: code Concerns the software code in the repository 🔒 staff only Restricted to staff members 🧱 stack: infra Related to the Terraform config and other infrastructure labels Sep 27, 2024
@openverse-bot openverse-bot moved this to 📋 Backlog in Openverse Backlog Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: infra Related to the Terraform config and other infrastructure 🔒 staff only Restricted to staff members
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant