Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): Add prototype for long running API calls for Archived Recordings #698

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

Josh-Matsuoka
Copy link
Contributor

@Josh-Matsuoka Josh-Matsuoka commented Oct 29, 2024

Hi,

This adds a prototype for long running API operations (first step of #286).

The workflow proceeds as described in the issue:

  • Cryostat receives a patch request to archive a recording
  • Basic validation is done (check that target is valid and that the recording exists)
  • Following this an ArchiveRequest is created that tracks a job ID and the requested recording
  • The archive request is then passed on to the ArchiveRequestGenerator (works similarly to the InterruptibleReportsGenerator)
  • ArchiveRequestGenerator performs the archiving on a separate thread, firing a notification to the web client for success or failure which includes the job ID

TODO: Web Client needs to be adjusted to account for the new notifications, refreshing the tables when it receives them.
TODO: Implement the same framework for Grafana uploads of active/archived recordings, as well as report generation for active/archived recordings

@mergify mergify bot added the safe-to-test label Oct 29, 2024
@Josh-Matsuoka Josh-Matsuoka added the feat New feature or request label Oct 29, 2024
@Josh-Matsuoka
Copy link
Contributor Author

@andrewazores which other endpoints do you think could use a framework like this? What other operations are likely to block the client for a long time?

@andrewazores
Copy link
Member

andrewazores commented Oct 30, 2024

Archiving an active recording has the obvious long-running concerns where the current architecture can lead to the client waiting a long time for Cryostat to respond. Another similar case right now is uploading recordings to the jfr-datasource for viewing in Grafana. Both the Grafana feature and the report generation feature also run into it for both active and archived recordings.

In the future, features like taking a heap dump should also fit into this same long-running task architecture.

@Inject private RecordingHelper recordingHelper;
@Inject ReportsService reportsService;

private Map<String, Map<String, AnalysisResult>> jobResults;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to avoid adding something like this that hangs on to the report generation results in memory. The report generation system already has tiered in-memory and S3-backed caching, so it'd be best to lean on that.

For archiving recordings, or for uploading recording files to jfr-datasource, it's a simpler design because there is no response payload to send back to the client - the client just needs to know that the job has been completed and whether it completed successfully.

For report generation, where it takes time and the client needs to not only not that it has completed and whether it was a success, but also needs a response payload (report result), I think an API that behaves something like Map.computeIfAbsent() would be best. I'm thinking that the ReportsService interface should gain methods for checking whether a result is already present for a given key. The default implementation would trivially return false, and the two caching tier implementations would check their respective caches and delegates.

image

HTTP 200 responses would contain the Map<String, AnalysisResult> as the response body for the client to use. HTTP 202 responses would contain a Job UUID in the response body, maybe a Location header to tell the client the URL where to find the result when it's ready (1), and that's it. Later, the completion notification containing the Job UUID would be emitted. When the client receives the the notification containing that UUID it sends a follow-up GET to the Location header URL, and this time around the result is ready to go so the response is an HTTP 200 with the expected data. This way, the existing tiered caching infrastructure is reused and there are no new endpoints introduced, only different response status behaviours on the existing endpoints.

(1) right now, the URL would be the same URL that the request was sent to initially, ex. /api/v4/reports/{encodedKey}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a WIP implementation of the suggested design, returning a 200 with the analysis result if the storage or quarkus cache has the report, delegating to the generator otherwise. The web client needs to be updated to react to the notifications which is what I'll work on next, but for now it gets the 202 response when the cache misses with the location URL to check when it gets the notification

@Override
public boolean keyExists(ActiveRecording recording) {
String key = ReportsService.key(recording);
return !Objects.isNull(activeCache.as(CaffeineCache.class).getIfPresent(key));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check delegate.keyExists(recording) here as well (for both signatures, and both cache layers).

Mostly this would support the case where an archived report is cached in storage (S3), but has been dropped from the in-memory cache already. The reportFor implementation should already handle the tiered loading/computing structure, so the keyExists check should mirror the same behaviour - we can consider the key exists if it is in either cache level, and retrieve the result from either level.

@Josh-Matsuoka Josh-Matsuoka marked this pull request as ready for review November 21, 2024 21:25
@Josh-Matsuoka
Copy link
Contributor Author

/build_test

Copy link

Workflow started at 11/21/2024, 5:30:33 PM. View Actions Run.

Copy link

OpenAPI schema change detected:

diff --git a/schema/openapi.yaml b/schema/openapi.yaml
index 4b168c9..8c8c378 100644
--- a/schema/openapi.yaml
+++ b/schema/openapi.yaml
@@ -1,25 +1,13 @@
 ---
 components:
   schemas:
-    AnalysisResult:
-      properties:
-        evaluation:
-          $ref: '#/components/schemas/Evaluation'
-        name:
-          type: string
-        score:
-          format: double
-          type: number
-        topic:
-          type: string
-      type: object
     Annotations:
       properties:
         cryostat:
           additionalProperties:
             type: string
           type: object
         platform:
           additionalProperties:
             type: string
           type: object
@@ -232,33 +220,20 @@ components:
           format: uri
           type: string
         id:
           $ref: '#/components/schemas/UUID_Flat'
         realm:
           $ref: '#/components/schemas/DiscoveryNode_Flat'
       required:
         - id
         - realm
       type: object
-    Evaluation:
-      properties:
-        explanation:
-          type: string
-        solution:
-          type: string
-        suggestions:
-          items:
-            $ref: '#/components/schemas/Suggestion'
-          type: array
-        summary:
-          type: string
-      type: object
     Event:
       properties:
         clazz:
           type: string
         description:
           type: string
         fields:
           items:
             $ref: '#/components/schemas/Field'
           type: array
@@ -302,20 +277,30 @@ components:
         relationKey:
           type: string
       type: object
     FileUpload:
       type: object
     GitInfo:
       properties:
         hash:
           type: string
       type: object
+    HttpServerResponse:
+      properties:
+        chunked:
+          type: boolean
+        statusCode:
+          format: int32
+          type: integer
+        statusMessage:
+          type: string
+      type: object
     Instant:
       example: 2022-03-10T16:15:50Z
       format: date-time
       type: string
     JsonObject:
       items:
         properties:
           key:
             type: string
           value: {}
@@ -547,29 +532,20 @@ components:
       type: object
     SerializableOptionDescriptor:
       properties:
         defaultValue:
           type: string
         description:
           type: string
         name:
           type: string
       type: object
-    Suggestion:
-      properties:
-        name:
-          type: string
-        setting:
-          type: string
-        value:
-          type: string
-      type: object
     Target:
       properties:
         agent:
           readOnly: true
           type: boolean
         alias:
           pattern: \S
           type: string
         annotations:
           $ref: '#/components/schemas/Annotations'
@@ -1218,22 +1194,27 @@ paths:
       tags:
         - Event Templates
   /api/v4/grafana/{encodedKey}:
     post:
       parameters:
         - in: path
           name: encodedKey
           required: true
           schema:
             type: string
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/HttpServerResponse'
       responses:
-        "202":
+        "200":
           content:
             text/plain:
               schema:
                 type: string
           description: OK
         "401":
           description: Not Authorized
         "403":
           description: Not Allowed
       security:
@@ -1449,34 +1430,27 @@ paths:
       tags:
         - Archived Recordings
   /api/v4/reports/{encodedKey}:
     get:
       parameters:
         - in: path
           name: encodedKey
           required: true
           schema:
             type: string
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/HttpServerResponse'
       responses:
-        "202":
-          content:
-            text/plain:
-              schema:
-                type: string
-          description: OK
         "200":
-          content:
-            application/json:
-              schema:
-                additionalProperties:
-                  $ref: '#/components/schemas/AnalysisResult'
-                type: object
           description: OK
         "401":
           description: Not Authorized
         "403":
           description: Not Allowed
       security:
         - SecurityScheme: []
       tags:
         - Reports
   /api/v4/rules:
@@ -2126,59 +2100,60 @@ paths:
             format: int64
             type: integer
         - in: path
           name: targetId
           required: true
           schema:
             format: int64
             type: integer
       requestBody:
         content:
-          text/plain:
+          application/json:
             schema:
-              type: string
+              $ref: '#/components/schemas/HttpServerResponse'
       responses:
-        "202":
+        "200":
           content:
             text/plain:
               schema:
                 type: string
           description: OK
         "401":
           description: Not Authorized
         "403":
           description: Not Allowed
-        "404":
-          description: Not Found
-        "502":
-          description: Target not Reachable
       security:
         - SecurityScheme: []
       tags:
         - Active Recordings
   /api/v4/targets/{targetId}/recordings/{remoteId}/upload:
     post:
       parameters:
         - in: path
           name: remoteId
           required: true
           schema:
             format: int64
             type: integer
         - in: path
           name: targetId
           required: true
           schema:
             format: int64
             type: integer
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/HttpServerResponse'
       responses:
-        "202":
+        "200":
           content:
             text/plain:
               schema:
                 type: string
           description: OK
         "401":
           description: Not Authorized
         "403":
           description: Not Allowed
       security:
@@ -2193,34 +2168,27 @@ paths:
           required: true
           schema:
             format: int64
             type: integer
         - in: path
           name: targetId
           required: true
           schema:
             format: int64
             type: integer
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/HttpServerResponse'
       responses:
         "200":
-          content:
-            application/json:
-              schema:
-                additionalProperties:
-                  $ref: '#/components/schemas/AnalysisResult'
-                type: object
-          description: OK
-        "202":
-          content:
-            text/plain:
-              schema:
-                type: string
           description: OK
         "401":
           description: Not Authorized
         "403":
           description: Not Allowed
       security:
         - SecurityScheme: []
       tags:
         - Reports
   /api/v4/targets/{targetId}/snapshot:

Copy link

No GraphQL schema changes detected.

Copy link

Schema changes committed by the CI.

Copy link

CI build and push: At least one test failed ❌
https://github.com/cryostatio/cryostat/actions/runs/11962766557

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request safe-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants