-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(api): Add prototype for long running API calls for Archived Recordings #698
base: main
Are you sure you want to change the base?
feat(api): Add prototype for long running API calls for Archived Recordings #698
Conversation
src/main/java/io/cryostat/recordings/ArchiveRequestGenerator.java
Outdated
Show resolved
Hide resolved
src/main/java/io/cryostat/recordings/ArchiveRequestGenerator.java
Outdated
Show resolved
Hide resolved
src/main/java/io/cryostat/recordings/ArchiveRequestGenerator.java
Outdated
Show resolved
Hide resolved
@andrewazores which other endpoints do you think could use a framework like this? What other operations are likely to block the client for a long time? |
…it application scoped
Archiving an active recording has the obvious long-running concerns where the current architecture can lead to the client waiting a long time for Cryostat to respond. Another similar case right now is uploading recordings to the jfr-datasource for viewing in Grafana. Both the Grafana feature and the report generation feature also run into it for both active and archived recordings. In the future, features like taking a heap dump should also fit into this same long-running task architecture. |
@Inject private RecordingHelper recordingHelper; | ||
@Inject ReportsService reportsService; | ||
|
||
private Map<String, Map<String, AnalysisResult>> jobResults; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to avoid adding something like this that hangs on to the report generation results in memory. The report generation system already has tiered in-memory and S3-backed caching, so it'd be best to lean on that.
For archiving recordings, or for uploading recording files to jfr-datasource, it's a simpler design because there is no response payload to send back to the client - the client just needs to know that the job has been completed and whether it completed successfully.
For report generation, where it takes time and the client needs to not only not that it has completed and whether it was a success, but also needs a response payload (report result), I think an API that behaves something like Map.computeIfAbsent()
would be best. I'm thinking that the ReportsService
interface should gain methods for checking whether a result is already present for a given key. The default implementation would trivially return false, and the two caching tier implementations would check their respective caches and delegates.
HTTP 200 responses would contain the Map<String, AnalysisResult>
as the response body for the client to use. HTTP 202 responses would contain a Job UUID in the response body, maybe a Location
header to tell the client the URL where to find the result when it's ready (1), and that's it. Later, the completion notification containing the Job UUID would be emitted. When the client receives the the notification containing that UUID it sends a follow-up GET to the Location
header URL, and this time around the result is ready to go so the response is an HTTP 200 with the expected data. This way, the existing tiered caching infrastructure is reused and there are no new endpoints introduced, only different response status behaviours on the existing endpoints.
(1) right now, the URL would be the same URL that the request was sent to initially, ex. /api/v4/reports/{encodedKey}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a WIP implementation of the suggested design, returning a 200 with the analysis result if the storage or quarkus cache has the report, delegating to the generator otherwise. The web client needs to be updated to react to the notifications which is what I'll work on next, but for now it gets the 202 response when the cache misses with the location URL to check when it gets the notification
@Override | ||
public boolean keyExists(ActiveRecording recording) { | ||
String key = ReportsService.key(recording); | ||
return !Objects.isNull(activeCache.as(CaffeineCache.class).getIfPresent(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to check delegate.keyExists(recording)
here as well (for both signatures, and both cache layers).
Mostly this would support the case where an archived report is cached in storage (S3), but has been dropped from the in-memory cache already. The reportFor
implementation should already handle the tiered loading/computing structure, so the keyExists
check should mirror the same behaviour - we can consider the key exists if it is in either cache level, and retrieve the result from either level.
/build_test |
Workflow started at 11/21/2024, 5:30:33 PM. View Actions Run. |
OpenAPI schema change detected: diff --git a/schema/openapi.yaml b/schema/openapi.yaml
index 4b168c9..8c8c378 100644
--- a/schema/openapi.yaml
+++ b/schema/openapi.yaml
@@ -1,25 +1,13 @@
---
components:
schemas:
- AnalysisResult:
- properties:
- evaluation:
- $ref: '#/components/schemas/Evaluation'
- name:
- type: string
- score:
- format: double
- type: number
- topic:
- type: string
- type: object
Annotations:
properties:
cryostat:
additionalProperties:
type: string
type: object
platform:
additionalProperties:
type: string
type: object
@@ -232,33 +220,20 @@ components:
format: uri
type: string
id:
$ref: '#/components/schemas/UUID_Flat'
realm:
$ref: '#/components/schemas/DiscoveryNode_Flat'
required:
- id
- realm
type: object
- Evaluation:
- properties:
- explanation:
- type: string
- solution:
- type: string
- suggestions:
- items:
- $ref: '#/components/schemas/Suggestion'
- type: array
- summary:
- type: string
- type: object
Event:
properties:
clazz:
type: string
description:
type: string
fields:
items:
$ref: '#/components/schemas/Field'
type: array
@@ -302,20 +277,30 @@ components:
relationKey:
type: string
type: object
FileUpload:
type: object
GitInfo:
properties:
hash:
type: string
type: object
+ HttpServerResponse:
+ properties:
+ chunked:
+ type: boolean
+ statusCode:
+ format: int32
+ type: integer
+ statusMessage:
+ type: string
+ type: object
Instant:
example: 2022-03-10T16:15:50Z
format: date-time
type: string
JsonObject:
items:
properties:
key:
type: string
value: {}
@@ -547,29 +532,20 @@ components:
type: object
SerializableOptionDescriptor:
properties:
defaultValue:
type: string
description:
type: string
name:
type: string
type: object
- Suggestion:
- properties:
- name:
- type: string
- setting:
- type: string
- value:
- type: string
- type: object
Target:
properties:
agent:
readOnly: true
type: boolean
alias:
pattern: \S
type: string
annotations:
$ref: '#/components/schemas/Annotations'
@@ -1218,22 +1194,27 @@ paths:
tags:
- Event Templates
/api/v4/grafana/{encodedKey}:
post:
parameters:
- in: path
name: encodedKey
required: true
schema:
type: string
+ requestBody:
+ content:
+ application/json:
+ schema:
+ $ref: '#/components/schemas/HttpServerResponse'
responses:
- "202":
+ "200":
content:
text/plain:
schema:
type: string
description: OK
"401":
description: Not Authorized
"403":
description: Not Allowed
security:
@@ -1449,34 +1430,27 @@ paths:
tags:
- Archived Recordings
/api/v4/reports/{encodedKey}:
get:
parameters:
- in: path
name: encodedKey
required: true
schema:
type: string
+ requestBody:
+ content:
+ application/json:
+ schema:
+ $ref: '#/components/schemas/HttpServerResponse'
responses:
- "202":
- content:
- text/plain:
- schema:
- type: string
- description: OK
"200":
- content:
- application/json:
- schema:
- additionalProperties:
- $ref: '#/components/schemas/AnalysisResult'
- type: object
description: OK
"401":
description: Not Authorized
"403":
description: Not Allowed
security:
- SecurityScheme: []
tags:
- Reports
/api/v4/rules:
@@ -2126,59 +2100,60 @@ paths:
format: int64
type: integer
- in: path
name: targetId
required: true
schema:
format: int64
type: integer
requestBody:
content:
- text/plain:
+ application/json:
schema:
- type: string
+ $ref: '#/components/schemas/HttpServerResponse'
responses:
- "202":
+ "200":
content:
text/plain:
schema:
type: string
description: OK
"401":
description: Not Authorized
"403":
description: Not Allowed
- "404":
- description: Not Found
- "502":
- description: Target not Reachable
security:
- SecurityScheme: []
tags:
- Active Recordings
/api/v4/targets/{targetId}/recordings/{remoteId}/upload:
post:
parameters:
- in: path
name: remoteId
required: true
schema:
format: int64
type: integer
- in: path
name: targetId
required: true
schema:
format: int64
type: integer
+ requestBody:
+ content:
+ application/json:
+ schema:
+ $ref: '#/components/schemas/HttpServerResponse'
responses:
- "202":
+ "200":
content:
text/plain:
schema:
type: string
description: OK
"401":
description: Not Authorized
"403":
description: Not Allowed
security:
@@ -2193,34 +2168,27 @@ paths:
required: true
schema:
format: int64
type: integer
- in: path
name: targetId
required: true
schema:
format: int64
type: integer
+ requestBody:
+ content:
+ application/json:
+ schema:
+ $ref: '#/components/schemas/HttpServerResponse'
responses:
"200":
- content:
- application/json:
- schema:
- additionalProperties:
- $ref: '#/components/schemas/AnalysisResult'
- type: object
- description: OK
- "202":
- content:
- text/plain:
- schema:
- type: string
description: OK
"401":
description: Not Authorized
"403":
description: Not Allowed
security:
- SecurityScheme: []
tags:
- Reports
/api/v4/targets/{targetId}/snapshot:
|
No GraphQL schema changes detected. |
Schema changes committed by the CI. |
CI build and push: At least one test failed ❌ |
Hi,
This adds a prototype for long running API operations (first step of #286).
The workflow proceeds as described in the issue:
TODO: Web Client needs to be adjusted to account for the new notifications, refreshing the tables when it receives them.
TODO: Implement the same framework for Grafana uploads of active/archived recordings, as well as report generation for active/archived recordings