Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the Run upload process #2172

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lampajr
Copy link
Member

@lampajr lampajr commented Nov 15, 2024

Refactor the process such that runs and related datasets get created/persisted synchronously whereas the datasets processing (label values calculation and so on) is deferred to the async process by making use if the dataset-even queue

Fixes Issue

Fixes #1976

Changes proposed

This change proposes to slightly update how the Run upload is managed, moving from a completely sync process to a hybrid one.

The process can be summarized as follow:

  1. Persist the Run and any related dataset in the same transaction (if everything goes fine obv).
  2. Once run and datasets are flushed, send dataset events to the dataset-event queue to process label values (and change point detection and so on) asynchronously.

TODOs/Open points:

  • What to do with the datastore uploads, should we follow the same process even if more than 10 runs are passed?
  • Change the API to always return 202 if the process succeeded with the runId that was created.
  • Adapt tests
  • [ ] Adapt the UI nothing to do here

Check List (Check all the applicable boxes)

  • My code follows the code style of this project.
  • My change requires changes to the documentation.
  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@lampajr lampajr force-pushed the run_upload_refactoring branch 4 times, most recently from 40b5218 to 95d8918 Compare November 18, 2024 11:46
@lampajr
Copy link
Member Author

lampajr commented Nov 18, 2024

The behavior will change as follows:

  • Uploading a new Run
    • If datastore has more than 10 runs, all of them will be processed async and the api will return 202 with empty array
    • If less than 10 runs, the process will persist the runs and their datasets and it will queue up the datasets recalculation such that they will be processed async . The API will return the list of run ids.

Given that the transform() method is getting called by other processes, if isRecalculation is set to true (which is not the case for new Run upload) the datasets recalculation is getting processed sync.

@lampajr lampajr force-pushed the run_upload_refactoring branch 3 times, most recently from f1661fe to 5fdd97d Compare November 19, 2024 10:00
@lampajr lampajr marked this pull request as ready for review November 19, 2024 11:16
@lampajr
Copy link
Member Author

lampajr commented Nov 19, 2024

Given that with the new implementation, uploading a run is not fully synchronous anymore, I had to change some tests in order to explicitly recalculate the labelValues.

Moreover I create a different in memory resource for AMQ such that we can skip processing events in the AMQ broker if not actually required, especially because they will throw errors as they are not running in the same transaction of the test

Refactor the process such that runs and related datasets
get created/persisted synchronously whereas the datasets
processing (label values calculation and so on) is deferred
to the async process by making use if the dataset-even queue

Signed-off-by: Andrea Lamparelli <a.lamparelli95@gmail.com>
Signed-off-by: Andrea Lamparelli <a.lamparelli95@gmail.com>
With this resource we can have more control over async processing
and even avoid unnecessary async processing when not needed

Signed-off-by: Andrea Lamparelli <a.lamparelli95@gmail.com>
@johnaohara
Copy link
Member

johnaohara commented Nov 21, 2024

Moreover I create a different in memory resource for AMQ

wdym by this? what happens in the event of a crash? we have an external amq instance to persist any messages that have not been processed so we can recover in the event of a crash. why do we need an in-memory resource?

@lampajr
Copy link
Member Author

lampajr commented Nov 21, 2024

Moreover I create a different in memory resource for AMQ

wdym by this? what happens in the event of a crash? we have an external amq instance to persist any messages that have not been processed so we can recover in the event of a crash. why do we need an in-memory resource?

Sorry, I should have mentioned that is just for testing purposes, see horreum-backend/src/test/java/io/hyperfoil/tools/horreum/test/AMQPInMemoryResource.java such that all tests that do not require or that they don't check async processing can use that. This will avoid to see some exceptions during the tests that are "expected" but that can cause confusion

@johnaohara
Copy link
Member

Sorry, I should have mentioned that is just for testing purposes, see horreum-backend/src/test/java/io/hyperfoil/tools/horreum/test/AMQPInMemoryResource.java such that all tests that do not require or that they don't check async processing can use that. This will avoid to see some exceptions during the tests that are "expected" but that can cause confusion

ok, thanks for clarifying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transaction killed while uploading a run
2 participants