Iterations for Benchmark #176

stanbrub · 2023-10-03T20:32:27Z

Currently, determining an average processing Rate for benchmarks comes from multiple nightly runs. However, variability in benchmarks for an operation can come from multiple sources; source code changes, platform changes, operational rate fluctuations.

Source Code Changes: Any change to the codebase (especially the DH engine) can affect an operation even if the operation is not targeted for work.
Platform Changes; Hardware changes, JVM version, Docker version, dependency versions, etc can all affect Rates without and code changes (Currently, the benchmarks are run on the same bare metal hardware every night)
Operational Rate Changes: Individual operational rates can swing wildly even neither source code or platform changes have occurred

Possible methods of doing iterations (at least for operations that have the worst variability when run back-to-back).

Provision more servers: 3 for starters, but make it configurable and take the median run w/metrics
- Pros: Iterations for all standard operations
- Cons: More expensive
Iterate on worst offenders: After nightly, get worst offenders on variability and iterate
- Pros: Cheaper. Keeps some operational variable benchmarks off the worst list
- Cons: Some operations are iterated. Others are not. Yet all are compared.

Notes:

Logging: Docker logs for nightly runs are per-operation
- Co-locate with the test results so that when selecting the median run, logs come with?
Multi-server Failures: If test run fails on one server but not on the others
- Do we fail the whole thing? Or use the successful runs?
- Change origin column to include server host as well as component?
- Add result column that includes iteration count?

stanbrub · 2024-07-12T18:40:31Z

Most of this ticket is solved by #317. That approach is not as ambitious but does the job of providing a way to add extra iterations for problematic benchmarks. The rest will be done with Auto-Provisioning Benchmark Hardware

stanbrub added the enhancement New feature or request label Oct 3, 2023

This was referenced May 14, 2024

Summary Generation Use Median Benchmark #154

Closed

Scores and Nightly History to Improve Release Comparisons #203

Closed

stanbrub linked a pull request Jul 12, 2024 that will close this issue

Design Changes for Dir Struct, Tagged Iterations, Metrics #317

Merged

stanbrub self-assigned this Jul 12, 2024

stanbrub closed this as completed in #317 Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterations for Benchmark #176

Iterations for Benchmark #176

stanbrub commented Oct 3, 2023 •

edited

Loading

stanbrub commented Jul 12, 2024

Iterations for Benchmark #176

Iterations for Benchmark #176

Comments

stanbrub commented Oct 3, 2023 • edited Loading

stanbrub commented Jul 12, 2024

stanbrub commented Oct 3, 2023 •

edited

Loading