You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, determining an average processing Rate for benchmarks comes from multiple nightly runs. However, variability in benchmarks for an operation can come from multiple sources; source code changes, platform changes, operational rate fluctuations.
Source Code Changes: Any change to the codebase (especially the DH engine) can affect an operation even if the operation is not targeted for work.
Platform Changes; Hardware changes, JVM version, Docker version, dependency versions, etc can all affect Rates without and code changes (Currently, the benchmarks are run on the same bare metal hardware every night)
Operational Rate Changes: Individual operational rates can swing wildly even neither source code or platform changes have occurred
Possible methods of doing iterations (at least for operations that have the worst variability when run back-to-back).
Provision more servers: 3 for starters, but make it configurable and take the median run w/metrics
Pros: Iterations for all standard operations
Cons: More expensive
Iterate on worst offenders: After nightly, get worst offenders on variability and iterate
Pros: Cheaper. Keeps some operational variable benchmarks off the worst list
Cons: Some operations are iterated. Others are not. Yet all are compared.
Notes:
Logging: Docker logs for nightly runs are per-operation
Co-locate with the test results so that when selecting the median run, logs come with?
Multi-server Failures: If test run fails on one server but not on the others
Do we fail the whole thing? Or use the successful runs?
Change origin column to include server host as well as component?
Add result column that includes iteration count?
The text was updated successfully, but these errors were encountered:
Most of this ticket is solved by #317. That approach is not as ambitious but does the job of providing a way to add extra iterations for problematic benchmarks. The rest will be done with Auto-Provisioning Benchmark Hardware
Currently, determining an average processing Rate for benchmarks comes from multiple nightly runs. However, variability in benchmarks for an operation can come from multiple sources; source code changes, platform changes, operational rate fluctuations.
Possible methods of doing iterations (at least for operations that have the worst variability when run back-to-back).
Notes:
The text was updated successfully, but these errors were encountered: