Skip to content

Commit

Permalink
Disable memory benchmarking (#589)
Browse files Browse the repository at this point in the history
Summary:
Our tests have been red for a while due to failing memory bechmarks.

## Issue

When benchmarking opacus we run the training script multiple times within one process:

```
for i in range(args.num_runs):
    run_layer_benchmark(
        ...
    )
```

We use built-in pytorch tools to check memory stats. Crucially, we verify that `torch.cuda.memory_allocated()` is 0 before the run starts. Normally, it should be 0, as all previous tensors are out of scope and should have been collected.

It all worked fine until something changed and some GPU memory stayed allocated between runs. No idea why, but explicit cache clearing or object deletion didn't help.

So I gave up and disabled memory benchmarking, since it seems like it's not a complicated thing to do due to some PyTorch update

Pull Request resolved: #589

Reviewed By: JohnlNguyen

Differential Revision: D45691684

Pulled By: karthikprasad

fbshipit-source-id: 82006e503240532840d3fb6dc0314f2202780973
  • Loading branch information
Igor Shilov authored and facebook-github-bot committed Aug 1, 2023
1 parent e8bc932 commit 7b28054
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 3 deletions.
1 change: 0 additions & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,6 @@ commands:
python benchmarks/generate_report.py --path-to-results /tmp/report_layers --save-path benchmarks/results/report-${report_id}.pkl --format pkl
python benchmarks/check_threshold.py --report-path "./benchmarks/results/report-"$report_id".pkl" --metric runtime --threshold <<parameters.runtime_ratio_threshold>> --column <<parameters.report_column>>
python benchmarks/check_threshold.py --report-path "./benchmarks/results/report-"$report_id".pkl" --metric memory --threshold <<parameters.memory_ratio_threshold>> --column <<parameters.report_column>>
when: always
- store_artifacts:
path: benchmarks/results/
Expand Down
3 changes: 1 addition & 2 deletions benchmarks/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ def generate_report(path_to_results: str, save_path: str, format: str) -> None:
pivot = results.pivot_table(
index=["batch_size", "num_runs", "num_repeats", "forward_only", "layer"],
columns=["gsm_mode"],
values=["runtime", "memory"],
values=["runtime"],
)

def add_ratio(df, metric, variant):
Expand All @@ -245,7 +245,6 @@ def add_ratio(df, metric, variant):
if "baseline" in results["gsm_mode"].tolist():
for m in set(results["gsm_mode"].tolist()) - {"baseline"}:
add_ratio(pivot, "runtime", m)
add_ratio(pivot, "memory", m)
pivot.columns = pivot.columns.set_names("value", level=1)

output = pivot.sort_index(axis=1).sort_values(
Expand Down

0 comments on commit 7b28054

Please sign in to comment.