LBFGS optimizer doesn't work for PINN training 🐛[BUG]: #492

hasethinvd · 2024-05-09T23:56:23Z

Version

24.01

On which installation method(s) does this occur?

Docker, Pip, Source

Describe the issue

After specifying the optimizer to be bfgs in config file, it overrides the max_steps to 0

Minimum reproducible example

#config
defaults :
  - modulus_default
  - arch:
      - fourier
      - modified_fourier
      - fully_connected
      - multiscale_fourier
  - scheduler: tf_exponential_lr
  - optimizer: bfgs
  - loss: sum


training:
  rec_results_freq: 1000
  max_steps : 150000

Relevant log output

[23:53:04] - lbfgs optimizer selected. Setting max_steps to 0
[23:53:05] - [step:     100000] lbfgs optimization in running
Error executing job with overrides: []
Traceback (most recent call last):
  File "/mount/data/test/eikonal/eikonal.py", line 313, in run
    slv.solve()
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/solver/solver.py", line 173, in solve
    self._train_loop(sigterm_handler)
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 543, in _train_loop
    loss, losses = self._cuda_graph_training_step(step)
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 730, in _cuda_graph_training_step
    self.apply_gradients()
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 185, in bfgs_apply_gradients
    self.optimizer.step(self.bfgs_closure_func)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 379, in wrapper
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/lbfgs.py", line 298, in step
    max_iter = group['max_iter']
KeyError: 'max_iter'

Environment details

No response

avidcoder123 · 2024-09-06T06:08:37Z

This issue is still active and needs fixing.

ktangsali · 2024-10-17T19:24:14Z

This is an expected behavior of the LBFGS optimizer in Modulus-Sym. Inside Modulus-Sym, the optimizer will set the max_steps to zero. If the training is started from scratch, this issue should not show up and the training should run successfully. Reference:

[18:49:00] - attempting to restore from: outputs/helmholtz
[18:49:00] - optimizer checkpoint not found
[18:49:00] - model wave_network.0.pth not found
[18:49:00] - lbfgs optimizer selected. Setting max_steps to 0
/usr/local/lib/python3.10/dist-packages/modulus/sym/eq/derivatives.py:120: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=False):
  with torch.cuda.amp.autocast(enabled=False):
[18:49:00] - [step:          0] lbfgs optimization in running
[18:49:58] - lbfgs optimization completed after 1000 steps
[18:49:58] - [step:          0] record constraint batch time:  5.987e-02s
[18:50:00] - [step:          0] record validators time:  2.309e+00s
[18:50:01] - [step:          0] saved checkpoint to outputs/helmholtz
[18:50:01] - [step:          0] loss:  1.007e+04
[18:50:01] - [step:          0] reached maximum training steps, finished training!

However, the above error occurs, if you switch the optimizer in the middle of training. For example, go from adam to bfgs after a few steps. While this is technically possible, Modulus-Sym does not currently allow such workflows. For such cases, its recommended to check the main Modulus library.

hasethinvd added ? - Needs Triage Need team to review and classify bug Something isn't working labels May 9, 2024

ktangsali self-assigned this Oct 15, 2024

ktangsali closed this as completed Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LBFGS optimizer doesn't work for PINN training 🐛[BUG]: #492

LBFGS optimizer doesn't work for PINN training 🐛[BUG]: #492

hasethinvd commented May 9, 2024 •

edited

Loading

avidcoder123 commented Sep 6, 2024

ktangsali commented Oct 17, 2024

LBFGS optimizer doesn't work for PINN training 🐛[BUG]: #492

LBFGS optimizer doesn't work for PINN training 🐛[BUG]: #492

Comments

hasethinvd commented May 9, 2024 • edited Loading

Version

On which installation method(s) does this occur?

Describe the issue

Minimum reproducible example

Relevant log output

Environment details

avidcoder123 commented Sep 6, 2024

ktangsali commented Oct 17, 2024

hasethinvd commented May 9, 2024 •

edited

Loading