You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[23:53:04] - lbfgs optimizer selected. Setting max_steps to 0
[23:53:05] - [step: 100000] lbfgs optimization in running
Error executing job with overrides: []
Traceback (most recent call last):
File "/mount/data/test/eikonal/eikonal.py", line 313, in run
slv.solve()
File "/usr/local/lib/python3.10/dist-packages/modulus/sym/solver/solver.py", line 173, in solve
self._train_loop(sigterm_handler)
File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 543, in _train_loop
loss, losses = self._cuda_graph_training_step(step)
File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 730, in _cuda_graph_training_step
self.apply_gradients()
File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 185, in bfgs_apply_gradients
self.optimizer.step(self.bfgs_closure_func)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 379, in wrapper
out = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/lbfgs.py", line 298, in step
max_iter = group['max_iter']
KeyError: 'max_iter'
Environment details
No response
The text was updated successfully, but these errors were encountered:
This is an expected behavior of the LBFGS optimizer in Modulus-Sym. Inside Modulus-Sym, the optimizer will set the max_steps to zero. If the training is started from scratch, this issue should not show up and the training should run successfully. Reference:
[18:49:00] - attempting to restore from: outputs/helmholtz
[18:49:00] - optimizer checkpoint not found
[18:49:00] - model wave_network.0.pth not found
[18:49:00] - lbfgs optimizer selected. Setting max_steps to 0
/usr/local/lib/python3.10/dist-packages/modulus/sym/eq/derivatives.py:120: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.amp.autocast(enabled=False):
with torch.cuda.amp.autocast(enabled=False):
[18:49:00] - [step: 0] lbfgs optimization in running
[18:49:58] - lbfgs optimization completed after 1000 steps
[18:49:58] - [step: 0] record constraint batch time: 5.987e-02s
[18:50:00] - [step: 0] record validators time: 2.309e+00s
[18:50:01] - [step: 0] saved checkpoint to outputs/helmholtz
[18:50:01] - [step: 0] loss: 1.007e+04
[18:50:01] - [step: 0] reached maximum training steps, finished training!
However, the above error occurs, if you switch the optimizer in the middle of training. For example, go from adam to bfgs after a few steps. While this is technically possible, Modulus-Sym does not currently allow such workflows. For such cases, its recommended to check the main Modulus library.
Version
24.01
On which installation method(s) does this occur?
Docker, Pip, Source
Describe the issue
After specifying the optimizer to be bfgs in config file, it overrides the max_steps to 0
Minimum reproducible example
Relevant log output
Environment details
No response
The text was updated successfully, but these errors were encountered: