You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The config specifies that the maximum number of steps is 400k. The epoch counter is misleading as you have actually seen batch_size times the number of epochs that are shown in the terminal. This has to do with how dataloading happens here:
Epoch 2: : 17671it [1:05:34, 4.49it/s, loss=2.15, v_num=3hqp]wandb: Network error (TransientError), entering retry loop.
Epoch 2: : 25832it [1:35:47, 4.49it/s, loss=2.23, v_num=3hqp]wandb: Network error (TransientError), entering retry loop.
Epoch 2: : 115816it [7:02:37, 4.57it/s, loss=2.1, v_num=3hqp]Epoch 2, global step 400000: 'val/AP' was not in top 1
self._num_logged_artifact() = 1
num_ckpt_logged_before = 1
num_new_cktps = 1
Trainer.fit
stopped:max_steps=400000
reached.Epoch 2: : 115816it [7:03:13, 4.56it/s, loss=2.1, v_num=3hqp]
wandb: Waiting for W&B process to finish... (success).
The provided code reached max_steps after only two epochs. Is there a problem somewhere? If I want to train for more epochs, what should I do?
The text was updated successfully, but these errors were encountered: