You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
root@ip-172-31-0-126:/Model-References/PyTorch/computer_vision/segmentation/Unet# python main.py --exec_mode predict --task 01 --hpus 1 --fold 3 --seed 123 --val_batch_size 64 --dim 2 --data=/data/pytorch/unet/01_2d --results=/tmp/Unet/results/fold_3 --autocast --inference_mode lazy --ckpt_path pretrained_checkpoint/pretrained_checkpoint.pt
Namespace(framework='pytorch-lightning', exec_mode='predict', data='/data/pytorch/unet/01_2d', results='/tmp/Unet/results/fold_3', logname=None, task='01', gpus=0, hpus=1, learning_rate=0.001, gradient_clip_val=0, negative_slope=0.01, tta=False, gradient_clip=False, gradient_clip_norm=12, amp=False, benchmark=False, deep_supervision=False, drop_block=False, attention=False, residual=False, focal=False, sync_batchnorm=False, save_ckpt=False, nfolds=5, seed=123, skip_first_n_eval=0, ckpt_path='pretrained_checkpoint/pretrained_checkpoint.pt', fold=3, patience=100, lr_patience=70, batch_size=2, val_batch_size=64, steps=None, profile=False, profile_steps='90:95', momentum=0.99, weight_decay=0.0001, save_preds=False, dim=2, resume_training=False, factor=0.3, num_workers=8, min_epochs=30, max_epochs=10000, warmup=5, norm='instance', nvol=1, run_lazy_mode=True, inference_mode='lazy', is_autocast=True, hpu_graphs=True, habana_loader=False, bucket_cap_mb=130, data2d_dim=3, oversampling=0.33, overlap=0.5, affinity='disabled', scheduler='none', optimizer='adamw', blend='gaussian', train_batches=0, test_batches=0, progress_bar_refresh_rate=25, set_aug_seed=False, augment=True, measurement_type='throughput', use_torch_compile=False, enable_tensorboard_logging=False)
Seed set to 123
Seed set to 123
Seed set to 123
Seed set to 773630
Number of test examples: 266
Seed set to 28030
Traceback (most recent call last):
File "/Model-References/PyTorch/computer_vision/segmentation/Unet/main.py", line 218, in <module>
main()
File "/Model-References/PyTorch/computer_vision/segmentation/Unet/main.py", line 209, in main
ptlrun(args)
File "/Model-References/PyTorch/computer_vision/segmentation/Unet/lightning_trainer/ptl.py", line 211, in ptlrun
model = NNUnet.load_from_checkpoint(ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/utilities/model_helpers.py", line 125, in wrapper
return self.method(cls, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/module.py", line 1581, in load_from_checkpoint
loaded = _load_from_checkpoint(
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/saving.py", line 91, in _load_from_checkpoint
model = _load_state(cls, checkpoint, strict=strict, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/saving.py", line 158, in _load_state
obj = cls(**_cls_kwargs)
File "/Model-References/PyTorch/computer_vision/segmentation/Unet/models/nn_unet.py", line 72, in __init__
self.build_nnunet()
File "/Model-References/PyTorch/computer_vision/segmentation/Unet/models/nn_unet.py", line 189, in build_nnunet
in_channels, n_class, kernels, strides, self.patch_size = get_unet_params(self.args)
File "/Model-References/PyTorch/computer_vision/segmentation/Unet/utils/utils.py", line 132, in get_unet_params
config = get_config_file(args)
File "/Model-References/PyTorch/computer_vision/segmentation/Unet/utils/utils.py", line 102, in get_config_file
return pickle.load(open(path, "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/weka/data/pytorch/unet/01_2d/config.pkl'
This command comes from README examples (Single Card Inference Examples / Inference / UNet2D, Lazy mode, BF16 mixed precision, batch size 64, 1 HPU on a single server).
Environment:
AWS DL1 instance + suggested system image
Ubuntu 22.04.4
Python 3.10.12
Environment is AWS DL1 instance. I followed Gaudi AWS quickstart to start instance and run Docker Habana runtime environment.
I'm getting error when running UNet2D inference:
This command comes from README examples (Single Card Inference Examples / Inference / UNet2D, Lazy mode, BF16 mixed precision, batch size 64, 1 HPU on a single server).
Environment:
Environment is AWS DL1 instance. I followed Gaudi AWS quickstart to start instance and run Docker Habana runtime environment.
Command for benchmark inference:
works without errors.
The text was updated successfully, but these errors were encountered: