Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train #4

Open
gskgnksjbn opened this issue Oct 29, 2024 · 0 comments
Open

train #4

gskgnksjbn opened this issue Oct 29, 2024 · 0 comments

Comments

@gskgnksjbn
Copy link

Hello, thank you for your excellent work
During the operation, I encountered some errors
The following is the environment I installed
`name: nerfmatch
channels:

  • pytorch
  • nvidia
  • defaults
    dependencies:
  • _libgcc_mutex=0.1=main
  • _openmp_mutex=5.1=1_gnu
  • blas=1.0=mkl
  • brotli-python=1.0.9=py39h6a678d5_8
  • bzip2=1.0.8=h5eee18b_6
  • ca-certificates=2024.9.24=h06a4308_0
  • certifi=2024.8.30=py39h06a4308_0
  • charset-normalizer=3.3.2=pyhd3eb1b0_0
  • cuda-cudart=11.7.99=0
  • cuda-cupti=11.7.101=0
  • cuda-libraries=11.7.1=0
  • cuda-nvrtc=11.7.99=0
  • cuda-nvtx=11.7.91=0
  • cuda-runtime=11.7.1=0
  • cuda-version=12.6=3
  • ffmpeg=4.3=hf484d3e_0
  • filelock=3.13.1=py39h06a4308_0
  • freetype=2.12.1=h4a9f257_0
  • gmp=6.2.1=h295c915_3
  • gmpy2=2.1.2=py39heeb90bb_0
  • gnutls=3.6.15=he1e5248_0
  • idna=3.7=py39h06a4308_0
  • intel-openmp=2023.1.0=hdb19cb5_46306
  • jinja2=3.1.4=py39h06a4308_0
  • jpeg=9e=h5eee18b_3
  • lame=3.100=h7b6447c_0
  • lcms2=2.12=h3be6417_0
  • ld_impl_linux-64=2.40=h12ee557_0
  • lerc=3.0=h295c915_0
  • libcublas=11.10.3.66=0
  • libcufft=10.7.2.124=h4fbf590_0
  • libcufile=1.11.1.6=0
  • libcurand=10.3.7.77=0
  • libcusolver=11.4.0.1=0
  • libcusparse=11.7.4.91=0
  • libdeflate=1.17=h5eee18b_1
  • libffi=3.4.4=h6a678d5_1
  • libgcc-ng=11.2.0=h1234567_1
  • libgomp=11.2.0=h1234567_1
  • libiconv=1.16=h5eee18b_3
  • libidn2=2.3.4=h5eee18b_0
  • libnpp=11.7.4.75=0
  • libnvjpeg=11.8.0.2=0
  • libpng=1.6.39=h5eee18b_0
  • libstdcxx-ng=11.2.0=h1234567_1
  • libtasn1=4.19.0=h5eee18b_0
  • libtiff=4.5.1=h6a678d5_0
  • libunistring=0.9.10=h27cfd23_0
  • libwebp-base=1.3.2=h5eee18b_1
  • lz4-c=1.9.4=h6a678d5_1
  • markupsafe=2.1.3=py39h5eee18b_0
  • mkl=2023.1.0=h213fc3f_46344
  • mkl-service=2.4.0=py39h5eee18b_1
  • mkl_fft=1.3.10=py39h5eee18b_0
  • mkl_random=1.2.7=py39h1128e8f_0
  • mpc=1.1.0=h10f8cd9_1
  • mpfr=4.0.2=hb69a4c5_1
  • mpmath=1.3.0=py39h06a4308_0
  • ncurses=6.4=h6a678d5_0
  • nettle=3.7.3=hbbd107a_1
  • networkx=3.2.1=py39h06a4308_0
  • openh264=2.1.1=h4ff587b_0
  • openjpeg=2.5.2=he7f1fd0_0
  • openssl=3.0.15=h5eee18b_0
  • pillow=10.4.0=py39h5eee18b_0
  • pysocks=1.7.1=py39h06a4308_0
  • python=3.9.20=he870216_1
  • pytorch=2.0.1=py3.9_cuda11.7_cudnn8.5.0_0
  • pytorch-cuda=11.7=h778d358_5
  • pytorch-mutex=1.0=cuda
  • readline=8.2=h5eee18b_0
  • requests=2.32.3=py39h06a4308_0
  • sqlite=3.45.3=h5eee18b_0
  • sympy=1.13.2=py39h06a4308_0
  • tbb=2021.8.0=hdb19cb5_0
  • tk=8.6.14=h39e8969_0
  • torchaudio=2.0.2=py39_cu117
  • torchtriton=2.0.0=py39
  • torchvision=0.15.2=py39_cu117
  • typing_extensions=4.11.0=py39h06a4308_0
  • tzdata=2024b=h04d1e81_0
  • urllib3=2.2.3=py39h06a4308_0
  • wheel=0.44.0=py39h06a4308_0
  • xz=5.4.6=h5eee18b_1
  • zlib=1.2.13=h5eee18b_1
  • zstd=1.5.6=hc292b87_0
  • pip:
    • absl-py==2.1.0
    • aiohappyeyeballs==2.4.3
    • aiohttp==3.10.10
    • aiosignal==1.3.1
    • anyio==4.6.2.post1
    • argon2-cffi==23.1.0
    • argon2-cffi-bindings==21.2.0
    • arrow==1.3.0
    • asttokens==2.4.1
    • async-lru==2.0.4
    • async-timeout==4.0.3
    • attrs==24.2.0
    • babel==2.16.0
    • beautifulsoup4==4.12.3
    • bleach==6.1.0
    • cffi==1.17.1
    • comm==0.2.2
    • contourpy==1.3.0
    • cycler==0.12.1
    • debugpy==1.8.7
    • decorator==5.1.1
    • defusedxml==0.7.1
    • einops==0.8.0
    • exceptiongroup==1.2.2
    • executing==2.1.0
    • fastjsonschema==2.20.0
    • fonttools==4.54.1
    • fqdn==1.5.1
    • frozenlist==1.5.0
    • fsspec==2024.10.0
    • future==1.0.0
    • grpcio==1.67.0
    • h11==0.14.0
    • h5py==3.12.1
    • httpcore==1.0.6
    • httpx==0.27.2
    • huggingface-hub==0.26.1
    • imageio==2.36.0
    • imgviz==1.7.5
    • importlib-metadata==8.5.0
    • importlib-resources==6.4.5
    • ipykernel==6.29.5
    • ipython==8.18.1
    • ipywidgets==8.1.5
    • isoduration==20.11.0
    • jedi==0.19.1
    • joblib==1.4.2
    • json5==0.9.25
    • jsonpointer==3.0.0
    • jsonschema==4.23.0
    • jsonschema-specifications==2024.10.1
    • jupyter==1.1.1
    • jupyter-client==8.6.3
    • jupyter-console==6.6.3
    • jupyter-core==5.7.2
    • jupyter-events==0.10.0
    • jupyter-lsp==2.2.5
    • jupyter-server==2.14.2
    • jupyter-server-terminals==0.5.3
    • jupyterlab==4.2.5
    • jupyterlab-pygments==0.3.0
    • jupyterlab-server==2.27.3
    • jupyterlab-widgets==3.0.13
    • kiwisolver==1.4.7
    • kornia==0.7.3
    • kornia-rs==0.1.5
    • lazy-loader==0.4
    • lightning-utilities==0.11.8
    • loguru==0.7.2
    • markdown==3.7
    • markdown-it-py==3.0.0
    • matplotlib==3.9.2
    • matplotlib-inline==0.1.7
    • mdurl==0.1.2
    • mistune==3.0.2
    • multidict==6.1.0
    • nbclient==0.10.0
    • nbconvert==7.16.4
    • nbformat==5.10.4
    • nerfacc==0.5.3
    • nest-asyncio==1.6.0
    • notebook==7.2.2
    • notebook-shim==0.2.4
    • numpy==1.24.0
    • opencv-contrib-python==4.10.0.84
    • opencv-python==4.10.0.84
    • overrides==7.7.0
    • packaging==24.1
    • pandocfilters==1.5.1
    • parso==0.8.4
    • pexpect==4.9.0
    • pip==23.2.1
    • platformdirs==4.3.6
    • prometheus-client==0.21.0
    • prompt-toolkit==3.0.48
    • propcache==0.2.0
    • protobuf==5.28.3
    • psutil==6.1.0
    • ptyprocess==0.7.0
    • pure-eval==0.2.3
    • pycolmap==0.4.0
    • pycparser==2.22
    • pydeprecate==0.3.1
    • pygments==2.18.0
    • pyparsing==3.2.0
    • python-dateutil==2.9.0.post0
    • python-json-logger==2.0.7
    • pytorch-lightning==1.5.10
    • pyyaml==6.0.2
    • pyzmq==26.2.0
    • referencing==0.35.1
    • rfc3339-validator==0.1.4
    • rfc3986-validator==0.1.1
    • rich==13.9.3
    • rpds-py==0.20.0
    • safetensors==0.4.5
    • scikit-image==0.24.0
    • scipy==1.13.1
    • send2trash==1.8.3
    • setuptools==59.5.0
    • six==1.16.0
    • sniffio==1.3.1
    • soupsieve==2.6
    • stack-data==0.6.3
    • tensorboard==2.18.0
    • tensorboard-data-server==0.7.2
    • terminado==0.18.1
    • tifffile==2024.8.30
    • timm==1.0.11
    • tinycss2==1.4.0
    • tomli==2.0.2
    • torchmetrics==1.5.1
    • tornado==6.4.1
    • tqdm==4.66.5
    • traitlets==5.14.3
    • transforms3d==0.4.2
    • types-python-dateutil==2.9.0.20241003
    • uri-template==1.3.0
    • wcwidth==0.2.13
    • webcolors==24.8.0
    • webencodings==0.5.1
    • websocket-client==1.8.0
    • werkzeug==3.0.6
    • widgetsnbextension==4.0.13
    • yacs==0.1.8
    • yarl==1.16.0
    • zipp==3.20.2
      prefix: /data/users/yuxuanhan/anaconda3/envs/nerfmatch
      `
      NO.1 NeRF Training (Optional),I downloaded the pre trained NERF model
      NO.2 Cache NeRF Features # Cambridge
      python -m model_eval.eval_nerf --cache_scene_pts --split 'train_test'
      --downsample 8 --img_wh 480 480 --stop_layer 3
      --ckpt 'pretrained/nerf/cambridge/mip_app/#scene_last.ckpt'
      --scene_anno_path 'data/annotations/cambridge_jsons/transforms_#scene_#split.json'
      --cache_dir 'outputs/scene_dirs/cambridge/inter_layer3/#scene/mip_app/last_15ep'
      --dataset 'cambridge'
      NO.3 NeRFMatch train
      torchrun --nproc_per_node=8 model_train/train_nerfmatch_c2f.py
      --config configs/nerfmatch/nerfmatch_cambridge_c2f.yaml
      --backbone 'convformer384' --temp_type 'mul' --batch_size 2
      --max_epochs 50 --clr 0.0004 --cbs 16 --pair_topk 20 --aug_self_pairs 10
      --scene_dir 'outputs/scene_dirs/cambridge/inter_layer3/#scene/mip_app/last_15ep/ds8lin'
      --resume_version 'mip_app_inter3_last' --update_conf
      --prefix 'eccv/repr' --scenes 'ShopFacade'
      but I changed the number of GPUs to 4 :nproc_per_node=8 to nproc_per_node=4

Here are the errors I made:(nerfmatch) y@hello-PowerEdge-T640:~/nerfmatch$ torchrun --nproc_per_node=4 model_train/train_nerfmatch_c2f.py \

--config configs/nerfmatch/nerfmatch_cambridge_c2f.yaml \
--backbone 'convformer384' --temp_type 'mul' --batch_size 2 \
--max_epochs 50 --clr 0.0004 --cbs 16 --pair_topk 20 --aug_self_pairs 10  \
--scene_dir  'outputs/scene_dirs/cambridge/inter_layer3/#scene/mip_app/last_15ep/ds8lin'   \
--resume_version 'mip_app_inter3_last' --update_conf \
--prefix 'eccv/repr'   --scenes 'ShopFacade' 

WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Global seed set to 12343
True batch: 8 lr: 0.0002
[2024-10-29 09:13:25|trainer|INFO]: Namespace(data=Namespace(dataset='NeRFMatchPair', data_dir='data/cambridge', scenes=['ShopFacade'], scene_anno_path='data/annotations/cambridge_jsons/transforms_#scene_#split.json', scene_dir='outputs/scene_dirs/cambridge/inter_layer3/#scene/mip_app/last_15ep/ds8lin', train_pair_txt='data/pairs/cambridge/#scene/pairs-db-covis20.txt', test_pair_txt='data/pairs/cambridge/#scene/pairs-query-netvlad10.txt', pair_topk=20, img_wh=[480, 480], img_dim=3, use_msk=False, model_ds=8, imagenet_norm=True, balanced_pair=True, epoch_sample_num=10000, aug_self_pairs=10), optim=Namespace(optimizer='adam', adapt_lr=True, clr=0.0004, cbs=16, weight_decay=0.0, lr_scheduler='cosine', coarse_only_epochs=0, max_epochs=50, lr=0.0002), model=Namespace(backbone='convformer384', pretrained=True, im_pe=True, im_sa_type='share', im_sa=3, temp_type='mul', pt_sa=3, pt_dim=256, pt_sa_type='full', pt_pe=True, pt_pe_type='fourier', post_pt_pe=True, cfeat_dim=256, ffeat_dim=128, cformer_type='crs', coarse_layers=1, pt_ftype='nerf', fine_sa=1, fsa_type='full', win_sz=5, cat_c_feat=True, fine_loss='match', coarse_percent=0.3, coarse_dthres=10, coarse_ckpt=None, c2f_ckpt=None), exp=Namespace(seed=12343, odir=PosixPath('outputs/nerfmatch/c2f/cambridge'), prefix='eccv/repr', resume_version='mip_app_inter3_last', num_workers=4, max_epochs=50, check_epochs=1, batch_size=2, debug=False, name='eccv/repr/NeRFMatchPair_ShopFacade_wh480-480ds8lin_top20ep10000_bala_imn_slfaug10/convformer384_pre_imp_ptp_ptfull3_pepos_imsa_share_cfcrs1d256_multmp_fsafull1d128w5_catc_match0.3d10/g4clr0.0004cbs16adamcosine_ep50'), gpus=-1, prefix='eccv/repr', debug=False, config='configs/nerfmatch/nerfmatch_cambridge_c2f.yaml', coarse_ckpt=None, c2f_ckpt=None, backbone='convformer384', cformer_type='crs', coarse_layers=1, pt_sa=3, im_sa=3, pt_dim=256, cfeat_dim=256, pt_pe=True, im_pe=True, im_sa_type='share', pt_sa_type='full', pt_ftype='nerf', pt_pe_type='fourier', temp_type='mul', fine_sa=1, fsa_type='full', update_conf=True, batch_size=2, clr=0.0004, cbs=16, adapt_lr=False, max_epochs=50, coarse_only_epochs=0, epoch_sample_num=10000, pair_topk=20, aug_self_pairs=10, train_pair_txt=None, scene_dir='outputs/scene_dirs/cambridge/inter_layer3/#scene/mip_app/last_15ep/ds8lin', scenes=['ShopFacade'], resume_version='mip_app_inter3_last', gpu_num=4)
[2024-10-29 09:13:25|trainer|INFO]: # GPUs=4 <pytorch_lightning.plugins.training_type.ddp.DDPPlugin object at 0x7fd0f04e1e80>

Global seed set to 12343
Global seed set to 12343
True batch: 8 lr: 0.0002
[2024-10-29 09:13:25|trainer|INFO]: Namespace(data=Namespace(dataset='NeRFMatchPair', data_dir='data/cambridge', scenes=['ShopFacade'], scene_anno_path='data/annotations/cambridge_jsons/transforms_#scene_#split.json', scene_dir='outputs/scene_dirs/cambridge/inter_layer3/#scene/mip_app/last_15ep/ds8lin', train_pair_txt='data/pairs/cambridge/#scene/pairs-db-covis20.txt', test_pair_txt='data/pairs/cambridge/#scene/pairs-query-netvlad10.txt', pair_topk=20, img_wh=[480, 480], img_dim=3, use_msk=False, model_ds=8, imagenet_norm=True, balanced_pair=True, epoch_sample_num=10000, aug_self_pairs=10), optim=Namespace(optimizer='adam', adapt_lr=True, clr=0.0004, cbs=16, weight_decay=0.0, lr_scheduler='cosine', coarse_only_epochs=0, max_epochs=50, lr=0.0002), model=Namespace(backbone='convformer384', pretrained=True, im_pe=True, im_sa_type='share', im_sa=3, temp_type='mul', pt_sa=3, pt_dim=256, pt_sa_type='full', pt_pe=True, pt_pe_type='fourier', post_pt_pe=True, cfeat_dim=256, ffeat_dim=128, cformer_type='crs', coarse_layers=1, pt_ftype='nerf', fine_sa=1, fsa_type='full', win_sz=5, cat_c_feat=True, fine_loss='match', coarse_percent=0.3, coarse_dthres=10, coarse_ckpt=None, c2f_ckpt=None), exp=Namespace(seed=12343, odir=PosixPath('outputs/nerfmatch/c2f/cambridge'), prefix='eccv/repr', resume_version='mip_app_inter3_last', num_workers=4, max_epochs=50, check_epochs=1, batch_size=2, debug=False, name='eccv/repr/NeRFMatchPair_ShopFacade_wh480-480ds8lin_top20ep10000_bala_imn_slfaug10/convformer384_pre_imp_ptp_ptfull3_pepos_imsa_share_cfcrs1d256_multmp_fsafull1d128w5_catc_match0.3d10/g4clr0.0004cbs16adamcosine_ep50'), gpus=-1, prefix='eccv/repr', debug=False, config='configs/nerfmatch/nerfmatch_cambridge_c2f.yaml', coarse_ckpt=None, c2f_ckpt=None, backbone='convformer384', cformer_type='crs', coarse_layers=1, pt_sa=3, im_sa=3, pt_dim=256, cfeat_dim=256, pt_pe=True, im_pe=True, im_sa_type='share', pt_sa_type='full', pt_ftype='nerf', pt_pe_type='fourier', temp_type='mul', fine_sa=1, fsa_type='full', update_conf=True, batch_size=2, clr=0.0004, cbs=16, adapt_lr=False, max_epochs=50, coarse_only_epochs=0, epoch_sample_num=10000, pair_topk=20, aug_self_pairs=10, train_pair_txt=None, scene_dir='outputs/scene_dirs/cambridge/inter_layer3/#scene/mip_app/last_15ep/ds8lin', scenes=['ShopFacade'], resume_version='mip_app_inter3_last', gpu_num=4)
True batch: 8 lr: 0.0002
[2024-10-29 09:13:25|trainer|INFO]: # GPUs=4 <pytorch_lightning.plugins.training_type.ddp.DDPPlugin object at 0x7f97098d9d30>

[2024-10-29 09:13:25|trainer|INFO]: Namespace(data=Namespace(dataset='NeRFMatchPair', data_dir='data/cambridge', scenes=['ShopFacade'], scene_anno_path='data/annotations/cambridge_jsons/transforms_#scene_#split.json', scene_dir='outputs/scene_dirs/cambridge/inter_layer3/#scene/mip_app/last_15ep/ds8lin', train_pair_txt='data/pairs/cambridge/#scene/pairs-db-covis20.txt', test_pair_txt='data/pairs/cambridge/#scene/pairs-query-netvlad10.txt', pair_topk=20, img_wh=[480, 480], img_dim=3, use_msk=False, model_ds=8, imagenet_norm=True, balanced_pair=True, epoch_sample_num=10000, aug_self_pairs=10), optim=Namespace(optimizer='adam', adapt_lr=True, clr=0.0004, cbs=16, weight_decay=0.0, lr_scheduler='cosine', coarse_only_epochs=0, max_epochs=50, lr=0.0002), model=Namespace(backbone='convformer384', pretrained=True, im_pe=True, im_sa_type='share', im_sa=3, temp_type='mul', pt_sa=3, pt_dim=256, pt_sa_type='full', pt_pe=True, pt_pe_type='fourier', post_pt_pe=True, cfeat_dim=256, ffeat_dim=128, cformer_type='crs', coarse_layers=1, pt_ftype='nerf', fine_sa=1, fsa_type='full', win_sz=5, cat_c_feat=True, fine_loss='match', coarse_percent=0.3, coarse_dthres=10, coarse_ckpt=None, c2f_ckpt=None), exp=Namespace(seed=12343, odir=PosixPath('outputs/nerfmatch/c2f/cambridge'), prefix='eccv/repr', resume_version='mip_app_inter3_last', num_workers=4, max_epochs=50, check_epochs=1, batch_size=2, debug=False, name='eccv/repr/NeRFMatchPair_ShopFacade_wh480-480ds8lin_top20ep10000_bala_imn_slfaug10/convformer384_pre_imp_ptp_ptfull3_pepos_imsa_share_cfcrs1d256_multmp_fsafull1d128w5_catc_match0.3d10/g4clr0.0004cbs16adamcosine_ep50'), gpus=-1, prefix='eccv/repr', debug=False, config='configs/nerfmatch/nerfmatch_cambridge_c2f.yaml', coarse_ckpt=None, c2f_ckpt=None, backbone='convformer384', cformer_type='crs', coarse_layers=1, pt_sa=3, im_sa=3, pt_dim=256, cfeat_dim=256, pt_pe=True, im_pe=True, im_sa_type='share', pt_sa_type='full', pt_ftype='nerf', pt_pe_type='fourier', temp_type='mul', fine_sa=1, fsa_type='full', update_conf=True, batch_size=2, clr=0.0004, cbs=16, adapt_lr=False, max_epochs=50, coarse_only_epochs=0, epoch_sample_num=10000, pair_topk=20, aug_self_pairs=10, train_pair_txt=None, scene_dir='outputs/scene_dirs/cambridge/inter_layer3/#scene/mip_app/last_15ep/ds8lin', scenes=['ShopFacade'], resume_version='mip_app_inter3_last', gpu_num=4)
[2024-10-29 09:13:25|trainer|INFO]: # GPUs=4 <pytorch_lightning.plugins.training_type.ddp.DDPPlugin object at 0x7fb6bdea8040>

Global seed set to 12343
True batch: 8 lr: 0.0002
[2024-10-29 09:13:25|trainer|INFO]: Namespace(data=Namespace(dataset='NeRFMatchPair', data_dir='data/cambridge', scenes=['ShopFacade'], scene_anno_path='data/annotations/cambridge_jsons/transforms_#scene_#split.json', scene_dir='outputs/scene_dirs/cambridge/inter_layer3/#scene/mip_app/last_15ep/ds8lin', train_pair_txt='data/pairs/cambridge/#scene/pairs-db-covis20.txt', test_pair_txt='data/pairs/cambridge/#scene/pairs-query-netvlad10.txt', pair_topk=20, img_wh=[480, 480], img_dim=3, use_msk=False, model_ds=8, imagenet_norm=True, balanced_pair=True, epoch_sample_num=10000, aug_self_pairs=10), optim=Namespace(optimizer='adam', adapt_lr=True, clr=0.0004, cbs=16, weight_decay=0.0, lr_scheduler='cosine', coarse_only_epochs=0, max_epochs=50, lr=0.0002), model=Namespace(backbone='convformer384', pretrained=True, im_pe=True, im_sa_type='share', im_sa=3, temp_type='mul', pt_sa=3, pt_dim=256, pt_sa_type='full', pt_pe=True, pt_pe_type='fourier', post_pt_pe=True, cfeat_dim=256, ffeat_dim=128, cformer_type='crs', coarse_layers=1, pt_ftype='nerf', fine_sa=1, fsa_type='full', win_sz=5, cat_c_feat=True, fine_loss='match', coarse_percent=0.3, coarse_dthres=10, coarse_ckpt=None, c2f_ckpt=None), exp=Namespace(seed=12343, odir=PosixPath('outputs/nerfmatch/c2f/cambridge'), prefix='eccv/repr', resume_version='mip_app_inter3_last', num_workers=4, max_epochs=50, check_epochs=1, batch_size=2, debug=False, name='eccv/repr/NeRFMatchPair_ShopFacade_wh480-480ds8lin_top20ep10000_bala_imn_slfaug10/convformer384_pre_imp_ptp_ptfull3_pepos_imsa_share_cfcrs1d256_multmp_fsafull1d128w5_catc_match0.3d10/g4clr0.0004cbs16adamcosine_ep50'), gpus=-1, prefix='eccv/repr', debug=False, config='configs/nerfmatch/nerfmatch_cambridge_c2f.yaml', coarse_ckpt=None, c2f_ckpt=None, backbone='convformer384', cformer_type='crs', coarse_layers=1, pt_sa=3, im_sa=3, pt_dim=256, cfeat_dim=256, pt_pe=True, im_pe=True, im_sa_type='share', pt_sa_type='full', pt_ftype='nerf', pt_pe_type='fourier', temp_type='mul', fine_sa=1, fsa_type='full', update_conf=True, batch_size=2, clr=0.0004, cbs=16, adapt_lr=False, max_epochs=50, coarse_only_epochs=0, epoch_sample_num=10000, pair_topk=20, aug_self_pairs=10, train_pair_txt=None, scene_dir='outputs/scene_dirs/cambridge/inter_layer3/#scene/mip_app/last_15ep/ds8lin', scenes=['ShopFacade'], resume_version='mip_app_inter3_last', gpu_num=4)
[2024-10-29 09:13:25|trainer|INFO]: # GPUs=4 <pytorch_lightning.plugins.training_type.ddp.DDPPlugin object at 0x7f9058fc6b20>

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 199, in _new_conn
sock = connection.create_connection(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 789, in urlopen
response = self._make_request(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 490, in _make_request
raise new_e
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 466, in _make_request
self._validate_conn(conn)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
conn.connect()
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 693, in connect
self.sock = sock = self._new_conn()
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 214, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fb69c2114f0>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 843, in urlopen
retries = retries.increment(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/retry.py", line 519, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convformer_b36.sail_in1k_384/resolve/main/pytorch_model.bin (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fb69c2114f0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1376, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1296, in get_hf_file_metadata
r = _request_wrapper(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 277, in _request_wrapper
response = _request_wrapper(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 300, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 93, in send
return super().send(request, *args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/adapters.py", line 700, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convformer_b36.sail_in1k_384/resolve/main/pytorch_model.bin (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fb69c2114f0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 8b46855a-41c5-43a8-9924-194ea3638c51)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/users/y/nerfmatch/model_train/train_nerfmatch_c2f.py", line 110, in
main()
File "/data/users/y/nerfmatch/model_train/train_nerfmatch_c2f.py", line 106, in main
train(config)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 863, in train
model = NeRFMatchMSTrainer(config)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 561, in init
self.model = NeRFMatcherMS(model_conf)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 84, in init
self.backbone = init_backbone_8_2(
File "/data/users/y/nerfmatch/model_train/nerfmatch/modules/init.py", line 112, in init_backbone_8_2
backbone = MetaFormer_MS(name, pretrained=pretrained)
File "/data/users/y/nerfmatch/model_train/nerfmatch/modules/init.py", line 28, in init
model = timm.create_model(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_factory.py", line 117, in create_model
model = create_fn(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/metaformer.py", line 1015, in convformer_b36
return _create_metaformer('convformer_b36', pretrained=pretrained, **model_kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/metaformer.py", line 663, in _create_metaformer
model = build_model_with_cfg(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_builder.py", line 427, in build_model_with_cfg
load_pretrained(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_builder.py", line 205, in load_pretrained
state_dict = load_state_dict_from_hf(pretrained_loc, weights_only=True)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_hub.py", line 192, in load_state_dict_from_hf
cached_file = hf_hub_download(hf_model_id, filename=filename, revision=hf_revision)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 969, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1487, in _raise_on_head_call_error
raise LocalEntryNotFoundError(
huggingface_hub.errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 199, in _new_conn
sock = connection.create_connection(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 789, in urlopen
response = self._make_request(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 490, in _make_request
raise new_e
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 466, in _make_request
self._validate_conn(conn)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
conn.connect()
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 693, in connect
self.sock = sock = self._new_conn()
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 214, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fd0e00a5e80>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 843, in urlopen
retries = retries.increment(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/retry.py", line 519, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convformer_b36.sail_in1k_384/resolve/main/pytorch_model.bin (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd0e00a5e80>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1376, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1296, in get_hf_file_metadata
r = _request_wrapper(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 277, in _request_wrapper
response = _request_wrapper(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 300, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 93, in send
return super().send(request, *args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/adapters.py", line 700, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convformer_b36.sail_in1k_384/resolve/main/pytorch_model.bin (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd0e00a5e80>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 043a0871-d157-4b41-af22-78462b5cebe5)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/users/y/nerfmatch/model_train/train_nerfmatch_c2f.py", line 110, in
main()
File "/data/users/y/nerfmatch/model_train/train_nerfmatch_c2f.py", line 106, in main
train(config)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 863, in train
model = NeRFMatchMSTrainer(config)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 561, in init
self.model = NeRFMatcherMS(model_conf)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 84, in init
self.backbone = init_backbone_8_2(
File "/data/users/y/nerfmatch/model_train/nerfmatch/modules/init.py", line 112, in init_backbone_8_2
backbone = MetaFormer_MS(name, pretrained=pretrained)
File "/data/users/y/nerfmatch/model_train/nerfmatch/modules/init.py", line 28, in init
model = timm.create_model(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_factory.py", line 117, in create_model
model = create_fn(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/metaformer.py", line 1015, in convformer_b36
return _create_metaformer('convformer_b36', pretrained=pretrained, **model_kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/metaformer.py", line 663, in _create_metaformer
model = build_model_with_cfg(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_builder.py", line 427, in build_model_with_cfg
load_pretrained(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_builder.py", line 205, in load_pretrained
state_dict = load_state_dict_from_hf(pretrained_loc, weights_only=True)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_hub.py", line 192, in load_state_dict_from_hf
cached_file = hf_hub_download(hf_model_id, filename=filename, revision=hf_revision)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 969, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1487, in _raise_on_head_call_error
raise LocalEntryNotFoundError(
huggingface_hub.errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 199, in _new_conn
sock = connection.create_connection(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 789, in urlopen
response = self._make_request(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 490, in _make_request
raise new_e
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 466, in _make_request
self._validate_conn(conn)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
conn.connect()
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 693, in connect
self.sock = sock = self._new_conn()
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 214, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f903c3a4cd0>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 843, in urlopen
retries = retries.increment(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/retry.py", line 519, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convformer_b36.sail_in1k_384/resolve/main/pytorch_model.bin (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f903c3a4cd0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1376, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1296, in get_hf_file_metadata
r = _request_wrapper(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 277, in _request_wrapper
response = _request_wrapper(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 300, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 93, in send
return super().send(request, *args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/adapters.py", line 700, in send
Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 199, in _new_conn
sock = connection.create_connection(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convformer_b36.sail_in1k_384/resolve/main/pytorch_model.bin (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f903c3a4cd0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: a067b81f-3bb2-480c-82a2-2ef3f404ae32)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/users/y/nerfmatch/model_train/train_nerfmatch_c2f.py", line 110, in
raise err
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
main()
File "/data/users/y/nerfmatch/model_train/train_nerfmatch_c2f.py", line 106, in main
sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 789, in urlopen
train(config)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 863, in train
response = self._make_request(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 490, in _make_request
model = NeRFMatchMSTrainer(config)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 561, in init
raise new_e
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 466, in _make_request
self.model = NeRFMatcherMS(model_conf)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 84, in init
self.backbone = init_backbone_8_2(
File "/data/users/y/nerfmatch/model_train/nerfmatch/modules/init.py", line 112, in init_backbone_8_2
self._validate_conn(conn)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
backbone = MetaFormer_MS(name, pretrained=pretrained)
File "/data/users/y/nerfmatch/model_train/nerfmatch/modules/init.py", line 28, in init
model = timm.create_model(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_factory.py", line 117, in create_model
model = create_fn(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/metaformer.py", line 1015, in convformer_b36
conn.connect()
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 693, in connect
return _create_metaformer('convformer_b36', pretrained=pretrained, **model_kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/metaformer.py", line 663, in _create_metaformer
self.sock = sock = self._new_conn()
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connection.py", line 214, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f97000e1e20>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/adapters.py", line 667, in send
model = build_model_with_cfg(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_builder.py", line 427, in build_model_with_cfg
load_pretrained(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_builder.py", line 205, in load_pretrained
resp = conn.urlopen(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/connectionpool.py", line 843, in urlopen
state_dict = load_state_dict_from_hf(pretrained_loc, weights_only=True)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_hub.py", line 192, in load_state_dict_from_hf
cached_file = hf_hub_download(hf_model_id, filename=filename, revision=hf_revision)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
retries = retries.increment( File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download

File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/urllib3/util/retry.py", line 519, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convformer_b36.sail_in1k_384/resolve/main/pytorch_model.bin (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f97000e1e20>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1376, in _get_metadata_or_catch_error
return _hf_hub_download_to_cache_dir(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 969, in _hf_hub_download_to_cache_dir
metadata = get_hf_file_metadata(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1487, in _raise_on_head_call_error
return fn(*args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1296, in get_hf_file_metadata
raise LocalEntryNotFoundError(
huggingface_hub.errors.LocalEntryNotFoundError : r = _request_wrapper(An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 277, in _request_wrapper
response = _request_wrapper(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 300, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 93, in send
return super().send(request, *args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/requests/adapters.py", line 700, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /timm/convformer_b36.sail_in1k_384/resolve/main/pytorch_model.bin (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f97000e1e20>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 58e87945-68d4-4103-8105-29a6f6748c43)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data/users/y/nerfmatch/model_train/train_nerfmatch_c2f.py", line 110, in
main()
File "/data/users/y/nerfmatch/model_train/train_nerfmatch_c2f.py", line 106, in main
train(config)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 863, in train
model = NeRFMatchMSTrainer(config)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 561, in init
self.model = NeRFMatcherMS(model_conf)
File "/data/users/y/nerfmatch/model_train/nerfmatch/nerfmatch_c2f_trainer.py", line 84, in init
self.backbone = init_backbone_8_2(
File "/data/users/y/nerfmatch/model_train/nerfmatch/modules/init.py", line 112, in init_backbone_8_2
backbone = MetaFormer_MS(name, pretrained=pretrained)
File "/data/users/y/nerfmatch/model_train/nerfmatch/modules/init.py", line 28, in init
model = timm.create_model(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_factory.py", line 117, in create_model
model = create_fn(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/metaformer.py", line 1015, in convformer_b36
return _create_metaformer('convformer_b36', pretrained=pretrained, **model_kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/metaformer.py", line 663, in _create_metaformer
model = build_model_with_cfg(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_builder.py", line 427, in build_model_with_cfg
load_pretrained(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_builder.py", line 205, in load_pretrained
state_dict = load_state_dict_from_hf(pretrained_loc, weights_only=True)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/timm/models/_hub.py", line 192, in load_state_dict_from_hf
cached_file = hf_hub_download(hf_model_id, filename=filename, revision=hf_revision)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 862, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 969, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1487, in _raise_on_head_call_error
raise LocalEntryNotFoundError(
huggingface_hub.errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 953941 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 953940) of binary: /data/users/y/anaconda3/envs/nerfmatch/bin/python
Traceback (most recent call last):
File "/data/users/y/anaconda3/envs/nerfmatch/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')())
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/data/users/y/anaconda3/envs/nerfmatch/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

model_train/train_nerfmatch_c2f.py FAILED

Failures:
[1]:
time : 2024-10-29_09:13:47
host : hello-PowerEdge-T640
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 953942)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-10-29_09:13:47
host : hello-PowerEdge-T640
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 953943)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-10-29_09:13:47
host : hello-PowerEdge-T640
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 953940)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

excuse me
1In this error, there is a remote connection to huggingface. co, is it because the link cannot be reached?
2In the environment configuration, the version in the PyTorch Lightning official documentation does not correspond to the version of Torch
Torch is 2.0, PL version requires 2.0 or above, is this an incorrect reason? Version Corresponding Query https://lightning.ai/docs/pytorch/latest/versioning.html#pytorch -support
如果是问题1的话,请问需要什么方式解决呢?
那问题2需不需要改动呢?如果pl的版本进行升级之后,代码也需要改动,因为替换高版本的pl会出现以下错误
ImportError: cannot import name ‘DDPPlugin‘ from pytorch_lightning.plugins

谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant