Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidArgumentError: PyLong_AsSize_t failure error during sleap train ; ValueError during sleap-track #1998

Open
1 of 4 tasks
rikebuck opened this issue Oct 16, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@rikebuck
Copy link

rikebuck commented Oct 16, 2024

Bug description

Hello,
When running the sleap training remotely (nvidia v100 GPU or nvidia a10 GPU; linux redhat), ie:

labels="labels.v001_large_rf_grayscale.pkg.slp"
config_json="baseline_large_rf.topdown.json"
sleap-train "$config_json" "$labels"

the training works for the first 49/200 epochs, then I get the error:

Epoch 50/200
Traceback (most recent call last):
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/bin/sleap-train", line 33, in
sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-train')())
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 2014, in main
trainer.train()
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/training.py", line 941, in train
verbose=2,
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3295, in _cache_key
inputs, include_tensor_ranks_only, ENCODE_VARIABLES_BY_RESOURCE_ID)
tensorflow.python.eager.core._NotOkStatusException: InvalidArgumentError: PyLong_AsSize_t failure

If I ignore the error and try to run inference using this model:

model1="/ru-auth/local/home/fbuck/scratch/SLEAP/models/baseline_large_rf.topdown_1/"
input_video="{parent_dir}/SLEAP/ex_inference_vids/${vid_name}.mp4"
model1_predictions="{parent_dir}/SLEAP/predictions/output_predictions_large_rf_topdown_1_50_epochs_${vid_name}.slp"
sleap-track -m "$model1" -o "$model1_predictions" "$input_video"

I get the following error:

Traceback (most recent call last):
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/bin/sleap-track", line 33, in
sys.exit(load_entry_point('sleap==1.3.3', 'console_scripts', 'sleap-track')())
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 5424, in main
labels_pr = predictor.predict(provider)
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 526, in predict
self._make_labeled_frames_from_generator(generator, data)
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 2633, in _make_labeled_frames_from_generator
for ex in generator:
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/inference.py", line 435, in _predict_generator
for ex in self.pipeline.make_dataset():
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/data/pipelines.py", line 276, in make_dataset
self.validate_pipeline()
File "/rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap/lib/python3.7/site-packages/sleap/nn/data/pipelines.py", line 252, in validate_pipeline
f"Missing required keys for transformer (index = {i}, "
ValueError: Missing required keys for transformer (index = 2, type = <class 'sleap.nn.data.instance_centroids.InstanceCentroidFinder'>): ['instances'].
Available: ['frame_ind', 'offset_x', 'raw_image_size', 'image', 'scale', 'video_ind', 'offset_y']

Please advise.
Thank you

Expected behaviour

trained model and prediction on input video

Actual behaviour

Please see above

Your personal set up

Environment packages
# paste output of `pip freeze` or `conda list` here

packages in environment at /rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3:

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 1.0.0 pypi_0 pypi
alsa-lib 1.2.3.2 h166bdaf_0 conda-forge
astunparse 1.6.3 pypi_0 pypi
attrs 21.4.0 pyhd8ed1ab_0 conda-forge
backports-zoneinfo 0.2.1 pypi_0 pypi
blas 1.1 openblas conda-forge
brotli 1.1.0 hb9d3cd8_2 conda-forge
brotli-bin 1.1.0 hb9d3cd8_2 conda-forge
bzip2 1.0.8 h4bc722e_7 conda-forge
c-ares 1.32.3 h4bc722e_0 conda-forge
ca-certificates 2024.8.30 hbcca054_0 conda-forge
cached-property 1.5.2 hd8ed1ab_1 conda-forge
cached_property 1.5.2 pyha770c72_1 conda-forge
cachetools 4.2.4 pypi_0 pypi
cairo 1.16.0 h6cf1ce9_1008 conda-forge
cattrs 1.1.1 pyhd8ed1ab_0 conda-forge
certifi 2024.8.30 pyhd8ed1ab_0 conda-forge
charset-normalizer 2.0.9 pypi_0 pypi
cloudpickle 2.2.1 pyhd8ed1ab_0 conda-forge
cuda-nvcc 11.3.58 h2467b9f_0 nvidia
cudatoolkit 11.3.1 hb98b00a_13 conda-forge
cudnn 8.2.1.32 h86fa8c9_0 conda-forge
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
cytoolz 0.12.0 py37h540881e_0 conda-forge
dask-core 2022.2.0 pyhd8ed1ab_0 conda-forge
dbus 1.13.6 h5008d03_3 conda-forge
efficientnet 1.0.0 pypi_0 pypi
expat 2.6.3 h5888daf_0 conda-forge
ffmpeg 4.3.2 h37c90e5_3 conda-forge
fftw 3.3.10 nompi_hf1063bd_110 conda-forge
flatbuffers 2.0 pypi_0 pypi
fontconfig 2.14.2 h14ed4e7_0 conda-forge
fonttools 4.38.0 py37h540881e_0 conda-forge
freetype 2.12.1 h267a509_2 conda-forge
fsspec 2023.1.0 pyhd8ed1ab_0 conda-forge
gast 0.4.0 pypi_0 pypi
geos 3.11.0 h27087fc_0 conda-forge
gettext 0.22.5 he02047a_3 conda-forge
gettext-tools 0.22.5 he02047a_3 conda-forge
gmp 6.3.0 hac33072_2 conda-forge
gnutls 3.6.13 h85f3911_1 conda-forge
google-auth 2.3.3 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
graphite2 1.3.13 h59595ed_1003 conda-forge
grpcio 1.43.0 pypi_0 pypi
gst-plugins-base 1.18.5 hf529b03_3 conda-forge
gstreamer 1.18.5 h9f60fe5_3 conda-forge
h5py 3.1.0 nompi_py37h1e651dc_100 conda-forge
harfbuzz 2.9.1 h83ec7ef_1 conda-forge
hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge
icu 68.2 h9c3ff4c_0 conda-forge
idna 3.3 pypi_0 pypi
image-classifiers 1.0.0 pypi_0 pypi
imagecodecs-lite 2019.12.3 py37hc105733_5 conda-forge
imageio 2.35.1 pyh12aca89_0 conda-forge
imgaug 0.4.0 pyhd8ed1ab_1 conda-forge
imgstore 0.2.9 pypi_0 pypi
importlib-metadata 4.2.0 pypi_0 pypi
importlib-resources 5.12.0 pypi_0 pypi
jasper 1.900.1 h07fcdf6_1006 conda-forge
joblib 1.3.2 pyhd8ed1ab_0 conda-forge
jpeg 9e h0b41bf4_3 conda-forge
jsmin 3.0.1 pyhd8ed1ab_0 conda-forge
jsonpickle 1.2 py_0 conda-forge
jsonschema 4.17.3 pypi_0 pypi
keras 2.7.0 pypi_0 pypi
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.4.4 py37h7cecad7_0 conda-forge
krb5 1.19.3 h3790be6_0 conda-forge
lame 3.100 h166bdaf_1003 conda-forge
lcms2 2.14 h6ed2654_0 conda-forge
ld_impl_linux-64 2.43 h712a8e2_1 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libasprintf 0.22.5 he8f35ee_3 conda-forge
libasprintf-devel 0.22.5 he8f35ee_3 conda-forge
libblas 3.9.0 24_linux64_openblas conda-forge
libbrotlicommon 1.1.0 hb9d3cd8_2 conda-forge
libbrotlidec 1.1.0 hb9d3cd8_2 conda-forge
libbrotlienc 1.1.0 hb9d3cd8_2 conda-forge
libcblas 3.9.0 24_linux64_openblas conda-forge
libclang 12.0.0 pypi_0 pypi
libcurl 7.86.0 h7bff187_1 conda-forge
libdeflate 1.14 h166bdaf_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libevent 2.1.10 h9b69904_4 conda-forge
libexpat 2.6.3 h5888daf_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc 14.1.0 h77fa898_1 conda-forge
libgcc-ng 14.1.0 h69a702a_1 conda-forge
libgettextpo 0.22.5 he02047a_3 conda-forge
libgettextpo-devel 0.22.5 he02047a_3 conda-forge
libgfortran 14.1.0 h69a702a_1 conda-forge
libgfortran-ng 14.1.0 h69a702a_1 conda-forge
libgfortran5 14.1.0 hc5f4f2c_1 conda-forge
libglib 2.80.2 hf974151_0 conda-forge
libgomp 14.1.0 h77fa898_1 conda-forge
libiconv 1.17 hd590300_2 conda-forge
liblapack 3.9.0 24_linux64_openblas conda-forge
liblapacke 3.9.0 24_linux64_openblas conda-forge
libllvm11 11.1.0 he0ac6c6_5 conda-forge
libnghttp2 1.51.0 hdcd2b5c_0 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libogg 1.3.5 h4ab18f5_0 conda-forge
libopenblas 0.3.27 pthreads_hac2b453_1 conda-forge
libopencv 4.5.3 py37h25009ff_1 conda-forge
libopus 1.3.1 h7f98852_1 conda-forge
libpng 1.6.43 h2797004_0 conda-forge
libpq 13.8 hd77ab85_0 conda-forge
libprotobuf 3.16.0 h780b84a_0 conda-forge
libsodium 1.0.18 h36c2ea0_1 conda-forge
libsqlite 3.46.0 hde9e2c9_0 conda-forge
libssh2 1.10.0 haa6b8db_3 conda-forge
libstdcxx 14.1.0 hc0a3c3a_1 conda-forge
libstdcxx-ng 14.1.0 h4852527_1 conda-forge
libtiff 4.4.0 h82bc61c_5 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libvorbis 1.3.7 h9c3ff4c_0 conda-forge
libwebp-base 1.4.0 hd590300_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libxkbcommon 1.0.3 he3ba5ed_0 conda-forge
libxml2 2.9.12 h72842e0_0 conda-forge
libxslt 1.1.33 h15afd5d_2 conda-forge
libzlib 1.2.13 h4ab18f5_6 conda-forge
locket 1.0.0 pyhd8ed1ab_0 conda-forge
markdown 3.3.6 pypi_0 pypi
markdown-it-py 2.2.0 pyhd8ed1ab_0 conda-forge
matplotlib-base 3.5.3 py37hf395dca_2 conda-forge
mdurl 0.1.2 pyhd8ed1ab_0 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
mysql-common 8.0.32 h14678bc_0 conda-forge
mysql-libs 8.0.32 h54cf53e_0 conda-forge
ncurses 6.5 he02047a_1 conda-forge
ndx-pose 0.1.1 pypi_0 pypi
nettle 3.6 he412f7d_0 conda-forge
networkx 2.7 pyhd8ed1ab_0 conda-forge
nixio 1.5.3 pypi_0 pypi
nspr 4.35 h27087fc_0 conda-forge
nss 3.100 hca3bf56_0 conda-forge
numpy 1.19.5 pypi_0 pypi
oauthlib 3.1.1 pypi_0 pypi
openblas 0.3.27 pthreads_h9eca1d5_1 conda-forge
opencv 4.5.3 py37h89c1867_1 conda-forge
opencv-python-headless 4.2.0.34 pypi_0 pypi
openh264 2.1.1 h780b84a_0 conda-forge
openjpeg 2.5.0 h7d73246_1 conda-forge
openssl 1.1.1w hd590300_0 conda-forge
opt-einsum 3.3.0 pypi_0 pypi
packaging 21.3 pypi_0 pypi
pandas 1.3.5 py37he8f5f7f_0 conda-forge
partd 1.4.1 pyhd8ed1ab_0 conda-forge
patsy 0.5.6 pyhd8ed1ab_0 conda-forge
pcre2 10.43 hcad00b1_0 conda-forge
pillow 9.2.0 py37h850a105_2 conda-forge
pip 24.0 pyhd8ed1ab_0 conda-forge
pixman 0.43.2 h59595ed_0 conda-forge
pkgutil-resolve-name 1.3.10 pypi_0 pypi
protobuf 3.19.1 pypi_0 pypi
psutil 5.9.3 py37h540881e_0 conda-forge
pthread-stubs 0.4 hb9d3cd8_1002 conda-forge
py-opencv 4.5.3 py37h6531663_1 conda-forge
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pygments 2.17.2 pyhd8ed1ab_0 conda-forge
pykalman 0.9.7 pyhd8ed1ab_0 conda-forge
pynwb 2.3.3 pypi_0 pypi
pyparsing 3.0.6 pypi_0 pypi
pyrsistent 0.19.3 pypi_0 pypi
pyside2 5.13.2 py37hfa98aef_7 conda-forge
python 3.7.12 hb7a2778_100_cpython conda-forge
python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge
python-rapidjson 1.9 py37hd23a5d3_0 conda-forge
python_abi 3.7 4_cp37m conda-forge
pytz 2024.2 pyhd8ed1ab_0 conda-forge
pywavelets 1.3.0 py37hda87dfa_1 conda-forge
pyyaml 6.0 py37h540881e_4 conda-forge
pyzmq 24.0.1 py37h0c0c2a8_0 conda-forge
qimage2ndarray 1.10.0 pypi_0 pypi
qt 5.12.9 hda022c4_4 conda-forge
qtpy 2.4.1 pyhd8ed1ab_0 conda-forge
readline 8.2 h8228510_1 conda-forge
requests 2.26.0 pypi_0 pypi
requests-oauthlib 1.3.0 pypi_0 pypi
rich 13.8.1 pyhd8ed1ab_0 conda-forge
ruamel-yaml 0.17.32 pypi_0 pypi
ruamel-yaml-clib 0.2.7 pypi_0 pypi
scikit-image 0.19.2 py37he8f5f7f_0 conda-forge
scikit-learn 1.0 py37hf0f1638_1 conda-forge
scikit-video 1.1.11 pyh24bf2e0_0 conda-forge
scipy 1.7.3 py37hf838250_2 anaconda
seaborn 0.12.2 hd8ed1ab_0 conda-forge
seaborn-base 0.12.2 pyhd8ed1ab_0 conda-forge
segmentation-models 1.0.1 pypi_0 pypi
setuptools 59.8.0 py37h89c1867_1 conda-forge
setuptools-scm 6.3.2 pypi_0 pypi
shapely 1.8.5 py37ha4e3bd1_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sleap 1.3.3 pypi_0 pypi
sqlite 3.46.0 h6d4b2fc_0 conda-forge
statsmodels 0.13.2 py37hda87dfa_0 conda-forge
tensorboard 2.7.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.0 pypi_0 pypi
tensorflow 2.7.0 pypi_0 pypi
tensorflow-estimator 2.7.0 pypi_0 pypi
tensorflow-hub 0.13.0 pyh56297ac_0 conda-forge
tensorflow-io-gcs-filesystem 0.23.1 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge
tifffile 2020.6.3 py_0 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
tomli 2.0.0 pypi_0 pypi
toolz 0.12.1 pyhd8ed1ab_0 conda-forge
typing-extensions 4.0.1 pypi_0 pypi
typing_extensions 4.7.1 pyha770c72_0 conda-forge
tzlocal 5.0.1 pypi_0 pypi
unicodedata2 14.0.0 py37h540881e_1 conda-forge
urllib3 1.26.7 pypi_0 pypi
werkzeug 2.0.2 pypi_0 pypi
wheel 0.42.0 pyhd8ed1ab_0 conda-forge
wrapt 1.13.3 pypi_0 pypi
x264 1!161.3030 h7f98852_1 conda-forge
xorg-kbproto 1.0.7 hb9d3cd8_1003 conda-forge
xorg-libice 1.1.1 hb9d3cd8_1 conda-forge
xorg-libsm 1.2.4 he73a12e_1 conda-forge
xorg-libx11 1.8.4 h0b41bf4_0 conda-forge
xorg-libxau 1.0.11 hb9d3cd8_1 conda-forge
xorg-libxdmcp 1.1.5 hb9d3cd8_0 conda-forge
xorg-libxext 1.3.4 h0b41bf4_2 conda-forge
xorg-libxrender 0.9.10 h7f98852_1003 conda-forge
xorg-renderproto 0.11.1 hb9d3cd8_1003 conda-forge
xorg-xextproto 7.3.0 hb9d3cd8_1004 conda-forge
xorg-xproto 7.0.31 hb9d3cd8_1008 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
zeromq 4.3.5 h59595ed_1 conda-forge
zipp 3.6.0 pypi_0 pypi
zlib 1.2.13 h4ab18f5_6 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge

Logs
# paste relevant logs here, if any

full txt files of the slurm output are attached below.

sleap-train_output.txt
sleap-track_output.txt

Screenshots

How to reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error
@rikebuck rikebuck added the bug Something isn't working label Oct 16, 2024
@eberrigan
Copy link
Contributor

Hi @rikebuck,

How did you install SLEAP? It looks like you have tensorflow 2.12 but our conda package is tensorflow 2.7 for Windows and Linux.

Here is an image with SLEAP and its dependencies installed. This should work on your cluster. Please follow the directions in the README to use it. https://gitlab.com/salk-tm/sleap-train

Best,

Elizabeth

@rikebuck
Copy link
Author

rikebuck commented Oct 22, 2024

I installed SLEAP by using the command "conda create -y -n sleap -c conda-forge -c nvidia -c sleap -c anaconda sleap=1.3.3"

Thank you for the image. In the instructions say "Make sure to have Docker Desktop running first" , however I am running sleap-train remotely. Would I install docker using these instructions instead: https://docs.docker.com/engine/install/rhel/ ? It is a shared cluster and I do not have permission to run "sudo" but I could reach out to the managers of the cluster if installing docker this way makes sense.

In the mean time, I can look into trying to change the tensorflow version in my sleap conda environment.

Thank you

@rikebuck
Copy link
Author

rikebuck commented Oct 22, 2024

Wait,
Actually I just ran "conda list" on my sleap conda environment, and it looks like my tensorflow version is correct -- it is 2.7.0 (please see below)
Sorry for the mistake earlier, I edited my initial issue post above to reflect this. However, the issue remains, and sleap-train was ran in this correct conda environment, using tensorflow 2.7 as confirmed in the sleap-train output log above :

"INFO:sleap.nn.training:Versions:
SLEAP: 1.3.3
TensorFlow: 2.7.0
Numpy: 1.19.5
Python: 3.7.12"
...

please advise. Thank you.

packages in environment at /rugpfs/fs0/bargmann_lab/scratch/fbuck/miniconda3/envs/sleap:

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 1.0.0 pypi_0 pypi
alsa-lib 1.2.3.2 h166bdaf_0 conda-forge
astunparse 1.6.3 pypi_0 pypi
attrs 21.4.0 pyhd8ed1ab_0 conda-forge
backports-zoneinfo 0.2.1 pypi_0 pypi
blas 1.1 openblas conda-forge
brotli 1.1.0 hb9d3cd8_2 conda-forge
brotli-bin 1.1.0 hb9d3cd8_2 conda-forge
bzip2 1.0.8 h4bc722e_7 conda-forge
c-ares 1.32.3 h4bc722e_0 conda-forge
ca-certificates 2024.8.30 hbcca054_0 conda-forge
cached-property 1.5.2 hd8ed1ab_1 conda-forge
cached_property 1.5.2 pyha770c72_1 conda-forge
cachetools 4.2.4 pypi_0 pypi
cairo 1.16.0 h6cf1ce9_1008 conda-forge
cattrs 1.1.1 pyhd8ed1ab_0 conda-forge
certifi 2024.8.30 pyhd8ed1ab_0 conda-forge
charset-normalizer 2.0.9 pypi_0 pypi
cloudpickle 2.2.1 pyhd8ed1ab_0 conda-forge
cuda-nvcc 11.3.58 h2467b9f_0 nvidia
cudatoolkit 11.3.1 hb98b00a_13 conda-forge
cudnn 8.2.1.32 h86fa8c9_0 conda-forge
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
cytoolz 0.12.0 py37h540881e_0 conda-forge
dask-core 2022.2.0 pyhd8ed1ab_0 conda-forge
dbus 1.13.6 h5008d03_3 conda-forge
efficientnet 1.0.0 pypi_0 pypi
expat 2.6.3 h5888daf_0 conda-forge
ffmpeg 4.3.2 h37c90e5_3 conda-forge
fftw 3.3.10 nompi_hf1063bd_110 conda-forge
flatbuffers 2.0 pypi_0 pypi
fontconfig 2.14.2 h14ed4e7_0 conda-forge
fonttools 4.38.0 py37h540881e_0 conda-forge
freetype 2.12.1 h267a509_2 conda-forge
fsspec 2023.1.0 pyhd8ed1ab_0 conda-forge
gast 0.4.0 pypi_0 pypi
geos 3.11.0 h27087fc_0 conda-forge
gettext 0.22.5 he02047a_3 conda-forge
gettext-tools 0.22.5 he02047a_3 conda-forge
gmp 6.3.0 hac33072_2 conda-forge
gnutls 3.6.13 h85f3911_1 conda-forge
google-auth 2.3.3 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
graphite2 1.3.13 h59595ed_1003 conda-forge
grpcio 1.43.0 pypi_0 pypi
gst-plugins-base 1.18.5 hf529b03_3 conda-forge
gstreamer 1.18.5 h9f60fe5_3 conda-forge
h5py 3.1.0 nompi_py37h1e651dc_100 conda-forge
harfbuzz 2.9.1 h83ec7ef_1 conda-forge
hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge
icu 68.2 h9c3ff4c_0 conda-forge
idna 3.3 pypi_0 pypi
image-classifiers 1.0.0 pypi_0 pypi
imagecodecs-lite 2019.12.3 py37hc105733_5 conda-forge
imageio 2.35.1 pyh12aca89_0 conda-forge
imgaug 0.4.0 pyhd8ed1ab_1 conda-forge
imgstore 0.2.9 pypi_0 pypi
importlib-metadata 4.2.0 pypi_0 pypi
importlib-resources 5.12.0 pypi_0 pypi
jasper 1.900.1 h07fcdf6_1006 conda-forge
joblib 1.3.2 pyhd8ed1ab_0 conda-forge
jpeg 9e h0b41bf4_3 conda-forge
jsmin 3.0.1 pyhd8ed1ab_0 conda-forge
jsonpickle 1.2 py_0 conda-forge
jsonschema 4.17.3 pypi_0 pypi
keras 2.7.0 pypi_0 pypi
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.4.4 py37h7cecad7_0 conda-forge
krb5 1.19.3 h3790be6_0 conda-forge
lame 3.100 h166bdaf_1003 conda-forge
lcms2 2.14 h6ed2654_0 conda-forge
ld_impl_linux-64 2.43 h712a8e2_1 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libasprintf 0.22.5 he8f35ee_3 conda-forge
libasprintf-devel 0.22.5 he8f35ee_3 conda-forge
libblas 3.9.0 24_linux64_openblas conda-forge
libbrotlicommon 1.1.0 hb9d3cd8_2 conda-forge
libbrotlidec 1.1.0 hb9d3cd8_2 conda-forge
libbrotlienc 1.1.0 hb9d3cd8_2 conda-forge
libcblas 3.9.0 24_linux64_openblas conda-forge
libclang 12.0.0 pypi_0 pypi
libcurl 7.86.0 h7bff187_1 conda-forge
libdeflate 1.14 h166bdaf_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libevent 2.1.10 h9b69904_4 conda-forge
libexpat 2.6.3 h5888daf_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc 14.1.0 h77fa898_1 conda-forge
libgcc-ng 14.1.0 h69a702a_1 conda-forge
libgettextpo 0.22.5 he02047a_3 conda-forge
libgettextpo-devel 0.22.5 he02047a_3 conda-forge
libgfortran 14.1.0 h69a702a_1 conda-forge
libgfortran-ng 14.1.0 h69a702a_1 conda-forge
libgfortran5 14.1.0 hc5f4f2c_1 conda-forge
libglib 2.80.2 hf974151_0 conda-forge
libgomp 14.1.0 h77fa898_1 conda-forge
libiconv 1.17 hd590300_2 conda-forge
liblapack 3.9.0 24_linux64_openblas conda-forge
liblapacke 3.9.0 24_linux64_openblas conda-forge
libllvm11 11.1.0 he0ac6c6_5 conda-forge
libnghttp2 1.51.0 hdcd2b5c_0 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libogg 1.3.5 h4ab18f5_0 conda-forge
libopenblas 0.3.27 pthreads_hac2b453_1 conda-forge
libopencv 4.5.3 py37h25009ff_1 conda-forge
libopus 1.3.1 h7f98852_1 conda-forge
libpng 1.6.43 h2797004_0 conda-forge
libpq 13.8 hd77ab85_0 conda-forge
libprotobuf 3.16.0 h780b84a_0 conda-forge
libsodium 1.0.18 h36c2ea0_1 conda-forge
libsqlite 3.46.0 hde9e2c9_0 conda-forge
libssh2 1.10.0 haa6b8db_3 conda-forge
libstdcxx 14.1.0 hc0a3c3a_1 conda-forge
libstdcxx-ng 14.1.0 h4852527_1 conda-forge
libtiff 4.4.0 h82bc61c_5 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libvorbis 1.3.7 h9c3ff4c_0 conda-forge
libwebp-base 1.4.0 hd590300_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libxkbcommon 1.0.3 he3ba5ed_0 conda-forge
libxml2 2.9.12 h72842e0_0 conda-forge
libxslt 1.1.33 h15afd5d_2 conda-forge
libzlib 1.2.13 h4ab18f5_6 conda-forge
locket 1.0.0 pyhd8ed1ab_0 conda-forge
markdown 3.3.6 pypi_0 pypi
markdown-it-py 2.2.0 pyhd8ed1ab_0 conda-forge
matplotlib-base 3.5.3 py37hf395dca_2 conda-forge
mdurl 0.1.2 pyhd8ed1ab_0 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
mysql-common 8.0.32 h14678bc_0 conda-forge
mysql-libs 8.0.32 h54cf53e_0 conda-forge
ncurses 6.5 he02047a_1 conda-forge
ndx-pose 0.1.1 pypi_0 pypi
nettle 3.6 he412f7d_0 conda-forge
networkx 2.7 pyhd8ed1ab_0 conda-forge
nixio 1.5.3 pypi_0 pypi
nspr 4.35 h27087fc_0 conda-forge
nss 3.100 hca3bf56_0 conda-forge
numpy 1.19.5 pypi_0 pypi
oauthlib 3.1.1 pypi_0 pypi
openblas 0.3.27 pthreads_h9eca1d5_1 conda-forge
opencv 4.5.3 py37h89c1867_1 conda-forge
opencv-python-headless 4.2.0.34 pypi_0 pypi
openh264 2.1.1 h780b84a_0 conda-forge
openjpeg 2.5.0 h7d73246_1 conda-forge
openssl 1.1.1w hd590300_0 conda-forge
opt-einsum 3.3.0 pypi_0 pypi
packaging 21.3 pypi_0 pypi
pandas 1.3.5 py37he8f5f7f_0 conda-forge
partd 1.4.1 pyhd8ed1ab_0 conda-forge
patsy 0.5.6 pyhd8ed1ab_0 conda-forge
pcre2 10.43 hcad00b1_0 conda-forge
pillow 9.2.0 py37h850a105_2 conda-forge
pip 24.0 pyhd8ed1ab_0 conda-forge
pixman 0.43.2 h59595ed_0 conda-forge
pkgutil-resolve-name 1.3.10 pypi_0 pypi
protobuf 3.19.1 pypi_0 pypi
psutil 5.9.3 py37h540881e_0 conda-forge
pthread-stubs 0.4 hb9d3cd8_1002 conda-forge
py-opencv 4.5.3 py37h6531663_1 conda-forge
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pygments 2.17.2 pyhd8ed1ab_0 conda-forge
pykalman 0.9.7 pyhd8ed1ab_0 conda-forge
pynwb 2.3.3 pypi_0 pypi
pyparsing 3.0.6 pypi_0 pypi
pyrsistent 0.19.3 pypi_0 pypi
pyside2 5.13.2 py37hfa98aef_7 conda-forge
python 3.7.12 hb7a2778_100_cpython conda-forge
python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge
python-rapidjson 1.9 py37hd23a5d3_0 conda-forge
python_abi 3.7 4_cp37m conda-forge
pytz 2024.2 pyhd8ed1ab_0 conda-forge
pywavelets 1.3.0 py37hda87dfa_1 conda-forge
pyyaml 6.0 py37h540881e_4 conda-forge
pyzmq 24.0.1 py37h0c0c2a8_0 conda-forge
qimage2ndarray 1.10.0 pypi_0 pypi
qt 5.12.9 hda022c4_4 conda-forge
qtpy 2.4.1 pyhd8ed1ab_0 conda-forge
readline 8.2 h8228510_1 conda-forge
requests 2.26.0 pypi_0 pypi
requests-oauthlib 1.3.0 pypi_0 pypi
rich 13.8.1 pyhd8ed1ab_0 conda-forge
ruamel-yaml 0.17.32 pypi_0 pypi
ruamel-yaml-clib 0.2.7 pypi_0 pypi
scikit-image 0.19.2 py37he8f5f7f_0 conda-forge
scikit-learn 1.0 py37hf0f1638_1 conda-forge
scikit-video 1.1.11 pyh24bf2e0_0 conda-forge
scipy 1.7.3 py37hf838250_2 anaconda
seaborn 0.12.2 hd8ed1ab_0 conda-forge
seaborn-base 0.12.2 pyhd8ed1ab_0 conda-forge
segmentation-models 1.0.1 pypi_0 pypi
setuptools 59.8.0 py37h89c1867_1 conda-forge
setuptools-scm 6.3.2 pypi_0 pypi
shapely 1.8.5 py37ha4e3bd1_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sleap 1.3.3 pypi_0 pypi
sqlite 3.46.0 h6d4b2fc_0 conda-forge
statsmodels 0.13.2 py37hda87dfa_0 conda-forge
tensorboard 2.7.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.0 pypi_0 pypi
tensorflow 2.7.0 pypi_0 pypi
tensorflow-estimator 2.7.0 pypi_0 pypi
tensorflow-hub 0.13.0 pyh56297ac_0 conda-forge
tensorflow-io-gcs-filesystem 0.23.1 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge
tifffile 2020.6.3 py_0 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
tomli 2.0.0 pypi_0 pypi
toolz 0.12.1 pyhd8ed1ab_0 conda-forge
typing-extensions 4.0.1 pypi_0 pypi
typing_extensions 4.7.1 pyha770c72_0 conda-forge
tzlocal 5.0.1 pypi_0 pypi
unicodedata2 14.0.0 py37h540881e_1 conda-forge
urllib3 1.26.7 pypi_0 pypi
werkzeug 2.0.2 pypi_0 pypi
wheel 0.42.0 pyhd8ed1ab_0 conda-forge
wrapt 1.13.3 pypi_0 pypi
x264 1!161.3030 h7f98852_1 conda-forge
xorg-kbproto 1.0.7 hb9d3cd8_1003 conda-forge
xorg-libice 1.1.1 hb9d3cd8_1 conda-forge
xorg-libsm 1.2.4 he73a12e_1 conda-forge
xorg-libx11 1.8.4 h0b41bf4_0 conda-forge
xorg-libxau 1.0.11 hb9d3cd8_1 conda-forge
xorg-libxdmcp 1.1.5 hb9d3cd8_0 conda-forge
xorg-libxext 1.3.4 h0b41bf4_2 conda-forge
xorg-libxrender 0.9.10 h7f98852_1003 conda-forge
xorg-renderproto 0.11.1 hb9d3cd8_1003 conda-forge
xorg-xextproto 7.3.0 hb9d3cd8_1004 conda-forge
xorg-xproto 7.0.31 hb9d3cd8_1008 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
zeromq 4.3.5 h59595ed_1 conda-forge
zipp 3.6.0 pypi_0 pypi
zlib 1.2.13 h4ab18f5_6 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge

"

@eberrigan
Copy link
Contributor

Hi @rikebuck,

You should speak with your cluster managers about the best way to use a container on your cluster. Usually there is singularity or docker already installed. You might have a specific type of workload orchestrator.

Can you also provide the training hyperparameters used from the config file?

Is there a chance you are running out of memory while training?

Best,

Elizabeth

@eberrigan eberrigan self-assigned this Oct 22, 2024
@rikebuck
Copy link
Author

rikebuck commented Oct 23, 2024

Okay I can reach out to them.

Attached please find the training jsons I have tried. The error for sleap-train only occurs at the 50th epoch. So if I take the model that was saved leading up to this point and run the 1 epoch json, there are no errors in the sleap-train step, however I get the same error listed when I run this on the in the sleap-track step.

I do not think I am running of memory: ( but please let me know if you think otherwise; or if there are other ways to test)

  • when I run nvidia-smi, it looks like there is ample GPU memory
  • the errors I have gotten when running out of GPU memory (albeit for different software) in the past are something like "CUDA out of memory"
  • sleap-train always runs until the 50th epoch

thank you,

baseline_large_rf.topdown_1_epoch.json
baseline_large_rf.topdown.json
baseline_medium_rf.topdown.json

@eberrigan
Copy link
Contributor

Inference shouldn't work at this point since you are not able to localize instances https://sleap.ai/tutorials/initial-training.html. It sounds like you are making it to the 50th epoch of the centroid model, which means you are not progressing to the centered instance model. Both models are required to run inference with the top-down model.

Could you provide some screenshots of your data and labels? You can also send us your training package at this link to troubleshoot.

I think you might need to decrease the size of your input scaling in your centroid model.

@rikebuck
Copy link
Author

rikebuck commented Oct 24, 2024

Attached please find some example screenshots of my data. I have two different label packages one with 100pts/frame another with 16pts/frame. The error I posted this issue on was for the 100pts/frame version, although I am realizing I never ran training the 16 pt version on several epochs, and I will try this now.

I just uploaded an example labels package to the google form link provided. I uploaded the labels for the 16pt version, because the labels pkg for the 100pt version is >10gb (11.13gb).

Some notes:

  • I should mention I am perhaps using sleap differently than most users. I have several frames (>1e6 frames) that labelled keypts automatically using a different software, https://github.com/yuichiiino1/WormTracer . The software works but was very slow, and I need another approach that can infer keypts more quickly (ie sleap?). I was able to the use the software to generate a lot of training data, however. I am not sure if having this many labelled frames may be affecting sleap-train?
  • I do not care if I have 16 or 100 labelled keypoints, this is residual from seeing in deeplabcut docs that more keypoints is better for inference. I am not sure if this is true for sleap/ if there is a saturation point
  • I only ever need to infer the keypoints of the one animal in the center of the image (although some frames may have more than one animal in it, only the animal closest to the center will be labeled) however I am using the multianimal version, also residual from deeplabcut, and because I do not how to/if I need to switch to the single animal

thank you

Image
Image
Image
Image

@eberrigan
Copy link
Contributor

cool!

I see. You have a lot of nodes in your skeleton. Increasing the complexity of the skeleton make pose estimation more challenging https://sleap.ai/guides/skeletons.html#skeletons. You can experiment with the number of nodes in your skeleton.

You should try using the bottom-up approach which matches nodes using part-affinity fields. We have had good success with this approach in plants https://spj.science.org/doi/10.34133/plantphenomics.0175.

You will want to optimize your hyperparameters. You can try doing a hyperparameter search to find the hyperparameters that give you the highest accuracy.

In general, you want your receptive field size to be about the size of your animal. It looks like your features are pretty small, so you will probably just want to use the most computationally expensive hyperparameters. This means input scaling ~1.0 and increasing the number of filters. Please see the documentation https://sleap.ai/guides/choosing-models.html#choosing-models.

@eberrigan
Copy link
Contributor

You can also take a look at the suggestions here #1977

@rikebuck
Copy link
Author

ok! I can try the bottom up approach. How would I perform a hyperparameter search? Are there specific tools?

Do you have a sense of a good place to start with number of filters (or what is a low vs high number of filters?) Is that part of the hyperparameter search?

How do stride and input scaling related to receptive field size? I read https://distill.pub/2019/computing-receptive-fields/ as linked in the docs, but I'm not sure I understand how to compute the correct stride yet. I can use an input scaling of ~1.
The bounding box of my worms tends to be 55x25 px - 30x30px within a 120x120px image depending on the posture.

I can look into online mining as well , as you mentioned in the linked post.

I also wanted to mention. Historically, the frames that are most difficult to get the midline of the worm are ones where the worm is intersecting itself (please see frames below); this is much less frequent, but occurs regularly. Is there any reason to believe that this would be harder to infer?

thank you!

Image
Image
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants