Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlagents_envs.exception.UnityEnvironmentException: Environment shut down with return code -6 (SIGABRT). #3

Open
Pimool opened this issue Aug 7, 2023 · 8 comments

Comments

@Pimool
Copy link

Pimool commented Aug 7, 2023

Hi, I am trying to run Reinforcement Learning on a GPU runbox.

With your code, I could train the model on Colab, and Saturn Cloud which is similar to colab.

However, when I tried to run on my personal GPU runbox, it occured an error.

mlagents-learn -h showed the options, so I thought it is a problem with environment.

How can I handle this error?

~$ mlagents-learn config.yaml --run-id=test --env=ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball.x86_64

Version information:
ml-agents: 0.31.0.dev0,
ml-agents-envs: 0.31.0.dev0,
Communicator API: 1.5.0,
PyTorch: 1.11.0+cu102
[INFO] Learning was interrupted. Please wait while the graph is generated.
Traceback (most recent call last):
File "/home/desktop/venv/bin/mlagents-learn", line 33, in
sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 264, in main
run_cli(parse_command_line())
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 260, in run_cli
run_training(run_seed, options, num_areas)
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 136, in run_training
tc.start_learning(env_manager)
File "/home/desktop/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 197, in start_learning
raise ex
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 172, in start_learning
self._reset_env(env_manager)
File "/home/desktop/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 105, in _reset_env
env_manager.reset(config=new_config)
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/env_manager.py", line 68, in reset
self.first_step_infos = self._reset_env(config)
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 446, in _reset_env
ew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {})
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 101, in recv
raise env_exception
mlagents_envs.exception.UnityEnvironmentException: Environment shut down with return code -6 (SIGABRT).

@dhyeythumar
Copy link
Owner

Hi @Pimool,
Check the ml-agents version (the environment given in this repo was built for release_1).
Also, it seems your training exited with a critical error suggested by the SIGABRT error code.

@Pimool
Copy link
Author

Pimool commented Aug 8, 2023

Hi, @dhyeythumar I used ml-agents release 20(the recent one).
However, it worked in colab and Saturn cloud with your environment on release 20..
So, I don't think it's a problem with release..
I don't know why the SIGABRT error occurs only on my personal GPU server.

@dhyeythumar
Copy link
Owner

Then most probably the environment is exiting with an error, it's possible that the Linux executable is not supported on GPU.
If I remember correctly on colab this env works on the CPU instance itself haven't tried it on GPU (try this and see if the GPU instance on colab works or not).

@Pimool
Copy link
Author

Pimool commented Aug 8, 2023

It works in colab on T4 GPU. Also, Saturn Cloud was on GPU, too.
The below is the colab notebook.
https://colab.research.google.com/drive/1sFY_V-uirL9pCPBlHkme8zBMfp3e1cJQ?usp=sharing

@Pimool
Copy link
Author

Pimool commented Aug 17, 2023

Below is the Player-0.log file when I try start training with the code above. I have no idea about the errors, and why the handler cannot load such files. Any help is greatly appreciated

'''
Mono path[0] = '/home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Managed'
Mono config path = '/home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/MonoBleedingEdge/etc'
Preloaded 'lib_burst_generated.so'
Preloaded 'libgrpc_csharp_ext.x64.so'
Initialize engine version: 2019.3.15f1 (59ff3e03856d)
[Subsystems] Discovering subsystems at path /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/UnitySubsystems
Forcing GfxDevice: Null
GfxDevice: creating device client; threaded=0
NullGfxDevice:
Version: NULL 1.0 [1.0]
Renderer: Null Device
Vendor: Unity Technologies
Begin MonoManager ReloadAssembly
Completed reload, in 0.142 seconds
WARNING: Shader Unsupported: 'Autodesk Interactive' - All passes removed
WARNING: Shader Did you use #pragma only_renderers and omit this platform?
UnloadTime: 1.141076 ms
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib.so
Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib
Caught fatal signal - signo:11 code:1 errno:0 addr:0x561114101530
Obtained 4 stack frames.
0 0x007f8bd7a1a520 in __sigaction
1 0x007f8bd66696b5 in grpc_completion_queue_create_internal(grpc_cq_completion_type, grpc_cq_polling_type)
2 0x007f8bd666abf0 in grpc_completion_queue_create_for_next
3 0x000000405aa870 in (wrapper managed-to-native) object:wrapper_native_0x7f8bd6657df0 ()
'''

@dhyeythumar
Copy link
Owner

Hi @Pimool ,
try this command !mlagents-learn config.yaml --run-id=$run_id --env=$env_name --no-graphics

I guess on your server it's trying to render the environment.

@Pimool
Copy link
Author

Pimool commented Aug 18, 2023

Hi, @dhyeythumar

Thanks for your advice, but It raises same error.

@OmarVector
Copy link

Yeah, we have the same problem and we couldnt find any solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants