Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: [Errno 24] Too many open files during training #51

Open
HsuWanTing opened this issue Feb 3, 2021 · 4 comments
Open

OSError: [Errno 24] Too many open files during training #51

HsuWanTing opened this issue Feb 3, 2021 · 4 comments

Comments

@HsuWanTing
Copy link

@MhLiao I tried to pretrain a model on SynthText dataset.
I followed your training script:

python -m torch.distributed.launch --nproc_per_node=4  tools/train_net.py --config-file configs/pretrain/seg_rec_poly_fuse_feature.yaml

But I always got the OSError after training a few iterations.

OSError: [Errno 24] Too many open files

How to fix this bug? Is there a way to reduce the number of opened files?

@samanthawyf
Copy link

@HsuWanTing @MhLiao hello, have you solved the problem and how? I met the same error. Thanks.

@gw00295652
Copy link

@HsuWanTing 你这个问题可能是分布式训练(多进程)导致的问题。

@HsuWanTing
Copy link
Author

@samanthawyf I just used the command line ulimit -n 65535 and the error was gone.

@lanfeng4659
Copy link

In my situation, I solve this error by inserting the following two lines code in the train_net.py file.

import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants