OSError: [Errno 24] Too many open files during training #51

HsuWanTing · 2021-02-03T10:30:26Z

@MhLiao I tried to pretrain a model on SynthText dataset.
I followed your training script:

python -m torch.distributed.launch --nproc_per_node=4  tools/train_net.py --config-file configs/pretrain/seg_rec_poly_fuse_feature.yaml

But I always got the OSError after training a few iterations.

OSError: [Errno 24] Too many open files

How to fix this bug? Is there a way to reduce the number of opened files?

The text was updated successfully, but these errors were encountered:

samanthawyf · 2021-02-24T11:12:44Z

@HsuWanTing @MhLiao hello, have you solved the problem and how? I met the same error. Thanks.

gw00295652 · 2021-02-26T09:58:51Z

@HsuWanTing 你这个问题可能是分布式训练（多进程）导致的问题。

HsuWanTing · 2021-03-02T02:41:08Z

@samanthawyf I just used the command line ulimit -n 65535 and the error was gone.

lanfeng4659 · 2021-07-03T03:08:00Z

In my situation, I solve this error by inserting the following two lines code in the train_net.py file.

import torch.multiprocessing
torch.multiprocessing.set_sharing_strategy('file_system')

Provide feedback