You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run a pretty simple test, with the following args:
args=SentenceTransformerTrainingArguments(
# Required parameter:output_dir=output_dir.as_posix(),
# Optional training parameters:num_train_epochs=num_epochs,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
learning_rate=1e-3,
warmup_ratio=0.1,
dataloader_num_workers=2,
use_mps_device=False,
eval_strategy="steps",
eval_steps=100,
save_strategy="steps",
save_steps=100,
save_total_limit=2,
logging_steps=100,
run_name="mpnet-base-all-nli-triplet", # Will be used in W&B if `wandb` is installed
)
but I got the following error:
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
File ".../lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File ".../lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File ".../projects/Modules/Training/pretrain-text-encoder.py", line 270, in <module>
train_model(model, ds_train, ds_val, output_dir, num_epochs=3, batch_size=16)
File ".../projects/Modules/Training/pretrain-text-encoder.py", line 217, in train_model
trainer.train()
File ".../lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
return inner_training_loop(
File ".../lib/python3.10/site-packages/transformers/trainer.py", line 2236, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File ".../lib/python3.10/site-packages/accelerate/data_loader.py", line 547, in __iter__
dataloader_iter = self.base_dataloader.__iter__()
File ".../lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 484, in __iter__
return self._get_iterator()
File ".../lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 415, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File ".../lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1138, in __init__
w.start()
File ".../lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File ".../lib/python3.10/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File ".../lib/python3.10/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
File ".../lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File ".../lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File ".../lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File ".../lib/python3.10/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File ".../lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 607, in reduce_storage
metadata = storage._share_filename_cpu_()
File ".../lib/python3.10/site-packages/torch/storage.py", line 437, in wrapper
return fn(self, *args, **kwargs)
File ".../lib/python3.10/site-packages/torch/storage.py", line 516, in _share_filename_cpu_
return super()._share_filename_cpu_(*args, **kwargs)
RuntimeError: _share_filename_: only available on CPU
Of course, if I switch to num_workers=0, everything works..
use_mps_device True or False makes no difference (I am doing some tests locally).
If I try with the old .fit I have:
Traceback (most recent call last):
File ".../autotagging/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File ".../autotagging/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File ".../Modules/Training/pretrain-text-encoder.py", line 227, in <module>
train_model(model, ds_train, ds_val, output_dir, num_epochs=3, batch_size=16)
File ".../Modules/Training/pretrain-text-encoder.py", line 175, in train_model
model.fit(
File ".../autotagging/lib/python3.10/site-packages/sentence_transformers/fit_mixin.py", line 260, in fit
for batch in data_loader:
File ".../autotagging/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 479, in __iter__
self._iterator = self._get_iterator()
File ".../autotagging/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 415, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File ".../autotagging/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1138, in __init__
w.start()
File ".../autotagging/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File ".../autotagging/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File ".../autotagging/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
File ".../autotagging/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File ".../autotagging/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File ".../autotagging/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File ".../autotagging/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'FitMixin.fit.<locals>.identity'
I have torch == 2.5.0 and sentence_transformers == 3.2.1.
The text was updated successfully, but these errors were encountered:
Hmm, that looks like a tricky bug. Could you perhaps use the new Trainer if you use:
classCustomTrainer(SentenceTransformerTrainer):
defget_train_dataloader(self):
dataloader=super().get_train_dataloader()
dataloader.multiprocessing_context="fork"iftorch.backends.mps.is_available() elseNonereturndataloader# also for eval/test dataloaders
Yeah at the end I fixed it like that, would be worth to fix it mainstream or even updating the docs?
Nowadays lots of folks in companies use macbooks to do dry runs locally before going cuda mode :)
Hi,
I am trying to run a pretty simple test, with the following args:
but I got the following error:
Of course, if I switch to num_workers=0, everything works..
use_mps_device True or False makes no difference (I am doing some tests locally).
If I try with the old .fit I have:
I have torch == 2.5.0 and sentence_transformers == 3.2.1.
The text was updated successfully, but these errors were encountered: