Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data prefetching does not occur for iterable datasets #34867

Open
Lucaweihs opened this issue Nov 21, 2024 · 1 comment
Open

Data prefetching does not occur for iterable datasets #34867

Lucaweihs opened this issue Nov 21, 2024 · 1 comment
Labels

Comments

@Lucaweihs
Copy link

Lucaweihs commented Nov 21, 2024

System Info

  • transformers version: 4.46.1
  • Platform: macOS-15.1-arm64-arm-64bit
  • Python version: 3.11.10
  • Huggingface_hub version: 0.26.2
  • Safetensors version: 0.4.5
  • Accelerate version: 1.0.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No

Who can help?

@muellerzr @SunMarc

Reproduction

PR #28498 was meant to allow for specifying the Pytorch dataloader's prefetch_factor argument via the huggingface dataloader_prefetch_factor training argument. As we can see on this line, this feature was added inside of a

if not isinstance(train_dataset, torch.utils.data.IterableDataset):

statement which results in prefetching never occurring for IterableDatasets (which seems like a mistake). There are a two other lines where the same error happens for test and eval dataloaders. Unless I'm missing something, I believe these lines can be moved out of the if condition and into other logic.

Expected behavior

Prefetching should work with iterable datasets.

@Lucaweihs Lucaweihs added the bug label Nov 21, 2024
@SunMarc
Copy link
Member

SunMarc commented Nov 22, 2024

Thanks for the nice write-up. Would you be open to create a PR to fix this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants