CUDA out of memory : I am using Colab T4 GPU #599

anshumansinha16 · 2023-11-05T04:34:15Z

!python generate.py

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in huggingface/transformers#24565
Loading checkpoint shards: 100% 33/33 [01:13<00:00, 2.22s/it]

Instruction: Tell me about alpacas.

/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:381: UserWarning: do_sample is set to False. However, temperature is set to 0.1 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:386: UserWarning: do_sample is set to False. However, top_p is set to 0.75 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:396: UserWarning: do_sample is set to False. However, top_k is set to 40 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:381: UserWarning: do_sample is set to False. However, temperature is set to 0.1 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:386: UserWarning: do_sample is set to False. However, top_p is set to 0.75 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:396: UserWarning: do_sample is set to False. However, top_k is set to 40 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k.
warnings.warn(
Traceback (most recent call last):
File "/content/generate.py", line 150, in
fire.Fire(main)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/content/generate.py", line 144, in main
print("Response:", evaluate(instruction))
File "/content/generate.py", line 119, in evaluate
generation_output = model.generate(
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1034, in generate
outputs = self.base_model.generate(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1752, in generate
return self.beam_search(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3091, in beam_search
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 922, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 672, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 382, in forward
key_states = torch.cat([past_key_value[0], key_states], dim=2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 6.81 MiB is free. Process 1298046 has 14.74 GiB memory in use. Of the allocated memory 13.56 GiB is allocated by PyTorch, and 309.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[ ]

The text was updated successfully, but these errors were encountered:

bingxinjiang03 · 2023-11-05T11:22:20Z

I came across this error too,do you know how to solve it ^o^

anshumansinha16 · 2023-11-05T17:37:38Z

No, I have not yet solved this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory : I am using Colab T4 GPU #599

CUDA out of memory : I am using Colab T4 GPU #599

anshumansinha16 commented Nov 5, 2023

bingxinjiang03 commented Nov 5, 2023

anshumansinha16 commented Nov 5, 2023

CUDA out of memory : I am using Colab T4 GPU #599

CUDA out of memory : I am using Colab T4 GPU #599

Comments

anshumansinha16 commented Nov 5, 2023

bingxinjiang03 commented Nov 5, 2023

anshumansinha16 commented Nov 5, 2023