You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in huggingface/transformers#24565
Loading checkpoint shards: 100% 33/33 [01:13<00:00, 2.22s/it]
Instruction: Tell me about alpacas.
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:381: UserWarning: do_sample is set to False. However, temperature is set to 0.1 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:386: UserWarning: do_sample is set to False. However, top_p is set to 0.75 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:396: UserWarning: do_sample is set to False. However, top_k is set to 40 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:381: UserWarning: do_sample is set to False. However, temperature is set to 0.1 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:386: UserWarning: do_sample is set to False. However, top_p is set to 0.75 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:396: UserWarning: do_sample is set to False. However, top_k is set to 40 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k.
warnings.warn(
Traceback (most recent call last):
File "/content/generate.py", line 150, in
fire.Fire(main)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/content/generate.py", line 144, in main
print("Response:", evaluate(instruction))
File "/content/generate.py", line 119, in evaluate
generation_output = model.generate(
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1034, in generate
outputs = self.base_model.generate(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1752, in generate
return self.beam_search(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3091, in beam_search
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 922, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 672, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 382, in forward
key_states = torch.cat([past_key_value[0], key_states], dim=2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 6.81 MiB is free. Process 1298046 has 14.74 GiB memory in use. Of the allocated memory 13.56 GiB is allocated by PyTorch, and 309.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[ ]
The text was updated successfully, but these errors were encountered:
!python generate.py
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the
legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False
. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in huggingface/transformers#24565Loading checkpoint shards: 100% 33/33 [01:13<00:00, 2.22s/it]
Instruction: Tell me about alpacas.
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:381: UserWarning:
do_sample
is set toFalse
. However,temperature
is set to0.1
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettemperature
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:386: UserWarning:
do_sample
is set toFalse
. However,top_p
is set to0.75
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_p
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:396: UserWarning:
do_sample
is set toFalse
. However,top_k
is set to40
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_k
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:381: UserWarning:
do_sample
is set toFalse
. However,temperature
is set to0.1
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettemperature
.warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:386: UserWarning:
do_sample
is set toFalse
. However,top_p
is set to0.75
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_p
.warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:396: UserWarning:
do_sample
is set toFalse
. However,top_k
is set to40
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_k
.warnings.warn(
Traceback (most recent call last):
File "/content/generate.py", line 150, in
fire.Fire(main)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/content/generate.py", line 144, in main
print("Response:", evaluate(instruction))
File "/content/generate.py", line 119, in evaluate
generation_output = model.generate(
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1034, in generate
outputs = self.base_model.generate(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1752, in generate
return self.beam_search(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3091, in beam_search
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1034, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 922, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 672, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 382, in forward
key_states = torch.cat([past_key_value[0], key_states], dim=2)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 6.81 MiB is free. Process 1298046 has 14.74 GiB memory in use. Of the allocated memory 13.56 GiB is allocated by PyTorch, and 309.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[ ]
The text was updated successfully, but these errors were encountered: