ValueError due to Input Length Exceeding max_length in Summarization #22

cxx5208 · 2024-03-08T20:06:51Z

Hello,

I've encountered a ValueError while using the project explainer to summarize a GitHub repository. The error arises because the input length (input_ids) is 513, which exceeds the specified max_length of 512 tokens.

Error Message:
ValueError: Input length of input_ids is 513, but max_length is set to 512. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.

Steps to Reproduce:
Run the summarization feature on a GitHub repository with a description that exceeds 512 tokens in length.
The error occurs in the _model_gen function within summarize.py, likely during the tokenization process where the input length exceeds the model's maximum length constraint.

Expected Behavior:
The application should handle repositories with descriptions longer than 512 tokens, either by segmenting the text appropriately or by allowing a larger max_length where feasible.

Suggested Fixes:

Increase the max_length parameter in the model's configuration if possible.
Implement a mechanism to truncate or segment the input text to adhere to the model's max_length constraints without losing critical information.
Adjust the error handling to provide a more descriptive message or to gracefully manage text inputs that exceed the max_length.

Additional Information:
Python Version: 3.12
Please let me know if there is any more information I can provide to help resolve this issue.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError due to Input Length Exceeding max_length in Summarization #22

ValueError due to Input Length Exceeding max_length in Summarization #22

cxx5208 commented Mar 8, 2024

ValueError due to Input Length Exceeding max_length in Summarization #22

ValueError due to Input Length Exceeding max_length in Summarization #22

Comments

cxx5208 commented Mar 8, 2024