Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError due to Input Length Exceeding max_length in Summarization #22

Open
cxx5208 opened this issue Mar 8, 2024 · 0 comments
Open

Comments

@cxx5208
Copy link

cxx5208 commented Mar 8, 2024

Hello,

I've encountered a ValueError while using the project explainer to summarize a GitHub repository. The error arises because the input length (input_ids) is 513, which exceeds the specified max_length of 512 tokens.

Error Message:
ValueError: Input length of input_ids is 513, but max_length is set to 512. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.

Steps to Reproduce:
Run the summarization feature on a GitHub repository with a description that exceeds 512 tokens in length.
The error occurs in the _model_gen function within summarize.py, likely during the tokenization process where the input length exceeds the model's maximum length constraint.

Expected Behavior:
The application should handle repositories with descriptions longer than 512 tokens, either by segmenting the text appropriately or by allowing a larger max_length where feasible.

Suggested Fixes:

  • Increase the max_length parameter in the model's configuration if possible.
  • Implement a mechanism to truncate or segment the input text to adhere to the model's max_length constraints without losing critical information.
  • Adjust the error handling to provide a more descriptive message or to gracefully manage text inputs that exceed the max_length.

Additional Information:
Python Version: 3.12
Please let me know if there is any more information I can provide to help resolve this issue.

Screenshot 2024-03-08 at 12 03 19 PM Screenshot 2024-03-08 at 12 03 29 PM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant