Add logic to calculate how much space to allocate for completion requests #205

adrastogi · 2024-06-06T17:23:25Z

Summary of the pull request

Our implementation doesn't try to 'right-size' the number of tokens that completion requests should use, which sometimes results in failures due to the total request being too large. This PR adds logic to calculate how many tokens to specify, which should hopefully mitigate this problem.

References and relevant issues

Closes #194

Detailed description of the pull request / Additional comments

The model we are using (gpt-35-turbo-instruct) has a fixed context window (4096 tokens), which is shared across the input prompt and the response produced by the model. https://platform.openai.com/docs/models/gpt-3-5-turbo

Callers of the API can specify how many tokens the model can allocate to the response via the max tokens parameter. https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens

We observed that with more complex or larger projects, responses weren't being produced due to our completion calls specifying a fixed max token amount (2000 tokens), so in cases where the prompt for a particular completion can be on the larger size, the request would be rejected since the model would observe that the total number of tokens exceeds its limit.

OpenAI has a Python library called TikToken for calculating the number of tokens that a particular input string consumes when processed by a particular model family, and Microsoft has a managed implementation here: https://github.com/microsoft/Tokenizer

This PR takes advantage of this functionality to calculate how many tokens to allocate for the completion requests.

From analyzing various observed failures, the input prompts are generally just a bit larger than half of the max token limit (so, just enough that the previous 2000 token limit would cause the overflow). It is possible that for an exorbitant input, we will not leave enough space for the model to complete a response. I updated that case in the code to generate an exception so that we can see whether this is a common occurrence, and we can tune the behavior from there. (In the long term, we may want to eventually move to a model with a larger context window.)

Validation steps performed

I used several test prompts that were previously failing (e.g., generating an Orleans project, generating a tic-tac-toe GUI app), and those no longer generate any errors. I also did some basic scenario tests to ensure that I didn't regress anything.

PR checklist

Closes #xxx
Tests added/passed
Documentation updated

adrastogi · 2024-06-06T17:27:07Z

@EricJohnson327 / @krschau, FYI for you as this PR adds a new package reference that I believe will need to be added to the feed (Microsoft.ML.Tokenizers). Thank you!

src/AzureExtension/QuickStartPlayground/AzureOpenAIService.cs

…te variables)

src/AzureExtension/AzureExtension.csproj

src/AzureExtension/QuickStartPlayground/AzureOpenAIService.cs

src/AzureExtension/AzureExtension.csproj

Aditya Rastogi added 2 commits June 6, 2024 09:40

Initial draft of changes

8313f68

Adding some comments

939e2b6

adrastogi requested review from manodasanW, craigloewen-msft, krschau, EricJohnson327 and jasmilimsft June 6, 2024 17:23

krschau approved these changes Jun 6, 2024

View reviewed changes

src/AzureExtension/QuickStartPlayground/AzureOpenAIService.cs Outdated Show resolved Hide resolved

PR feedback (adding underscores as prefixes to newly-introduced priva…

f58f303

…te variables)

adrastogi commented Jun 6, 2024

View reviewed changes

src/AzureExtension/AzureExtension.csproj Show resolved Hide resolved

dhoehna reviewed Jun 10, 2024

View reviewed changes

src/AzureExtension/QuickStartPlayground/AzureOpenAIService.cs Show resolved Hide resolved

dhoehna reviewed Jun 10, 2024

View reviewed changes

src/AzureExtension/QuickStartPlayground/AzureOpenAIService.cs Show resolved Hide resolved

Aditya Rastogi and others added 2 commits June 10, 2024 13:36

PR feedback (add explicit version of System.Text.Json)

afaaa32

Merge branch 'main' into user/adrastogi/prompt-window-fix

8491024

manodasanW reviewed Jun 10, 2024

View reviewed changes

src/AzureExtension/AzureExtension.csproj Outdated Show resolved Hide resolved

manodasanW approved these changes Jun 10, 2024

View reviewed changes

PR feedback (whitespace fix)

2801a1e

adrastogi merged commit efad074 into main Jun 10, 2024
3 checks passed

adrastogi deleted the user/adrastogi/prompt-window-fix branch June 10, 2024 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add logic to calculate how much space to allocate for completion requests #205

Add logic to calculate how much space to allocate for completion requests #205

adrastogi commented Jun 6, 2024 •

edited

Loading

adrastogi commented Jun 6, 2024

Add logic to calculate how much space to allocate for completion requests #205

Add logic to calculate how much space to allocate for completion requests #205

Conversation

adrastogi commented Jun 6, 2024 • edited Loading

Summary of the pull request

References and relevant issues

Detailed description of the pull request / Additional comments

Validation steps performed

PR checklist

adrastogi commented Jun 6, 2024

adrastogi commented Jun 6, 2024 •

edited

Loading