Add OpenAI as a Provider for Descriptive Text Generation #828
+431
−44
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the Change
In #785, we updated to GPT-4o mini in our OpenAI ChatGPT Provider. This model is multi-modal, which means you can do things with images, video, or audio, not just text.
So far we haven't take advantage of that but this PR brings OpenAI as a Provider for the Descriptive Text Generator Feature. Currently this Feature only runs on the Azure AI Vision Provider, so this brings a second option for that Feature.
Making requests to this model is the same as all of our text generation requests, other than we send the image URL in that request. We have a default prompt that is used and that can be modified from the settings screen, as needed. I tried to keep this prompt fairly generic but open to suggestions on improvements there. It gives decent results right now in the images I tested though does tend to be more verbose than what I'd want in just alt text, though noting the text here can be used as a caption or description, so hard to balance all three:
OpenAI requires images to be at least 512x512, so we return an error message if any image below that threshold is used. It also supports passing in the full image URL or a base64 encoded version of the image. For now I've used the image URL but we could look to go the encoded route, which would make things work in environments where images are publicly accessible (like locally). The downside here is it's slower and more expensive, as it uses more tokens.
Closes #826
How to test the Change
Tools > ClassifAI > Image Processing > Descriptive Text Generator
Descriptive text fields
is turned onChangelog Entry
Credits
Props @dkotter, @jeffpaul
Checklist: