Finetune on multi-image VQA dataset #531

rsr-droid · 2024-10-06T14:15:26Z

Hi, is it possible to finetune this model on a multi-image VQA dataset, which has a format similar but not limited to the following where we can pass a list of images with a single prompt:
[
{
"system_prompt": "You are a helpful AI assistant",
"image": [
"/path/to/image_1,
"/path/to/image_2"
],
"conversations": [
{
"from": "human",
"value": "Provide a description of the images in attached."
},
{
"from": "gpt",
"value": "The first image shows a Labradoodle dog running energetically through a grassy field while holding a stick in its mouth. The dog has curly, light-colored fur and floppy ears flying in the air as it sprints with a joyful expression.

The second image features a close-up of an orange-colored cat with its mouth wide open in a large yawn, showing its sharp teeth and tongue. The background is black, making the cat's fur and facial features stand out vividly."

}
]

},

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetune on multi-image VQA dataset #531

Finetune on multi-image VQA dataset #531

rsr-droid commented Oct 6, 2024

Finetune on multi-image VQA dataset #531

Finetune on multi-image VQA dataset #531

Comments

rsr-droid commented Oct 6, 2024