[Draft] Visual Language Model Sample + Replay #884

bwsw · 2024-11-18T11:25:00Z

We deploy a language model and formulate questions. Depending on answers, we run certain classic models for a duration until the next answer from VLM.

We run VLM once in 1 second with questions:

cars in the viewport?
people in the viewport?

We use Replay to store results and initialize replay every time the mode says positively; when replay, while the model continues to answer positively, we prolong replay processing; when it switches to negative, we stop it.

The replayed streams are sent to the secondary pipeline where YOLOv8-KeyPoint and YOLOv8 are deployed; we use only ROIs corresponding to VLM decisions to launch one or another (or both) models.

Can use the VLM service to run the model: https://docs.nvidia.com/jetson/jps/inference-services/vlm.html

bwsw added the enhancement New feature or request label Nov 18, 2024

bwsw changed the title ~~Visual Language Model Sample~~ Visual Language Model Sample + Replay Nov 18, 2024

bwsw changed the title ~~Visual Language Model Sample + Replay~~ [Draft] Visual Language Model Sample + Replay Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Visual Language Model Sample + Replay #884

[Draft] Visual Language Model Sample + Replay #884

bwsw commented Nov 18, 2024 •

edited

Loading

[Draft] Visual Language Model Sample + Replay #884

[Draft] Visual Language Model Sample + Replay #884

Comments

bwsw commented Nov 18, 2024 • edited Loading

bwsw commented Nov 18, 2024 •

edited

Loading