Welcome to SurvBot, a modern GenAI implementation of Image Captioning using Moondream VLM. The modern surveillance systems are becoming one of the most annoying sources of data that we fight on a daily basis with. With users putting dozens and dozens of CCTV Cams over their campuses, any scope of objective video tagging is a nightmare.
We use Moondream, an easy to use modular and mind you, very very very tiny VLM, to image caption footages as they are fed into the Streamlit based web application. We then use a combination of Pandas and Streamlit Native CSV functions to let the end user persue through the Video Tagged CSV find points of interests.
git clone https://github.com/yourusername/video-frame-inference.git
pip install -r requirements.txt
streamlit run app.py
- Navigate to the app interface and upload a video file using the file uploader provided.
- The Application returns key frames from the video.
- You can then view, search, and download the CSV of the entire surveillance.
- Upload video files in formats such as MP4, MOV, or AVI.
- Key Frames
- Extract frames from the video at specified intervals.
- Display extracted frames with timestamps in the Streamlit app.
- Add frame inference using a pre-trained model.
- Implement frame filtering based on specific criteria.
- Provide options to download extracted frames or data.
This repository is under the MIT License. Read more here.