Docker Image ―
OpenAI API-Compatible Pre-loaded LLM Server

Docker images are based on Nvidia CUDA images. LLMs are pre-loaded and served via vLLM.

Environment Variables

The OpenAI API is exposed on port 8000.

Note

The VRAM column is the minimum required amount of VRAM used by the model on a single GPU.

Tag	Model	RunPod	Vast.ai	VRAM
`ivangabriele/llm:lmsys__vicuna-13b-v1.5-16k`				26GB
`ivangabriele/llm:open-orca__llongorca-13b-16k`				26GB