How to Integrate vLLM Models in Openclaw

Image
Table of contents: [Show]

Running local LLMs with vLLM gives you full control over your AI inference. This guide shows you how to connect your vLLM server to Openclaw for seamless agent interactions.

Quick Start

Before integrating with Openclaw, ensure your vLLM server is running and exposing the /v1 endpoints. By default, vLLM runs on http://127.0.0.1:8000/v1.

Start Your vLLM Server

Launch your vLLM server with your chosen model:

python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-chat-hf

Configure Openclaw

You have two options for connecting Openclaw to your vLLM instance.

Option A: Auto-Discovery (Simplest)

If your vLLM server follows standard OpenAI-compatible endpoints, Openclaw can auto-discover available models. Just set your primary model:

{ "agents": { "defaults": { "model": { "primary": "vllm/your-model-id" } } } }

Option B: Manual Configuration (Full Control)

For explicit control over model definitions, add the vLLM provider block to your ~/.Openclaw/Openclaw.json:

{ "models": { "providers": { "vllm": { "baseUrl": "http://127.0.0.1:8000/v1", "apiKey": "${VLLM_API_KEY}", "api": "openai-completions", "models": [ { "id": "your-model-id", "name": "Local vLLM Model", "reasoning": false, "input": ["text"], "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }, "contextWindow": 128000, "maxTokens": 8192 } ] } } }, "agents": { "defaults": { "model": { "primary": "vllm/your-model-id" } } } }

vLLM Integration

Testing Your Setup

Once configured, test the connection by sending a message to your Openclaw agent. The agent will route requests to your local vLLM instance.

Example prompt and expected response flow:

User: What is the capital of France?

Agent: Uses the vLLM provider to generate a response through your local model.

Best Practices & Troubleshooting

  • Remote vLLM: If vLLM runs on a different machine, update baseUrl to the correct IP/hostname.
  • GPU Memory: Monitor VRAM usage. Large context windows require significant GPU memory.
  • Authentication: If your vLLM server requires an API key, set it via the VLLM_API_KEY environment variable.
  • Model Compatibility: Ensure your vLLM version supports the model you are running.

Looking for other model integrations? Check out our guides on GLM Models Openclaw or deployment options like Openclaw and Flyio.