Skip to main content

Configure LLM Backend

NeoMind's AI Agent and AI Chat rely on an LLM backend to understand natural language and execute instructions. This guide covers configuring local or cloud LLMs via Web UI or CLI.


Backend Overview​

NeoMind supports 10+ LLM backends in two deployment modes:

CategoryBackendDefault ModelNotes
Local (recommended)Ollamaqwen3.5:4bDefault backend, fully offline
Localllama.cppLoaded at startupSelf-hosted llama-server
CloudOpenAIgpt-4o-miniAPI Key required
CloudAnthropicclaude-3-5-sonnetAPI Key required
CloudGooglegemini-1.5-flashAPI Key required
CloudxAIgrok-betaAPI Key required
CloudQwen (Alibaba)qwen-max-latestDashScope Key required
CloudDeepSeekdeepseek-v3API Key required
CloudGLM (Zhipu)glm-4-plusAPI Key required
CloudMiniMaxm2-1-19bAPI Key required
CloudCustomAnyOpenAI-compatible endpoint

Recommended: Ollama + qwen3.5:4b (4B params, balances speed and quality, runs smoothly on 8GB RAM). Add cloud backends when you need more power or multimodal.


Step 1: Install Ollama and Pull a Model (Local Backend)​

Install from ollama.com. After install, Ollama listens on http://localhost:11434 by default.

# Install Ollama (macOS / Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Recommended model (Chinese + tool calling + 128K context)
ollama pull qwen3.5:4b

# For vision capability (image input), also pull a vision model
ollama pull qwen3.5:4b-vl # or llava / minicpm-v etc.

Note: Use qwen3.5:4b. Earlier docs mentioned ministral-3:3b / deepseek-r1:7b β€” these are no longer recommended (unstable tool calling / too large for edge hardware).

Skip this step if using a cloud backend (OpenAI / Anthropic / GLM, etc.).

Step 2: Open LLM Backend Settings​

Navigate to Settings β†’ LLM Backends:

LLM backend list β€” click Add Backend

Click Add Backend to open the configuration form.

Step 3: Fill in Backend Details​

Ollama (Local)​

FieldValue
TypeOllama
Endpointhttp://localhost:11434 (default; use the host IP for remote)
Modelqwen3.5:4b (must match the ollama pull name)
StreamEnabled (recommended for better UX)

Cloud (OpenAI example)​

FieldValue
TypeOpenAI (or Anthropic / Google / Qwen / …)
API KeyYour API Key (e.g. sk-...)
Base URLLeave empty for official; fill in for custom gateway
Modelgpt-4o-mini (or gpt-4o / gpt-4-turbo, etc.)

Chinese providers: Qwen / DeepSeek / GLM / MiniMax all use OpenAI-compatible protocols. NeoMind has built-in default endpoints β€” just fill in the API Key and model name.

Custom (OpenAI-Compatible Endpoint)​

If you use vLLM, Together AI, OpenRouter, or another self-hosted/third-party gateway, select Custom:

  • base_url: Gateway URL (e.g. https://api.openrouter.ai/v1)
  • api_key: Gateway key
  • model: Model name exposed by the gateway
LLM backend configuration form

After saving, NeoMind probes the backend's capabilities (tool calling, multimodal, context window) and writes capability tags automatically.

Step 4: Set Default and Verify​

Click Set Default in the backend list to make it the system default.

Then open AI Chat and send a greeting to verify:

AI Chat verifying LLM connection

If AI Chat doesn't respond, check:

  • Is Ollama running? ollama list should show pulled models
  • Cloud backend: Is the API Key valid? Is the network reachable?
  • More in Troubleshooting

Option 2: CLI Setup​

Prefer the terminal? These commands cover the full workflow from creation to activation.

1. List Existing Backends​

neomind llm list

2. List Available Models (Ollama)​

# List models pulled in Ollama
neomind llm models

# Or specify a remote Ollama
neomind llm models --endpoint http://192.168.1.100:11434

3. Create a Backend​

# Ollama local
neomind llm create --name local --type ollama \
--endpoint http://localhost:11434 --model qwen3.5:4b

# OpenAI cloud
neomind llm create --name openai --type openai \
--endpoint https://api.openai.com/v1 \
--model gpt-4o-mini --api-key sk-xxxx

# GLM cloud (OpenAI-compatible)
neomind llm create --name glm --type openai \
--endpoint https://open.bigmodel.cn/api/paas/v4 \
--model glm-4-flash --api-key xxx.xxx.xxx

# Custom gateway (OpenRouter etc.)
neomind llm create --name router --type custom \
--endpoint https://openrouter.ai/api/v1 \
--model anthropic/claude-3.5-sonnet --api-key sk-or-xxxx

A backend ID is returned on success (e.g. local or a random ID).

4. Test the Connection​

neomind llm test local

Returns model info and response status = connection OK.

5. Activate as Default​

neomind llm activate local

6. Other Common Commands​

# View backend details (with capability tags)
neomind llm get local

# Update model or parameters
neomind llm update local --model qwen3.5:8b --temperature 0.5

# Delete a backend
neomind llm delete local
πŸ“– CLI Command Reference
CommandDescriptionKey Flags
llm listList all backends--json for JSON output
llm get <id>View detailsβ€”
llm modelsList available Ollama models--endpoint <url>
llm createCreate a backend--name --type --endpoint --model --api-key --temperature
llm update <id>Update config--model --endpoint --api-key --temperature
llm test <id>Test connectionβ€”
llm activate <id>Set as defaultβ€”
llm delete <id>Deleteβ€”

Ollama API Endpoint​

NeoMind calls Ollama's native /api/chat endpoint (not /v1/chat/completions). This means:

  • Supports the thinking field (chain-of-thought for reasoning models like qwen3.x / deepseek-r1)
  • Supports native multimodal (image input)
  • Streaming and tool calling use the Ollama native protocol

If you're testing with curl, use the correct endpoint:

curl http://localhost:11434/api/chat -d '{
"model": "qwen3.5:4b",
"messages": [{"role": "user", "content": "Hello"}],
"stream": false
}'

Multimodal (Vision) Capability​

NeoMind supports image input and visual analysis. Vision capability depends on the model:

  • Ollama: After pulling a vision model (e.g. qwen3.5:4b-vl / llava / minicpm-v), you can upload images in AI Chat.
  • Cloud: gpt-4o / gpt-4o-mini / claude-3-5-sonnet / gemini-1.5-flash / qwen-vl / glm-4v natively support vision.

NeoMind auto-detects multimodal capability (via LiteLLM registry + /api/show runtime probe + name heuristic matching). If auto-detection is inaccurate, manually toggle Multimodal in the backend detail page.


Setting the Default Backend​

A NeoMind instance can have multiple LLM backends, but only one is marked as default. The default backend is used for:

  • Initial AI Chat conversations
  • Scheduled Agent executions
  • LLM analysis in the rule engine
Switching the default
  • Web UI: Backend list β†’ click Set Default
  • CLI:
# List all backends and see which is default
neomind llm list

# Set a backend as default
neomind llm activate local

Next Steps​


Last updated: 2026-06-15