experimental AI setup

If you can privately afford some more hardware, consider looking into NVIDIA DGX Spark. Here also a German article Heise-News about Nvidia DGX Spark. Or as an alternative, check out AMD Strix Halo hardware.

provide OpenAI-compatibel API for local llm

With local LLM you want to provide OpenAI-compatibel API. This can be done via e.g. llama.cpp, vllm, etc.

llama.cpp

Install llama.cpp, see also https://llama-cpp.com/. (The above startup.sh script also installs llama.cpp.)

For CPU-only setups, please also check: https://github.com/ikawrakow/ik_llama.cpp.

vllm

An alternative to llama.cpp is vllm. It is often used in server setups, but supports fewer llm models and often also lacks newer features.

huggingface

llama.cpp can download llm models automatically on startup, but you might also want to download models separately from https://huggingface.co/.

All downloads are stored by default in ~/.cache/huggingface/hub.

The huggingface software support can be installed with ./startup.sh –install-hf or via the following few lines:

sudo apt-get update
sudo apt-get install -y python3-venv
python3 -m venv venv
. venv/bin/activate
pip3 install huggingface_hub hf_transfer

hf cache list
hf models list
MODEL="unsloth/Qwen3.6-27B-MTP-GGUF"
#MODEL="unsloth/Qwen3.6-27B-GGUF"
#MODEL="unsloth/Qwen3.6-35B-A3B-GGUF"
hf models info $MODEL
hf download $MODEL --include "*mmproj-BF16*" --include "*UD-Q6_K_XL*"

large language model (llm)

Depending on hardware and on task, you might choose between different llm models. qwen3.6 is pretty new and has good quality.

North Mini Code from Canadian company cohere:

qwen3.6 from Alibaba:

If you want to speed things up, consider changing from Q6 to Q4 and also downgrading from Qwen3.6-27B-MTP-GGUF to Qwen3.6-35B-A3B-GGUF.

GLM (Open Source):

Gemma from Google:

Gemma 4 26B

Nemotron:

Nemotron 3 Nano 30B

hermes agent

For local llama.cpp configuration, use http://127.0.0.1:8080/v1.

Some commands:

hermes update    # to update the software stack

# configuration/setup:
hermes setup
hermes model     # just the setup for llm models

hermes status
hermes doctor
hermes doctor --fix

opencode

See https://opencode.ai/.

curl -fsSL https://opencode.ai/install | bash
opencode config set model http://localhost:8080/v1
opencode config set api-key "not-needed"
opencode

You can add your local LLM in ~/.config/opencode/opencode.jsonc with:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "dmr": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Local Model Runner",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1",
        "apiKey": "no-api-key"
      },
      "models": {
        "unsloth/GLM-4.7-Flash": {
          "name": "GLM-4.7-Flash"
        }
      }
    }
  }
}

openclaw

Not running this myself, but you might want to check out: https://openclaw.ai/

sashiko

Sashiko is an agentic Linux kernel code review system.

Impressum