experimental AI setup

If you can privately afford some more hardware, consider looking into NVIDIA DGX Spark. Hier also a German article Heise-News about Nvidia DGX Spark. Or as an alternative, check out AMD Strix Halo hardware.

Other links:

For a shell startup script that downloads/updates software and models, please look at https://github.com/laroche/laroche.github.io/blob/master/startup.sh.

llama.cpp

Install llama.cpp, see also https://llama-cpp.com/. (The above startup.sh script also installs llama.cpp.)

For CPU-only setups, please also check: https://github.com/ikawrakow/ik_llama.cpp.

vllm

An alternative to llama.cpp is vllm. It is often used in server setups, but supports fewer llm models and often also lacks newer features.

huggingface

llama.cpp can download llm models automatically on startup, but you might also want to download models separately from https://huggingface.co/.

All downloads are stored by default in ~/.cache/huggingface/hub.

The huggingface software support can be installed with ./startup.sh –install-hf or via the following few lines:

sudo apt-get update
sudo apt-get install -y python3-venv
python3 -m venv venv
. venv/bin/activate
pip3 install huggingface_hub hf_transfer

hf cache list
hf models list
MODEL="unsloth/Qwen3.6-27B-MTP-GGUF"
#MODEL="unsloth/Qwen3.6-27B-GGUF"
#MODEL="unsloth/Qwen3.6-35B-A3B-GGUF"
hf models info $MODEL
hf download $MODEL --include "*mmproj-BF16*" --include "*UD-Q6_K_XL*"

large language model (llm)

Depending on hardware and on task, you might choose between different llm models. qwen3.6 is pretty new and has good quality.

qwen3.6 from Alibaba:

If you want to speed things up, consider changing from Q6 to Q4 and also downgrading from Qwen3.6-27B-MTP-GGUF to Qwen3.6-35B-A3B-GGUF.

GLM:

hermes agent

For local llama.cpp configuration, use http://127.0.0.1:8080/v1.

Some commands:

hermes update    # to update the software stack

# configuration/setup:
hermes setup
hermes model     # just the setup for llm models

hermes status
hermes doctor
hermes doctor --fix

opencode

See https://opencode.ai/.

npm install -g @opencode/cli
opencode config set model http://localhost:8080/v1
opencode config set api-key "not-needed"
opencode

openclaw

Not running this myself, but you might want to check out: https://openclaw.ai/

sashiko

Sashiko is an agentic Linux kernel code review system.

Impressum