litellm: drop claude-*/gpt-* shadow aliases

Honest model names only — local picks up real Ollama names (qwen2.5-72b,
llama-3.3-70b, llama-3.1-8b, nomic-embed-text), Claude via *-max only.

The shadows were briefly useful (paperless-ai wizard probe quirk) and
then briefly used to make the ALL-LOCAL cutover transparent to clients,
but having "claude-sonnet-4-6" silently route to llama3.3:70b in the
Open WebUI picker was a constant foot-gun. Pulse re-pointed to a clean
alias in its UI prior to this push; paperless-ai was already on
qwen2.5-72b. Trade-off captured in [[litellm-openai-alias-shadowing]].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Your Name
2026-05-29 20:07:49 -04:00
parent e866d0c89f
commit b64a95a71b
+9 -32
View File
@@ -57,42 +57,19 @@ litellm_package_spec: "litellm[proxy]==1.55.10"
# the inference path off the resolver. Ollama has no auth (placeholder api_key). # the inference path off the resolver. Ollama has no auth (placeholder api_key).
anvil_ollama_base: "http://192.168.1.150:11434" anvil_ollama_base: "http://192.168.1.150:11434"
# ALL homelab LLM load routes LOCAL as of 2026-05-28. Every standard alias that # Model list — honest names only. Shadow aliases (claude-*/gpt-*) were removed
# clients already use (claude-*, gpt-*) now resolves to Anvil/Ollama — no client # 2026-05-29 because the dual meaning (was-Claude, now-local) was a constant
# reconfig needed. Meridian still runs, but Claude/Max is reachable ONLY via the # foot-gun in the Open WebUI picker. Local models keep their real names; Claude
# explicit *-max escape-hatch aliases below (use them for vision or hard # is reached only via the explicit *-max aliases.
# reasoning — llama3.x is text-only and weaker on complex tasks). #
# Trade-off: any client that hard-codes a claude-*/gpt-* model name in a probe
# (paperless-ai wizard hits gpt-4o-mini; see [[litellm-openai-alias-shadowing]])
# will 400 with `Invalid model name` until that alias is re-added.
# #
# Size split: mini/haiku-class → llama3.1:8b; everything else → llama3.3:70b.
# Single GPU, OLLAMA_NUM_PARALLEL=1 — concurrent/mixed requests queue and the # Single GPU, OLLAMA_NUM_PARALLEL=1 — concurrent/mixed requests queue and the
# 70B+8B can't both stay resident in the ~62 GB budget (expect model swaps). # 70B+8B can't both stay resident in the ~62 GB budget (expect model swaps).
litellm_models: litellm_models:
# ---- Default aliases → LOCAL (Anvil/Ollama) ---- # ---- Local (Anvil/Ollama) ----
- name: claude-haiku-4-5
backend: ollama_chat/llama3.1:8b
api_base: "{{ anvil_ollama_base }}"
api_key: ollama-no-auth
- name: gpt-4o-mini
backend: ollama_chat/llama3.1:8b
api_base: "{{ anvil_ollama_base }}"
api_key: ollama-no-auth
- name: claude-sonnet-4-6
backend: ollama_chat/llama3.3:70b
api_base: "{{ anvil_ollama_base }}"
api_key: ollama-no-auth
- name: claude-opus-4-7
backend: ollama_chat/llama3.3:70b
api_base: "{{ anvil_ollama_base }}"
api_key: ollama-no-auth
- name: gpt-4o
backend: ollama_chat/llama3.3:70b
api_base: "{{ anvil_ollama_base }}"
api_key: ollama-no-auth
- name: gpt-4-turbo
backend: ollama_chat/llama3.3:70b
api_base: "{{ anvil_ollama_base }}"
api_key: ollama-no-auth
# Direct local model names (explicit)
- name: qwen2.5-72b - name: qwen2.5-72b
backend: ollama_chat/qwen2.5:72b backend: ollama_chat/qwen2.5:72b
api_base: "{{ anvil_ollama_base }}" api_base: "{{ anvil_ollama_base }}"