litellm: drop claude-*/gpt-* shadow aliases
Honest model names only — local picks up real Ollama names (qwen2.5-72b, llama-3.3-70b, llama-3.1-8b, nomic-embed-text), Claude via *-max only. The shadows were briefly useful (paperless-ai wizard probe quirk) and then briefly used to make the ALL-LOCAL cutover transparent to clients, but having "claude-sonnet-4-6" silently route to llama3.3:70b in the Open WebUI picker was a constant foot-gun. Pulse re-pointed to a clean alias in its UI prior to this push; paperless-ai was already on qwen2.5-72b. Trade-off captured in [[litellm-openai-alias-shadowing]]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+9
-32
@@ -57,42 +57,19 @@ litellm_package_spec: "litellm[proxy]==1.55.10"
|
|||||||
# the inference path off the resolver. Ollama has no auth (placeholder api_key).
|
# the inference path off the resolver. Ollama has no auth (placeholder api_key).
|
||||||
anvil_ollama_base: "http://192.168.1.150:11434"
|
anvil_ollama_base: "http://192.168.1.150:11434"
|
||||||
|
|
||||||
# ALL homelab LLM load routes LOCAL as of 2026-05-28. Every standard alias that
|
# Model list — honest names only. Shadow aliases (claude-*/gpt-*) were removed
|
||||||
# clients already use (claude-*, gpt-*) now resolves to Anvil/Ollama — no client
|
# 2026-05-29 because the dual meaning (was-Claude, now-local) was a constant
|
||||||
# reconfig needed. Meridian still runs, but Claude/Max is reachable ONLY via the
|
# foot-gun in the Open WebUI picker. Local models keep their real names; Claude
|
||||||
# explicit *-max escape-hatch aliases below (use them for vision or hard
|
# is reached only via the explicit *-max aliases.
|
||||||
# reasoning — llama3.x is text-only and weaker on complex tasks).
|
#
|
||||||
|
# Trade-off: any client that hard-codes a claude-*/gpt-* model name in a probe
|
||||||
|
# (paperless-ai wizard hits gpt-4o-mini; see [[litellm-openai-alias-shadowing]])
|
||||||
|
# will 400 with `Invalid model name` until that alias is re-added.
|
||||||
#
|
#
|
||||||
# Size split: mini/haiku-class → llama3.1:8b; everything else → llama3.3:70b.
|
|
||||||
# Single GPU, OLLAMA_NUM_PARALLEL=1 — concurrent/mixed requests queue and the
|
# Single GPU, OLLAMA_NUM_PARALLEL=1 — concurrent/mixed requests queue and the
|
||||||
# 70B+8B can't both stay resident in the ~62 GB budget (expect model swaps).
|
# 70B+8B can't both stay resident in the ~62 GB budget (expect model swaps).
|
||||||
litellm_models:
|
litellm_models:
|
||||||
# ---- Default aliases → LOCAL (Anvil/Ollama) ----
|
# ---- Local (Anvil/Ollama) ----
|
||||||
- name: claude-haiku-4-5
|
|
||||||
backend: ollama_chat/llama3.1:8b
|
|
||||||
api_base: "{{ anvil_ollama_base }}"
|
|
||||||
api_key: ollama-no-auth
|
|
||||||
- name: gpt-4o-mini
|
|
||||||
backend: ollama_chat/llama3.1:8b
|
|
||||||
api_base: "{{ anvil_ollama_base }}"
|
|
||||||
api_key: ollama-no-auth
|
|
||||||
- name: claude-sonnet-4-6
|
|
||||||
backend: ollama_chat/llama3.3:70b
|
|
||||||
api_base: "{{ anvil_ollama_base }}"
|
|
||||||
api_key: ollama-no-auth
|
|
||||||
- name: claude-opus-4-7
|
|
||||||
backend: ollama_chat/llama3.3:70b
|
|
||||||
api_base: "{{ anvil_ollama_base }}"
|
|
||||||
api_key: ollama-no-auth
|
|
||||||
- name: gpt-4o
|
|
||||||
backend: ollama_chat/llama3.3:70b
|
|
||||||
api_base: "{{ anvil_ollama_base }}"
|
|
||||||
api_key: ollama-no-auth
|
|
||||||
- name: gpt-4-turbo
|
|
||||||
backend: ollama_chat/llama3.3:70b
|
|
||||||
api_base: "{{ anvil_ollama_base }}"
|
|
||||||
api_key: ollama-no-auth
|
|
||||||
# Direct local model names (explicit)
|
|
||||||
- name: qwen2.5-72b
|
- name: qwen2.5-72b
|
||||||
backend: ollama_chat/qwen2.5:72b
|
backend: ollama_chat/qwen2.5:72b
|
||||||
api_base: "{{ anvil_ollama_base }}"
|
api_base: "{{ anvil_ollama_base }}"
|
||||||
|
|||||||
Reference in New Issue
Block a user