Commit Graph

19 Commits

Author SHA1 Message Date
Your Name a39323db70 litellm: fix direct_* model IDs — gemini 2.5, drop o3-mini
Verified the direct_* providers end-to-end after billing was enabled.
- OpenAI direct_gpt-4o / direct_gpt-4o-mini: working.
- Gemini: gemini-2.0-flash 404s (LiteLLM 1.55.10 rewrites it to a retired
  experimental name) and gemini-1.5-pro is retired -> switch to the current GA
  gemini-2.5-flash / gemini-2.5-pro (both verified).
- Drop direct_o3-mini: o-series needs max_completion_tokens, which 1.55.10 won't
  translate from the max_tokens clients (Open WebUI) send -> 400. Re-add after a
  LiteLLM bump.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 12:39:17 -04:00
Your Name 211d26cc63 litellm: re-index models with local_/proxy_/direct_ prefixes + scaffold OpenAI+Gemini
Backend-prefix taxonomy so the Open WebUI picker is self-documenting and a
model name can't lie about where it routes:
  local_*  -> Anvil/Ollama (free)        e.g. local_qwen2.5-72b
  proxy_*  -> Claude via Meridian/Max     e.g. proxy_claude-sonnet-4-6
  direct_* -> metered OpenAI/Gemini       e.g. direct_gpt-4o, direct_gemini-2.0-flash

Drops the redundant -max suffix (proxy_ already implies Max). api_base is now
emitted only when a model defines it, so direct_* hit the provider default
endpoint instead of Meridian. direct_* are SCAFFOLDED (no live keys): litellm.env
writes a placeholder so the proxy boots; deploy.sh pulls OPENAI_API_KEY/
GEMINI_API_KEY from Infisical /meridian if present (non-fatal). They 401 until
real keys land.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 12:05:55 -04:00
Your Name b64a95a71b litellm: drop claude-*/gpt-* shadow aliases
Honest model names only — local picks up real Ollama names (qwen2.5-72b,
llama-3.3-70b, llama-3.1-8b, nomic-embed-text), Claude via *-max only.

The shadows were briefly useful (paperless-ai wizard probe quirk) and
then briefly used to make the ALL-LOCAL cutover transparent to clients,
but having "claude-sonnet-4-6" silently route to llama3.3:70b in the
Open WebUI picker was a constant foot-gun. Pulse re-pointed to a clean
alias in its UI prior to this push; paperless-ai was already on
qwen2.5-72b. Trade-off captured in [[litellm-openai-alias-shadowing]].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 20:07:49 -04:00
Your Name e866d0c89f litellm: add qwen2.5-72b alias (Anvil) as the best-quality local model
Replaces the short-lived mistral-large alias. Backed by ollama_chat/qwen2.5:72b
on Anvil. Consumers (paperless-ai, RAG chat, HA, morning-report) target this.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 22:26:34 -04:00
Your Name c29e24b51b litellm: route all homelab LLM load to Anvil/Ollama by default
Per-model api_base/api_key overrides in the template (default stays
Meridian's local port). All standard aliases (claude-*, gpt-*) now point
at Anvil's Ollama (mini/haiku-class -> llama3.1:8b, rest -> llama3.3:70b).
Claude/Max reachable only via new *-max escape-hatch aliases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 11:16:46 -04:00
Your Name 8adecb417a site.yml: drop leftover node_exporter role refs
Earlier Pass 2 cleanup sed only matched bare '- node_exporter' lines;
these used '{ role: ..., tags: [...] }' syntax which fell through.
Deploys via Semaphore were erroring with 'role node_exporter not found'.
2026-05-21 22:35:38 -04:00
Your Name 53df2ced67 node_exporter: retire standalone role (replaced by Alloy embedded)
Embedded prometheus.exporter.unix in Alloy has been pushing identical
metrics fleet-wide since 2026-05-21 cutover; the standalone binary
and systemd service have been removed from each host. Drops the
role + site.yml entry so future deploys don't reinstate them.

See homelab-docs/docs/audit/alloy-consolidation-2026-05-21.md.
2026-05-21 21:49:52 -04:00
Your Name bee546cea8 alloy: cutover prometheus.exporter.unix to standard job names
Drops the _canary suffix on alloy_prom_job. Prometheus retired its
static node_* scrape jobs in the same release; Alloy's remote_write
fills the gap with identical job/instance/group/hostname labels.
2026-05-21 20:52:50 -04:00
Your Name 40af073d9c alloy: add prometheus.exporter.unix canary (Track A fleet rollout)
Embeds node_exporter inside Alloy alongside Loki shipping; pushes
metrics via remote_write to observe Prom with job=node_lxc_canary
to run side-by-side with the existing node_exporter scrape until
cutover. See homelab-docs/docs/audit/alloy-consolidation-2026-05-21.md.
2026-05-21 19:21:22 -04:00
Your Name b33148e010 README: add Logging section (ships to Loki via Alloy) 2026-05-19 23:45:04 -04:00
Your Name 8e462beea8 alloy: max_age 12h → 1m (Loki rejects journal entries > ~1h old on first deploy) 2026-05-19 23:02:16 -04:00
Your Name 03d1d4630f alloy: bare-metal systemd shipper for journald → Loki
Meridian + LiteLLM both run as systemd services on this LXC (no docker)
so the Docker-container Alloy pattern from other repos doesn't apply.
Apt-install grafana/alloy via apt.grafana.com, journald-only scrape,
ships to Loki on observe.lan.balders.ca.

Side benefit: Meridian.service + LiteLLM.service logs (including the
gpt-* alias shadowing requests from paperless-ai) now searchable in
Loki, not just journalctl on the LXC.
2026-05-19 22:49:44 -04:00
Your Name 49c6e10574 litellm: shadow gpt-4o-mini / gpt-4o / gpt-4-turbo aliases onto Claude backends
paperless-ai's setup wizard validates the OpenAI provider by hardcoding
model=gpt-4o-mini in the probe, regardless of the OPENAI_MODEL env. Without
the alias LiteLLM 400s ("Invalid model name") and the wizard rejects the
key. Shadow common OpenAI names onto our Claude backends so any client that
probes gpt-* gets a healthy response (and routes to the Max sub).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 13:39:51 -04:00
Your Name 26f2ce4848 README: LiteLLM section + Pulse wiring recipe + dual-endpoint client table
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 11:55:07 -04:00
Your Name a6b26c500f litellm: add OpenAI→Meridian shim role (venv + systemd, port 4000)
LiteLLM sits in front of Meridian for clients that can't talk Anthropic's
/v1/messages format (Pulse OpenAI provider, paperless-ai, etc.). Routes
OpenAI-shaped requests to localhost:3456 (Meridian) which forwards to the
Max sub.

- New roles/litellm/ — Python venv, pip install litellm[proxy], systemd
- vars/main.yml — model map (haiku/sonnet/opus) + LITELLM_MASTER_KEY env lookup
- site.yml — adds litellm role + sanity-check assert
- deploy.sh — pulls LITELLM_MASTER_KEY from Infisical (/meridian/) on the
  controller and exports it for the playbook
- New Infisical secret /meridian/vault_litellm_master_key

Smoke: Pulse → LiteLLM /v1/chat/completions → Meridian /v1/messages → Max sub
returns "pong" through both the LiteLLM master key auth and the Claude Code
SDK OAuth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 11:23:52 -04:00
Your Name 4ab85f0227 README: replace scp-from-Mac OAuth bootstrap with claude auth login --claudeai
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 11:01:41 -04:00
Your Name ee178ef013 inventory: change IP .184 → .164 (Chuck's preference; .184 was unreachable)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:39:49 -04:00
Your Name 5e16fee73b initial scaffold: Meridian LXC (Node 22 + npm @rynfar/meridian + systemd)
Deploys @rynfar/meridian on a Debian 12 LXC, bound to 0.0.0.0:3456.
OAuth credentials transferred manually after first deploy (claude login on
Mac, scp ~/.claude to /opt/meridian/.claude). systemd unit is enabled but
gated on credentials.json existence so the first deploy doesn't crash-loop.

LXC has no auth layer — security model is LAN-only reachability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:20:41 -04:00
cbalders 94fad75007 Initial commit 2026-05-17 21:14:51 -04:00