Designing an LLM broker without selling arbitrage hype

Problem

The initial idea sounds simple: put an OpenAI-compatible API in front of several providers and route requests by cost, latency, or quality. The risky part is assuming that this is enough to be a product. Providers change models, limits, and terms; a serious broker cannot depend on avoiding quotas or promising almost-free tokens.

Decision

The repo frames it as an executable v1: FastAPI exposes /v1/chat/completions, clients use public aliases such as auto, fast, and smart, and LiteLLM acts as the routing and fallback layer. The user is not buying a specific provider; they are buying a stable interface.

For billing, the core is a Postgres credit_ledger. The API estimates cost before calling a model, checks balance, and then debits against reported usage. Redis + ARQ separate batch execution, and the Telegram bot works as an onboarding channel without maintaining a parallel balance.

Tradeoffs

The v1 prioritizes contract and accounting over flashier features such as streaming.
USDC topups are modeled, but real onchain reconciliation is still a production requirement.
Aliases simplify UX, but final pricing needs provider/model granularity to avoid opaque margins.
Batch is a real opportunity if it charges for convenience and priority, not just token proxying.

Validation

The code already leaves some verifiable contracts: tests for alias resolution and the premium multiplier, an API health check, deterministic bot helpers, and a flow that records each request as queued, completed, or failed before touching the ledger.

There is also an explicit risk list: migrating deprecated models, live providers in LiteLLM, exact billing, rate limits, streaming, observability, and an internal admin surface. That list is part of the value: it avoids presenting a scaffold as if it were production infrastructure.

Outcome

Scaffold with FastAPI, Postgres, Redis, ARQ, SQLAlchemy/Alembic, LiteLLM, and a Telegram bot.
OpenAI-compatible surface for integrating with existing SDKs and agents.
Ledger-based accounting instead of treating a materialized balance as the only source of truth.
Clear roadmap toward onchain integration, exact pricing, rate limiting, and real batch execution.

The technical and commercial conclusion is the same: the horizontal broker only makes sense as infrastructure. The more defensible product appears on top of it: vertical workflows that charge for outcomes, traceability, and convenience rather than raw inference resale.