LLM Infrastructure
Designing an LLM broker without selling arbitrage hype
Problem
The initial idea sounds simple: put an OpenAI-compatible API in front of several providers and route requests by cost, latency, or quality. The risky part is assuming that this is enough to be a product. Providers change models, limits, and terms; a serious broker cannot depend on avoiding quotas or promising almost-free tokens.
Decision
The repo frames it as an executable v1: FastAPI exposes /v1/chat/completions, clients use public
aliases such as auto, fast, and smart, and LiteLLM acts as the routing
and fallback layer. The user is not buying a specific provider; they are buying a stable interface.
For billing, the core is a Postgres credit_ledger. The API estimates cost before calling a model,
checks balance, and then debits against reported usage. Redis + ARQ separate batch execution, and the Telegram
bot works as an onboarding channel without maintaining a parallel balance.
Tradeoffs
- The v1 prioritizes contract and accounting over flashier features such as streaming.
- USDC topups are modeled, but real onchain reconciliation is still a production requirement.
- Aliases simplify UX, but final pricing needs provider/model granularity to avoid opaque margins.
- Batch is a real opportunity if it charges for convenience and priority, not just token proxying.
Validation
The code already leaves some verifiable contracts: tests for alias resolution and the premium multiplier, an API health check, deterministic bot helpers, and a flow that records each request as queued, completed, or failed before touching the ledger.
There is also an explicit risk list: migrating deprecated models, live providers in LiteLLM, exact billing, rate limits, streaming, observability, and an internal admin surface. That list is part of the value: it avoids presenting a scaffold as if it were production infrastructure.
Outcome
- Scaffold with FastAPI, Postgres, Redis, ARQ, SQLAlchemy/Alembic, LiteLLM, and a Telegram bot.
- OpenAI-compatible surface for integrating with existing SDKs and agents.
- Ledger-based accounting instead of treating a materialized balance as the only source of truth.
- Clear roadmap toward onchain integration, exact pricing, rate limiting, and real batch execution.
Next
The technical and commercial conclusion is the same: the horizontal broker only makes sense as infrastructure. The more defensible product appears on top of it: vertical workflows that charge for outcomes, traceability, and convenience rather than raw inference resale.