spotinference

priority-ordered GPU inference. hibernate when idle. pay only for tokens.

A tiny OpenAI-compatible gateway that fans requests across a ranked lineup of on-demand GPU VMs — H100, A100, L40 — and hibernates them five minutes after the last token. Each request walks your lineup top-to-bottom; whichever tier is warm (or wakes fastest) gets the job. Usage is logged per token; the dashboard shows live tok/s, VM state, and cost to the penny.