About
About llm-speed
llm-speed is the canonical, crowdsourced source of truth for how fast LLMs actually run — across hosted APIs, consumer GPUs, Apple Silicon, and prosumer rigs. One reproducible CLI, one methodology, every backend.
Why this exists
Existing answers are inadequate. Enterprise leaderboards focus on H100/B200 racks. Reddit folklore is unstructured. Vendor blogs are SEO-driven, not methodology-driven. Nobody owns the union of consumer-local plus hosted-API benchmarks under one consistent protocol. That's the gap we fill.
How it works
pipx install llm-speedinstalls a single CLI that fingerprints your hardware, runs a fixed workload suite, and uploads a signed result.- The website renders every submission, with per-(model × hardware × backend) pages and run-level permalinks.
- Read the methodology for the full workload spec and dispute process.
Peer sources
We complement, not replace, the great work others have done:
- Artificial Analysis (AA) — enterprise H100/B200 / agent benchmarks.
- LocalScore — Mozilla Builders project, Llamafile-only single-score local benchmarks.
- r/LocalLLaMA — the community.
Contact
Issues, ideas, disputes: GitHub issues. The project is Apache-2.0 licensed; results belong to their submitters.