About

About llm-speed

llm-speed is the canonical, crowdsourced source of truth for how fast LLMs actually run — across hosted APIs, consumer GPUs, Apple Silicon, and prosumer rigs. One reproducible CLI, one methodology, every backend.

Why this exists

Existing answers are inadequate. Enterprise leaderboards focus on H100/B200 racks. Reddit folklore is unstructured. Vendor blogs are SEO-driven, not methodology-driven. Nobody owns the union of consumer-local plus hosted-API benchmarks under one consistent protocol. That's the gap we fill.

How it works

pipx install llm-speed installs a single CLI that fingerprints your hardware, runs a fixed workload suite, and uploads a signed result.
The website renders every submission, with per-(model × hardware × backend) pages and run-level permalinks.
Read the methodology for the full workload spec and dispute process.

Peer sources

We complement, not replace, the great work others have done:

Artificial Analysis (AA) — enterprise H100/B200 / agent benchmarks.
LocalScore — Mozilla Builders project, Llamafile-only single-score local benchmarks.
r/LocalLLaMA — the community.

Contact

Issues, ideas, disputes: GitHub issues. The project is Apache-2.0 licensed; results belong to their submitters.