Home Models GPUs Backends

GPU capacity planner

Estimate GPUs, throughput, and latency for self-hosted LLM serving.

Pick a model, choose a GPU tier, and get quick guidance on fit, concurrency, and expected latency. Built for vLLM-style deployments.

Supported models54

GPU profiles28

Values are seeded; plug in your own measurements later.

Inputs

Active users vs GPUs

Active users: 8GPUs: 1

Active users

GPUs

Sliders are independent; results below reflect your chosen active users and GPU count.

Model

Backend

GPUs (choose one or more)

A100 80GBAI PRO R9000Apple M1 MaxApple M1 UltraApple M2 MaxApple M2 UltraApple M3 MaxApple M4 MaxApple M5 MaxB200H100 80GBH200 141GBInstinct MI250Instinct MI300XIntel Arc A770 16GBIntel Arc B580Intel Arc Pro B60Intel Arc Pro B70Intel Data Center GPU MaxIntel Flex 170L40SRTX 4090RTX 5090RTX 6000 AdaRTX PRO 6000 BlackwellRX 7900 XTXRX 9070 XTRadeon PRO W7900

Active users

Latency profile

Custom target tok/s per user (optional)

Avg prompt tokens

Avg output tokens

Estimates are approximate. Real performance depends on quantization, context length, batching, and traffic shape.

Results

Run an estimate to see GPU recommendations.