Catalog
Backends
6 entriesBackend list
| Name | Category | Target HW | Multiuser | OpenAI API | Configurable parameters |
|---|---|---|---|---|---|
| Hugging Face Transformers | library | nvidia,amd,intel,cpu,apple | No | No | dtypedevice_mapmax_new_tokensnum_beams |
| MLX-LM | apple | apple | No | No | max_tokenstemperaturetop_p |
| Ollama | local | apple,nvidia,amd,cpu | No | Yes | num_ctxnum_predicttemperaturetop_p |
| Text Generation Inference | server | nvidia,amd | Yes | Yes | dtypequantizenum_shardcuda_memory_fractionmax_concurrent_requestsmax_input_tokensmax_total_tokens |
| llama.cpp | local | cpu,metal,cuda,hip,vulkan | No | No | ctx_sizen_predictn_gpu_layers |
| vLLM | server | nvidia,amd,intel,cpu,apple | Yes | Yes | dtypequantizationgpu_memory_utilizationmax_model_lentensor_parallel_sizemax_num_seqs |