State-of-the-art language and multimodal models.
| On-Demand Model | Type | Precision | Context | Price Per Tokens Input ($/1M) | Price Per Tokens output ($/1M) |
|---|---|---|---|---|---|
| Google Gemma-3-4b-it | Serverless | BF16 | 128,000 | 0.13 | 0.13 |
| Meta Llama-3.3-70B-Instruct_Q8_0 | Serverless | 8-bit | 128,000 | 0.60 | 0.60 |
| Microsoft Phi4-mini-instruct | Serverless | BF16 | 128,000 | 0.12 | 0.12 |
| OpenAI GPT-oss-20B | Serverless | BF16 | 128,000 | 0.17 | 0.17 |
| Alibaba Qwen3-14B-Q8_0 | Serverless | 8-bit | 32,768 | 0.19 | 0.19 |
*Contact us to host your private models or other public models.
State of the art language and multimodal models
| On-Demand Model | Type | Context | Weights Precision | Price Tokens in ($/1M Token) | Price Tokens out ($/1M Token) |
|---|---|---|---|---|---|
| Alibaba Qwen2.5-VL-3B-Instruct | Image / Video to Text | Coming soon... |
*Contact us to host your private models or other public models.
Apply to get accepted to our Beta program and receive free credits.
From model selection to global inference delivery to 24/7 operations,
Suiri ensures your AI performs reliably, cost-efficiently, and close to your users.