Blackwell GPUs.Rupee rates.
Dedicated GPU instances for LLM inference, ML training and professional visualization. The whole card is yours — billed per hour, on-demand, no upfront commitment.
Three cards. Three rates.
Every instance gets a dedicated RTX Pro Blackwell card — no time-slicing, no fractional vGPU. Pick the VRAM you need, pay by the hour, terminate when you're done.
NVMe-backed storage, on the same platform — and the same invoice — as the rest of your compute. No upfront commitment.
GPU pricing in the docsFig. G-1 · The whole card
| Instance | GPU | VRAM | vCPU | RAM | ₹ / hr |
|---|---|---|---|---|---|
| nv2a.xlarge | RTX 4500 Pro Blackwell | 32 GiB | 4 | 16 GiB | ₹44.554 |
| nv3a.2xlarge | RTX 5000 Pro Blackwell | 48 GiB | 8 | 32 GiB | ₹63.849 |
| nv1a.4xlarge | RTX 6000 Pro Blackwell | 96 GiB | 16 | 64 GiB | ₹126.784 |
Billed per hour · on-demand · no upfront commitment
Rent the whole card. Or just buy tokens.
The whole card
A dedicated GPU instance. Your weights, your runtime, your quantization — fine-tune, train, render, serve any model you like. Capacity is fixed and the meter is the clock.
from ₹44.554/hr
per instance · on-demand
Just the tokens
Token-priced LLM inference on a hosted Qwen model (Qwen3.6-27B). No instance to manage, no idle hours, no minimum commit — you pay only for what the model reads and writes.
₹20 · ₹60/1M tok
input · output · no minimum commit
LLM inference pricing → docs.excloud.inRule of thumb: steady high-volume or custom models → rent the card. Bursty traffic on a stock model → buy the tokens.
Access by requisition.
GPU capacity is allocated by quota so every instance maps to a real card. Send a quota request, get approved, then provision from the console like any other instance — same hourly billing, same bill.
Request GPU quota- To
- support@excloud.dev
- Subject
- GPU quota request
- State
- instance type(s) · quantity · workload
- Then
- provision at console.excloud.dev
Provision it now.
Console, CLI, API or Terraform — same prices everywhere.