
Agents don’t solve single prompts. They execute long-running tasks: planning, calling tools, spawning sub-agents, evaluating intermediate results, correcting mistakes, and iterating across hundreds of steps before producing something useful.
Every step consumes time and compute. As a result, the relevant unit of measurement is no longer the cost of a response, but the cost of completing a task. The question is no longer just how a model ranks on a benchmark, but how much it costs and how long it takes to get the job done.
That is what NVIDIA built Nemotron 3 Ultra to optimize for.
It is an open model for frontier reasoning and orchestration in long-running autonomous agents, designed for use cases such as coding agents, deep research, and complex enterprise workflows. 550B total parameters with 55B active, a hybrid Transformer-Mamba MoE architecture, and up to 1M context. NVIDIA reports 5x faster inference and up to 30% lower cost for agentic tasks versus other open models in its class.
And starting today, Nemotron 3 Ultra is available on Fireworks with day-zero support, ready to deploy on dedicated GPUs through on-demand deployments.
We build the best-performing inference platform, running on the latest NVIDIA GPUs including B300 and B200. On top of the hardware, our proprietary optimizations like our FireAttention custom kernels deliver up to 4x higher throughput while fully preserving model quality. For a family of models whose entire point is fast, efficient task completion across long agent runs, that's the environment you want them in.
Day-zero support means you don't wait. Nemotron 3 Ultra is deployable on Fireworks the moment it launches.
On-demand deployments give you dedicated GPUs: lower latency, no hard rate limits, and predictable performance unaffected by other users. You're billed by GPU-second, so it's cost-effective under real load.
Deploy Nemotron 3 Ultra on Fireworks today with a single command. We're excited to work with NVIDIA to make these breakthrough models available to developers worldwide.
Questions? Join our Discord or contact us to schedule a meeting with our solutions team.