NVIDIA Nemotron 3 Ultra NVFP4 API & Playground

Nemotron-3-Ultra-550B-A55B-NVFP4 is a frontier-scale large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for the most demanding workloads, including complex multi-step agents, long-context analysis, and high-accuracy reasoning over code, math, and science. The model employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Like the Super model, the Ultra model incorporates Multi-Token Prediction (MTP) layers for faster text generation and improved quality, and it is trained using an NVFP4 pre-training recipe to maximize compute efficiency. The model has 55B active parameters and 550B parameters in total.

NVIDIA Nemotron 3 Ultra NVFP4 API Features

Serverless Docs	NVIDIA Nemotron 3 Ultra NVFP4 is available via Fireworks' serverless API, where you pay per token. There are several ways to call the Fireworks API, including Fireworks' Python client, the REST API, or OpenAI's Python client.
On-demand Deployment Docs	On-demand deployments allow you to use NVIDIA Nemotron 3 Ultra NVFP4 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Available Serverless

Run queries immediately, pay only for usage

$0.60 / $0.12 / $2.40

Per 1M Tokens (input/cached input/output)

Metadata

State

Ready

Created on

6/2/2026

Kind

Base model

Provider

NVIDIA

Hugging Face

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4

Specification

Calibrated

Mixture-of-Experts

Yes

Parameters

549B

Supported Functionality

Fine-tuning

Not supported

Serverless

Supported

Context Length

262k tokens

Function Calling

Supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Not supported

NVIDIA Nemotron 3 Ultra NVFP4

NVIDIA Nemotron 3 Ultra NVFP4 API Features

Serverless

On-demand Deployment

Available Serverless

Metadata

Specification

Supported Functionality