NVIDIA Nemotron 3 Ultra BF16 API & Playground

Nemotron-3-Ultra-550B-A55B-BF16 is a frontier-scale large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for the most demanding workloads, including complex multi-step agents, long-context analysis, and high-accuracy reasoning over code, math, and science. Like other models in the family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be configured through a flag in the chat template.

NVIDIA Nemotron 3 Ultra BF16 API Features

Fine-tuning Docs	NVIDIA Nemotron 3 Ultra BF16 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
On-demand Deployment Docs	On-demand deployments allow you to use NVIDIA Nemotron 3 Ultra BF16 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Metadata

State

Ready

Created on

6/2/2026

Kind

Base model

Provider

NVIDIA

Hugging Face

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16

Specification

Calibrated

Mixture-of-Experts

Yes

Parameters

549B

Supported Functionality

Fine-tuning

Supported

Serverless

Not supported

Context Length

262k tokens

Function Calling

Supported

Embeddings

Not supported

Rerankers

Not supported

Support image input

Not supported

NVIDIA Nemotron 3 Ultra BF16

NVIDIA Nemotron 3 Ultra BF16 API Features

Fine-tuning

On-demand Deployment

Metadata

Specification

Supported Functionality