GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/NVIDIA/NVIDIA Nemotron 3 Ultra BF16
NVIDIA icon

NVIDIA Nemotron 3 Ultra BF16

Ready
model path:accounts/fireworks/models/nemotron-3-ultra-bf16

Nemotron-3-Ultra-550B-A55B-BF16 is a frontier-scale large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for the most demanding workloads, including complex multi-step agents, long-context analysis, and high-accuracy reasoning over code, math, and science. Like other models in the family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be configured through a flag in the chat template.

NVIDIA Nemotron 3 Ultra BF16 API Features

Fine-tuning

Docs

NVIDIA Nemotron 3 Ultra BF16 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use NVIDIA Nemotron 3 Ultra BF16 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Metadata

State
Ready
Created on
6/2/2026
Kind
Base model
Provider
NVIDIA

Specification

Calibrated
No
Mixture-of-Experts
Yes
Parameters
549B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
262k tokens
Function Calling
Supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported