
Nemotron-3-Ultra-550B-A55B-BF16 is a frontier-scale large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for the most demanding workloads, including complex multi-step agents, long-context analysis, and high-accuracy reasoning over code, math, and science. Like other models in the family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be configured through a flag in the chat template.
Fine-tuningDocs | NVIDIA Nemotron 3 Ultra BF16 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
On-demand DeploymentDocs | On-demand deployments allow you to use NVIDIA Nemotron 3 Ultra BF16 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits. |