Serverless 2.0 is live: control reliability & speed without reserved capacity. Get Started.

Model Library
/NVIDIA/NVIDIA Nemotron Nano 2 VL
NVIDIA icon

NVIDIA Nemotron Nano 2 VL

Ready
model path:accounts/fireworks/models/nemotron-nano-v2-12b-vl

NVIDIA Nemotron Nano 2 VL is an open 12B multimodal reasoning model for document intelligence and video understanding. It enables AI assistants to extract, interpret, and act on information across text, images, tables, and videos.

NVIDIA Nemotron Nano 2 VL API Features

On-demand Deployment

Docs

On-demand deployments allow you to use NVIDIA Nemotron Nano 2 VL on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

NVIDIA Nemotron Nano 2 VL FAQs

What is NVIDIA Nemotron Nano 2 VL and who developed it?

NVIDIA Nemotron Nano 2 VL is a 12B parameter open multimodal reasoning model developed by NVIDIA. It is designed for document intelligence, visual question answering, long video understanding, and complex multimodal reasoning. It builds on the Nemotron Nano V2 LLM and incorporates the RADIOv2.5 vision encoder with a hybrid Mamba-Transformer architecture.

What applications and use cases does NVIDIA Nemotron Nano 2 VL excel at?

Nemotron Nano 2 VL is optimized for:

  • Multi-modal document intelligence (e.g., invoices, receipts, manuals)
  • Multi-image reasoning (up to 5 document images)
  • Visual question answering (VQA)
  • Video understanding and summarization
What is the maximum context length for NVIDIA Nemotron Nano 2 VL?

The model supports a maximum of 128K tokens (input + output combined) at inference time.

What is the usable context window for NVIDIA Nemotron Nano 2 VL?

Although training included sequences up to 311K tokens, the usable context length during inference is 128K tokens.

Does NVIDIA Nemotron Nano 2 VL support quantized formats (4-bit/8-bit)?

Yes. The model is released in:

  • FP8
  • FP4 (using Quantization-aware Distillation)
  • BF16
What are known failure modes of NVIDIA Nemotron Nano 2 VL?
  • Performance degradation on text reasoning tasks after vision training (mitigated in later SFT stages)
  • Repetition loops in reasoning-on mode for OOD tasks (addressed via reasoning budget control)
  • Performance sensitivity in OCR tasks based on image preprocessing method
How many parameters does NVIDIA Nemotron Nano 2 VL have?

The model has 12.6 billion parameters.

Is fine-tuning supported for NVIDIA Nemotron Nano 2 VL?

No, fine-tuning is not currently supported on Fireworks AI.

What rate limits apply on the shared endpoint?

It is available via on-demand deployment only, which provides dedicated GPUs with no rate limits.

What license governs commercial use of NVIDIA Nemotron Nano 2 VL?

Use of this model is governed by the NVIDIA Open Model License Agreement.

Metadata

State
Ready
Created on
10/23/2025
Kind
Base model
Provider
NVIDIA

Specification

Calibrated
No
Mixture-of-Experts
No
Parameters
13B

Supported Functionality

Fine-tuning
Not supported
Serverless
Not supported
Context Length
131k tokens
Function Calling
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Supported