NVIDIA Nemotron Nano 2 VL is an open 12B multimodal reasoning model for document intelligence and video understanding. It enables AI assistants to extract, interpret, and act on information across text, images, tables, and videos.
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for NVIDIA Nemotron Nano 2 VL using Fireworks' reliable, high-performance system with no rate limits. |
NVIDIA Nemotron Nano 2 VL is a 12B parameter open multimodal reasoning model developed by NVIDIA. It is designed for document intelligence, visual question answering, long video understanding, and complex multimodal reasoning. It builds on the Nemotron Nano V2 LLM and incorporates the RADIOv2.5 vision encoder with a hybrid Mamba-Transformer architecture.
Nemotron Nano 2 VL is optimized for:
The model supports a maximum of 128K tokens (input + output combined) at inference time.
Although training included sequences up to 311K tokens, the usable context length during inference is 128K tokens.
Yes. The model is released in:
The model has 12.6 billion parameters.
No, fine-tuning is not currently supported on Fireworks AI.
Use of this model is governed by the NVIDIA Open Model License Agreement.