Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/NVIDIA/NVIDIA Nemotron Nano 2 VL

NVIDIA Nemotron Nano 2 VL

Ready
fireworks/nemotron-nano-v2-12b-vl

    NVIDIA Nemotron Nano 2 VL is an open 12B multimodal reasoning model for document intelligence and video understanding. It enables AI assistants to extract, interpret, and act on information across text, images, tables, and videos.

    NVIDIA Nemotron Nano 2 VL API Features

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for NVIDIA Nemotron Nano 2 VL using Fireworks' reliable, high-performance system with no rate limits.

    NVIDIA Nemotron Nano 2 VL FAQs

    What is NVIDIA Nemotron Nano 2 VL and who developed it?

    NVIDIA Nemotron Nano 2 VL is a 12B parameter open multimodal reasoning model developed by NVIDIA. It is designed for document intelligence, visual question answering, long video understanding, and complex multimodal reasoning. It builds on the Nemotron Nano V2 LLM and incorporates the RADIOv2.5 vision encoder with a hybrid Mamba-Transformer architecture.

    What applications and use cases does NVIDIA Nemotron Nano 2 VL excel at?

    Nemotron Nano 2 VL is optimized for:

    • Multi-modal document intelligence (e.g., invoices, receipts, manuals)
    • Multi-image reasoning (up to 5 document images)
    • Visual question answering (VQA)
    • Video understanding and summarization
    What is the maximum context length for NVIDIA Nemotron Nano 2 VL?

    The model supports a maximum of 128K tokens (input + output combined) at inference time.

    What is the usable context window for NVIDIA Nemotron Nano 2 VL?

    Although training included sequences up to 311K tokens, the usable context length during inference is 128K tokens.

    Does NVIDIA Nemotron Nano 2 VL support quantized formats (4-bit/8-bit)?

    Yes. The model is released in:

    • FP8
    • FP4 (using Quantization-aware Distillation)
    • BF16
    What are known failure modes of NVIDIA Nemotron Nano 2 VL?
    • Performance degradation on text reasoning tasks after vision training (mitigated in later SFT stages)
    • Repetition loops in reasoning-on mode for OOD tasks (addressed via reasoning budget control)
    • Performance sensitivity in OCR tasks based on image preprocessing method
    How many parameters does NVIDIA Nemotron Nano 2 VL have?

    The model has 12.6 billion parameters.

    Is fine-tuning supported for NVIDIA Nemotron Nano 2 VL?

    No, fine-tuning is not currently supported on Fireworks AI.

    What rate limits apply on the shared endpoint?

    It is available via on-demand deployment only, which provides dedicated GPUs with no rate limits.

    What license governs commercial use of NVIDIA Nemotron Nano 2 VL?

    Use of this model is governed by the NVIDIA Open Model License Agreement.

    Metadata

    State
    Ready
    Created on
    10/23/2025
    Kind
    Base model
    Provider
    NVIDIA
    Hugging Face
    nemotron-nano-v2-12b-vl

    Specification

    Calibrated
    No
    Mixture-of-Experts
    No
    Parameters
    0

    Supported Functionality

    Fine-tuning
    Not supported
    Serverless
    Not supported
    Serverless LoRA
    Not supported
    Context Length
    N/A
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Supported