Qwen 3.7 Plus is now available on Serverless, exclusively on Fireworks. Try it today.

Model Library
/Meta/Llama 3.2 90B Vision Instruct
Meta Mark

Llama 3.2 90B Vision Instruct

Ready
model path:accounts/fireworks/models/llama-v3p2-90b-vision-instruct

Instruction-tuned image reasoning model with 90B parameters from Meta. Optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The model can understand visual data, such as charts and graphs and also bridge the gap between vision and language by generating text to describe images details

Llama 3.2 90B Vision Instruct API Features

On-demand Deployment

Docs

On-demand deployments allow you to use Llama 3.2 90B Vision Instruct on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Llama 3.2 90B Vision Instruct FAQs

What is Llama 3.2 90B Vision Instruct and who developed it?

Llama 3.2 90B Vision Instruct is a multimodal instruction-tuned model developed by Meta. It combines the Llama 3.1 text model with a vision adapter for image reasoning and supports both image and text input. It is part of the Llama 3.2-Vision model collection.

What applications and use cases does Llama 3.2 90B Vision Instruct excel at?

This model excels in:

  • Visual Question Answering (VQA)
  • Chart, diagram, and infographic analysis
  • Document understanding (DocVQA)
  • Image captioning and image-text retrieval
  • Visual grounding and reasoning tasks
  • Agentic systems that process image + text input
What is the maximum context length for Llama 3.2 90B Vision Instruct?

It supports a maximum context length of 131,072 tokens on Fireworks.

What is the usable context window for Llama 3.2 90B Vision Instruct?

Fireworks supports the full 131K token window on dedicated GPU deployments (on-demand).

What are known failure modes of Llama 3.2 90B Vision Instruct?
  • No support for structured function calling or tool use
  • May hallucinate or show reduced precision in adversarial visual prompts
  • Supports only English for image-text tasks
  • Ethical risks include child safety, cyber attacks, and content misuse; mitigated with fine-tuning and Llama Guard
How many parameters does Llama 3.2 90B Vision Instruct have?

The model has 88.6 billion parameters.

Is fine-tuning supported for Llama 3.2 90B Vision Instruct?

Fine-tuning is supported via LoRA on Fireworks' serverless interface, but standard fine-tuning is not available.

How are tokens counted (prompt vs completion)?

Tokens are counted across combined input and output, up to the 131K limit.

What rate limits apply on the shared endpoint?
  • Serverless: Not supported
  • On-demand: Available with no rate limits on dedicated infrastructure
What license governs commercial use of Llama 3.2 90B Vision Instruct?

The model is governed by the Llama 3.2 Community License, a custom commercial license that allows research and commercial use under specified terms.

Metadata

State
Ready
Created on
9/23/2024
Kind
Base model
Provider
Meta

Specification

Calibrated
No
Mixture-of-Experts
No
Parameters
88.5B

Supported Functionality

Fine-tuning
Not supported
Serverless
Not supported
Context Length
131k tokens
Function Calling
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Supported