New to Fireworks? Start with the Serverless Quickstart to see a vision model example, then return here for more details.
Chat Completions API
Provide images via URL or base64 encoding:- Python
- JavaScript
- curl
Using base64-encoded images
Using base64-encoded images
Instead of URLs, you can provide base64-encoded images prefixed with MIME types:
Working with images
Vision-language models support prompt caching to improve performance for requests with repeated content. Both text and image portions can benefit from caching to reduce time to first token by up to 80%. Tips for optimal performance:- Use URLs for long conversations – Reduces latency compared to base64 encoding
- Downsize images – Smaller images use fewer tokens and process faster
- Structure prompts for caching – Place static instructions at the beginning, variable content at the end
- Include metadata in prompts – Add context about the image directly in your text prompt
Advanced capabilities
Vision fine-tuning
Fine-tune VLMs for specialized visual tasks
LoRA adapters
Deploy custom LoRA adapters for vision models
Dedicated deployments
Deploy VLMs on dedicated GPUs for better performance
Alternative query methods
Completions API (advanced)
Completions API (advanced)
For the Completions API, manually insert the image token
<image> in your prompt and supply images as an ordered list:Known limitations
- Maximum images per request: 30 images maximum, regardless of format (base64 or URL)
- Base64 size limit: Total base64-encoded images must be less than 10MB
- URL size and timeout: Each image URL must be smaller than 5MB and download within 1.5 seconds
- Supported formats:
.png,.jpg,.jpeg,.gif,.bmp,.tiff,.ppm - Llama 3.2 Vision models: Pass images before text in the content field to avoid refusals (temporary limitation)