Molmo2 is a family of open vision-language models developed by the Allen Institute for AI (Ai2) that support image, video and multi-image understanding and grounding. Molmo 2 (8B) is Qwen 3-based and Ai2's best overall model for video grounding and QA.
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for Molmo2-8B using Fireworks' reliable, high-performance system with no rate limits. |