The InternVL3 collection of models are advanced multimodal large language models that combine superior vision and language understanding capabilities. Built with a ViT-MLP-LLM architecture, these models excel at multimodal reasoning, document analysis, video understanding, and complex visual tasks while supporting dynamic resolution processing and extended context understanding.
On-demand DeploymentDocs | On-demand deployments allow you to use InternVL3 78B on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits. |