Meta's Llama 4 Maverick is their initial natively-multimodal, Mixture-of-Experts (MoE) model.
This model processes both text and images, directing tokens through specialized expert blocks. Notably, it features a significantly expanded context window of 1 million tokens, a 10x increase compared to other models. This advancement allows for keeping extensive code repositories, complete product specifications, or lengthy user conversations in its memory.
Minutes after Meta published the weights, the model showed up in the Fireworks AI catalogue (accounts/fireworks/models/llama4-maverick-instruct-basic). Early adopters, including many of the edge-AI researchers who benchmarked the model, were already hitting the endpoint before most providers finished container builds.
To enable superior performance of Llama 4 we leveraged multiple components of Fireworks Platform:
The flexibility of the platform enabled Fireworks AI to be the first public Llama 4 API.
Independent testing by Artificial Analysis on April 27, 2025, demonstrates that Fireworks AI delivers 145 tokens per second for streaming throughput of Llama 4 Maverick, running on H200. This performance is 10-20% faster than the closest competitor and more than double the speed of managed Azure endpoints (Artificial Analysis).
Figure 1. Llama 4 Maverick Output-Token Speed (27 Apr 2025).
Fireworks AI exposes an OpenAI-compatible function-calling interface; just pass a JSON schema via tools and receive a deterministic function_call object.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103
If you need the fastest, largest-context, multimodal Llama 4 endpoint with production-grade function calling, Fireworks AI is the current engineering sweet spot.
Spin up the API, point your existing OpenAI client to it, and enjoy 145 tokens-per-second chat with a million-token brain: https://fireworks.ai/models/fireworks/llama4-maverick-instruct-basic
PS: The llama4-maverick running on Serverless is on the public tier, and hence performance might vary, depending on traffic. If you intend to achieve optimal speeds, and customize for your needs, we recommend running it on on-demand deployment.
Happy building! 🚀