Llama 3.3 70B Instruct is the December update of Llama 3.1 70B. The model improves upon Llama 3.1 70B (released July 2024) with advances in tool calling, multilingual text support, math and coding. The model achieves industry leading results in reasoning, math and instruction following and provides similar performance as 3.1 405B but with significant speed and cost improvements.
Fine-tuningDocs | Llama 3.3 70B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
ServerlessDocs | Llama 3.3 70B Instruct is available via Fireworks' serverless API, where you pay per token. There are several ways to call the Fireworks API, including Fireworks' Python client, the REST API, or OpenAI's Python client. |
On-demand DeploymentDocs | On-demand deployments allow you to use Llama 3.3 70B Instruct on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits. |
Run queries immediately, pay only for usage
Llama 3.3 70B Instruct is a multilingual, instruction-tuned large language model developed by Meta AI. It is the December 2024 update to Llama 3.1 70B, offering improvements in reasoning, tool use, math, code generation, and multilingual capabilities.
The model is optimized for:
Fireworks supports a context length of 131,072 tokens.
The full 131.1K tokens are usable on Fireworks on-demand deployments.
Yes. The model supports 4-bit and 8-bit formats.
Meta outlines these limitations:
No, function calling is not supported for this model.
The model has 70.6 billion parameters.
Yes. Fireworks supports LoRA-based fine-tuning for this model.
The model is distributed under the Llama 3.3 Community License, a custom commercial license from Meta. Full license details are available here.