Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Meta/Llama 3.3 70B Instruct
fireworks/llama-v3p3-70b-instruct

    Llama 3.3 70B Instruct is the December update of Llama 3.1 70B. The model improves upon Llama 3.1 70B (released July 2024) with advances in tool calling, multilingual text support, math and coding. The model achieves industry leading results in reasoning, math and instruction following and provides similar performance as 3.1 405B but with significant speed and cost improvements.

    Llama 3.3 70B Instruct API Features

    Fine-tuning

    Docs

    Llama 3.3 70B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    Serverless

    Docs

    Immediately run model on pre-configured GPUs and pay-per-token

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Llama 3.3 70B Instruct using Fireworks' reliable, high-performance system with no rate limits.

    Available Serverless

    Run queries immediately, pay only for usage

    $0.90 / $0.90
    Per 1M Tokens (input/output)

    Llama 3.3 70B Instruct FAQs

    What is Llama 3.3 70B Instruct and who developed it?

    Llama 3.3 70B Instruct is a multilingual, instruction-tuned large language model developed by Meta AI. It is the December 2024 update to Llama 3.1 70B, offering improvements in reasoning, tool use, math, code generation, and multilingual capabilities.

    What applications and use cases does Llama 3.3 70B Instruct excel at?

    The model is optimized for:

    • Conversational AI
    • Code assistance
    • Agentic systems
    • Search and Enterprise RAG
    • Tool use and multilingual dialogue (supports 8 languages)
    What is the maximum context length for Llama 3.3 70B Instruct?

    Fireworks supports a context length of 131,072 tokens.

    What is the usable context window for Llama 3.3 70B Instruct?

    The full 131.1K tokens are usable on Fireworks on-demand deployments.

    Does Llama 3.3 70B Instruct support quantized formats (4-bit/8-bit)?

    Yes. The model supports 4-bit and 8-bit formats.

    What are known failure modes of Llama 3.3 70B Instruct?

    Meta outlines these limitations:

    • May produce inaccurate, biased, or objectionable content
    • Risk in unsupported languages
    • Tool-calling can pose security risk if not sandboxed
    • Potential for misuse in cyberattack or CBRNE scenarios
    • Requires additional safeguards for production use (e.g., Prompt Guard, Code Shield, Llama Guard 3)
    Does Llama 3.3 70B Instruct support function-calling schemas?

    No, function calling is not supported for this model.

    How many parameters does Llama 3.3 70B Instruct have?

    The model has 70.6 billion parameters.

    Is fine-tuning supported for Llama 3.3 70B Instruct?

    Yes. Fireworks supports LoRA-based fine-tuning for this model.

    What rate limits apply on the shared endpoint?

    The model is available via serverless at $0.90 per million tokens, and via on-demand deployment with no rate limits.

    What license governs commercial use of Llama 3.3 70B Instruct?

    The model is distributed under the Llama 3.3 Community License, a custom commercial license from Meta. Full license details are available here.

    Metadata

    State
    Ready
    Created on
    12/5/2024
    Kind
    Base model
    Provider
    Meta
    Hugging Face
    Llama-3.3-70B-Instruct

    Specification

    Calibrated
    Yes
    Mixture-of-Experts
    No
    Parameters
    70.6B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Supported
    Serverless LoRA
    Supported
    Context Length
    131.1k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported