Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Meta/Llama 3.1 70B Instruct
Meta Mark

Llama 3.1 70B Instruct

Ready
fireworks/llama-v3p1-70b-instruct

    The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes. The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

    Llama 3.1 70B Instruct API Features

    Fine-tuning

    Docs

    Llama 3.1 70B Instruct can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Llama 3.1 70B Instruct using Fireworks' reliable, high-performance system with no rate limits.

    Llama 3.1 70B Instruct FAQs

    What is Llama 3.1 70B Instruct and who developed it?

    Llama 3.1 70B Instruct is a multilingual instruction-tuned large language model developed by Meta AI. It is part of the Llama 3.1 family, which includes 8B, 70B, and 405B models. The 70B Instruct variant is fine-tuned with supervised data and RLHF for assistant-like use cases.

    What applications and use cases does Llama 3.1 70B Instruct excel at?

    The model is optimized for:

    • Conversational AI
    • Code assistance
    • Agentic systems
    • Search
    • Enterprise RAG

    It supports multilingual dialogue, with performance benchmarks across 8 languages.

    What is the maximum context length for Llama 3.1 70B Instruct?

    Fireworks supports a maximum context length of 131,072 tokens.

    What is the usable context window for Llama 3.1 70B Instruct?

    The full 131.1K token window is usable on Fireworks AI’s infrastructure.

    Does Llama 3.1 70B Instruct support quantized formats (4-bit/8-bit)?

    Yes, it supports 4-bit and 8-bit formats.

    What are known failure modes of Llama 3.1 70B Instruct?

    Meta reports risks in adversarial settings such as:

    • Producing inaccurate, biased, or harmful outputs
    • Security vulnerabilities in agentic use
    • Multilingual degradation outside the 8 supported languages

    Refer to Meta’s Responsible Use Guide and red teaming reports for further details.

    Does Llama 3.1 70B Instruct support function-calling schemas?

    Yes, function calling is supported for this model.

    How many parameters does Llama 3.1 70B Instruct have?

    The model has 70.6 billion parameters.

    Is fine-tuning supported for Llama 3.1 70B Instruct?

    Yes. Fireworks supports LoRA-based fine-tuning for this model.

    What rate limits apply on the shared endpoint?

    Serverless is available at $0.90 per million tokens, while on-demand has no rate limits and uses dedicated GPU allocation.

    What license governs commercial use of Llama 3.1 70B Instruct?

    The model is governed by the Llama 3.1 Community License, a custom commercial license available via Meta’s GitHub.

    Metadata

    State
    Ready
    Created on
    7/18/2024
    Kind
    Base model
    Provider
    Meta
    Hugging Face
    Llama-3.1-70B-Instruct

    Specification

    Calibrated
    Yes
    Mixture-of-Experts
    No
    Parameters
    70.6B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Not supported
    Serverless LoRA
    Supported
    Context Length
    131.1k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported