Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/Mistral/Mixtral MoE 8x7B Instruct
Mistral Logo Icon

Mixtral MoE 8x7B Instruct

Ready
fireworks/mixtral-8x7b-instruct

    Mixtral MoE 8x7B Instruct is the instruction-tuned version of Mixtral MoE 8x7B and has the chat completions API enabled.

    Mixtral MoE 8x7B Instruct API Features

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for Mixtral MoE 8x7B Instruct using Fireworks' reliable, high-performance system with no rate limits.

    Mixtral MoE 8x7B Instruct FAQs

    What is Mixtral MoE 8x7B Instruct and who developed it?

    Mixtral MoE 8x7B Instruct is an instruction-tuned sparse Mixture-of-Experts (MoE) model developed by Mistral AI. It fine-tunes the base Mixtral-8x7B model for conversational and instruction-following tasks.

    What applications and use cases does Mixtral MoE 8x7B Instruct excel at?

    The model is designed for:

    • Conversational AI
    • Code assistance
    • Agentic systems
    • Search and enterprise RAG
    • Multimedia reasoning (text-only)

    It outperforms Llama 2 70B across several benchmarks, per Mistral’s internal evaluations.

    What is the maximum context length for Mixtral MoE 8x7B Instruct?

    The model supports a context window of 32,768 tokens.

    What is the usable context window for Mixtral MoE 8x7B Instruct?

    Fireworks supports the full 32.8K token context window on on-demand GPU deployments, with no rate limits.

    What are known failure modes of Mixtral MoE 8x7B Instruct?
    • No function calling support
    • No image input or multimodal support
    • No embeddings or reranking capabilities
    • Unmoderated outputs: No safety alignment or content filtering is applied
    How many parameters does Mixtral MoE 8x7B Instruct have?

    The model has 46.7 billion active parameters, drawn from 8 experts of 7B each, with 2 experts activated per forward pass.

    Is fine-tuning supported for Mixtral MoE 8x7B Instruct?
    • Standard fine-tuning: Not supported
    • LoRA fine-tuning: Supported on Fireworks via Serverless LoRA
    How are tokens counted (prompt vs completion)?

    Tokens are counted across input + output, within the 32.8K context limit.

    What rate limits apply on the shared endpoint?
    • Serverless: Not supported
    • On-demand: Supported with no rate limits via dedicated GPUs
    What license governs commercial use of Mixtral MoE 8x7B Instruct?

    The model is licensed under the Apache 2.0 license, which allows unrestricted commercial use.

    Metadata

    State
    Ready
    Created on
    12/11/2023
    Kind
    Base model
    Provider
    Mistral
    Hugging Face
    Mixtral-8x7B-Instruct-v0.1

    Specification

    Calibrated
    No
    Mixture-of-Experts
    Yes
    Parameters
    46.7B

    Supported Functionality

    Fine-tuning
    Not supported
    Serverless
    Not supported
    Serverless LoRA
    Supported
    Context Length
    32.8k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported