GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/Mistral/Mixtral MoE 8x7B Instruct
Mistral Logo Icon

Mixtral MoE 8x7B Instruct

Ready
model path:accounts/fireworks/models/mixtral-8x7b-instruct

Mixtral MoE 8x7B Instruct is the instruction-tuned version of Mixtral MoE 8x7B and has the chat completions API enabled.

Mixtral MoE 8x7B Instruct API Features

On-demand Deployment

Docs

On-demand deployments allow you to use Mixtral MoE 8x7B Instruct on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Mixtral MoE 8x7B Instruct FAQs

What is Mixtral MoE 8x7B Instruct and who developed it?

Mixtral MoE 8x7B Instruct is an instruction-tuned sparse Mixture-of-Experts (MoE) model developed by Mistral AI. It fine-tunes the base Mixtral-8x7B model for conversational and instruction-following tasks.

What applications and use cases does Mixtral MoE 8x7B Instruct excel at?

The model is designed for:

  • Conversational AI
  • Code assistance
  • Agentic systems
  • Search and enterprise RAG
  • Multimedia reasoning (text-only)

It outperforms Llama 2 70B across several benchmarks, per Mistral’s internal evaluations.

What is the maximum context length for Mixtral MoE 8x7B Instruct?

The model supports a context window of 32,768 tokens.

What is the usable context window for Mixtral MoE 8x7B Instruct?

Fireworks supports the full 32.8K token context window on on-demand GPU deployments, with no rate limits.

What are known failure modes of Mixtral MoE 8x7B Instruct?
  • No function calling support
  • No image input or multimodal support
  • No embeddings or reranking capabilities
  • Unmoderated outputs: No safety alignment or content filtering is applied
How many parameters does Mixtral MoE 8x7B Instruct have?

The model has 46.7 billion active parameters, drawn from 8 experts of 7B each, with 2 experts activated per forward pass.

Is fine-tuning supported for Mixtral MoE 8x7B Instruct?
  • Standard fine-tuning: Not supported
  • LoRA fine-tuning: Supported on Fireworks via Serverless LoRA
How are tokens counted (prompt vs completion)?

Tokens are counted across input + output, within the 32.8K context limit.

What rate limits apply on the shared endpoint?
  • Serverless: Not supported
  • On-demand: Supported with no rate limits via dedicated GPUs
What license governs commercial use of Mixtral MoE 8x7B Instruct?

The model is licensed under the Apache 2.0 license, which allows unrestricted commercial use.

Metadata

State
Ready
Created on
12/11/2023
Kind
Base model
Provider
Mistral

Specification

Calibrated
No
Mixture-of-Experts
Yes
Parameters
46.7B

Supported Functionality

Fine-tuning
Not supported
Serverless
Not supported
Context Length
32.7k tokens
Function Calling
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported