Mixtral MoE 8x7B Instruct is the instruction-tuned version of Mixtral MoE 8x7B and has the chat completions API enabled.
On-demand DeploymentDocs | On-demand deployments allow you to use Mixtral MoE 8x7B Instruct on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits. |
Mixtral MoE 8x7B Instruct is an instruction-tuned sparse Mixture-of-Experts (MoE) model developed by Mistral AI. It fine-tunes the base Mixtral-8x7B model for conversational and instruction-following tasks.
The model is designed for:
It outperforms Llama 2 70B across several benchmarks, per Mistral’s internal evaluations.
The model supports a context window of 32,768 tokens.
Fireworks supports the full 32.8K token context window on on-demand GPU deployments, with no rate limits.
The model has 46.7 billion active parameters, drawn from 8 experts of 7B each, with 2 experts activated per forward pass.
Tokens are counted across input + output, within the 32.8K context limit.
The model is licensed under the Apache 2.0 license, which allows unrestricted commercial use.