Mistral Small 24B Instruct 2501 API

What is Mistral Small 24B Instruct 2501 and who developed it?

Mistral Small 24B Instruct 2501 is an instruction-tuned version of the base Mistral Small 24B model, developed by Mistral AI. It is designed as a high-performance, "small" LLM (under 70B parameters) that competes with much larger models. It supports multilingual tasks and is well-suited for chat, reasoning, and structured output generation.

What applications and use cases does Mistral Small 24B Instruct 2501 excel at?

Conversational AI
Code assistance
Agentic systems
Search and Enterprise RAG
Tool calling (via vLLM)
Multilingual tasks

Its performance is validated across generalist, reasoning, and coding benchmarks.

What is the maximum context length for Mistral Small 24B Instruct 2501?

The model supports a context window of 32,768 tokens.

What is the usable context window for Mistral Small 24B Instruct 2501?

The full 32.8K token context window is available on Fireworks' on-demand deployments with no rate limits.

What are known failure modes of Mistral Small 24B Instruct 2501?

No image input, embeddings, or reranker support
Function calling is supported, but only in vLLM-compatible setups, not in Fireworks-native API
Requires 55–60 GB GPU RAM for FP16 inference
Does not support streaming responses

How many parameters does Mistral Small 24B Instruct 2501 have?

The model has 23.6 billion parameters.

Is fine-tuning supported for Mistral Small 24B Instruct 2501?

Yes. Fireworks supports LoRA-based fine-tuning through its RFT infrastructure.

How are tokens counted (prompt vs completion)?

Fireworks charges based on combined input + output token usage.

What rate limits apply on the shared endpoint?

Serverless: Not supported
On-demand: Available with no rate limits using dedicated GPUs

What license governs commercial use of Mistral Small 24B Instruct 2501?

The model is released under the Apache 2.0 license, allowing unrestricted commercial use.

Fine-tuning Docs	Mistral Small 24B Instruct 2501 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
On-demand Deployment Docs	On-demand deployments give you dedicated GPUs for Mistral Small 24B Instruct 2501 using Fireworks' reliable, high-performance system with no rate limits.

Mistral Small 24B Instruct 2501

Mistral Small 24B Instruct 2501 API Features

Fine-tuning

On-demand Deployment

Mistral Small 24B Instruct 2501 FAQs

Metadata

Specification

Supported Functionality