GLM 5.2 is now available on Serverless. Try it today.

Model Library
/Mistral/Mistral Small 24B Instruct 2501
Mistral Logo Icon

Mistral Small 24B Instruct 2501

Ready
model path:accounts/fireworks/models/mistral-small-24b-instruct-2501

Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models!

Mistral Small 24B Instruct 2501 API Features

Fine-tuning

Docs

Mistral Small 24B Instruct 2501 can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

On-demand Deployment

Docs

On-demand deployments allow you to use Mistral Small 24B Instruct 2501 on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Mistral Small 24B Instruct 2501 FAQs

What is Mistral Small 24B Instruct 2501 and who developed it?

Mistral Small 24B Instruct 2501 is an instruction-tuned version of the base Mistral Small 24B model, developed by Mistral AI. It is designed as a high-performance, "small" LLM (under 70B parameters) that competes with much larger models. It supports multilingual tasks and is well-suited for chat, reasoning, and structured output generation.

What applications and use cases does Mistral Small 24B Instruct 2501 excel at?
  • Conversational AI
  • Code assistance
  • Agentic systems
  • Search and Enterprise RAG
  • Tool calling (via vLLM)
  • Multilingual tasks

Its performance is validated across generalist, reasoning, and coding benchmarks.

What is the maximum context length for Mistral Small 24B Instruct 2501?

The model supports a context window of 32,768 tokens.

What is the usable context window for Mistral Small 24B Instruct 2501?

The full 32.8K token context window is available on Fireworks' on-demand deployments with no rate limits.

What are known failure modes of Mistral Small 24B Instruct 2501?
  • No image input, embeddings, or reranker support
  • Function calling is supported, but only in vLLM-compatible setups, not in Fireworks-native API
  • Requires 55–60 GB GPU RAM for FP16 inference
  • Does not support streaming responses
How many parameters does Mistral Small 24B Instruct 2501 have?

The model has 23.6 billion parameters.

Is fine-tuning supported for Mistral Small 24B Instruct 2501?

Yes. Fireworks supports LoRA-based fine-tuning through its RFT infrastructure.

How are tokens counted (prompt vs completion)?

Fireworks charges based on combined input + output token usage.

What rate limits apply on the shared endpoint?
  • Serverless: Not supported
  • On-demand: Available with no rate limits using dedicated GPUs
What license governs commercial use of Mistral Small 24B Instruct 2501?

The model is released under the Apache 2.0 license, allowing unrestricted commercial use.

Metadata

State
Ready
Created on
1/30/2025
Kind
Base model
Provider
Mistral

Specification

Calibrated
No
Mixture-of-Experts
No
Parameters
23.5B

Supported Functionality

Fine-tuning
Supported
Serverless
Not supported
Context Length
32.7k tokens
Function Calling
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported