Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-20b is used for lower latency, and local or specialized use-cases.
Fine-tuningDocs | OpenAI gpt-oss-20b can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
ServerlessDocs | Immediately run model on pre-configured GPUs and pay-per-token |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for OpenAI gpt-oss-20b using Fireworks' reliable, high-performance system with no rate limits. |
Run queries immediately, pay only for usage
gpt-oss-20b is an open-weight 21.5B parameter model developed by OpenAI. It is part of the "gpt-oss" series, optimized for lower latency and local or specialized tasks. The model was trained using OpenAI's Harmony response format and supports configurable reasoning depth for agentic applications.
gpt-oss-20b is designed for:
It is particularly suited for scenarios where developers need customization and transparency in reasoning processes.
The maximum context length is 131,072 tokens on Fireworks AI.
Yes. gpt-oss-20b supports 8-bit precision and was post-trained using MXFP4 quantization of the MoE weights, making it compatible with 16GB memory deployments.
Yes. The model natively supports function calling with defined schemas and is suitable for streaming scenarios, particularly when using OpenAI-compatible APIs such as vLLM.
The model has 21.5 billion parameters, of which 3.6 billion are active during inference (MoE architecture).
Yes. Fine-tuning for gpt-oss-20b is supported on Fireworks AI using LoRA.
The model is released under the Apache 2.0 license, which permits free use, modification, and commercial deployment without patent restrictions.