Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-20b is used for lower latency, and local or specialized use-cases.
OpenAI gpt-oss-20b can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
Learn MoreImmediately run model on pre-configured GPUs and pay-per-token
Learn MoreOn-demand deployments give you dedicated GPUs for OpenAI gpt-oss-20b using Fireworks' reliable, high-performance system with no rate limits.
Learn MoreOpenAI gpt-oss-20b is an open-weight 21.5B parameter model developed by OpenAI. It is part of the "gpt-oss" series, optimized for lower latency and local or specialized tasks. The model was trained using OpenAI's Harmony response format and supports configurable reasoning depth for agentic applications.
gpt-oss-20b is designed for:
Function calling with schemas
Web browsing and browser automation
Agentic tasks
Chain-of-thought reasoning
Local and low-latency deployments
It is particularly suited for scenarios where developers need customization and transparency in reasoning processes.
The maximum context length is 131,072 tokens on Fireworks AI.
Yes. gpt-oss-20b supports 8-bit precision and was post-trained using MXFP4 quantization of the MoE weights, making it compatible with 16GB memory deployments.
Yes. The model natively supports function calling with defined schemas and is suitable for streaming scenarios, particularly when using OpenAI-compatible APIs such as vLLM.
The model has 21.5 billion parameters, of which 3.6 billion are active during inference (MoE architecture).
Yes. Fine-tuning is supported on Fireworks AI using LoRA.
The model is released under the Apache 2.0 license, which permits free use, modification, and commercial deployment without patent restrictions.
OpenAI
131072
Available
Available
$0.07 / $0.3