Serverless 2.0 is live: control reliability & speed without reserved capacity. Get Started.

Model Library
/OpenAI/OpenAI gpt-oss-20b
model path:accounts/fireworks/models/gpt-oss-20b

Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-20b is used for lower latency, and local or specialized use-cases.

OpenAI gpt-oss-20b API Features

Fine-tuning

Docs

OpenAI gpt-oss-20b can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

Serverless

Docs

OpenAI gpt-oss-20b is available via Fireworks' serverless API, where you pay per token. There are several ways to call the Fireworks API, including Fireworks' Python client, the REST API, or OpenAI's Python client.

On-demand Deployment

Docs

On-demand deployments allow you to use OpenAI gpt-oss-20b on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Available Serverless

Run queries immediately, pay only for usage

$0.07 / $0.04 / $0.30
Per 1M Tokens (input/cached input/output)

gpt-oss-20b FAQs

What is gpt-oss-20b and who developed it?

gpt-oss-20b is an open-weight 21.5B parameter model developed by OpenAI. It is part of the "gpt-oss" series, optimized for lower latency and local or specialized tasks. The model was trained using OpenAI's Harmony response format and supports configurable reasoning depth for agentic applications.

What applications and use cases does gpt-oss-20b excel at?

gpt-oss-20b is designed for:

  • Function calling with schemas
  • Web browsing and browser automation
  • Agentic tasks
  • Chain-of-thought reasoning
  • Local and low-latency deployments

It is particularly suited for scenarios where developers need customization and transparency in reasoning processes.

What is the maximum context length for gpt-oss-20b?

The maximum context length is 131,072 tokens on Fireworks AI.

Does gpt-oss-20b support quantized formats (4-bit/8-bit)?

Yes. gpt-oss-20b supports 8-bit precision and was post-trained using MXFP4 quantization of the MoE weights, making it compatible with 16GB memory deployments.

Does gpt-oss-20b support streaming responses and function-calling schemas?

Yes. The model natively supports function calling with defined schemas and is suitable for streaming scenarios, particularly when using OpenAI-compatible APIs such as vLLM.

How many parameters does gpt-oss-20b have?

The model has 21.5 billion parameters, of which 3.6 billion are active during inference (MoE architecture).

Is fine-tuning supported for gpt-oss-20b?

Yes. Fine-tuning for gpt-oss-20b is supported on Fireworks AI using LoRA.

What license governs commercial use of gpt-oss-20b?

The model is released under the Apache 2.0 license, which permits free use, modification, and commercial deployment without patent restrictions.

Metadata

State
Ready
Created on
8/4/2025
Kind
Base model
Provider
OpenAI
Hugging Face
openai/gpt-oss-20b

Specification

Calibrated
No
Mixture-of-Experts
Yes
Parameters
20.9B

Supported Functionality

Fine-tuning
Supported
Serverless
Supported
Context Length
131k tokens
Function Calling
Not supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported