Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/OpenAI/OpenAI gpt-oss-20b
fireworks/gpt-oss-20b

    Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-20b is used for lower latency, and local or specialized use-cases.

    OpenAI gpt-oss-20b API Features

    Fine-tuning

    Docs

    OpenAI gpt-oss-20b can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    Serverless

    Docs

    Immediately run model on pre-configured GPUs and pay-per-token

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for OpenAI gpt-oss-20b using Fireworks' reliable, high-performance system with no rate limits.

    Available Serverless

    Run queries immediately, pay only for usage

    $0.07 / $0.30
    Per 1M Tokens (input/output)

    gpt-oss-20b FAQs

    What is gpt-oss-20b and who developed it?

    gpt-oss-20b is an open-weight 21.5B parameter model developed by OpenAI. It is part of the "gpt-oss" series, optimized for lower latency and local or specialized tasks. The model was trained using OpenAI's Harmony response format and supports configurable reasoning depth for agentic applications.

    What applications and use cases does gpt-oss-20b excel at?

    gpt-oss-20b is designed for:

    • Function calling with schemas
    • Web browsing and browser automation
    • Agentic tasks
    • Chain-of-thought reasoning
    • Local and low-latency deployments

    It is particularly suited for scenarios where developers need customization and transparency in reasoning processes.

    What is the maximum context length for gpt-oss-20b?

    The maximum context length is 131,072 tokens on Fireworks AI.

    Does gpt-oss-20b support quantized formats (4-bit/8-bit)?

    Yes. gpt-oss-20b supports 8-bit precision and was post-trained using MXFP4 quantization of the MoE weights, making it compatible with 16GB memory deployments.

    Does gpt-oss-20b support streaming responses and function-calling schemas?

    Yes. The model natively supports function calling with defined schemas and is suitable for streaming scenarios, particularly when using OpenAI-compatible APIs such as vLLM.

    How many parameters does gpt-oss-20b have?

    The model has 21.5 billion parameters, of which 3.6 billion are active during inference (MoE architecture).

    Is fine-tuning supported for gpt-oss-20b?

    Yes. Fine-tuning for gpt-oss-20b is supported on Fireworks AI using LoRA.

    What license governs commercial use of gpt-oss-20b?

    The model is released under the Apache 2.0 license, which permits free use, modification, and commercial deployment without patent restrictions.

    Metadata

    State
    Ready
    Created on
    8/4/2025
    Kind
    Base model
    Provider
    OpenAI
    Hugging Face
    gpt-oss

    Specification

    Calibrated
    No
    Mixture-of-Experts
    Yes
    Parameters
    20.9B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Supported
    Serverless LoRA
    Not supported
    Context Length
    131.1k tokens
    Function Calling
    Not supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported