Fireworks RFT now available! Fine-tune open models that outperform frontier models. Try today

Model Library
/OpenAI/OpenAI gpt-oss-120b
fireworks/gpt-oss-120b

    Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-120b is used for production, general purpose, high reasoning use-cases that fits into a single H100 GPU.

    OpenAI gpt-oss-120b API Features

    Fine-tuning

    Docs

    OpenAI gpt-oss-120b can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    Serverless

    Docs

    Immediately run model on pre-configured GPUs and pay-per-token

    On-demand Deployment

    Docs

    On-demand deployments give you dedicated GPUs for OpenAI gpt-oss-120b using Fireworks' reliable, high-performance system with no rate limits.

    Available Serverless

    Run queries immediately, pay only for usage

    $0.15 / $0.60
    Per 1M Tokens (input/output)
    What is gpt-oss-120b and who developed it?

    gpt-oss-120b is an open-weight large language model developed by OpenAI and released on August 5, 2025. It is designed for high-performance reasoning, agentic tasks, and general-purpose applications.

    What applications and use cases does gpt-oss-120b excel at?

    gpt-oss-120b is optimized for:

    • Complex reasoning and structured problem-solving (especially with chain-of-thought)
    • Agentic workflows (tool use, web browsing, function calling)
    • Production-grade general-purpose tasks (e.g., coding, math, science)
    • Use cases that benefit from adjustable reasoning levels
    What is the maximum context length for gpt-oss-120b?

    128K tokens.

    What is the usable context window for gpt-oss-120b?

    The full 128K context is supported on Fireworks AI, though usable context depends on prompt length and model memory limits.

    Does gpt-oss-120b support quantized formats?

    Yes. The model supports quantized versions, including 8-bit and MXFP4 precision for the MoE layer.

    What is the default temperature of gpt-oss-120b on Fireworks AI?

    The default temperature of gpt-oss-120b is 0.7.

    What is the maximum output length Fireworks allows for gpt-oss-120b?

    100 tokens (default in code example), but can be adjusted via the max_tokens parameter.

    Does gpt-oss-120b support streaming responses and function-calling schemas?

    Yes. gpt-oss-120b supports agentic workflows including function calling (via schemas), tool use, and Harmony-format structured outputs.

    How many parameters does gpt-oss-120b have?

    117 billion total parameters, with 5.1 billion active parameters per forward pass (Mixture-of-Experts).

    Is fine-tuning supported for gpt-oss-120b?

    Yes. Fine-tuning is supported and available for gpt-oss-120b on Fireworks AI.

    How are tokens counted (prompt vs completion)?

    Fireworks AI charges per 1M tokens: $0.15 for input and $0.60 for output.

    What license governs commercial use of gpt-oss-120b?

    Apache 2.0 license — permissive for commercial use without restriction.

    Metadata

    State
    Ready
    Created on
    8/4/2025
    Kind
    Base model
    Provider
    OpenAI
    Hugging Face
    gpt-oss

    Specification

    Calibrated
    No
    Mixture-of-Experts
    Yes
    Parameters
    116.8B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Supported
    Serverless LoRA
    Not supported
    Context Length
    131.1k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported