DeepSeek V4 Pro is Live → Try it now.

Model Library
/OpenAI/OpenAI gpt-oss-120b
accounts/fireworks/models/gpt-oss-120b

    Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-120b is used for production, general purpose, high reasoning use-cases that fits into a single H100 GPU.

    OpenAI gpt-oss-120b API Features

    Fine-tuning

    Docs

    OpenAI gpt-oss-120b can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

    Serverless

    Docs

    OpenAI gpt-oss-120b is available via Fireworks' serverless API, where you pay per token. There are several ways to call the Fireworks API, including Fireworks' Python client, the REST API, or OpenAI's Python client.

    On-demand Deployment

    Docs

    On-demand deployments allow you to use OpenAI gpt-oss-120b on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

    Available Serverless

    Run queries immediately, pay only for usage

    $0.15 / $0.01 / $0.60
    Per 1M Tokens (input/cached input/output)
    What is gpt-oss-120b and who developed it?

    gpt-oss-120b is an open-weight large language model developed by OpenAI and released on August 5, 2025. It is designed for high-performance reasoning, agentic tasks, and general-purpose applications.

    What applications and use cases does gpt-oss-120b excel at?

    gpt-oss-120b is optimized for:

    • Complex reasoning and structured problem-solving (especially with chain-of-thought)
    • Agentic workflows (tool use, web browsing, function calling)
    • Production-grade general-purpose tasks (e.g., coding, math, science)
    • Use cases that benefit from adjustable reasoning levels
    What is the maximum context length for gpt-oss-120b?

    128K tokens.

    What is the usable context window for gpt-oss-120b?

    The full 128K context is supported on Fireworks AI, though usable context depends on prompt length and model memory limits.

    Does gpt-oss-120b support quantized formats?

    Yes. The model supports quantized versions, including 8-bit and MXFP4 precision for the MoE layer.

    What is the default temperature of gpt-oss-120b on Fireworks AI?

    The default temperature of gpt-oss-120b is 0.7.

    What is the maximum output length Fireworks allows for gpt-oss-120b?

    100 tokens (default in code example), but can be adjusted via the max_tokens parameter.

    Does gpt-oss-120b support streaming responses and function-calling schemas?

    Yes. gpt-oss-120b supports agentic workflows including function calling (via schemas), tool use, and Harmony-format structured outputs.

    How many parameters does gpt-oss-120b have?

    117 billion total parameters, with 5.1 billion active parameters per forward pass (Mixture-of-Experts).

    Is fine-tuning supported for gpt-oss-120b?

    Yes. Fine-tuning is supported and available for gpt-oss-120b on Fireworks AI.

    How are tokens counted (prompt vs completion)?

    Fireworks AI charges per 1M tokens: $0.15 for input and $0.60 for output.

    What license governs commercial use of gpt-oss-120b?

    Apache 2.0 license — permissive for commercial use without restriction.

    Metadata

    State
    Ready
    Created on
    8/4/2025
    Kind
    Base model
    Provider
    OpenAI

    Specification

    Calibrated
    No
    Mixture-of-Experts
    Yes
    Parameters
    116B

    Supported Functionality

    Fine-tuning
    Supported
    Serverless
    Supported
    Context Length
    131k tokens
    Function Calling
    Supported
    Embeddings
    Not supported
    Rerankers
    Not supported
    Support image input
    Not supported