GLM 5.2 is live! Opus-level intelligence at open-source rates. Pay per token on serverless. Try it today.

Model Library
/OpenAI/OpenAI gpt-oss-120b
model path:accounts/fireworks/models/gpt-oss-120b

Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-120b is used for production, general purpose, high reasoning use-cases that fits into a single H100 GPU.

OpenAI gpt-oss-120b API Features

Fine-tuning

Docs

OpenAI gpt-oss-120b can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

Serverless

Docs

OpenAI gpt-oss-120b is available via Fireworks' serverless API, where you pay per token. There are several ways to call the Fireworks API, including Fireworks' Python client, the REST API, or OpenAI's Python client.

On-demand Deployment

Docs

On-demand deployments allow you to use OpenAI gpt-oss-120b on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits.

Available Serverless

Run queries immediately, pay only for usage

$0.15 / $0.01 / $0.60
Per 1M Tokens (input/cached input/output)
What is gpt-oss-120b and who developed it?

gpt-oss-120b is an open-weight large language model developed by OpenAI and released on August 5, 2025. It is designed for high-performance reasoning, agentic tasks, and general-purpose applications.

What applications and use cases does gpt-oss-120b excel at?

gpt-oss-120b is optimized for:

  • Complex reasoning and structured problem-solving (especially with chain-of-thought)
  • Agentic workflows (tool use, web browsing, function calling)
  • Production-grade general-purpose tasks (e.g., coding, math, science)
  • Use cases that benefit from adjustable reasoning levels
What is the maximum context length for gpt-oss-120b?

128K tokens.

What is the usable context window for gpt-oss-120b?

The full 128K context is supported on Fireworks AI, though usable context depends on prompt length and model memory limits.

Does gpt-oss-120b support quantized formats?

Yes. The model supports quantized versions, including 8-bit and MXFP4 precision for the MoE layer.

What is the default temperature of gpt-oss-120b on Fireworks AI?

The default temperature of gpt-oss-120b is 0.7.

What is the maximum output length Fireworks allows for gpt-oss-120b?

100 tokens (default in code example), but can be adjusted via the max_tokens parameter.

Does gpt-oss-120b support streaming responses and function-calling schemas?

Yes. gpt-oss-120b supports agentic workflows including function calling (via schemas), tool use, and Harmony-format structured outputs.

How many parameters does gpt-oss-120b have?

117 billion total parameters, with 5.1 billion active parameters per forward pass (Mixture-of-Experts).

Is fine-tuning supported for gpt-oss-120b?

Yes. Fine-tuning is supported and available for gpt-oss-120b on Fireworks AI.

How are tokens counted (prompt vs completion)?

Fireworks AI charges per 1M tokens: $0.15 for input and $0.60 for output.

What license governs commercial use of gpt-oss-120b?

Apache 2.0 license — permissive for commercial use without restriction.

Metadata

State
Ready
Created on
8/4/2025
Kind
Base model
Provider
OpenAI

Specification

Calibrated
No
Mixture-of-Experts
Yes
Parameters
116B

Supported Functionality

Fine-tuning
Supported
Serverless
Supported
Context Length
131k tokens
Function Calling
Supported
Embeddings
Not supported
Rerankers
Not supported
Support image input
Not supported