DeepSeek V3.1, a new state of the art open weight models for agentic reasoning, tool use, and coding, is now available! Try Now

OpenAi Logo MArk

OpenAI gpt-oss-120b

Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-120b is used for production, general purpose, high reasoning use-cases that fits into a single H100 GPU.

Try Model

Fireworks Features

Fine-tuning

OpenAI gpt-oss-120b can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model

Learn More

Serverless

Immediately run model on pre-configured GPUs and pay-per-token

Learn More

On-demand Deployment

On-demand deployments give you dedicated GPUs for OpenAI gpt-oss-120b using Fireworks' reliable, high-performance system with no rate limits.

Learn More

FAQs

What is gpt-oss-120b and who developed it?

gpt-oss-120b is an open-weight large language model developed by OpenAI and released on August 5, 2025. It is designed for high-performance reasoning, agentic tasks, and general-purpose applications.

What applications and use cases does gpt-oss-120b excel at?

gpt-oss-120b is optimized for:

Complex reasoning and structured problem-solving (especially with chain-of-thought)

Agentic workflows (tool use, web browsing, function calling)

Production-grade general-purpose tasks (e.g., coding, math, science)

Use cases that benefit from adjustable reasoning levels

What is the maximum context length for gpt-oss-120b**?**

128K tokens.

What is the usable context window for gpt-oss-120b?

Fireworks states the full 128K context is supported, though usable context depends on prompt length and model memory limits.

Does gpt-oss-120b support quantized formats?

Yes. The model supports quantized versions, including 8-bit and MXFP4 precision for the MoE layer.

What is the default temperature of gpt-oss-120b on Fireworks AI?

The default temperature of gpt-oss-120b is 0.7.

What is the maximum output length Fireworks allows for gpt-oss-120b?

100 tokens (default in code example), but can be adjusted via the max_tokens parameter.

Does gpt-oss-120b support streaming responses and function-calling schemas?

Yes. gpt-oss-120b supports agentic workflows including function calling (via schemas), tool use, and Harmony-format structured outputs.

How many parameters does gpt-oss-120b have?

117 billion total parameters, with 5.1 billion active parameters per forward pass (Mixture-of-Experts).

Is fine-tuning supported for gpt-oss-120b?

Yes. Fine-tuning is supported and available for GPT-OSS-120b on Fireworks AI.

How are tokens counted (prompt vs completion)?

Fireworks AI charges per 1M tokens: $0.15 for input and $0.60 for output.

What license governs commercial use of gpt-oss-120b?

Apache 2.0 license — permissive for commercial use without restriction.

Info

Provider

OpenAI

Model Type

LLM

Context Length

131072

Serverless

Available

Fine-Tuning

Available

Pricing Per 1M Tokens Input/Output

$0.15 / $0.6