Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-120b is used for production, general purpose, high reasoning use-cases that fits into a single H100 GPU.
OpenAI gpt-oss-120b can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model
Learn MoreImmediately run model on pre-configured GPUs and pay-per-token
Learn MoreOn-demand deployments give you dedicated GPUs for OpenAI gpt-oss-120b using Fireworks' reliable, high-performance system with no rate limits.
Learn Moregpt-oss-120b is an open-weight large language model developed by OpenAI and released on August 5, 2025. It is designed for high-performance reasoning, agentic tasks, and general-purpose applications.
gpt-oss-120b is optimized for:
Complex reasoning and structured problem-solving (especially with chain-of-thought)
Agentic workflows (tool use, web browsing, function calling)
Production-grade general-purpose tasks (e.g., coding, math, science)
Use cases that benefit from adjustable reasoning levels
128K tokens.
Fireworks states the full 128K context is supported, though usable context depends on prompt length and model memory limits.
Yes. The model supports quantized versions, including 8-bit and MXFP4 precision for the MoE layer.
The default temperature of gpt-oss-120b is 0.7.
100 tokens (default in code example), but can be adjusted via the max_tokens
parameter.
Yes. gpt-oss-120b supports agentic workflows including function calling (via schemas), tool use, and Harmony-format structured outputs.
117 billion total parameters, with 5.1 billion active parameters per forward pass (Mixture-of-Experts).
Yes. Fine-tuning is supported and available for GPT-OSS-120b on Fireworks AI.
Fireworks AI charges per 1M tokens: $0.15 for input and $0.60 for output.
Apache 2.0 license — permissive for commercial use without restriction.
OpenAI
131072
Available
Available
$0.15 / $0.6