Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-120b is used for production, general purpose, high reasoning use-cases that fits into a single H100 GPU.
Fine-tuningDocs | OpenAI gpt-oss-120b can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
ServerlessDocs | Immediately run model on pre-configured GPUs and pay-per-token |
On-demand DeploymentDocs | On-demand deployments give you dedicated GPUs for OpenAI gpt-oss-120b using Fireworks' reliable, high-performance system with no rate limits. |
Run queries immediately, pay only for usage
gpt-oss-120b is an open-weight large language model developed by OpenAI and released on August 5, 2025. It is designed for high-performance reasoning, agentic tasks, and general-purpose applications.
gpt-oss-120b is optimized for:
128K tokens.
The full 128K context is supported on Fireworks AI, though usable context depends on prompt length and model memory limits.
Yes. The model supports quantized versions, including 8-bit and MXFP4 precision for the MoE layer.
The default temperature of gpt-oss-120b is 0.7.
100 tokens (default in code example), but can be adjusted via the max_tokens parameter.
Yes. gpt-oss-120b supports agentic workflows including function calling (via schemas), tool use, and Harmony-format structured outputs.
117 billion total parameters, with 5.1 billion active parameters per forward pass (Mixture-of-Experts).
Yes. Fine-tuning is supported and available for gpt-oss-120b on Fireworks AI.
Fireworks AI charges per 1M tokens: $0.15 for input and $0.60 for output.
Apache 2.0 license — permissive for commercial use without restriction.