Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-120b is used for production, general purpose, high reasoning use-cases that fits into a single H100 GPU.
Fine-tuningDocs | OpenAI gpt-oss-120b can be customized with your data to improve responses. Fireworks uses LoRA to efficiently train and deploy your personalized model |
ServerlessDocs | OpenAI gpt-oss-120b is available via Fireworks' serverless API, where you pay per token. There are several ways to call the Fireworks API, including Fireworks' Python client, the REST API, or OpenAI's Python client. |
On-demand DeploymentDocs | On-demand deployments allow you to use OpenAI gpt-oss-120b on dedicated GPUs with Fireworks' high-performance serving stack with high reliability and no rate limits. |
Run queries immediately, pay only for usage
gpt-oss-120b is an open-weight large language model developed by OpenAI and released on August 5, 2025. It is designed for high-performance reasoning, agentic tasks, and general-purpose applications.
gpt-oss-120b is optimized for:
128K tokens.
The full 128K context is supported on Fireworks AI, though usable context depends on prompt length and model memory limits.
Yes. The model supports quantized versions, including 8-bit and MXFP4 precision for the MoE layer.
The default temperature of gpt-oss-120b is 0.7.
100 tokens (default in code example), but can be adjusted via the max_tokens parameter.
Yes. gpt-oss-120b supports agentic workflows including function calling (via schemas), tool use, and Harmony-format structured outputs.
117 billion total parameters, with 5.1 billion active parameters per forward pass (Mixture-of-Experts).
Yes. Fine-tuning is supported and available for gpt-oss-120b on Fireworks AI.
Fireworks AI charges per 1M tokens: $0.15 for input and $0.60 for output.
Apache 2.0 license — permissive for commercial use without restriction.