This is a deep dive analysis of gpt-oss (20b & 120b), released by OpenAI on 5th Aug 2025. This blog explores its capabilities, technical architecture, benchmarks, and practical applications for developers.
OpenAI is finally back to living up to its name of building “open models”. After GPT-2, this is the first set of open-source LLMs coming from OpenAI.
OpenAI's new open-source models, gpt-oss-20b and gpt-oss-120b, are very strong reasoning models that excel at problem solving and tool calling. Both models support long context windows and adjustable reasoning levels. That makes them a great choice for agentic use cases.
Try out the new OpenAI gpt-oss-120b & gpt-oss-20b on Fireworks AI!
The following table is an evaluation across multiple benchmarks and reasoning levels for both the gpt-oss-20b and gpt-oss-120b
The following table showcases the Main capabilities evaluations, where gpt-oss models are compared at reasoning level with other OpenAI closed-models including - high to OpenAI’s o3, o3-mini, and o4-mini on canonical benchmarks.
The gpt-oss-120b model surpasses OpenAI o3-mini and approaches OpenAI o4-mini accuracy. The smaller gpt-oss-20b model is also surprisingly competitive, despite being 6 times smaller than gpt-oss-120b.
A fair comparison of gpt-oss against leading commercial models including Kimi, GLM, Qwen and DeepSeek, highlighting areas where it excels and areas for improvement.
After pre-training on massive text data, thegpt-oss models go through a dedicated post-training phase to refine their reasoning abilities and tool usage, drawing from similar Chain-of-Thought (CoT) reinforcement learning techniques used in OpenAI's o3 models.
This phase trains the models on complex, multi-step tasks across coding, math, and science, helping them develop structured problem-solving capabilities and a personality similar to ChatGPT.
OpenAI introduced a new format called the Harmony Chat Format- a flexible, role-aware, message-based structure for interactive conversations.
💡 If you're deploying gpt-oss models, using Harmony Format correctly is essential for unlocking their full capabilities, especially in multi-turn chats.
The models are trained to support three reasoning levels- low, medium, and high, configured in the system prompt (e.g., Reasoning: high).
As you increase the reasoning level, the model produces longer and more structured CoT traces, allowing it to think through problems with greater depth.
4.3 Agentic Tool Usegpt-oss models are also trained to work with a range of tools in agentic workflows:
These tools can be turned on or off using system prompts, and OpenAI provides basic harnesses and an open-source implementation to help developers integrate them into real-world apps.
You can run both the gpt-oss models (gpt-oss-120b & gpt-oss-20b) on Fireworks AI Model Library via UI.
1234567891011121314
We’re also excited to announce a joint effort between Fireworks AI and AMD to bring OpenAI models to AMD’s latest MI355 GPUs. This collaboration will make powerful AI models more accessible and cost-efficient, coming soon to the Fireworks AI platform.
Try out the new OpenAI gpt-oss models now!