OpenAI gpt-oss-120b & 20b, open weight models designed for reasoning, agentic tasks, and versatile developer use cases is now available! Try Now

Blog
Openai Gpt Oss

Introducing OpenAI gpt-oss (20b & 120b)

GPT-OSS Models

This is a deep dive analysis of gpt-oss (20b & 120b), released by OpenAI on 5th Aug 2025. This blog explores its capabilities, technical architecture, benchmarks, and practical applications for developers.

1. TL;DR Summary

OpenAI is finally back to living up to its name of building “open models”. After GPT-2, this is the first set of open-source LLMs coming from OpenAI.

OpenAI's new open-source models, gpt-oss-20b and gpt-oss-120b, are very strong reasoning models that excel at problem solving and tool calling. Both models support long context windows and adjustable reasoning levels. That makes them a great choice for agentic use cases.

Here are the key-features you need to know:

  • The models performance is at the level of o3 and o4-mini (see section 2.1 for benchmarks)
  • The models support both built-in (code interpreter, browser) and user-provided tools and are able to generate consistent trajectories over doesn’t of turns
  • The models allow for selecting low/mid/high reasoning level (as in o4-mini-high)
  • The model architecture is quite standard mixture-of-experts transformer. The performance upgrades are primarily because of the training data & reinforcement learning tuning

Try out the new OpenAI gpt-oss-120b & gpt-oss-20b on Fireworks AI!

2. Benchmarking Information

The following table is an evaluation across multiple benchmarks and reasoning levels for both the gpt-oss-20b and gpt-oss-120b

Evaluations across multiple benchmarks and reasoning levels.

3. Comparison to other OpenAI Models

The following table showcases the Main capabilities evaluations, where gpt-oss models are compared at reasoning level with other OpenAI closed-models including - high to OpenAI’s o3, o3-mini, and o4-mini on canonical benchmarks.

The gpt-oss-120b model surpasses OpenAI o3-mini and approaches OpenAI o4-mini accuracy. The smaller gpt-oss-20b model is also surprisingly competitive, despite being 6 times smaller than gpt-oss-120b.

Main capabilities evaluations

Performance Metrics vs other Chinese models (+ any other proprietary model)

A fair comparison of gpt-oss against leading commercial models including Kimi, GLM, Qwen and DeepSeek, highlighting areas where it excels and areas for improvement.


Other benchmarks
Other benchmarks

4. Technical Details

Post-Training for Reasoning, Tools, and Agentic Behavior (GPT-OSS Models)

After pre-training on massive text data, thegpt-oss models go through a dedicated post-training phase to refine their reasoning abilities and tool usage, drawing from similar Chain-of-Thought (CoT) reinforcement learning techniques used in OpenAI's o3 models.

This phase trains the models on complex, multi-step tasks across coding, math, and science, helping them develop structured problem-solving capabilities and a personality similar to ChatGPT.

4.1 Harmony Chat Format: A New Chat Protocol for Smarter Agents

OpenAI introduced a new format called the Harmony Chat Format- a flexible, role-aware, message-based structure for interactive conversations.

  • It uses labeled roles like System, Developer, User, Assistant, and Tool to enforce a hierarchy when resolving conflicting instructions.
  • Special “channels” like analysis, commentary, and final help guide how reasoning traces, tool calls, and final answers are shown to the user.
  • This structure enables the model to perform more advanced agentic tasks, like embedding tool calls directly within reasoning steps or sharing step-by-step action plans.

💡 If you're deploying gpt-oss models, using Harmony Format correctly is essential for unlocking their full capabilities, especially in multi-turn chats.

4.2 Variable Reasoning Levels

The models are trained to support three reasoning levels- low, medium, and high, configured in the system prompt (e.g., Reasoning: high).

As you increase the reasoning level, the model produces longer and more structured CoT traces, allowing it to think through problems with greater depth.

4.3 Agentic Tool Usegpt-oss models are also trained to work with a range of tools in agentic workflows:

  • Web browsing, to fetch real-time information and increase factual grounding
  • Python execution, in a stateful notebook-style environment for live code reasoning
  • Custom developer functions, defined in-system using schemas (similar to OpenAI’s function calling)

These tools can be turned on or off using system prompts, and OpenAI provides basic harnesses and an open-source implementation to help developers integrate them into real-world apps.

5. Implementation Guidance

You can run both the gpt-oss models (gpt-oss-120b & gpt-oss-20b) on Fireworks AI Model Library via UI.

1234567891011121314
# Example implementation using Fireworks.ai API
import fireworks.ai as fw
fw.api_key = "your_api_key"
response = fw.Completion.create(
model="fireworks/[model-name]", #Replace [model-name] as
#gpt-oss-120b or gpt-oss-20b
prompt="Your prompt here",
max_tokens=100,
temperature=0.7
)
print(response.choices[0].text)

AMD Partnership

We’re also excited to announce a joint effort between Fireworks AI and AMD to bring OpenAI models to AMD’s latest MI355 GPUs. This collaboration will make powerful AI models more accessible and cost-efficient, coming soon to the Fireworks AI platform.

Try out the new OpenAI gpt-oss models now!

Resources