OpenAI o3 and o4-mini released

OpenAI o3 and o4-mini released

The best LLMs so far?

Photo by Solen Feyissa on Unsplash

OpenAI strikes back, and how? OpenAI dropped in two major models: the full version of o3 and the o4-mini model, and the results are looking pretty good on the benchmarks.

https://medium.com/media/00e017c4b52d354b785b96ba1e65a9be/href

Though o3 is just for the paid ChatGPT users, o4-mini is available for the free tier as well.

Key Features of OpenAI o3:

Advanced Reasoning Capabilities

  • O3 is OpenAI’s most powerful reasoning model, excelling in coding, math, science, and visual perception.
  • Sets new state-of-the-art (SOTA) benchmarks in Codeforces, SWE-bench (without scaffolding), and MMMU.
  • Makes 20% fewer major errors than O1 in real-world tasks, particularly in programming, business consulting, and creative ideation.

Full Tool Integration

  • Can agentically use and combine all tools within ChatGPT, including:

Web search

Python code execution

Visual reasoning (image analysis, charts, graphics)

Image generation

  • Trained to reason about when and how to use tools effectively.

Multimodal & Visual Reasoning

  • Can integrate images into its reasoning process, enabling problem-solving that blends visual and textual analysis.
  • Excels in interpreting whiteboards, textbook diagrams, and hand-drawn sketches, even if blurry or low-quality.

Improved Efficiency & Performance Scaling

Benefits from reinforcement learning scaling, showing consistent performance gains with increased compute.

  • More efficient than O1 at equal latency and cost.

Safety & Refusal Improvements

  • Completely rebuilt safety training data with enhanced refusal mechanisms for biorisk, malware, and jailbreaks.
  • Passes OpenAI’s Preparedness Framework evaluations (below “High” risk in biorisk, cybersecurity, and AI self-improvement).

Key Features of OpenAI o4-Mini:

Optimised for Speed & Cost-Efficiency

  • A smaller, faster model designed for high-throughput reasoning.
  • Achieves remarkable performance for its size, especially in math, coding, and visual tasks.
  • Outperforms O3-Mini in both STEM and non-STEM tasks (e.g., data science).

Strong Benchmark Performance

  • Best-performing model on AIME 2024 & 2025 (competition math).
  • Excels in real-world tasks with higher usage limits than O3.

Improved Instruction Following & Natural Responses

  • More natural and conversational compared to previous models.
  • Better at referencing memory and past conversations for personalised responses.

Tool Use & Agentic Capabilities

  • Like O3, it can strategically use tools (web search, Python, image generation).
  • Optimized for fast, multi-step workflows (typically under a minute).

Safety & Compliance

  • Shares O3’s enhanced safety mitigations, including refusal training and reasoning-based monitoring.

Common Features (O3 & O4 Mini):

  • Available in ChatGPT (Plus, Pro, Team, Enterprise) and via API. O4-mini as a free tier as well
  • Unified reasoning and conversational abilities, blending O-series problem-solving with GPT-series natural dialogue.
  • Codex CLI Support — Works with OpenAI’s new terminal-based coding agent for local code execution and reasoning.

Benchmarks and metrics

  • O3 (no tools) nailed AIME math with up to 91.6% accuracy — top-tier reasoning in competitive math.
  • O4-mini (no tools) nearly matched O3 on AIME and Codeforces tasks, showing it’s lean yet lethal.
  • In Codeforces-style programming, both O3 and O4-mini scored above 2700 ELO, which is elite coder territory.
  • O3 (no tools) crushed GPQA (PhD-level science) with 83.3% accuracy, outperforming even newer versions.
  • O4-mini (no tools) followed closely with 81.4%, showing great reasoning without external help.
  • On “Humanity’s Last Exam”, O3 jumped from 20.3% → 24.9% with Python + browsing — great at tool use.
  • O4-mini (with tools) hit 17%, very respectable for a compact model with generalist capabilities.
  • Overall, O3 is the best all-rounder, especially with tools — smart and resourceful.
  • O4-mini is the surprise underdog, consistently delivering near-O3 performance at a lighter footprint.

Bottom line?
O3 is your go-to academic overachiever, while O4-mini is the budget genius that can almost keep up.

Concluding,

OpenAI’s O3 and O4 Mini redefine AI intelligence — O3 as the powerhouse for complex reasoning, and O4 Mini as the fast, cost-efficient alternative. Both excel in coding, math, and multimodal tasks while integrating tools seamlessly. With top-tier benchmarks and enhanced safety, they set a new standard for AI performance. The future of smart, efficient AI is here


OpenAI o3 and o4-mini released was originally published in Data Science in Your Pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.

Share this article
0
Share
Shareable URL
Prev Post

MCP using LangChain with any AI model

Next Post

BitNet b1.58 2B4T : The 1st 1-Bit LLM is here

Read next
Subscribe to our newsletter
Get notified of the best deals on our Courses, Tools and Giveaways..