beats OpenAI-o3, Grok3, Claude 3.7, DeepSeek-R1

First DeepSeek v3–0324 this week, and now, Google has released its new model, Google Gemini 2.5 Pro. OpenAI would surely feel the heat. Don’t take it to be a small release;
it’s the best LLM that has been released to date.
https://medium.com/media/2db1829ab1b5852c0cd5d226fc2b1980/href
What is Gemini 2.5 Pro?
Gemini 2.5 Pro (specifically, the initial release is Gemini 2.5 Pro Experimental) is the first model released under the Gemini 2.5 generation from Google DeepMind. It is described as Google DeepMind’s most intelligent and advanced AI model, designed as a “thinking model” capable of reasoning through its thoughts before responding, aimed at tackling increasingly complex problems. It is state-of-the-art on many benchmarks.
https://medium.com/media/b4576e44bc7132d4c2f575372a5fca66/href
What are the Google Gemini 2.5 Pro key features?
Data Science in Your Pocket – No Rocket Science
Thinking Capabilities: Reason internally before responding, leading to enhanced performance and accuracy.
Enhanced Reasoning: State-of-the-art performance on benchmarks requiring advanced reasoning, including math and science (like GPQA, AIME 2025) and knowledge/reasoning (Humanity’s Last Exam).
Advanced Coding: Shows strong code capabilities, excelling at creating web apps, agentic code applications, code transformation, and editing. High score on SWE-Bench Verified.
Native Multimodality: Builds on Gemini’s ability to understand and process information from text, audio, images, video, and entire code repositories.
Long Context Window: Ships with a 1 million token context window (with 2 million planned soon), allowing it to comprehend vast datasets.
Benchmarks

- Reasoning & Knowledge (Humanity’s Last Exam — no tools): This benchmark tests deep reasoning and broad knowledge using questions from diverse experts, without allowing the AI external tools. Gemini 2.5 Pro achieves the top score of 18.8%, indicating state-of-the-art performance in unaided reasoning and knowledge recall compared to the other models listed.
- Science (GPQA diamond): This evaluates understanding and complex reasoning on graduate-level questions in Physics, Chemistry, and Biology. Gemini 2.5 Pro scores 84.0% on single attempt (pass@1), the highest among the models shown for this method, showcasing strong scientific reasoning.
- Mathematics (AIME 2025): This uses problems from the challenging American Invitational Mathematics Examination (2025 version) to test advanced mathematical problem-solving. Gemini 2.5 Pro leads with 86.7% on single attempt (pass@1), demonstrating superior performance in complex math without relying on multiple tries.
- Mathematics (AIME 2024): Similar to the above, but using problems from the 2024 AIME competition. Gemini 2.5 Pro again scores the highest on single attempt (pass@1) with 92.0%, reinforcing its strong mathematical reasoning capability.
- Code Generation (LiveCodeBench v5): This benchmark assesses the ability to write functional code based on given problems. Gemini 2.5 Pro scores 70.4% (pass@1), showing strong code generation ability, competitive with the top performer (OpenAI o3-mini) on this specific metric.
- Code Editing (Aider Polyglot): This measures how well the model can modify or debug existing code across different programming languages. Gemini 2.5 Pro achieves the leading scores of 74.0% / 68.6% (whole/diff), indicating it’s highly proficient at editing code compared to peers.
- Agentic Coding (SWE-bench verified): This tests the model’s capacity to handle complex, multi-step software engineering tasks autonomously. Gemini 2.5 Pro scores a high 63.8%, demonstrating strong agentic capabilities, though slightly behind Claude 3.7 Sonnet in this benchmark.
- Factuality (SimpleQA): This measures accuracy in answering relatively straightforward factual questions. Gemini 2.5 Pro scores 52.9%, a solid performance, although lower than OpenAI GPT-4.5’s score (62.5%) on this specific test.
- Visual Reasoning (MMMU): This tests the ability to understand and reason about combined visual (image) and text inputs across multiple disciplines. Gemini 2.5 Pro achieves the highest single-attempt (pass@1) score of 81.7%, demonstrating leading capability in multimodal understanding.
- Image Understanding (Vibe-Eval (Reka)): This focuses specifically on comprehending the content within images. Gemini 2.5 Pro scores 69.4%, leading among models that support this multimodal benchmark.
- Long Context (MRCR): This assesses reading comprehension and information retrieval over very long documents (128k and 1 Million tokens). Gemini 2.5 Pro significantly outperforms others with 91.5% (128k) and 83.1% (1M), showcasing the exceptional ability to handle and utilize vast amounts of context.
- Multilingual Performance (Global MMLU (Lite)): This measures understanding and knowledge across various subjects in multiple languages. Gemini 2.5 Pro achieves the top score of 89.8%, indicating superior multilingual and multi-subject capabilities.
Combining it all
Gemini 2.5 Pro demonstrates state-of-the-art or highly competitive performance across the board, particularly excelling in complex reasoning (Humanity’s Last Exam), single-attempt math and science problems (AIME, GPQA), code editing (Aider), visual reasoning (MMMU), image understanding (Vibe-Eval), processing extremely long contexts (MRCR), and multilingual tasks (Global MMLU). This positions it as one of the most capable and versatile AI models currently available according to these benchmarks.
What should it be used for?
- Tackling complex tasks that require advanced reasoning.
- Solving problems in math and science.
- Advanced coding tasks like creating visually compelling web apps, developing agentic code applications, code transformation, and editing.
- Analyzing and comprehending vast amounts of information across different formats (text, audio, image, video, code).
- Experimentation by developers and enterprises.
- Scaled production use (once pricing and higher rate limits are available).
How to use Google Gemini 2.5 Pro?
Currently:
Available in Google AI Studio for developers and enterprises to experiment with (as shown in above video)
Available for Gemini Advanced users within the Gemini app (selectable in the model dropdown on desktop and mobile).
Conclusion
So, Gemini 2.5 Pro is clearly a big step forward in smart AI. It’s not just about getting answers, but thinking them through, which really helps with accuracy on tricky stuff. We’re seeing it shine when dealing with complex reasoning, creative coding, understanding images, and even digesting huge documents. It’s ready for developers and Gemini Advanced users to try now, so dive in and see how this extra brainpower can help you out!
Google Gemini 2.5 Pro: The best LLM ever was originally published in Data Science in your pocket on Medium, where people are continuing the conversation by highlighting and responding to this story.