Google Unveils Gemini 2.5 Pro, Setting New Records on Humanity’s Ultimate Test

google-releases-gemini-2.5-pro-reasoning-model-azmotech

Google has launched Gemini 2.5 Pro, a groundbreaking AI model that scored 18.8% on Humanity’s Last Exam (HLE) without relying on web search or external tools. HLE, crafted by leading subject matter experts and academics worldwide, is a rigorous benchmark designed to assess deep knowledge across various domains. Previously, OpenAI’s o3-mini-high achieved a 14% score on the same test without using any tools.

Gemini 2.5 Pro is a reasoning-based AI model, built on top of a larger base LLM using reinforcement learning and chain-of-thought prompting. Prior to this, Google had introduced the smaller Gemini 2.0 Flash Thinking model.

According to Google, Gemini 2.5 Pro can “analyze information, draw logical conclusions, incorporate context and nuance, and make informed decisions.”

gemini-2.5-pro-benchmark-results-azmotech

Gemini 2.5 Pro was tested on LMArena under the codename “Nebula” and has now claimed the top spot on the LMArena leaderboard with a record score of 1,443, surpassing Grok 3 and GPT-4.5. Beyond LMArena, Google states that Gemini 2.5 Pro excels in coding, math, and science, delivering exceptional performance across these benchmarks.

In the GPQA Diamond benchmark, Gemini 2.5 Pro scored 84%, while in AIME 2025, it achieved an impressive 86.7%. Additionally, in the SWE-bench verified benchmark, which evaluates real-world software problem-solving, Gemini 2.5 Pro scored 63.8%, second only to Claude 3.7 Sonnet Extended Thinking, which led with 70.3%.

Google highlights Gemini 2.5 Pro’s advanced coding and reasoning capabilities. The model is now rolling out to Gemini Advanced users. Those interested in testing it for free can visit Google AI Studio and select the “Gemini 2.5 Pro Experimental 03-25” model from the drop-down menu.

Share this article
Shareable URL
Leave a Reply

Your email address will not be published. Required fields are marked *

Read next
0
Share