Gemini 2.5: Next-Level Intelligence in Google's Models | en

At I/O 2025, Google unveiled a series of groundbreaking updates to its Gemini 2.5 model series, along with an innovative experimental feature known as Deep Think, designed to enhance the reasoning capabilities of the 2.5 Pro model. These advancements mark a significant leap forward in the field of artificial intelligence, offering developers and users alike unprecedented levels of performance, efficiency, and versatility.

The Gemini 2.5 Pro model has garnered widespread acclaim from developers as the premier solution for coding tasks, while the 2.5 Flash model is set to receive a substantial upgrade. Furthermore, Google is introducing a range of new capabilities across its models, including Deep Think, an experimental enhanced reasoning mode specifically tailored for the 2.5 Pro model.

In a prior announcement, Google unveiled Gemini 2.5 Pro, its most intelligent model to date, and expedited the release of its I/O update to empower developers in creating exceptional web applications. Today, the company is sharing further refinements to the Gemini 2.5 model series, boasting remarkable achievements:

Gemini 2.5 Pro has surpassed all expectations, demonstrating exceptional performance on academic benchmarks. It now holds the top position on the WebDev Arena and LMArena leaderboards, solidifying its status as the world’s leading model for coding and learning assistance.
New features are being integrated into both 2.5 Pro and 2.5 Flash, including native audio output for a more natural and engaging conversational experience, advanced security measures, and the integration of Project Mariner’s computer usage capabilities. The 2.5 Pro model will be further enhanced with Deep Think, an experimental mode designed to improve reasoning for intricate mathematical and coding problems.
Google remains committed to improving the developer experience through the incorporation of thought summaries in the Gemini API and Vertex AI. These summaries offer heightened transparency, extended thinking budgets for 2.5 Pro to ensure greater control, and support for MCP tools in the Gemini API and SDK for access to a broader range of open-source tools.
The 2.5 Flash model is now universally accessible within the Gemini app. An updated version will soon be available in Google AI Studio for developers and in Vertex AI for enterprises, slated for early June, with 2.5 Pro following shortly thereafter.

This remarkable progress is the result of the relentless dedication of Google’s teams, who are committed to continuously improving its technologies and deploying them in a safe and responsible manner.

Unveiling the Superior Performance of 2.5 Pro

The 2.5 Pro model has recently been updated to empower developers in creating more interactive and feature-rich web applications. The positive feedback received from users and developers is greatly appreciated, and ongoing improvements will continue to be implemented based on user input.

In addition to its outstanding performance on academic benchmarks, the latest iteration of 2.5 Pro has captured the top spot on the popular coding leaderboard, WebDev Arena, with an impressive ELO score of 1415. It also leads across all leaderboards of the LMArena, which evaluates human preference based on various criteria. Moreover, equipped with a 1 million-token context window, 2.5 Pro delivers state-of-the-art performance in long-context and video understanding. This expanded context comprehension enables the model to handle complex tasks requiring extensive memory and detailed analysis, significantly enhancing its capabilities in fields such as document summarization, video analysis, and long-form content creation. The ability to process and retain information from lengthy sources allows for deeper insights and more accurate responses, making 2.5 Pro an invaluable tool for professionals dealing with large volumes of data and multimedia content.

By integrating LearnLM, a family of models developed in collaboration with educational experts, 2.5 Pro has become the leading model for learning. In direct comparisons evaluating its pedagogy and effectiveness, educators and experts favored Gemini 2.5 Pro over other models across a diverse array of scenarios. It also surpassed top models across all five principles of learning science, which are used to construct AI systems for learning. This highlights its effectiveness in educational contexts, offering tailored and effective teaching strategies. The integration of LearnLM principles ensures that Gemini 2.5 Pro provides instruction that is engaging, relevant, and optimized for knowledge retention. The model’s ability to adapt to different learning styles and provide personalized feedback makes it an ideal educational companion for students of all ages and levels. Furthermore, its comprehensive understanding of various subjects and its ability to explain complex concepts in a clear and concise manner make it an indispensable tool for educators looking to enhance their teaching methods.

Deep Think: Pushing the Boundaries of Reasoning

Google is actively exploring the limits of Gemini’s cognitive capabilities and beginning to experiment with an enhanced reasoning mode called Deep Think. This innovative mode employs cutting-edge research techniques, enabling the model to evaluate multiple hypotheses before formulating a response. This approach enhances decision-making processes, allowing for more sophisticated and nuanced outcomes in complex situations. By exploring various potential solutions and considering a wider range of factors, Deep Think enables Gemini 2.5 Pro to arrive at more accurate and insightful conclusions. This capability is particularly valuable in fields such as scientific research, financial analysis, and strategic planning, where complex problems require careful consideration and evaluation. The ability to weigh different options and assess their potential impact allows users to make more informed decisions and achieve better outcomes.

Gemini 2.5 Pro Deep Think achieved an impressive score on the 2025 USAMO, widely recognized as one of the most challenging mathematics benchmarks. It also excels on LiveCodeBench, a demanding benchmark for competition-level coding, and attains a score of 84.0% on MMMU, which assesses multimodal reasoning. These results underscore Deep Think’s exceptional performance in handling complex tasks, suggesting a promising future for advanced AI problem-solving. The exceptional performance on these benchmarks demonstrates the potential of Deep Think to revolutionize various fields by enabling AI systems to tackle complex problems with unprecedented accuracy and efficiency. The ability to excel in mathematics, coding, and multimodal reasoning opens up new possibilities for AI applications in areas such as scientific discovery, software development, and robotics.

Given that 2.5 Pro Deep Think is pushing the frontier of what’s possible, Google is taking additional time to conduct thorough safety evaluations and solicit further input from safety experts. The company will also provide select testers with access to the Gemini API to gather feedback before making it widely available. This cautious and deliberate approach aims to ensure the responsible deployment of advanced AI technology. The commitment to safety and responsible development reflects Google’s dedication to ensuring that AI technologies are used for the benefit of society. By carefully evaluating the potential risks and benefits of Deep Think and soliciting feedback from experts and users, Google aims to mitigate any potential negative consequences and maximize the positive impact of this groundbreaking technology.

Introducing an Enhanced 2.5 Flash

The 2.5 Flash model, known for its efficiency and cost-effectiveness, has been refined across numerous dimensions. It has shown improvements across critical benchmarks for reasoning, multimodality, code handling, and long context, while simultaneously becoming more efficient, utilizing between 20-30% fewer tokens in evaluations. This highlights its optimized performance and resource management. The improvements in reasoning, multimodality, and code handling make 2.5 Flash a versatile tool for a wide range of applications, while the enhanced efficiency ensures that it can be deployed cost-effectively. The ability to process information quickly and accurately makes it ideal for tasks such as real-time data analysis, natural language processing, and automated decision-making.

The new 2.5 Flash is currently available for preview in Google AI Studio for developers, in Vertex AI for enterprise applications, and in the Gemini app for general users. It is scheduled for general availability in early June, making it accessible for production environments. This widespread availability ensures that developers, enterprises, and general users can all benefit from the capabilities of 2.5 Flash. The preview programs provide an opportunity to experiment with the model and provide feedback, while the general availability in June will enable widespread adoption and integration into various applications and workflows.

New Capabilities of Gemini 2.5

Enhancements to Native Audio Output and the Live API

The Live API introduces a preview version of audio-visual input and native audio output dialogue, enabling users to create conversational experiences with a more natural and expressive Gemini. This feature enables more engaging and interactive applications. The ability for the AI to produce lifelike audio responses significantly enhances user interaction by creating a more intuitive way of communicating. The combination of audio-visual input and native audio output dialogue creates a more immersive and natural communication experience, making it easier for users to interact with AI systems. This capability is particularly valuable in applications such as virtual assistants, customer service chatbots, and educational platforms.

The Live API empowers users to steer the model’s tone, accent, and speaking style. For example, the model can be instructed to adopt a dramatic voice when narrating a story. It also supports tool use, allowing it to conduct searches on the user’s behalf. The flexibility in voice control and access to external tools makes the model extraordinarily versatile and valuable across diverse application scenarios. The ability to customize the model’s voice and integrate it with external tools opens up a wide range of possibilities for creating personalized and engaging experiences. This capability is particularly valuable in applications such as storytelling, virtual training, and personalized marketing.

Users can experiment with various early features, including:

Affective Dialogue: The model detects emotion in the user’s voice and responds accordingly. This functionality adds layers of emotional intelligence to the AI, making the interaction more personalized. The ability to detect and respond to emotions enhances the communication experience, making it more natural and intuitive. This capability is particularly valuable in applications such as mental health support, customer service, and conflict resolution.
Proactive Audio: The model ignores background conversations and knows when to respond, minimizing interruptions and improving clarity. This feature enhances the quality of the interaction, allowing for more efficient and focused communication. The ability to filter out background noise and respond only when appropriate improves the clarity and efficiency of communication, making it easier for users to interact with AI systems in noisy environments.
Thinking in the Live API: The model leverages Gemini’s thinking capabilities to support more complex tasks. This allows for deeper analysis and consideration when tackling complex tasks, making it exceptionally valuable in fields requiring precise and insightful solutions. The integration of Gemini’s thinking capabilities into the Live API enables the model to perform more complex tasks and provide more accurate and insightful responses. This capability is particularly valuable in fields such as scientific research, financial analysis, and strategic planning.

Google is also releasing new previews for text-to-speech functionality in both 2.5 Pro and 2.5 Flash. These provide first-of-their-kind support for multiple speakers, enabling text-to-speech with two voices via native audio output. This feature is especially valuable for creating engaging narratives and dialogues in multimedia applications. The ability to generate text-to-speech with multiple voices adds a new level of realism and engagement to multimedia applications, making them more immersive and enjoyable.

Like Native Audio dialogue, text-to-speech is expressive and can capture subtle nuances such as whispers. It supports over 24 languages and seamlessly switches between them, making it a versatile tool for global communication. These subtleties in language usage enrich the user experience, facilitating a more nuanced and personalized communication process. The ability to capture subtle nuances such as whispers and seamlessly switch between languages makes the text-to-speech functionality a versatile tool for global communication. This capability is particularly valuable for applications such as language learning, international business, and cross-cultural communication.

This text-to-speech capability will be available later today in the Gemini API.

Enhanced Computer Interface

Google is introducing Project Mariner’s computer use capabilities into the Gemini API and Vertex AI. Forward-thinking companies such as Automation Anywhere, UiPath, Browserbase, Autotab, The Interaction Company, and Cartwheel are exploring its potential. Google is looking forward to a wider rollout for developers to experiment with this capability this summer, paving the way for innovative projects and solutions. The ability to integrate AI models directly with computer interfaces leads to more streamlined, productive workflow solutions across diverse industries. The integration of Project Mariner’s computer use capabilities enables AI models to interact directly with computer systems, automating tasks and streamlining workflows. This capability is particularly valuable in industries such as manufacturing, logistics, and healthcare, where efficiency and productivity are critical.

Superior Security Measures

Google has significantly strengthened its protections against security threats, such as indirect prompt injections. This involves embedding malicious instructions into data retrieved by an AI model. Google’s new security approach has substantially increased Gemini’s protection rate against indirect prompt injection attacks during tool use, making Gemini 2.5 its most secure model family to date. This enhanced security assures users of a safe, reliable experience when adopting AI-driven solutions. The enhanced security measures protect against malicious attacks and ensure that AI systems can be used safely and reliably. This is particularly important in applications where sensitive data is being processed or where decisions are being made that could have significant consequences.

An Enhanced Developer Experience

Thought Summaries

Both 2.5 Pro and Flash will now include thought summaries in the Gemini API and in Vertex AI. These summaries take the model’s raw thoughts and organize them into a clear format with headers,key details, and information on model actions, such as when they use tools. By offering insights into the AI’s analytical process, thought summaries assist in comprehending and debugging issues within AI systems, improving efficiency and system design. The thought summaries provide developers with valuable insights into the AI’s decision-making process, making it easier to understand and debug issues. This transparency is crucial for building trust in AI systems and ensuring that they are used responsibly.

With a more structured, streamlined format on the model’s thinking process, developers and users will find the interactions with Gemini models easier to understand and debug.

Thinking Budgets

Google launched 2.5 Flash with thinking budgets to give developers greater control over costs by balancing latency and quality. This capability is now extended to 2.5 Pro, giving you greater fine-tuning options. By controlling the tokens used and optimizing resources, developers can achieve the appropriate balance between computational cost and solution effectiveness, making AI implementation both economical and efficient. The thinking budgets enable developers to fine-tune the performance and cost of AI models, making them more accessible and practical for a wider range of applications. This is particularly important for developers who are working with limited resources or who need to optimize their AI models for specific performance requirements.

This allows full control over the number of tokens a model uses to think before it responds, or even to turn off its thinking capabilities.

Gemini 2.5 Pro with budgets will be generally available for stable production use in the coming weeks, along with the generally available model.

Support for MCP Tools

Google has added native SDK support for Model Context Protocol (MCP) definitions in the Gemini API for easier integration with open-source tools. Different deployment methods, like MCP servers and hosted tools, are explored to make it easier for users to build agentic applications. This improves the AI development environment through a wider range of options for tool integration and collaboration on projects. The support for MCP tools makes it easier for developers to integrate Gemini models with open-source tools and build agentic applications. This fosters innovation and collaboration within the AI community.

Continuous innovation is key in the ongoing commitment to improve the models and developer experience, making them more efficient, performant, and responsive to developer feedback. Double down on the breadth and depth of fundamental research to push the frontier of Gemini’s capabilities. There is more to come in the future.

updated at 2025-05-22

# Google # Gemini # AGI