Google's Ironwood TPU: AI Compute Leap | en

Unveiling Ironwood’s Unprecedented Capabilities

The relentless march of progress in artificial intelligence (AI) is inextricably linked to advancements in hardware. Google, a perennial leader in AI innovation, has recently introduced its seventh-generation Tensor Processing Unit (TPU), internally known as Ironwood, representing a monumental leap forward in AI computational power. This state-of-the-art AI accelerator boasts computational capabilities that eclipse even the fastest supercomputers globally by a factor of 24 in expansive deployments.

Formally announced at the Google Cloud Next ‘25 conference, Ironwood signifies a strategic evolution in Google’s decade-long commitment to AI chip development. Distinct from its predecessors, which were predominantly tailored for AI training and inference workloads, Ironwood is meticulously engineered to excel in inference tasks, thus inaugurating a new epoch of AI-driven applications.

According to Amin Vahdat, the esteemed Vice President and General Manager of Machine Learning, Systems, and Cloud AI at Google, ‘Ironwood is architected to underpin the next wave of generative AI and its immense computational and communication demands. We term this the ‘Inference Era,’ wherein AI agents will proactively access and generate data to collaboratively furnish insights and responses, transcending the mere provision of data.’

The technical attributes of Ironwood are nothing short of extraordinary. When scaled to a pod comprising 9,216 chips, it is capable of delivering a staggering 42.5 exaflops of AI compute power. This figure dramatically overshadows the 1.7 exaflops offered by El Capitan, the current reigning champion as the world’s most powerful supercomputer. Each individual Ironwood chip possesses a peak compute capacity of 4,614 TFLOPs.

Beyond its sheer processing prowess, Ironwood also incorporates significant enhancements in memory and bandwidth. Each chip is furnished with 192GB of high-bandwidth memory (HBM), a six-fold increase compared to the preceding-generation TPU, Trillium, which was unveiled just last year. Furthermore, the memory bandwidth per chip attains 7.2 terabits/s, signifying a 4.5-fold improvement over Trillium.

In an era characterized by the expansion of data centers and escalating concerns regarding power consumption, Ironwood also distinguishes itself through its energy efficiency. Its performance per watt is twice that of Trillium and nearly 30 times greater than the initial TPU introduced in 2018.

The emphasis on inference optimization marks a pivotal paradigm shift in the AI landscape. In recent years, leading AI research institutions have primarily concentrated on constructing ever-larger foundation models with progressively higher parameter counts. Google’s emphasis on inference optimization indicates a transition toward a novel phase centered on deployment efficiency and inference capabilities.

While model training remains indispensable, the number of training iterations is inherently finite. Conversely, as AI technologies become increasingly integrated into diverse applications, inference operations are anticipated to occur billions of times daily. As models grow in complexity, the economic sustainability of these applications becomes inextricably linked to inference costs.

Over the past eight years, Google’s demand for AI compute has increased tenfold, reaching an astounding 100 million. Without specialized architectures such as Ironwood, even the relentless progress of Moore’s Law would struggle to keep pace with this exponential expansion.

Notably, Google’s announcement underscores its focus on ‘mental models’ capable of executing complex reasoning tasks, rather than merely recognizing patterns. This suggests that Google envisions a future wherein AI extends beyond larger models and encompasses models capable of decomposing problems, performing multi-step reasoning, and emulating human-like thought processes.

Powering the Next Generation of Large Models

Google is positioning Ironwood as the foundational infrastructure for its most advanced AI models, including Gemini 2.5, which boasts natively integrated reasoning capabilities.

Google has also recently unveiled Gemini 2.5 Flash, a smaller iteration of its flagship model designed for latency-sensitive, everyday applications. Gemini 2.5 Flash possesses the ability to dynamically adjust its reasoning depth based on the complexity of the prompt.

Google also showcased its comprehensive suite of multimodal generative models, encompassing text-to-image, text-to-video, and the newly introduced text-to-music feature, Lyria. A demonstration illustrated how these tools could be seamlessly integrated to produce a complete promotional video for a concert.

Ironwood represents merely one component of Google’s broader AI infrastructure strategy. Google also announced Cloud WAN, a managed wide area network service that empowers enterprises to access Google’s global-scale private network infrastructure.

Furthermore, Google is expanding its software offerings for AI workloads, including Pathways, a machine learning runtime developed by Google DeepMind. Pathways now enables customers to scale model serving across hundreds of TPUs.

Fostering AI Agent Collaboration with A2A

Beyond hardware advancements, Google has also articulated its vision for an AI ecosystem centered around multi-agent systems. To facilitate the development of intelligent agents, Google has introduced the Agent-to-Agent (A2A) protocol, designed to enable secure and standardized communication between disparate AI agents.

Google anticipates that 2025 will mark a transformative year for AI, with generative AI applications evolving from answering single questions to solving complex problems through agent systems.

The A2A protocol enables interoperability between agents across different platforms and frameworks, providing them with a common ‘language’ and secure communication channels. This protocol can be regarded as a network layer for intelligent agents, aiming to simplify agent collaboration in complex workflows. By enabling specialized AI agents to collaboratively address tasks of varying complexity and duration, A2A seeks to enhance overall capabilities through synergy.

A2A operates by establishing a standardized mechanism for agents to exchange information and coordinate actions, without requiring them to share underlying code or data structures. This facilitates the creation of more modular and flexible AI systems, where agents can be readily added, removed, or reconfigured as needed.

Google has drawn comparison between MCP and A2A protocols in a blog post.

MCP (Model Context Protocol) is designed for tool and resource management.
- It connects agents to tools, APIs, and resources through structured input/output.
- The Google ADK supports MCP tools, enabling various MCP servers to work with agents.
A2A (Agent2Agent Protocol) is designed for collaboration between agents.
- It enables dynamic, multi-modal communication between agents without sharing memory, resources, or tools.
- It is an open standard driven by the community.
- Examples can be viewed using Google ADK, LangGraph, Crew.AI, and other tools.

In essence, A2A and MCP are complementary: MCP equips agents with tool support, while A2A enables these tool-equipped agents to communicate and collaborate with each other.

Judging by the initial partners, A2A appears poised to garner similar attention as MCP. Over 50 companies have joined the initial collaboration, encompassing leading tech firms and top global consulting and system integration service providers.

Google emphasizes the openness of the protocol, positioning it as a standard mechanism for agents to collaborate, irrespective of the underlying technology framework or service provider. Google outlined five key principles that guided the design of the protocol in collaboration with its partners:

Embrace Agent Capabilities: A2A focuses on enabling agents to collaborate in their natural, unstructured way, even if they don’t share memory, tools, and context. The protocol aims to enable true multi-agent scenarios, rather than restricting agents to being mere ‘tools.’
Build on Existing Standards: The protocol builds upon existing popular standards, including HTTP, SSE, and JSON-RPC, making it easier to integrate with existing IT stacks commonly used by enterprises.
Secure by Default: A2A is designed to support enterprise-grade authentication and authorization, comparable to OpenAPI’s authentication schemes at launch.
Support Long-Running Tasks: A2A is designed to be flexible, supporting a wide range of scenarios, from quick tasks to in-depth research that may take hours or even days (when humans are involved). Throughout the process, A2A can provide users with real-time feedback, notifications, and status updates.
Modality Agnostic: The world of agents is not limited to text, which is why A2A is designed to support various modalities, including audio and video streams.

Google provides a compelling example of how A2A can significantly streamline the hiring process.

Within a unified interface such as Agentspace, a hiring manager can delegate an agent to identify suitable candidates based on the job requirements. This agent can interact with specialized agents to source candidates, schedule interviews, and even engage other specialized agents to assist with background checks, thereby enabling intelligent automation of the entire hiring process across diverse systems. This orchestration highlights the transformative potential of agent collaboration. The ability for agents to dynamically connect and leverage each other’s capabilities unlocks new levels of efficiency and intelligence.

Embracing the Model Context Protocol (MCP)

In addition to its endeavors in developing A2A, Google is also embracing the Model Context Protocol (MCP). Just weeks after OpenAI announced its adoption of MCP, Google followed suit. This swift adoption underscores the industry-wide recognition of MCP’s importance and its potential to standardize interactions between AI models and external tools.

Demis Hassabis, CEO of Google DeepMind, recently announced on X that Google will add support for MCP to its Gemini models and SDKs. However, he did not provide a specific timeline. This announcement, while lacking concrete dates, signals Google’s commitment to adopting and promoting the MCP standard. It reinforces the idea that MCP is becoming a central element in the evolving landscape of AI development.

Hassabis stated that ‘MCP is an excellent protocol that is rapidly becoming the open standard for the AI agent era. I look forward to working with the MCP team and other partners in the industry to advance this technology.’ His enthusiastic endorsement further solidifies MCP’s position as a key enabler for the next generation of AI applications. The collaborative spirit he expresses, inviting collaboration with the MCP team and industry partners, highlights the importance of community involvement in shaping the future of AI standards.

Since its release in November 2024, MCP has rapidly gained traction, becoming a simple and standardized way to connect language models with tools and data. This rapid adoption is a testament to the protocol’s effectiveness and ease of use. It has quickly filled a crucial gap in the AI development ecosystem, providing a much-needed common language for models and external resources.

MCP enables AI models to access data from sources such as enterprise tools and software to complete tasks, as well as access content libraries and application development environments. The protocol allows developers to establish bidirectional connections between data sources and AI-powered applications, such aschatbots. This bidirectional connection is crucial for enabling real-time interactions and dynamic updates between models and their environment.

Developers can expose data interfaces through MCP servers and build MCP clients (such as applications and workflows) to connect to these servers. This server-client architecture provides a flexible and scalable framework for integrating AI models with a wide variety of data sources and applications. The decoupling of data sources and AI models through MCP servers allows for independent development and updates, minimizing dependencies and increasing maintainability. Since Anthropic open-sourced MCP, multiple companies have integrated MCP support into their platforms. This widespread adoption further solidifies MCP’s role as an industry standard.

The convergence of Ironwood’s hardware advancements, the A2A protocol for agent collaboration, and the adoption of MCP represents a holistic approach to advancing the capabilities and applicability of AI. Google’s commitment to both hardware and software innovation positions it at the forefront of the AI revolution, poised to unlock new possibilities and transform industries across the globe.

updated at 2025-04-12

# Google # Gemini # Agent