The Era of Interconnected AI Agents: MCP & A2A Lead the Way

The Emergence of the Agent Concept

In recent years, the Agent field has been receiving unprecedented market attention, evidenced by events such as Microsoft’s launch of the GitHub MCP server, Google’s release of the A2A (Agent-to-Agent) intelligent agent communication protocol, and Alipay’s launch of its MCP server. Although a universally accepted definition of Agent is still lacking, the three core components of “planning,” “memory,” and “tool use,” proposed by former OpenAI researcher Lilian Weng, are widely recognized as key elements for understanding Agents.

The concept of Agents is not new in the field of artificial intelligence, but the rapid development of Large Language Models (LLMs) has ushered in new breakthroughs in the application prospects of Agents. An Agent can be seen as an intelligent system that can perceive the environment, plan autonomously, and execute tasks. Its core lies in its ability to simulate human decision-making processes and utilize various tools and resources to achieve predetermined goals.

Current Status of Agent Development: Huge Potential, Low Penetration Rate

As an evolved version of chatbots, most of the current Agent applications are integrated into the paid services of large models, with only a few Agents such as Manus and Devin providing independent paid services. Nevertheless, Agents with autonomous planning capabilities like Deep Research and Manus still have many limitations in their usage. The number of users who can truly experience them is likely small, and there is still significant room for improvement before a “blockbuster” application emerges.

However, as the inference capabilities of large models continue to improve, Agents are gradually becoming the focus of application innovation. More and more developers and researchers are exploring Agent applications in various fields, such as intelligent assistants, automated processes, and data analysis. The potential of Agents is being gradually unlocked, and the future development space is vast.

Large-Scale Agent Applications Imminent: Driven by Multiple Favorable Conditions

Breakthroughs in Model Training

  • Rapid Growth of Context Window: The context window of a large model refers to the maximum text length that the model can consider when processing text. With technological advancements, the context window of models is growing rapidly, which means that the model can better understand the context of long texts, thereby making more accurate decisions. This extended context allows Agents to maintain a more coherent understanding of ongoing tasks and user intentions, crucial for complex, multi-step operations.

  • In-depth Application of Reinforcement Learning: Reinforcement learning is a method of training Agents through rewards and punishments. In recent years, reinforcement learning has been widely used in Agent training, enabling Agents to better adapt to complex environments and learn optimal strategies. By interacting with their environment and receiving feedback, Agents can learn to optimize their behavior over time, leading to more effective and efficient task execution.

  • Increasingly Mature Reasoning Models: The reasoning model is the core component of an Agent, responsible for reasoning and judging based on the input information. With in-depth research, reasoning models are becoming increasingly mature, better supporting the various applications of Agents. Advanced reasoning models can handle uncertainty, identify relevant information, and draw logical conclusions, essential for making informed decisions in dynamic environments.

Thriving Ecosystem

  • Rapid Development of Protocols such as MCP and A2A: MCP (Model Communication Protocol) and A2A (Agent-to-Agent) are two important Agent communication protocols. The rapid development of these protocols makes it easier for Agents to call various tools and services, thereby achieving more complex functions. Standardized protocols facilitate seamless integration and interoperability between Agents and external resources.

  • Increasingly Convenient Agent Tool Invocation: With technological advancements, the way Agents call external tools and services is becoming increasingly convenient. For example, through APIs (Application Programming Interfaces), Agents can easily access various data sources and online services, thereby expanding their capabilities. Easy access to a wide range of tools and services enables Agents to perform complex tasks that would otherwise be impossible.

In November 2024, Anthropic released and open-sourced the MCP protocol, aiming to standardize how external data and tools provide context to models. This move will greatly promote the development of the Agent ecosystem, enabling Agents to better utilize external resources. This standardization effort will reduce friction in integrating new tools and data sources, accelerating the growth and innovation within the Agent ecosystem.

MCP and A2A: Key to Agent Interconnection

MCP Protocol: Connecting Agents to the External World

The main goal of the MCP protocol is to achieve “one-click interconnection” between Agents and external data and tools. Through the MCP protocol, Agents can easily access various external resources, such as databases, APIs, Web services, etc. This enables Agents to better understand the environment and make more informed decisions. The protocol allows Agents to dynamically discover and utilize relevant resources based on their current needs, improving their adaptability and effectiveness.

A2A Protocol: Building a Communication Bridge between Agents

The goal of the A2A protocol is to enable communication between Agents. Through the A2A protocol, Agents can collaborate with each other to complete complex tasks. This is of great significance for building distributed intelligent systems. By sharing information, coordinating actions, and delegating tasks, Agents can work together to solve problems that are too complex for any single Agent to handle.

Although the goal of A2A protocol is agent-to-agent communication, while MCP is for agents and external tools and data, in the complex situation where “tools may also be packaged as Agents,” the functions of the two may overlap. However, this competition helps reduce the cost of large models calling external tools and communication. This competition will drive technological progress and ultimately benefit the entire Agent ecosystem. The overlap and competition between these protocols will foster innovation and lead to more efficient and versatile solutions for Agent interaction.

Agent Development Prospects

End-to-End Agents: No Human Intervention Required

Currently, there are a large number of “intelligent Agents” on the market, but a significant portion of them are developed based on platforms such as Coze and Dify, requiring humans to write workflows in advance. These Agents are more like the superposition of prompt engineering and belong to the more primary Agents. These platforms simplify the process of building Agents but often require significant human intervention in defining the Agent’s behavior.

More advanced Agents are “end-to-end,” which means “inputting a task to the Agent, and the Agent automatically completes the task results required by humans.” For example, the user only needs to input a goal to the Agent, and the Agent can autonomously plan and execute the task, eventually completing the goal. L3/L4/L5 and other such advanced Agents are more in line with human needs and will become an important direction for future Agent development. These Agents can handle complex tasks with minimal human oversight, adapting to changing circumstances and learning from their experiences.

Agents Assisting Robots and Autonomous Driving

When the definition of Agent is applied to embodied intelligence, it is found that robots and vehicles controlled by large models are also Agents. Especially robots, the current bottleneck in robot development is not in the “cerebellum” of “how to make physical actions,” but in the “brain” of thinking about “what kind of physical actions to make,” which falls within the range of Agents. By integrating advanced reasoning, planning, and perception capabilities, Agents can enable robots to perform complex tasks in unstructured environments.

In the field of robotics, Agents can help robots better understand the environment and make more reasonable decisions. For example, Agents can autonomously plan the robot’s movement path and perform various tasks based on the objects and people in the environment. This allows robots to adapt to dynamic environments and perform tasks with greater autonomy and efficiency.

In the field of autonomous driving, Agents can help vehicles better perceive the surrounding environment and make safer driving decisions. For example, Agents can autonomously adjust the vehicle’s speed and direction based on traffic signals, other vehicles, and pedestrians, thereby avoiding traffic accidents. Agents can process vast amounts of sensor data in real-time, enabling vehicles to make informed decisions and navigate safely.

Agent Interconnection and AI Native Networks

In the future, perhaps all Agents should be able to communicate with each other, self-organize, and self-negotiate to build a collaborative network that is lower cost and more efficient than the existing Internet. The Chinese developer community is also building protocols such as ANP, aiming to become the HTTP protocol of the Agent Internet era. Regarding Agent identity authentication, technologies such as DID can be used. This vision of interconnected Agents promises to unlock new levels of automation, collaboration, and innovation.

  • Agent Interconnection: The interconnection between Agents can realize resource sharing and collaboration, thereby improving the efficiency of the entire system. For example, different Agents can share data, tools, and services, thereby jointly completing complex tasks. Interconnected Agents can leverage each other’s strengths and capabilities, leading to more efficient and effective problem-solving.

  • AI Native Networks: An AI native network refers to a network specifically designed for artificial intelligence applications. This network can provide higher bandwidth, lower latency, and stronger security, thereby better supporting the variousapplications of Agents. These networks are optimized for the demanding requirements of AI workloads, enabling Agents to communicate and collaborate with minimal delay and maximum reliability.

  • DID Technology: DID (Decentralized Identifier) is a decentralized identity authentication technology. Through DID technology, Agents can have their own identity, thereby achieving more secure and reliable communication. DID technology can ensure that Agents can trust each other and interact securely, even in decentralized environments.

The development of Agent technology will bring tremendous changes. The future Internet will no longer be a simple information transmission network but a collaborative network full of intelligence. This intelligent network will enable new forms of communication, collaboration, and automation, transforming industries and shaping the future of society. The shift from a passive information network to an active, intelligent ecosystem will unlock unprecedented opportunities for innovation and progress. As Agents become more sophisticated and interconnected, they will play an increasingly important role in all aspects of our lives, from personal assistants to industrial automation to scientific discovery. The era of interconnected AI Agents is poised to revolutionize the way we live, work, and interact with the world around us.