Securing MCP via Prompt-Injection-Style Defenses | en

Tenable Research has unveiled groundbreaking research that redefines the approach to a widely discussed AI vulnerability. In a detailed analysis, Tenable’s Ben Smith demonstrates how techniques akin to prompt injection can be effectively repurposed to audit, monitor, and even create firewalls for Large Language Model (LLM) tool calls operating within the increasingly popular Model Context Protocol (MCP).

The Model Context Protocol (MCP), a novel standard developed by Anthropic, facilitates the integration of AI chatbots with external tools, enabling them to perform tasks autonomously. This convenience, however, introduces new security challenges. Attackers can insert hidden instructions, known as prompt injection, or introduce malicious tools to manipulate the AI into violating its own rules. Tenable’s research comprehensively examines these risks and proposes a unique solution: leveraging the same techniques used in attacks to create robust defenses that monitor, inspect, and control every tool an AI attempts to use.

The Critical Importance of Understanding MCP Security

As businesses increasingly integrate LLMs with critical business tools, it is crucial for CISOs, AI engineers, and security researchers to fully understand the risks and defensive opportunities presented by MCP.

Ben Smith, senior staff research engineer at Tenable, notes that "MCP is a rapidly evolving and immature technology that’s reshaping how we interact with AI. MCP tools are easy to develop and plentiful, but they do not embody the principles of security by design and should be handled with care. So, while these new techniques are useful for building powerful tools, those same methods can be repurposed for nefarious means. Don’t throw caution to the wind; instead, treat MCP servers as an extension of your attack surface.”

Key Highlights from the Research

Cross-Model Behavior Varies:
- Claude Sonnet 3.7 and Gemini 2.5 Pro Experimental consistently invoked the logger and exposed parts of the system prompt.
- GPT-4o also inserted the logger but produced varying (and sometimes hallucinated) parameter values in each run.
Security Upside: The same mechanisms used by attackers can be used by defenders to audit toolchains, detect malicious or unknown tools, and build guardrails within MCP hosts.
Explicit User Approval: MCP already requires explicit user approval before any tool executes. This research emphasizes the need for strict least-privilege defaults and thorough individual tool review and testing.

Deep Dive into the Model Context Protocol (MCP)

The Model Context Protocol (MCP) represents a paradigm shift in how AI models interact with the external world. Unlike traditional AI systems that operate in isolation, MCP allows AI models to seamlessly integrate with external tools and services, enabling them to perform a wide range of tasks, from accessing databases and sending emails to controlling physical devices. This integration opens up new possibilities for AI applications, but it also introduces new security risks that must be carefully addressed.

Understanding the Architecture of MCP

At its core, MCP consists of several key components that work together to facilitate communication between AI models and external tools. These components include:

The AI Model: This is the central intelligence that drives the system. It could be a large language model (LLM) like GPT-4 or a specialized AI model designed for a specific task. This model is responsible for generating requests for external tools based on user input and its internal logic. The choice of AI model can significantly impact the security posture of the MCP, as different models may exhibit varying levels of susceptibility to prompt injection attacks. For instance, more advanced models might be better at discerning malicious prompts, while simpler models could be more easily manipulated.
The MCP Server: This acts as an intermediary between the AI model and the external tools. It receives requests from the AI model, validates them, and forwards them to the appropriate tool. The MCP server is a crucial component for security, as it is responsible for enforcing access controls, validating inputs, and preventing unauthorized actions. A well-designed MCP server should incorporate robust security features, such as input sanitization, authentication, and authorization mechanisms. It also needs to be able to handle a wide variety of external tools and services in a secure manner. The server might also perform rate limiting to prevent denial-of-service attacks.
The External Tools: These are the services and applications that the AI model interacts with. They can include databases, APIs, web services, and even physical devices. The security of the external tools is paramount, as they are the ultimate target of potential attacks. Each external tool should be carefully vetted to ensure that it is secure and does not contain any vulnerabilities that could be exploited. Regular security audits and penetration testing should be performed to identify and address any potential weaknesses. Secure coding practices should be followed during development, and appropriate security controls should be implemented to protect sensitive data.
The User Interface: This provides a way for users to interact with the AI system and control its behavior. It may also provide a way for users to approve or deny tool requests. The user interface should be designed with security in mind, providing clear and concise information to the user about the actions that the AI system is taking. It should also allow users to easily review and approve or deny tool requests, providing an important layer of protection against unauthorized actions. Furthermore, the user interface should be protected against cross-site scripting (XSS) and other common web vulnerabilities.

The Benefits of MCP

The Model Context Protocol offers several significant advantages over traditional AI systems:

Increased Functionality: By integrating with external tools, AI models can perform a much wider range of tasks than they could on their own. This allows AI to be used in a much broader range of applications, from automating complex business processes to controlling physical devices. For example, an AI model could use an external database to look up customer information, send an email to confirm an order, or control a robot to perform a physical task.
Improved Efficiency: MCP can automate tasks that would otherwise require human intervention, saving time and resources. This can lead to significant cost savings and increased productivity. For example, an AI model could automatically respond to customer inquiries, process invoices, or schedule appointments, freeing up human employees to focus on more complex and strategic tasks.
Enhanced Flexibility: MCP allows AI models to adapt to changing circumstances and respond to new information in real-time. This is particularly important in dynamic environments where conditions can change rapidly. For example, an AI model could use real-time weather data to adjust the settings of a climate control system or use traffic data to optimize the route of a delivery truck.
Greater Scalability: MCP can be easily scaled to accommodate a growing number of users and tools. This makes it a suitable solution for organizations of all sizes, from small businesses to large enterprises. The scalability of MCP allows organizations to deploy AI solutions that can grow with their needs, without requiring significant infrastructure investments.

The Emerging Security Risks in MCP

Despite its benefits, MCP introduces several security risks that must be carefully considered. These risks stem from the fact that MCP allows AI models to interact with the external world, which opens up new avenues for attackers to exploit. The interconnected nature of the system makes it vulnerable to various attacks that target different components, including the AI model, the MCP server, the external tools, and the user interface. Therefore, a layered security approach is essential to mitigate these risks effectively.

Prompt Injection Attacks

Prompt injection attacks are a particularly concerning threat to MCP systems. In a prompt injection attack, an attacker crafts a malicious input that manipulates the AI model into performing unintended actions. This can be done by injecting malicious commands or instructions into the AI model’s input, which the model then interprets as legitimate commands. The attacker’s ability to craft such prompts relies on understanding the AI model’s behavior and vulnerabilities, as well as the structure of the MCP.

For example, an attacker could inject a command that tells the AI model to delete all of the data in a database or to send sensitive information to an unauthorized party. The potential consequences of a successful prompt injection attack can be severe, including data breaches, financial losses, and reputational damage. The impact of prompt injection can extend beyond data breaches; it can also lead to the manipulation of physical systems if the AI model controls such devices. For example, an attacker could manipulate the AI to alter production parameters in a manufacturing plant, leading to product defects or even safety hazards.

Malicious Tool Integration

Another significant risk is the integration of malicious tools into the MCP ecosystem. An attacker could create a tool that appears to be legitimate but actually contains malicious code. When the AI model interacts with this tool, the malicious code could be executed, potentially compromising the entire system. This highlights the importance of robust tool vetting processes.

For example, an attacker could create a tool that steals user credentials or installs malware on the system. It is crucial to carefully vet all tools before integrating them into the MCP ecosystem to prevent the introduction of malicious code. This vetting process should include code reviews, security audits, and penetration testing. Furthermore, organizations should implement a mechanism for monitoring the behavior of external tools and detecting any suspicious activity. A robust sandbox environment can be used to test new tools before they are deployed in a production environment.

Privilege Escalation

Privilege escalation is another potential security risk in MCP systems. If an attacker can gain access to an account with limited privileges, they may be able to exploit vulnerabilities in the system to gain higher-level privileges. This could allow the attacker to access sensitive data, modify system configurations, or even take control of the entire system. Proper access control mechanisms are paramount.

Exploiting vulnerabilities might involve leveraging flaws in the AI model’s access control logic, the MCP server’s authentication mechanisms, or the external tools’ authorization procedures. To prevent privilege escalation, organizations should enforce the principle of least privilege, granting users only the minimum level of access necessary to perform their job functions. Regular security audits and penetration testing can help identify and address any potential vulnerabilities that could be exploited for privilege escalation.

Data Poisoning

Data poisoning involves injecting malicious data into the training data used to build AI models. This can corrupt the model’s behavior, causing it to make incorrect predictions or take unintended actions. In the context of MCP, data poisoning could be used to manipulate the AI model into interacting with malicious tools or to perform other harmful actions. This is a long-term attack that can be difficult to detect.

The effects of data poisoning can be subtle and may not be immediately apparent, making it difficult to detect. To mitigate the risk of data poisoning, organizations should carefully vet the data sources used to train AI models and implement data validation techniques to detect and remove malicious data. Regular retraining of AI models with fresh, clean data can also help to mitigate the effects of data poisoning. Furthermore, organizations should monitor the performance of AI models and investigate any unexpected or unusual behavior.

Lack of Visibility and Control

Traditional security tools are often ineffective at detecting and preventing attacks against MCP systems. This is because MCP traffic is often encrypted and can be difficult to distinguish from legitimate traffic. As a result, it can be challenging to monitor AI model activity and identify malicious behavior. Enhanced monitoring and logging are critical.

The dynamic and complex nature of MCP systems makes it difficult to apply traditional security monitoring techniques. Organizations need to implement specialized security tools that can monitor AI model activity, analyze MCP traffic, and detect suspicious behavior. This requires a deep understanding of the MCP architecture and the potential attack vectors. Furthermore, organizations should implement robust logging and auditing mechanisms to track all activity within the MCP system. These logs can be used to investigate security incidents and identify potential vulnerabilities.

Turning the Tables: Using Prompt Injection for Defense

Tenable’s research demonstrates that the same techniques used in prompt injection attacks can be repurposed to create robust defenses for MCP systems. By crafting carefully designed prompts, security teams can monitor AI model activity, detect malicious tools, and build guardrails to prevent attacks. This innovative approach provides a powerful means of defending against prompt injection and other AI-related threats.

Auditing Toolchains

One of the key defensive applications of prompt injection is auditing toolchains. By injecting specific prompts intothe AI model’s input, security teams can track which tools the AI model is using and how it is interacting with them. This information can be used to identify suspicious activity and to ensure that the AI model is only using authorized tools. This proactive approach provides valuable insights into the AI model’s behavior.

The auditing process involves injecting prompts that trigger the AI model to reveal the tools it is using and the data it is exchanging with them. This information can be used to create a comprehensive inventory of all tools used by the AI model and to identify any unauthorized or suspicious tools. The audit logs can be analyzed to detect any anomalies or patterns that might indicate malicious activity. This provides an effective way to ensure that the AI model is only using authorized tools and that it is interacting with them in a secure manner.

Detecting Malicious or Unknown Tools

Prompt injection can also be used to detect malicious or unknown tools. By injecting prompts that trigger specific behaviors, security teams can identify tools that are acting suspiciously or that are not authorized to be used. This can help to prevent the AI model from interacting with malicious tools and to protect the system from attack. Anomaly detection is a key element of this defense.

The detection process involves injecting prompts that are designed to trigger specific behaviors in the AI model, such as accessing sensitive data or performing unauthorized actions. If the AI model attempts to use a tool that is not authorized or if it exhibits suspicious behavior, the security team can be alerted. This allows them to take immediate action to prevent the AI model from interacting with the malicious tool and to protect the system from attack.

Building Guardrails Inside MCP Hosts

Perhaps the most powerful defensive application of prompt injection is building guardrails inside MCP hosts. By injecting prompts that enforce specific security policies, security teams can prevent the AI model from performing unauthorized actions or accessing sensitive data. This can help to create a secure environment for AI model execution and to protect the system from attack. These guardrails act as a dynamic firewall.

The guardrails can be implemented by injecting prompts that define the boundaries of acceptable AI model behavior. These prompts can be used to restrict the AI model’s access to sensitive data, prevent it from performing unauthorized actions, and enforce security policies. This creates a secure environment for AI model execution and helps to protect the system from attack. The guardrails can be dynamically adjusted based on the evolving threat landscape.

The Importance of Explicit User Approval

The research underscores the critical need for explicit user approval before any tool executes within the MCP environment. MCP already incorporates this requirement, but the findings reinforce the necessity of strict least-privilege defaults and thorough individual tool review and testing. This approach ensures that users retain control over the AI system and can prevent it from performing unintended actions. Human oversight remains critical.

Least-Privilege Defaults

The principle of least privilege dictates that users should only be granted the minimum level of access necessary to perform their job functions. In the context of MCP, this means that AI models should only be granted access to the tools and data that they absolutely need to perform their tasks. This reduces the potential impact of a successful attack and limits the attacker’s ability to escalate privileges. Limiting the scope of access reduces the attack surface.

This can be implemented by carefully configuring the AI model’s access controls and by using role-based access control (RBAC) to manage user permissions. Furthermore, organizations should regularly review and update access controls to ensure that they remain appropriate and effective. This helps to prevent unauthorized access to sensitive data and to limit the potential damage from a successful attack.

Thorough Tool Review and Testing

Before integrating any tool into the MCP ecosystem, it is crucial to thoroughly review and test it to ensure that it is secure and does not contain any malicious code. This should involve a combination of automated and manual testing techniques, including code analysis, penetration testing, and vulnerability scanning. This is a crucial step in securing the MCP.

The review process should involve a team of security experts who can assess the tool’s security posture and identify any potential vulnerabilities. The testing process should include a variety of techniques, such as static code analysis, dynamic analysis, and penetration testing. The results of the review and testing should be carefully documented and used to make informed decisions about whether to integrate the tool into the MCP ecosystem.

Implications and Recommendations

Tenable’s research has significant implications for organizations that are using or planning to use MCP. The findings highlight the importance of understanding the security risks associated with MCP and of implementing appropriate security measures to mitigate those risks. A proactive security posture is essential.

Key Recommendations

Implement robust input validation: All input to the AI model should be carefully validated to prevent prompt injection attacks. This should include filtering out malicious commands and instructions and limiting the length and complexity of input. Input sanitization is key.
Enforce strict access controls: Access to sensitive data and tools should be strictly controlled to prevent unauthorized access. This should involve using strong authentication mechanisms and implementing the principle of least privilege. Access controls are foundational.
Monitor AI model activity: AI model activity should be closely monitored to detect suspicious behavior. This should include logging all tool requests and responses and analyzing the data for anomalies. Monitoring provides valuable insights.
Implement a robust incident response plan: Organizations should have a robust incident response plan in place to deal with security incidents involving MCP systems. This should include procedures for identifying, containing, and recovering from attacks. Preparation is paramount.
Stay informed: The MCP landscape is constantly evolving, so it is important to stay informed about the latest security risks and best practices. This can be done by subscribing to security mailing lists, attending security conferences, and following security experts on social media. Continuous learning is essential.

By following these recommendations, organizations can significantly reduce the risk of attacks against their MCP systems and protect their sensitive data. The future of AI depends on our ability to build secure and trustworthy systems, and that requires a proactive and vigilant approach to security. Security must be a priority.

updated at 2025-05-06

# Prompt Engineering # Anthropic # Claude