LLMs Generate Insecure Code by Default

LLMs and Insecure Code Generation: The Default Scenario

Backslash Security’s recent research highlights a significant issue: Large Language Models (LLMs), including prominent models such as GPT-4.1, Claude, and Gemini, tend to generate insecure code as their default behavior. This implies that unless explicitly instructed with security-focused guidelines or specific security instructions, the code produced by these advanced AI systems is inherently susceptible to common weaknesses and potential exploits. However, the encouraging aspect of this research is the finding that the security posture of LLM-generated code can be substantially improved by offering additional security guidance or by implementing rules-based governance frameworks.

In response to these findings and to further investigate the problem, Backslash Security has introduced the Model Context Protocol (MCP) Server, coupled with Rules and Extensions specifically designed for Agentic Integrated Development Environments (IDEs). These tools are engineered to directly address the security vulnerabilities that have been identified in LLM-generated code. The ultimate goal is to equip developers with the means and resources necessary to create more secure and robust applications that are less prone to exploits.

The core issue lies in the fact that LLMs, without proper direction, do not inherently prioritize security when generating code. This research underscores the critical need for developers and organizations to be proactive in incorporating security considerations into their AI-driven development workflows.

Naive Prompts: A Recipe for Vulnerability

The research methodology involved subjecting seven different versions of popular LLMs, including OpenAI’s GPT models, Anthropic’s Claude, and Google’s Gemini, to a series of tests. The primary objective was to evaluate the extent to which different prompting techniques influenced the models’ capacity to generate secure code. The security of the code generated was assessed based on its resilience against ten Common Weakness Enumeration (CWE) use cases. These CWEs represent a diverse range of common software vulnerabilities that are frequently exploited by attackers.

The consistent pattern observed throughout these tests was that the security of the code output improved as the sophistication of the prompting techniques increased. However, the overarching finding was that all of the LLMs tested exhibited a tendency to produce insecure code when left to operate without specific security guidance. This fundamentally suggests that these models, in their default configurations, do not intrinsically prioritize security and often fall short of addressing prevalent coding weaknesses that could lead to significant vulnerabilities.

When presented with simple, “naive” prompts that made no explicit mention of security considerations or requirements, all the LLMs under evaluation generated insecure code that was vulnerable to at least four of the ten common CWEs. This finding starkly illustrates the inherent lack of security awareness that is present in these models when they are operating without specific and intentional guidance. The models seem to prioritize functionality and efficiency over security, which is a critical gap that needs to be addressed in order to responsibly leverage these AI tools.

The Impact of Security-Focused Prompts

The study revealed that prompts that generally specified a need for security led to more secure results. This indicates that LLMs are, in fact, capable of producing more secure code when explicitly instructed to do so. Furthermore, the study found that prompts that specifically requested code compliant with Open Web Application Security Project (OWASP) best practices yielded even better results. OWASP, as a non-profit foundation, plays a critical role in improving the security of software by providing standards, tools, and resources.

Despite these improvements achieved through more sophisticated prompts, the research also found that some code vulnerabilities persisted in five out of the seven LLMs tested. This underscores the ongoing challenges associated with consistently generating secure code with LLMs. Even when models are explicitly instructed to prioritize security and adhere to industry best practices, there remains a risk of vulnerabilities slipping through, highlighting the need for robust security testing and review processes.

The research demonstrated that the most effective approach for generating secure code involved prompts that were bound to specific rules defined by Backslash to address particular CWEs. These rules-based prompts resulted in code that was secure and not vulnerable to the tested CWEs. This strongly suggests that providing LLMs with specific, targeted guidance is a crucial factor in ensuring the security of the generated code. By providing clear instructions on how to avoid specific vulnerabilities, LLMs can be effectively guided towards generating more secure code.

Performance Variations Among LLMs

Across all prompts, OpenAI’s GPT-4o demonstrated the lowest performance, achieving a secure code result of only 1 out of 10 when using “naive” prompts. Even when prompted to generate secure code, it still produced insecure outputs vulnerable to eight out of ten issues. GPT-4.1 did not perform significantly better with naive prompts, scoring 1.5 out of 10. These results suggest that certain LLMs may be less adept at handling security considerations, even when prompted to do so.

In contrast, Claude 3.7 Sonnet emerged as the best performer among the GenAI tools tested. It scored 6 out of 10 using naive prompts and a perfect 10 out of 10 when using security-focused prompts. This suggests that some LLMs are inherently better equipped to handle security considerations, even in the absence of explicit instructions. The superior performance of Claude 3.7 Sonnet highlights the importance of carefully selecting the right LLM for security-sensitive applications.

The variations in performance observed across different LLMs underscore the need for developers and organizations to carefully evaluate the security capabilities of different models and to select the models that are best suited for their specific needs. Not all LLMs are created equal when it comes to security, and a thorough evaluation is essential for making informed decisions.

Backslash Security’s Solutions for Safe Vibe Coding

To address the issues revealed by its LLM prompt testing, Backslash Security is introducing several new features designed to enable safe vibe coding. Vibe coding refers to the practice of generating code using AI tools like LLMs, which is becoming increasingly popular. However, this approach also introduces new security risks.

Backslash AI Rules & Policies

Backslash AI Rules & Policies provide machine-readable rules that can be injected into prompts to ensure CWE coverage. These rules can be used with tools like Cursor, a popular code editor. Additionally, AI policies control which AI rules are active in IDEs through the Backslash platform, allowing organizations to customize their security settings. This offers a structured approach to enforcing security best practices and mitigating common vulnerabilities.

Backslash IDE Extension

The Backslash IDE Extension integrates directly into developers’ existing workflows, allowing them to receive Backslash security reviews on code written by both humans and AI. This integration is crucial for ensuring that security considerations are addressed throughout the development process, not just at the end. By embedding security checks into the IDE, developers can identify and fix vulnerabilities early in the development cycle, before they make their way into production.

Backslash Model Context Protocol (MCP) Server

The Backslash Model Context Protocol (MCP) Server is a context-aware API that conforms to the MCP standard. It connects Backslash to AI tools, enabling secure coding, scanning, and fixes. The MCP standard provides a common framework for AI tools to communicate and share information, facilitating the development of secure AI-powered applications. The MCP server allows for a centralized and consistent approach to managing security across different AI tools and platforms.

Addressing the Challenges of AI-Generated Code

Yossi Pik, co-founder and CTO of Backslash Security, emphasizes the challenges that AI-generated code poses to security teams. He notes that ‘AI-generated code – or vibe coding – can feel like a nightmare for security teams. It creates a flood of new code and brings LLM risks like hallucinations and prompt sensitivity.’ Hallucinations refer to instances where LLMs generate incorrect or nonsensical information, while prompt sensitivity refers to the tendency of LLMs to produce different outputs based on subtle variations in the input prompt. These factors can make it difficult to ensure the security and reliability of AI-generated code.

However, Pik also believes that AI can be a valuable tool for AppSec teams when used with the right controls. He argues that ‘with the right controls – like org-defined rules and a context-aware MCP server plugged into a purpose-built security platform – AI can actually give AppSec teams more control from the start.’ Backslash Security aims to provide these controls through its dynamic policy-based rules, context-sensitive MCP server, and IDE extension, all of which are designed for the new coding era. The key is to strike a balance between leveraging the benefits of AI and mitigating the associated risks.

The Implications of Insecure AI-Generated Code

The findings from Backslash Security’s research have significant implications for the software development industry. As AI-powered code generation tools become increasingly prevalent, it is crucial to understand the risks associated with relying on these tools without proper security measures in place. The widespread adoption of AI code generation requires a fundamental shift in how security is approached in the software development lifecycle.

Increased Vulnerability to Cyberattacks

Insecure AI-generated code can create new vulnerabilities that cybercriminals can exploit. These vulnerabilities can lead to data breaches, system compromise, and other security incidents. The more code that is generated by AI without proper security checks, the greater the attack surface becomes, increasing the likelihood of successful cyberattacks.

Difficulty in Identifying and Remediating Vulnerabilities

The sheer volume of AI-generated code can make it challenging to identify and remediate vulnerabilities. Security teams may struggle to keep up with the rapid pace of code generation, leading to a backlog of security issues. The speed at which AI can generate code can overwhelm traditional security processes, making it difficult to maintain a secure development environment.

Lack of Security Awareness Among Developers

Many developers may not be fully aware of the security risks associated with AI-generated code. This lack of awareness can lead to developers inadvertently introducing vulnerabilities into their applications. It is crucial for developers to receive training on the specific security challenges associated with AI-generated code and to understand how to use AI tools safely.

Regulatory Compliance Challenges

Organizations that rely on AI-generated code may face regulatory compliance challenges. Many regulations require organizations to implement adequate security measures to protect sensitive data. Insecure AI-generated code can make it difficult to meet these requirements. Organizations need to be aware of the regulatory implications of using AI-generated code and to ensure that they are compliant with all applicable laws and regulations.

Best Practices for Secure AI-Powered Code Generation

To mitigate the risks associated with insecure AI-generated code, organizations should adopt the following best practices: A proactive and multi-faceted approach is essential for ensuring the security of AI-generated code.

Provide Security Training to Developers

Developers should receive training on the security risks associated with AI-generated code. This training should cover topics such as common CWEs, secure coding practices, and how to use security tools. Investing in developer education is crucial for building a security-conscious development team.

Implement Security Policies and Procedures

Organizations should implement security policies and procedures that address the use of AI-generated code. These policies should define acceptable use cases, security requirements, and processes for reviewing and approving AI-generated code. Clear and well-defined policies are essential for guiding the responsible use of AI code generation tools.

Use Security Tools to Scan AI-Generated Code

Organizations should use security tools to scan AI-generated code for vulnerabilities. These tools can help identify common CWEs and other security issues. Automated security scanning is a critical component of a secure AI-powered development workflow.

Implement a Secure Development Lifecycle (SDLC)

Organizations should implement a secure development lifecycle (SDLC) that incorporates security considerations throughout the development process. This includes conducting security reviews of AI-generated code, performing penetration testing, and implementing security monitoring. A comprehensive SDLC is essential for ensuring that security is integrated into every stage of the development process.

Establish a Bug Bounty Program

Organizations should establish a bug bounty program to encourage security researchers to find and report vulnerabilities in AI-generated code. This can help identify vulnerabilities that may have been missed by internal security teams. Bug bounty programs can provide valuable insights into potential security weaknesses.

Stay Informed About the Latest Security Threats

Organizations should stay informed about the latest security threats and vulnerabilities that affect AI-generated code. This can help them proactively address potential security issues. Continuous monitoring of the threat landscape is essential for staying ahead of emerging security risks.

Collaborate with Security Experts

Organizations should collaborate with security experts to assess the security of their AI-generated code and develop strategies for mitigating risks. Engaging with security experts can provide valuable guidance and support.

The Future of Secure AI-Powered Code Generation

As AI-powered code generation tools continue to evolve, it is crucial to prioritize security. By implementing the best practices outlined above, organizations can harness the benefits of AI-powered code generation while mitigating the risks associated with insecure code. The future of software development will undoubtedly be shaped by AI, and it is essential to ensure that security is a central consideration in this evolution.

Advances in AI Security

Ongoing research and development efforts are focused on improving the security of AI systems. These efforts include developing new techniques for detecting and preventing adversarial attacks, improving the robustness of AI models, and creating more secure AI architectures. Innovation in AI security is essential for keeping pace with the rapidly evolving threat landscape.

Integration of Security into AI Development

Security is becoming increasingly integrated into the AI development process. This includes incorporating security considerations into the design of AI models, using secure coding practices, and conducting security testing throughout the development lifecycle. Shifting security left in the development process is crucial for preventing vulnerabilities from being introduced in the first place.

Collaboration Between AI and Security Experts

Collaboration between AI and security experts is essential for ensuring the security of AI systems. This collaboration can help identify potential security risks and develop effective mitigation strategies. Cross-functional collaboration is key to addressing the complex security challenges posed by AI.

Increased Awareness of AI Security Risks

Increased awareness of AI security risks is driving the development of new security tools and techniques. This includes tools for detecting adversarial attacks, analyzing the security of AI models, and monitoring AI systems for suspicious activity. Greater awareness is leading to increased investment in AI security research and development.

By addressing the security challenges associated with AI-generated code, organizations can unlock the full potential of AI-powered development while protecting their systems and data from cyberattacks. The responsible and secure use of AI in software development will be a key differentiator for organizations in the future. The ability to leverage AI effectively while maintaining a strong security posture will be essential for success in the digital age.