30 May 2026

Securing the Agentic Development Lifecycle

Amir Kavousian

Appsec
AI
Threat Modeling

Table of Contents

This article is co-authored with Rami McCarthy, Principal Security Researcher and prolific author of security blogs (ramimac.me). You can also find him on LinkedIn (linkedin.com/in/ramimac/).

AI coding agents are great at generating code and executing what is in front of them, but lack a persistent understanding (an “organizational memory”) of your architecture, your threat landscape, and the decisions your team has already made on risk.

In application security, we have a name for that organizational memory. It’s called a threat model. As the agentic development lifecycle becomes the de facto way to ship software, the threat model will become the most important artifact in the entire security program.

Traditional AppSec vendors barely survived the shift from Waterfall to DevOps. Now, they’ve been fully left behind. This post lays out a thesis for what the AppSec stack looks like in the era of the Agentic Development Lifecycle (ADLC), where code generation is autonomous, development velocity is 10–100x what it was, and security must fundamentally rethink when, where, and how it operates.

A Brief History of AppSec in the SDLC Era

For the past two decades, application security followed a predictable arc. In the early 2000s, security was an afterthought, most often done as penetration testing at the end of the cycle, if at all. The mid-2010s brought the “shift left” movement: integrate SAST, SCA, and secrets scanning into CI/CD pipelines. Give developers the findings, and hope (and “pray”) they fix issues before production.

On paper, it worked. In practice, teams didn’t actually shift left. Instead, they “took all the existing work, handed it to developers, and said, ‘this is your problem now.’” The result: alert fatigue, friction between security and engineering, and an ever-growing backlog of findings that nobody had time to fix. CISA research even eroded the foundational claim behind shift-left (that fixing vulnerabilities early is cheaper) was never empirically validated. But it “spread like a fairy tale.”

The “Shift-Left” stack (SAST + SCA + DAST + secrets scanning, orchestrated through ASPM) served a specific era: one where humans wrote code at human speed, and security tools had time to catch up. That era is over.

Enter the ADLC: What Is Actually Changing

The Agentic Development Lifecycle is a structural transformation in how software is produced. As with any transformation, there are levels to how much (and how fast) teams are embracing AI:

  • AI-assisted coding (humans write and review, AI enhances)
  • vibe coding (developers describe intent, AI generates)
  • agentic coding (AI agents plan, execute, and iterate autonomously with limited human direction).
The Agentic Development Lifecycle is a structural transformation in how software is produced.

Most organizations are now operating across all three modes simultaneously. And security has to account for each.

Developers now attribute 41% of their application code to AI generation on average, and report accepting nearly 40% of AI-generated code without revision. Vibe coding is becoming the default development paradigm for a growing share of the industry, and that is exacerbating a serious scaling problem for security teams. When most companies have roughly 300 developers per security engineer, and AI triples code output per developer, Scan-First Security becomes a losing proposition.

What It Takes to Secure the ADLC

Securing the Agentic Development Lifecycle requires rethinking security across three fundamental dimensions: when security happens, who (or what) enforces it, and what we are actually protecting against.

(A) When: Security must move to where humans still have control. 

In the ADLC, humans control two things: the design decisions that shape what gets built, and the governance policies that constrain how agents operate. Everything in between (code generation, testing, iteration) is increasingly autonomous. Security that only operates at the code layer is operating on the output of a process it can’t influence.

In the ADLC, humans control two things: the design decisions that shape what gets built, and the governance policies that constrain how agents operate. Everything in between (code generation, testing, iteration) is increasingly autonomous.

This leads to a fundamental shift. Security has always operated at three levels: program (organizational risk appetite, compliance posture, security culture), system (architecture decisions, trust boundaries, data classification), and tactical (individual code changes, vulnerability fixes, configuration tweaks). Security tools have spent two decades siloed into the  tactical: scanners, linters, SCA, DAST. AI agents are about to commoditize that entire layer. When frontier models can self-scan and self-fix, tactical tooling hits a ceiling. The organizations that pull ahead will be the ones that invested in what tactical tools cannot reach: program-level and system-level security context.

When you invest in program and system-level security, every tactical decision gets better. Without that investment, each agent session starts from zero.

The compounding effect

When you invest in program and system-level security, every tactical decision gets better. Without that investment, each agent session starts from zero.

(B) Who: Agents become both the attack surface and the enforcement mechanism. 

Agents can be powerful security enforcers. But they’re also “first-class identities” requiring the same IAM rigor we apply to human users. Non-human identities already outnumber human identities in most organizations, and are growing quickly. The tools generating our code are themselves attack vectors. For instance, nearly one million developer endpoints were compromised in a prompt injection attack in the Amazon Q Developer VS Code extension

(C) What: The threat model expands beyond code vulnerabilities. 

Traditional AppSec protects against code-level flaws: injection, XSS, broken authentication. The ADLC introduces a different class of risk entirely: business logic flaws that emerge in the drift between imprecise specification and agentic implementation.

For example, an agent may interpret a refund flow specified as “process eligible returns” into an implementation that the agent deems “eligible” in ways the product owner never sanctioned. Or, a PostgreSQL view that bypasses row-level security, or an authentication provider that remained active even after registration was removed from the frontend. These are not code-level bugs a SAST tool would catch, but are architectural decisions that need to be threat-modeled. They are semantic gaps between what the business meant and what the agent built, and they only surface as behavioral failures in production.

A Layered Approach to ADLC Security

Securing the ADLC requires a four-layer stack: threat architecture, agent governance, generation-time guardrails, and adversarial validation. We’ll go through each layer in this section.

Securing the agentic development lifecycle requires a new four-layer AppSec stack anchored by a living threat model.

Layer 1: Threat Architecture

Threat architecture is the living context engine that serves your system's trust boundaries, architectural decisions, and risk context to every downstream control. Without it, governance is generic policy, guardrails catch generic bugs, and validation tests a generic checklist. With it, each layer becomes precise, architecture-aware, and compounding in value.

Once established, the threat model becomes the organizational memory: a living store of your architecture’s trust boundaries, your team’s risk decisions, and the security patterns your system requires. Every other layer in this stack should consume it:

  • Your guardrails enforce the boundaries that the threat model defines. 
  • Your generation-time controls embed the patterns it requires. 
  • Your adversarial tests validate the assumptions it made. 

When you treat the threat model as a living context engine rather than a compliance checkbox, the entire stack gets smarter and the returns compound with every iteration.

Threat architecture is the living context engine that serves your system's context to each layer to make it precise, architecture-aware, and compounding in value.

Layer 2: Agents Governance

Organizations need deterministic guardrails around non-deterministic systems. This means input sanitization (redacting secrets from prompts), processing-layer checks (blocking hardcoded credentials during generation), and output validation (verifying generated code against security policies before commit). 

This layer also controls how AI coding tools are used across the environment: which assistants, agents, models, extensions, and MCP-connected services are sanctioned, and what policies govern their use. The goal is to facilitate happy-path deployments to curb shadow deployments. 

Elevating security beyond a checkbox activity and into an enabler is only possible when governance consumes the threat model to define what “secure and auditable” means for your specific application. A generic governance policy says “don’t use unapproved tools.” A design-informed governance policy says “agents working on the payment service can only access the payments schema and must route all external API calls through the approved gateway.” 

Layer 3: Generation-Time Guardrails

This layer operates through rules files, prompt controls, hooks, skills, harnesses, and MCP-connected security services that steer agents toward safer outputs by embedding secure coding rules, organizational policies, and architectural context before code is produced.

Claude Code, Codex, and Cursor already embed pattern-based security analysis in real-time during code generation. What persists is the control plane that orchestrates, prioritizes, and enforces policy across the pipeline. The focus will shift from “here are 20,000 vulnerabilities to triage, prioritize and remediate” to “these 100 code changes will reduce your AppSec risk by 98%.” This is decidedly outcome-oriented, not alert-count-oriented.

Threat modeling can amplify the impact of generation-time guardrails by focusing on the authorization gaps, trust boundary violations, and data flow risks that matter for your application. 

Threat modeling can amplify the impact of generation-time guardrails by focusing on the authorization gaps, trust boundary violations, and data flow risks that matter for your application.

Layer 4: Adversarial Validation

The non-deterministic nature of the new stack necessitates an active validation plan, including penetration testing, red teaming, and agent behavior analysis, that are driven by the threat model.

Threat-model-informed pen testing not only improves the quality of the output, it can also create a feedback loop that continuously improves the ADLC: adversarial validation tests the assumptions made at the design phase, and findings flow back to update the threat model for the next iteration. Design informs testing. Testing refines design. The stack becomes a loop, not a pipeline.

Threat modeling and pen testing create a feedback loop that continuously improves the ADLC: adversarial validation tests the assumptions made at the design phase, and findings flow back to update the threat model for the next iteration.

Organizational Memory Is the Bottleneck

Every layer in this stack matters, but the design-phase security has the highest leverage and yet the least tooling. In the ADLC, context is king. And in AppSec, the threat model has always been the gold standard for security context.

The threat model is the living input that drives everything downstream, including guardrails, point-of-generation controls, and adversarial testing should validate the assumptions your threat model makes. When you treat the threat model as an organization memory that acts as the security context fabric of the ADLC, the entire stack gets smarter. The key insight is that treating threat models as documents is a category error; a threat model is a model, not a report. Treat a threat model as a structured knowledge graph.

When you treat the threat model as an organization memory that acts as the security context fabric of the ADLC, the entire stack gets smarter.

The idea is to provide AI tools with AppSec context, and define secure design patterns and architecture guardrails that AI tools are expected to follow during implementation. 

In practice, this remains the most manual, inconsistent, and under-tooled phase of the entire stack. We have maturing tooling for point-of-generation controls (Layer 3), growing investment in governance (Layer 2), and emerging solutions for validation (Layer 4). Design-phase security remains largely manual: whiteboards, spreadsheets, inconsistent threat modeling exercises that happen once and are never updated. In a world where code is generated in minutes, manual design phase security is the bottleneck.

The New Landscape: What Emerges, What Gets Absorbed, What Gets Vibe-Coded

The security market follows a predictable pattern: point solutions emerge, then consolidate into platforms. But AI introduces a challenge: the underlying infrastructure is still shifting, forcing founders to place bets on models and architectures that may not exist in 18 months. It feels like “by the time the analyst PDF is published, the category has already shifted.”

Here’s how I see the tool landscape shaking out:

Absorbed into Frontier Models and IDEs

Basic pattern matching SAST, common vulnerability detection, secrets scanning, and simple SCA checks are being absorbed directly into AI coding assistants. Anthropic’s Claude Code security scanner and OpenAI’s Codex already reason contextually through code, in ways that surpass traditional pattern-matching scanners. The counterpoint of “separation of code generation from validation” is still valid, but in a world where security teams have tool sprawl and fatigue, the arc of the industry curves toward more consolidation. 

Emerging as Distinct Categories

Several new tool categories are crystallizing. Design-phase security platforms that review architecture and business logic before code generation. Agent governance and guardrail systems that constrain what AI agents can do. Non-human identity management for agent credentials and permissions. And Cloud Application Detection & Response (CADR) for runtime behavioral validation. These categories address threats that didn’t exist in the traditional SDLC and can’t be solved by extending existing tools.

Vibe-Coded by Internal Teams

Internal security teams are already using agentic tools to build bespoke security automation: custom guardrail policies, internal vulnerability triage agents, compliance documentation generators, and pipeline enforcement hooks. Ramp’s autonomous patching pipeline is a leading example: a multi-agent architecture that patched 100 vulnerabilities in six days with zero human involvement. Not every team can build Ramp-quality infrastructure, since building high-quality security tooling requires knowing “what good looks like.” But the trend is clear: security-mature teams will increasingly vibe-code their own operational tooling, creating a market bifurcation between teams that consume vendor platforms and teams that build on primitives.

The Thesis, Simply

The ADLC demands a new AppSec stack. This is not an evolution of the old one, but a fundamentally different architecture that maps security controls to where decisions are actually made.

Traditional layers (controls, scanners, testing) are necessary, but not sufficient. They assume the design is sound, and operate on code that is already being generated, tools that are already deployed, and outputs that are already produced. In short, they are reactive to the specification rather than shaping it.

The highest-leverage investment in ADLC security is the layer that most organizations have neglected: design-phase security, anchored by a living threat model that serves as the organizational memory of your security program. It’s where humans still have control, where architectural decisions propagate automatically, and where the current tooling gap is widest. 

This is also the layer that can't be vibe-coded. Generation-time guardrails can be assembled from rules files and prompt templates. Governance policies can be stitched together from Agent.md files and CI hooks. But threat architecture requires synthesizing system knowledge across teams, codebases, and architectural decisions, and then keeping that knowledge current as the system evolves. A static CLAUDE.md file written once by a senior engineer captures a snapshot; it doesn't update when services get decomposed, new trust boundaries emerge, or risk decisions change. The organizational memory problem is inherently a graph problem: relationships between components, data flows, trust boundaries, and threat scenarios that shift with every architectural change. That's a product, not a prompt or rule file.

The organizational memory problem is inherently a graph problem: relationships between components, data flows, trust boundaries, and threat scenarios that shift with every architectural change. That's a product, not a prompt or rule file.

Organizations that invest in encoding program-level and system-level security context before the first line of code is generated will see compounding returns. Once you treat it as a structured knowledge graph: every agent session gets smarter; every governance policy gets more precise; every generated module inherits institutional knowledge that would otherwise evaporate between sessions.

The organizations that don’t will keep buying tactical tools to scan AI-generated code faster. They’ll optimize the wrong layer and spend their time chasing machine-generated symptoms of human-made design flaws.

The ADLC is already here for many teams, and it’s growing fast. The question is whether your security stack was built for the world that’s emerging, or the one that is quickly disappearing.

Table of Contents

Subscribe