The Zero-Code Safety Group: Shifting Left with Immediate-Native AI Brokers

Here is a sample that performs out at most engineering organizations operating at scale. A developer writes a function, opens a pull request, and someplace between the CI run and the safety certification course of, they discover out they’ve launched vulnerabilities. Generally it is 12 points. Generally it is 50. Both means, they’re now context-switching again into code they mentally closed days in the past, making fixes beneath time stress, and infrequently introducing new issues whereas patching previous ones.

That is the basic shift-left failure. We discuss catching issues early, however in follow, “early” nonetheless means post-commit for many safety tooling. GitHub Actions, SAST pipelines, safety evaluations — all of them hearth after the code leaves the developer’s machine.

I lately entered GoDaddy’s “Compress the Cycle” hackathon, specializing in constructing options to assist improve developer productiveness and scale back developer cycle time. Our workforce was chosen as a runner-up for the answer we created.

The query we posed is: can AI agent groups transfer that suggestions all the way in which to pre-commit, operating domestically, earlier than a single line is pushed?

The reply is sure — and the structure to do it’s surprisingly light-weight. This weblog submit discusses the answer we constructed and the teachings we took away from the method.

Safety overview is a fan-out drawback

Guide code safety overview has at all times been a parallelism drawback in disguise. An intensive overview requires a number of lenses concurrently: static evaluation for recognized vulnerability patterns, logic overview for authentication bypass and injection dangers, infrastructure configuration for IAM and community coverage points, architectural compliance for authorized patterns and design selections. A single reviewer (human or AI) context-switching between these domains serially produces worse outcomes than specialists working in parallel.

That is precisely the issue that multi-agent orchestration solves. The structure is a hub-and-spoke with one orchestrator and N specialised area brokers operating in parallel:

Developer set off (pre-commit or CI)
            │
            ▼
    ┌──────────────────┐
    │   ORCHESTRATOR   │
    │  • Reads diff    │
    │  • Spawns lanes  │
    │  • Aggregates    │
    └──┬───────────────┘
       │ fan-out (parallel)
  ┌────┼──────────────────────┐
  ▼    ▼           ▼          ▼
[SAST][Logic   ][IaaC    ][Policy
 Agent][Review  ][Agent   ][Agent ]
  ↕       ↕         ↕         ↕
[Valid.][Valid. ][Valid.  ][Valid.]
  └────┴──────────┴─────────┴──────┘
            │ fan-in
            ▼
     Structured findings
     (CRITICAL blocks commit)

Every area agent is narrowly scoped — it owns one overview area and nothing else. The static evaluation agent would not do logic overview. The IaaC agent would not opine on utility code. Strict scope boundaries forestall brokers from stepping on one another’s findings and producing conflicting, redundant output.

However the fan-out topology alone is not what makes this work. The paired Validator per area agent is.

The Satan’s Advocate sample

AI brokers hallucinate. It is a recognized, well-documented drawback. In most domains, a hallucinated reply is an inconvenience. In safety code overview, a hallucinated discovering — a false constructive surfaced with excessive confidence — erodes developer belief instantly. As soon as a developer dismisses three findings as noise, they begin dismissing all findings as noise. The software turns into ineffective sooner than it turned helpful.

The usual strategy to this drawback is immediate engineering: ask the agent to be extra cautious, add a self-reflection step, tune the boldness threshold. These assist on the margins however do not deal with the basis concern — you are asking the identical agent that produced the doubtless fallacious discovering to additionally consider whether or not it was fallacious. That is not a dependable verify.

The extra sturdy architectural reply is adversarial pairing: each area agent runs alongside a devoted Validator agent whose default assumption is that the area agent bought one thing fallacious.

┌────────────────────────────────────────────────┐
│              Area Agent (SAST)               │
│  • Scans code for vulnerability patterns       │
│  • Produces findings with severity + location  │
└──────────────────┬─────────────────────────────┘
                   │ findings
                   ▼
┌────────────────────────────────────────────────┐
│           Validator Agent (SAST)               │
│  • Default assumption: discovering is WRONG        │
│  • Checks in opposition to false constructive registry      │
│  • Verifies rule forex + coverage scope       │
│  • Failure mode: NEEDS_HUMAN (not REJECT)      │
└──────────────────┬─────────────────────────────┘
                   │ confirmed | rejected | needs_human
                   ▼
              Aggregator

This separation of issues — one agent for breadth, one for precision — is what will get false constructive charges into the vary the place builders really belief the output.

The next design guidelines are crucial:

The Validator’s failure mode should be NEEDS_HUMAN, not REJECT. If the coverage registry is unavailable and the Validator defaults to rejecting all findings it may well’t confirm, you will silently suppress actual vulnerabilities on infrastructure failures. Unavailability will not be proof {that a} discovering is fallacious.
The Validator wants a unique toolset than the area agent. The area agent wants code evaluation instruments. The Validator wants coverage verification instruments — entry to false constructive registries, lively coverage selections, rule forex checks. Similar discovering, totally different proof sources. For this reason they’re separate brokers and never a single self-critique immediate.
Rejection causes are extra invaluable than confirmations. A affirmation tells you the discovering is actual. A rejection tells you the area agent has a scientific blind spot — it persistently misidentifies a sample, cites a rule that not exists, or applies a coverage to code exterior its scope. Categorize rejections weekly and you’ve got a direct enchancment roadmap to your prompts, no mannequin retraining required.

The structure has virtually no code in it

That is the half that surprises most engineers after they first encounter it: the orchestrator described above will not be a Python utility, not a LangGraph StateGraph, not a containerized microservice. It’s a Markdown file.

Claude Code’s Agent Groups function executes pure language management stream natively. The orchestrator immediate reads like a do-while loop in plain English — analyze the diff, conditionally spawn area agent duties in parallel, acquire inbox messages from every lane, mixture findings. Claude Code reads this, understands the conditional logic, and executes it via its Process and Teammate primitives.

The complete deployable system we constructed consists of lower than 10 Markdown recordsdata and 0 traces of utility code.

The sensible implications of this structure are important:

Distribution is a file copy. Rolling out to a brand new workforce means including these recordsdata to their repository. There isn’t any service to deploy, no infrastructure to provision, no SDK to put in past Claude Code itself.

Iteration is immediate enhancing. Bettering the system means enhancing Markdown recordsdata. The suggestions loop from “validator is rejecting too aggressively” to “mounted” is minutes, not a construct/deploy cycle.

Area specialists can contribute with out writing code. A safety engineer who has by no means written a Python script can enhance the SAST agent’s detection logic by enhancing its system immediate. In truth, considered one of our workforce members was a technical author. The barrier to contribution is writing clearly, not software program engineering.

The identical artifact runs domestically and in CI. The CLAUDE.md, agent definitions, and MCP server connections are an identical whether or not invoked by a developer pre-commit or by a GitHub Actions workflow on PR open. The set off modifications; nothing else does.

Dealing with coverage context with out drowning each agent in it

One of many non-obvious architectural challenges in the sort of system is shared coverage context. Organizations accumulate design selections, authorized patterns, and architectural constraints over time. This context is related to each area agent — a static evaluation agent must find out about authorized cryptography libraries, an IaaC agent must find out about required useful resource tags, a logic overview agent must find out about authorized authentication patterns.

The naive strategy is to inject all of this into each agent’s context window. This creates two issues: token price scales badly, and brokers begin producing findings that battle with coverage selections they acquired however misapplied.

A cleaner strategy runs coverage context in two modes:

Mode 1: Scoped injection at spawn time. Earlier than the orchestrator followers out, it queries the coverage retailer for selections related to the modified recordsdata and injects a scoped abstract into every agent’s startup immediate. The IaaC agent will get IaaC-relevant insurance policies. The logic overview agent will get authentication and library insurance policies. Brokers begin with the fitting information, not all information.

Mode 2: Standalone coverage compliance lane. A devoted coverage agent runs in parallel particularly checking for architectural drift — instances the place the implementation diverges from what was formally determined. Its scope is slim: compliance checking solely, not safety vulnerability evaluation. Its Validator particularly deduplicates in opposition to findings from different lanes earlier than confirming.

The 2 modes collectively imply coverage information is at all times present (queried dwell, not embedded statically in CLAUDE.md), at all times scoped (brokers do not wade via irrelevant selections), and at all times checked as soon as (the dedup step prevents the identical coverage violation being surfaced by three totally different lanes).

What the output really appears to be like like

The aim will not be complete documentation of each potential concern. The aim is the minimal info a developer must unblock themselves, in precedence order.

╔══ Safety Evaluate ═══════════════════════════════════════╗
║  2 CRITICAL  |  3 HIGH  |  1 MEDIUM  |  5 suppressed     ║
╠══ CRITICAL ══════════════════════════════════════════════╣
║  [Static Analysis + Logic Review — consensus finding]    ║
║  Hardcoded credential — src/config/database.js:14        ║
║  Rule: SAST-SEC-001 | Repair: use surroundings variable      ║
╠══ HIGH ══════════════════════════════════════════════════╣
║  [Logic Review]                                          ║
║  Admin route bypasses auth middleware — routes/admin:89  ║
║  Coverage requires authentication on all /admin paths      ║
╚══════════════════════════════════════════════════════════╝

Two design selections price calling out:

Consensus findings get flagged explicitly. When two or extra area brokers independently verify the identical concern, that affirmation is surfaced within the output. It tells the developer this is not a borderline name from one agent — a number of impartial analyses reached the identical conclusion.

Suppressed depend is proven, not hidden. The developer is aware of 5 findings have been filtered out by validators. This builds belief within the filtering mechanism as a result of it is an auditable layer that may be queried quite than a black field.

CRITICAL findings exit with a non-zero code, blocking the commit. HIGH findings require acknowledgment. The enforcement is proportional to severity, not binary. A developer who hits a wall of unblockable warnings on each commit will discover a means across the software. Proportional enforcement retains the software within the path with out turning into the impediment.

The suggestions loop that makes the system enhance over time

Surprisingly, the rejection log is essentially the most sturdy a part of this structure, not the brokers.

Each time a Validator rejects a site agent’s discovering, it data the rationale: fallacious file/line match, rule not lively, recognized false constructive, coverage exemption utilized, severity overclaimed. These causes, aggregated over weeks of actual PR evaluations, produce a exact enchancment backlog for the area agent prompts.

The static evaluation agent cited SAST-AUTH-003 seventeen instances this week, and the validator rejected all seventeen as a result of that rule was deprecated within the final ruleset replace.

That is a one-line repair to the agent immediate. No mannequin retraining, no infrastructure change — edit the Markdown file to reference the present rule ID.

This suggestions loop compounds. Every iteration of the area agent prompts reduces the validator’s rejection fee. A decrease rejection fee means extra confirmed findings per scan. Extra confirmed findings per scan means builders encounter fewer false positives. Fewer false positives means increased belief. Increased belief means increased adoption.

The system will get higher at roughly the speed you are keen to learn rejection logs and edit prompts. For an inside platform workforce, that is a sustainable operational mannequin.

When to not use this sample

Multi-agent orchestration has actual prices. Every area agent is a separate LLM name with its personal context window. A full five-lane overview with validators runs 8–10 LLM cases concurrently. For a small diff, that is important token spend relative to the sign produced.

The conditional fan-out logic mitigates this — solely spawn brokers related to the modified recordsdata. A CSS-only change would not want IaaC validation. A pure Terraform change would not want logic overview. However the orchestration overhead is actual and should not be obscured.

Use the next heuristics to assist decide when this sample earns its price (and when it is counterproductive):

Use	Do not use
Multi-domain overview floor: findings in a single area have an effect on selections in one other. Safety overview genuinely advantages from parallel specialists with cross-lane correlation.	Duties which might be inherently sequential: in case your overview pipeline requires every step to rely upon the final, fan-out provides coordination overhead with out time financial savings. Use subagents as a substitute.
False constructive fee is a tough constraint: the validator layer exists particularly to get precision excessive sufficient that builders belief the output. For those who can tolerate noisy output, a single well-prompted agent is cheaper and sooner.	Downside is well-solved by a single centered agent: multi-agent orchestration solves the context window drawback for big, multi-domain duties. For a centered, slim process, it is architectural overengineering.

The broader precept

What makes this structure fascinating past the safety use case is what it says about the place the worth sits in AI-powered programs.

The intuition when constructing inside AI tooling is to achieve for frameworks, infrastructure, and utility code. LangGraph pipelines, vector databases, customized APIs, containerized deployments. That intuition usually produces programs which might be onerous to iterate on, onerous to contribute to, and tightly coupled to particular runtime environments.

The prompt-native strategy inverts this. The intelligence is within the prompts — the exact scoping of every agent’s area, the adversarial posture of the validator, the do-while management stream of the orchestrator. The runtime (Claude Code’s Agent Groups) handles execution. The MCP servers are skinny wrappers round current inside APIs, not net-new utility logic.

The result’s a system the place the first engineering problem is pondering clearly about agent tasks, scope boundaries, and validation logic — and encoding that pondering in structured pure language. The deployable artifact is a listing of Markdown recordsdata. The contribution mannequin is accessible to anybody who can write clearly, not simply engineers who can navigate a fancy codebase.

That is a unique type of leverage than most engineering infrastructure delivers.

For those who’re constructing comparable programs or have pushed this structure into manufacturing, essentially the most helpful factor to share is what breaks at scale — particularly, how validator accuracy degrades as codebase complexity will increase, and what immediate patterns have held up. That is the place the fascinating engineering nonetheless lives.

Initially printed on Medium.

The Zero-Code Safety Group: Shifting Left with Immediate-Native AI Brokers

What We Solely Study About Wealth with Time – Funding Moats

Redundancy and resilience | Seth’s Weblog

g6pm6

Related Posts

WordPress Remark Moderation: Plugins and AI Workflows

How To Create an Intriguing Model Voice for Your Rising Enterprise

The way to trademark a brand in 4 straightforward steps

Hostinger unlocks versatile, AI-powered ecommerce for everybody

What Is a SecurityScorecard Score and Why It Issues

Redundancy and resilience | Seth's Weblog

Leave a Reply Cancel reply

Premium Content

Which States are the Most Impacted by Healthcare Information Breaches?

I Requested ChatGPT’s New Agent What to Publish Subsequent — It Received 50,000 Views in 48 Hours

Methods to mechanically add Affiliate ID to your subscribers profile in Convertkit from Thrivecart

Browse by Category

IdeasToMakeMoneyToday

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?

The Zero-Code Safety Group: Shifting Left with Immediate-Native AI Brokers

Safety overview is a fan-out drawback

The Satan’s Advocate sample

The structure has virtually no code in it

Dealing with coverage context with out drowning each agent in it

What the output really appears to be like like

The suggestions loop that makes the system enhance over time

When to not use this sample

The broader precept

What We Solely Study About Wealth with Time – Funding Moats

Redundancy and resilience | Seth’s Weblog

Related Posts

Leave a Reply Cancel reply

Premium Content

Browse by Category

Browse by Tags

IdeasToMakeMoneyToday

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?