Agentman › Security & Governance › Use-Only Skills

Use-only skills · Security whitepaper

Confidential knowledge your colleagues' agents can apply — but never read.

Use-only skills let an organization share codified expertise so an AI agent can apply it to a task, without the recipient being able to read, copy, or export its contents. The protection is structural: the confidential body is never placed in the agent's readable context, so it cannot be disclosed through the agent.

Request the whitepaper Talk to our team

disclosures across all adversarial attempts

259

extraction attempts · 24 techniques · 4 AI models

105

automated tests that fail if protection regresses

100%

of content-retrieval paths redacted or blocked

Purpose & scope

One question, answered precisely.

A subject-matter expert encodes proprietary knowledge — a litigation playbook, a pricing model, a diligence checklist — as a skill, and shares it so a colleague's agent can apply it to live work. The whitepaper answers:

Can a person to whom a use-only skill is shared extract its contents through the AI agent?

The relevant adversary is an authorized insider — a recipient granted use of the skill who attempts to convert “use” into “copy.” This is distinct from an external attacker, and it is the threat the feature is designed to address.

Security model

Structural protection, not behavioral.

Security research is unambiguous: instructions, training, and output filters are heuristic and cannot guarantee non-disclosure under adversarial input. A control that depends on the model choosing not to reveal content degrades as models change.

The naive approach

Behavioral control

Give the agent the skill body and instruct it to apply the content without revealing it. Disclosure depends on the model's discretion — inherently heuristic, and breakable under adversarial pressure.

The Agentman approach

Structural control

For a use-only recipient, the skill body is never placed in the agent's readable context. It is delivered out-of-band for application only; every retrieval path — read, export, file fetch — returns a redacted result or an error. Leakage is prevented by absence.

Confidential skill bodyexpert's playbook, model, checklist

→

Out-of-band deliveryapplication only, content-scoped

→

Agent applies itwork product belongs to the user

→

Read · export · fetchredacted or error — always

The model's refusal behavior is retained only as a secondary, defense-in-depth backstop — not the guarantee.

Threat model

Defined against a capable, authorized insider.

Element	Definition
Protected asset	The confidential body of a use-only skill (instructions and reference files).
Adversary	An authorized use-only recipient of the skill.
Adversary capabilities	Full access to the AI agent and its skills tools; arbitrary prompts; a capable, cooperative model; knowledge that the skill exists.
Security objective	The recipient can apply the skill to a task but cannot obtain its verbatim contents — or a faithful paraphrase — through any path exposed by the system.

Evaluation

Verified in three layers.

1 · Structural verification, live against production

Acting as an authorized use-only recipient, every content-retrieval path exposed by the agent's tools was exercised. None returned the skill body. The result does not depend on which AI model is in use.

Retrieval path	Outcome
open_skill (apply the skill)	Metadata only — body not present in the agent's context
Read the body as a file	Not found
Crafted file-path traversal	Rejected — path-traversal protection
Operating-system path traversal	Blocked at the network boundary (defense-in-depth)
Export the skill package	Refused for a use-only recipient

Each application of a use-only skill is recorded in an immutable audit log: recipient identity, client, workspace, timestamp.

2 · Automated regression coverage

The access-control logic is covered by 105 automated tests (89 service-level, 16 protocol-level), constructed to fail if the protection regresses — confirmed by mutation testing: disabling the path-traversal validator exposes the secret, so any future weakening breaks the build. Coverage spans the full access-level × scope × tenant matrix; redaction on every read path (and the converse — a Read-grantee correctly receives the body); path-traversal rejection; and fail-closed audit logging: if the audit write fails, no content is served.

3 · Adversarial extraction study

To test the behavioral backstop, an agent was given the confidential body (with a marked canary value) under the production non-disclosure directive and subjected to 24 escalating techniques — direct requests, authority and urgency pressure, “ignore previous instructions” injections, debugging and tool-confusion prompts, translation / Base64 / acrostic / fill-in-the-blank extraction, and post-refusal persistence — across four generative-AI models, with model identity verified from run logs.

Model	Attempts	Substance disclosures
Claude Opus 4.8	120	0
Claude Opus 4.7	48	0
Claude Sonnet 4.6	43	0
Claude Haiku 4.5	48	0
Total	259	0

A few responses from the smallest model acknowledged that confidential content existed while still refusing to reveal it; manual review confirmed no disclosure.

How we report zero

A zero-disclosure count is reported with its sample size and a confidence bound — not as a “0% rate.” By the rule of three, zero events in 259 trials is consistent with a true behavioral disclosure rate below ~1.2% (one-sided 95% bound). This bound applies only to the behavioral backstop; the structural guarantee is categorical for the paths tested, not probabilistic.

Defense in depth

Four layers. A failure of any upper layer is contained by the one below.

Audit logging

Every application is attributed — recipient, client, workspace, timestamp. Detective control.

Non-disclosure directive

The agent is instructed to apply but not reveal. Behavioral; heuristic backstop only.

Content scoping

The body is marked for application only, so conforming clients do not render it.

Structural redaction — the primary control

The body is never placed in the agent's readable context; all read, export, and file paths redact or error. Model-independent.

Honest boundaries

What this protects — and what it deliberately doesn't.

An informed trust decision requires an explicit boundary. We state ours precisely.

Protected

The confidential skill body — instructions and reference files — against extraction by a use-only recipient through the agent.
Every ordinary retrieval path: read, export, and file fetch all redact or error.
Attribution: every application is captured in an immutable audit log.

Out of scope, by design

Users granted Read access or higher — they are authorized to view the content.
A non-conforming AI client the customer connects; connected clients are part of the customer's trust boundary.
The applied work product (e.g., a draft brief) — that output belongs to the user.
Out-of-band channels, such as photographing a screen.

Residual risk, stated plainly: the behavioral backstop is specific to the models tested and weaker on smaller models — which is exactly why the model-independent structural control is primary. The adversarial study was scored within a single model family; an independent red-team assessment would further strengthen the behavioral result. Reported figures are statistical bounds, not certainties of zero.

Governance around it

Use-only is one of four access levels — and everything is logged.

Every skill carries the same Google-Docs-style sharing model: grant a teammate by name or email — or the whole workspace in one click — one of four levels. Use is the strictest. The matrix is exact:

Capability	Use	Read	Edit	Admin	Owner
Find in the library / search	✓	✓	✓	✓	✓
Run it / apply to an agent	✓	✓	✓	✓	✓
See the instructions & files	—	✓	✓	✓	✓
Export / clone	—	✓	✓	✓	✓
Edit content & publish versions	—	—	✓	✓	✓
Manage sharing, delete	—	—	—	✓	✓
Transfer ownership	—	—	—	✓	✓

Per-skill Activity

Owners and admins see total applies, distinct users, and last-applied for each skill, with a newest-first event trail: body applied, file applied, bound to an agent, viewed. The applier's identity is visible only to the owner and admins — and the applier never sees the body.

Workspace access log

One log across every skill in the workspace, for workspace admins — filter by skill, by user, or by event type. It answers the compliance question — who used what expertise, when — in one place.

Publishing, sharing changes, and ownership transfers are permission-gated at the edit and admin levels; every application of a use-only skill is attributed to a named user. Workspaces are the outer boundary: many firms run one workspace per client or matter, with per-skill grants as the fine grain inside it.

Bring your security team.

Read the full whitepaper and companion technical evaluation report — full methodology, per-test results, and statistics — or walk through the architecture with us.

Request the whitepaper Talk to our team