Agent Security

Cyrus is designed for teams that care deeply about security, predictability, and control. Modern AI agents introduce new attack surfaces — especially around prompt injection, tool execution, and autonomous behavior. Cyrus is built to reduce those risks by design, not by policy alone. This page explains how Cyrus protects your organization, and how those protections align with enterprise security expectations.

Our Security Philosophy

Cyrus follows three core principles:

No hidden autonomy
No privileged instruction sources
No silent tool execution

Cyrus does not attempt to “outsmart” or override the underlying AI model’s safety mechanisms. Instead, it inherits and preserves them, adding additional structural guardrails at the system level.

We Do Not Override Claude’s System Prompt

Cyrus does not replace or weaken Claude Code’s native system prompt. Instead:

Claude’s built-in prompt injection defenses remain fully intact
Cyrus only appends contextual guidance, never overrides safety rules
All instruction hierarchy, trust boundaries, and verification logic remain Claude-native

This means Cyrus benefits directly from Claude’s strongest protections, including:

Instruction origin tracking
Untrusted content isolation
Mandatory user verification for action-like content
Rule immutability

Protection Against Prompt Injection Attacks

Indirect & Zero-Click Prompt Injection

Threat:
Malicious instructions embedded in emails, documents, web pages, logs, or tool outputs. Cyrus Protection:

Instructions from tools, documents, MCP servers, and web content are treated as untrusted data
Claude is required to stop execution and surface the content to the user
Explicit user approval is required before any action

Cyrus never executes instructions simply because they appear relevant.

Web & RAG-Based Injection

Threat:
Malicious instructions hidden in search results, scraped pages, or retrieved documents. Cyrus Protection:

Web and retrieval results are always treated as data, never authority
Claims of “system”, “admin”, or “developer” instructions from content are ignored
Conflicting instructions always defer to safety rules

Cyrus will surface suspicious content rather than act on it.

Defense Against Goal Drift & Silent Behavior Changes

Threat:
Subtle prompt changes over time that steer an agent toward unintended goals. Cyrus Protection:

Cyrus does not allow background or scheduled prompt mutation
All work is triggered by explicit Linear issues
Instructions are not persisted or compounded across runs unless explicitly restated

There is no concept of “quiet mode”, recurring hidden prompts, or silent objective reweighting.

Tool & MCP Execution Safety

Explicit Tool Allow-Listing

Cyrus enforces a strict allow-list for all tools, including MCP servers.

Tools must be explicitly permitted per repository
MCP tools require both:
- A defined server (.mcp.json)
- An allow-listed namespace (mcp__servername)

If a tool is not allowed, Cyrus will refuse to call it — even if instructed.

No Silent Tool Execution

Cyrus does not:

Auto-discover tools
Execute tools opportunistically
Escalate permissions dynamically

All tool use is:

Explicit
Traceable
Bounded by configuration

Instruction Source Isolation

Cyrus enforces a clear instruction hierarchy:

Source	Trust Level
User messages (Linear issue)	Trusted
System prompt (Claude)	Trusted
Tool results / MCP output	Untrusted
Web content / documents	Untrusted

Only user-authored instructions can trigger actions. Claims of authority from any other source are treated as malicious by default.

Cyrus inherits Claude’s protections against:

Fake “admin” or “system” messages
Urgent or emergency manipulation
Claims of pre-authorization
Attempts to redefine agent roles or permissions

If encountered, Cyrus will:

Stop execution
Quote the suspicious content
Ask the user for explicit confirmation

Auditability & Transparency

Cyrus is designed to be inspectable:

Actions map directly to Linear issues
Tool calls are explicit and permission-bound
No background or autonomous behavior outside declared workflows

This makes Cyrus suitable for:

Security reviews
Internal audits
Regulated environments

Compliance & Controls

Cyrus complements its agent-level protections with organizational controls:

SOC 2 compliant infrastructure
Principle of least privilege
No secret material committed to repositories
Environment-scoped credentials

Security is enforced at both the system and agent levels.

Summary

Cyrus is not a “black box autonomous agent”. It is a controlled, auditable, and safety-first system that:

Preserves Claude’s strongest defenses
Adds explicit tool and instruction boundaries
Prevents silent escalation or drift
Makes all meaningful actions user-directed

Cyrus Community

Ask security questions or talk directly with the team on Discord

Getting Started

Agent Configuration

Model Providers

Security

Advanced

Local Setup

Our Security Philosophy

We Do Not Override Claude’s System Prompt

Protection Against Prompt Injection Attacks

Indirect & Zero-Click Prompt Injection

Web & RAG-Based Injection

Defense Against Goal Drift & Silent Behavior Changes

Tool & MCP Execution Safety

Explicit Tool Allow-Listing

No Silent Tool Execution

Instruction Source Isolation

Auditability & Transparency

Compliance & Controls

Summary

Cyrus Community

Getting Started

Agent Configuration

Model Providers

Security

Advanced

Local Setup

​Our Security Philosophy

​We Do Not Override Claude’s System Prompt

​Protection Against Prompt Injection Attacks

​Indirect & Zero-Click Prompt Injection

​Web & RAG-Based Injection

​Defense Against Goal Drift & Silent Behavior Changes

​Tool & MCP Execution Safety

​Explicit Tool Allow-Listing

​No Silent Tool Execution

​Instruction Source Isolation

​Social Engineering & Authority Impersonation Defense

​Auditability & Transparency

​Compliance & Controls

​Summary

Cyrus Community

Our Security Philosophy

We Do Not Override Claude’s System Prompt

Protection Against Prompt Injection Attacks

Indirect & Zero-Click Prompt Injection

Web & RAG-Based Injection

Defense Against Goal Drift & Silent Behavior Changes

Tool & MCP Execution Safety

Explicit Tool Allow-Listing

No Silent Tool Execution

Instruction Source Isolation

Social Engineering & Authority Impersonation Defense

Auditability & Transparency

Compliance & Controls

Summary