Our Security Philosophy
Cyrus follows three core principles:- No hidden autonomy
- No privileged instruction sources
- No silent tool execution
We Do Not Override Claude’s System Prompt
Cyrus does not replace or weaken Claude Code’s native system prompt. Instead:- Claude’s built-in prompt injection defenses remain fully intact
- Cyrus only appends contextual guidance, never overrides safety rules
- All instruction hierarchy, trust boundaries, and verification logic remain Claude-native
- Instruction origin tracking
- Untrusted content isolation
- Mandatory user verification for action-like content
- Rule immutability
Protection Against Prompt Injection Attacks
Indirect & Zero-Click Prompt Injection
Threat:Malicious instructions embedded in emails, documents, web pages, logs, or tool outputs. Cyrus Protection:
- Instructions from tools, documents, MCP servers, and web content are treated as untrusted data
- Claude is required to stop execution and surface the content to the user
- Explicit user approval is required before any action
Web & RAG-Based Injection
Threat:Malicious instructions hidden in search results, scraped pages, or retrieved documents. Cyrus Protection:
- Web and retrieval results are always treated as data, never authority
- Claims of “system”, “admin”, or “developer” instructions from content are ignored
- Conflicting instructions always defer to safety rules
Defense Against Goal Drift & Silent Behavior Changes
Threat:Subtle prompt changes over time that steer an agent toward unintended goals. Cyrus Protection:
- Cyrus does not allow background or scheduled prompt mutation
- All work is triggered by explicit Linear issues
- Instructions are not persisted or compounded across runs unless explicitly restated
Tool & MCP Execution Safety
Explicit Tool Allow-Listing
Cyrus enforces a strict allow-list for all tools, including MCP servers.- Tools must be explicitly permitted per repository
- MCP tools require both:
- A defined server (
.mcp.json) - An allow-listed namespace (
mcp__servername)
- A defined server (
No Silent Tool Execution
Cyrus does not:- Auto-discover tools
- Execute tools opportunistically
- Escalate permissions dynamically
- Explicit
- Traceable
- Bounded by configuration
Instruction Source Isolation
Cyrus enforces a clear instruction hierarchy:| Source | Trust Level |
|---|---|
| User messages (Linear issue) | Trusted |
| System prompt (Claude) | Trusted |
| Tool results / MCP output | Untrusted |
| Web content / documents | Untrusted |
Social Engineering & Authority Impersonation Defense
Cyrus inherits Claude’s protections against:- Fake “admin” or “system” messages
- Urgent or emergency manipulation
- Claims of pre-authorization
- Attempts to redefine agent roles or permissions
- Stop execution
- Quote the suspicious content
- Ask the user for explicit confirmation
Auditability & Transparency
Cyrus is designed to be inspectable:- Actions map directly to Linear issues
- Tool calls are explicit and permission-bound
- No background or autonomous behavior outside declared workflows
- Security reviews
- Internal audits
- Regulated environments
Compliance & Controls
Cyrus complements its agent-level protections with organizational controls:- SOC 2 compliant infrastructure
- Principle of least privilege
- No secret material committed to repositories
- Environment-scoped credentials
Summary
Cyrus is not a “black box autonomous agent”. It is a controlled, auditable, and safety-first system that:- Preserves Claude’s strongest defenses
- Adds explicit tool and instruction boundaries
- Prevents silent escalation or drift
- Makes all meaningful actions user-directed

Cyrus Community
Ask security questions or talk directly with the team on Discord

