← Back
← All Articles

Paperclip v0.3.1: Independent Security Assessment

Five findings from a static source code analysis of an open-source AI agent orchestration platform — mapped to OWASP LLM Top 10 (2025) and MITRE ATLAS.

Executive Summary

Paperclip is an open-source, self-hosted orchestration platform (MIT license, github.com/paperclipai/paperclip) that coordinates teams of AI agents — Claude Code, OpenClaw, Codex, Cursor, and arbitrary HTTP-reachable runtimes — into structured company hierarchies with org charts, budgets, goals, and governance. It launched in early March 2026 and accumulated over 14,200 GitHub stars in its first week.

The platform's value proposition is compelling: bring your own AI agents, assign them roles (CEO, CTO, Engineer, Designer), set company-level goals, and let Paperclip handle coordination, budget enforcement, and accountability. Agents wake on scheduled heartbeats, check for work, execute tasks, and report back — all traced through an immutable audit log.

I performed a static source code analysis of version 0.3.1 to assess whether the platform's security posture matches its governance marketing. The short answer: it does not. While Paperclip demonstrates thoughtful security engineering in several subsystems, the default deployment configuration bypasses nearly all of it.

Assessment Methodology

This assessment was conducted through static analysis of the Paperclip v0.3.1 source code cloned from the public GitHub repository on March 14, 2026. The following server components were reviewed: authentication middleware, JWT implementation, secrets management, agent adapter execution logic, configuration defaults, and log redaction.

Findings are mapped to the OWASP Top 10 for LLM Applications (2025 edition) and the MITRE ATLAS framework. CVSS scores are estimated based on assessed attack vectors and impacts. No dynamic testing, penetration testing, or live exploitation was performed. All findings are based on source code analysis and documented behavior.

Findings

Five findings were identified across the authentication, authorization, and agent execution subsystems. Two are rated Critical, two High, and one Medium.

F-01: Default Authentication Bypass via local_trusted Mode

Severity: CRITICAL | OWASP: LLM06 Excessive Agency | MITRE ATLAS: AML.T0040 | CVSS Est: 9.1

The default deployment mode is local_trusted, set in server/src/config.ts at line 117. When this mode is active, the authentication middleware in server/src/middleware/auth.ts (lines 22-25) automatically promotes every incoming HTTP request to a full board-level actor with isInstanceAdmin: true — regardless of whether any authentication credentials are provided. There is no login screen, no API key check, no session validation. Every request is treated as coming from the system owner.

Any process, script, or application running on the same machine — or any device on the same network if the server binds to 0.0.0.0 — can execute any API call with full administrative privileges. This includes creating and terminating agents, modifying token budgets, accessing the full audit log, and reading encrypted secrets. The Paperclip documentation recommends Tailscale for mobile access, which would extend this unrestricted admin access to every device on the user's Tailscale network.

This finding maps to OWASP LLM06 (Excessive Agency) because the system grants maximum privileges by default with no authentication challenge. It maps to MITRE ATLAS AML.T0040 (ML Model Inference API Access) because an adversary with network access can interact with the full agent orchestration API without credentials, enabling unauthorized model invocation, task creation, and data exfiltration.

The default deployment mode should be authenticated, not local_trusted. The platform should require explicit opt-in for trust-all mode via an environment variable, with a prominent console warning when it is enabled.

F-02: Hardcoded Fallback Authentication Secret

Severity: CRITICAL | OWASP: LLM02 Sensitive Info Disclosure | MITRE ATLAS: AML.T0024 | CVSS Est: 8.6

The Better Auth instance creation in server/src/auth/better-auth.ts at line 70 uses a hardcoded fallback secret: the literal string "paperclip-dev-secret". This value is used for session cookie signing when neither BETTER_AUTH_SECRET nor PAPERCLIP_AGENT_JWT_SECRET environment variables are set. Because the source code is publicly available under the MIT license, any person who reads the repository knows this secret.

On any Paperclip instance deployed in authenticated mode without explicitly setting a unique secret, an attacker can craft valid session cookies using the known default, impersonate any user including administrators, and gain full control of every company and agent managed by that instance. This effectively reduces the authenticated mode — which is supposed to be the secure deployment option — to the same security posture as local_trusted.

This maps to OWASP LLM02 (Sensitive Information Disclosure) because the authentication secret is embedded in public source code. It maps to MITRE ATLAS AML.T0024 (Exfiltration via Cyber Means) because forged admin sessions enable extraction of all agent configurations, API keys stored in the secrets vault, audit logs, and any data processed by agents.

Remove the hardcoded fallback entirely. The platform should refuse to start in authenticated mode if BETTER_AUTH_SECRET is not set. On first run, it should generate a cryptographically random secret and persist it to a local file with restrictive permissions (similar to how it already handles the master encryption key).

F-03: Dangerous Permission Skip Flag in Agent Adapter

Severity: HIGH | OWASP: LLM06 Excessive Agency | MITRE ATLAS: AML.T0051 | CVSS Est: 7.8

The Claude local adapter in packages/adapters/claude-local/src/server/execute.ts at line 398 supports a configuration flag called --dangerously-skip-permissions. When enabled, this flag is passed directly to the Claude Code runtime and disables all permission checks — meaning the agent can execute arbitrary file system operations, run shell commands, install packages, and modify system state without any human-in-the-loop confirmation.

A misconfigured or compromised agent running with this flag has unrestricted access to the host operating system. Combined with Finding F-01 (default admin mode with no authentication), any network-adjacent attacker could use the API to create a new agent with this flag enabled and achieve remote code execution on the host machine. This turns an AI orchestration platform into an unauthenticated RCE vector.

This maps to OWASP LLM06 (Excessive Agency) because the platform provides a mechanism to completely remove execution guardrails from AI agents. While the flag is clearly labeled as dangerous and is intended for trusted development environments, the combination with weak default authentication creates a privilege escalation path from network access to arbitrary code execution. It also maps to MITRE ATLAS AML.T0051 (LLM Prompt Injection) because an attacker who can influence agent prompts can leverage the unrestricted execution environment to perform actions far beyond the intended scope.

Require explicit per-agent opt-in for permission skipping through the governance approval system. Log every invocation of this flag to the immutable audit trail with alert triggers. In authenticated deployment mode, prevent this flag from being set via the API without board-level approval.

F-04: Environment Variable Exposure to Agent Child Processes

Severity: HIGH | OWASP: LLM02 Sensitive Info Disclosure | MITRE ATLAS: AML.T0048 | CVSS Est: 7.2

When Paperclip executes an agent task, the adapter spawns a child process using runChildProcess and passes an extensive set of environment variables to the agent runtime. These variables — constructed across lines 148-226 of the Claude adapter's execute.ts — include the Paperclip API URL, agent authentication tokens, run IDs, task IDs, workspace file paths, and potentially the ANTHROPIC_API_KEY. All of these values are directly accessible to any code the agent writes or executes.

A prompt-injected or malicious agent task can trivially read process.env or /proc/self/environ to harvest every credential and internal URL passed by Paperclip. These credentials can then be exfiltrated to external services through the agent's normal output channels — HTTP requests, file writes, or even encoded in task completion messages. Since agents are designed to make network requests and write files, there is no behavioral anomaly to detect.

This maps to OWASP LLM02 (Sensitive Information Disclosure) because sensitive credentials are exposed to the AI agent runtime without scoping, time-limiting, or access controls. It maps to MITRE ATLAS AML.T0048 (Exfiltration via ML Inference API) because an adversary who achieves prompt injection on an agent can instruct it to read its own environment and include the harvested secrets in outputs, API calls, or generated files.

Replace environment variable injection with short-lived, scope-limited tokens delivered via temporary files with restrictive file permissions (0o600). Implement secret rotation after each agent run completes. Consider using a sidecar process or Unix domain socket for credential access rather than exposing secrets in the process environment.

F-05: Cross-Agent Prompt Injection via Shared Goal Hierarchy

Severity: MEDIUM | OWASP: LLM01 Prompt Injection | MITRE ATLAS: AML.T0051 | CVSS Est: 6.5

Paperclip's core architecture traces every task back to a company mission through a goal hierarchy. When an agent wakes to perform work, its prompt context is constructed from multiple sources: the task description, project-level goals, company mission statement, inter-agent delegation notes, @-mention messages from other agents, and session handoff markdown from previous runs. All of these text fields are controlled by other agents or users and are included in the prompt without any sanitization, escaping, or structural separation from system instructions.

A malicious or compromised agent operating in a lower-privilege role — say, an "SEO Analyst" — could embed prompt injection payloads in its task outputs, delegation notes, or @-mention responses. When a downstream agent with broader permissions (say, a "CTO" role with code execution access) processes these messages as part of its context window, the injection payload executes in the higher-privilege agent's context. This creates a transitive trust chain where any agent's output becomes another agent's trusted input.

This maps directly to OWASP LLM01 (Prompt Injection) — specifically the indirect variant where adversarial content is planted in data that the LLM later processes. It maps to MITRE ATLAS AML.T0051 because the multi-agent architecture creates propagation paths for adversarial prompts across privilege boundaries. This is a novel attack surface that is specific to multi-agent orchestration platforms and is not adequately addressed by traditional application security frameworks.

Implement structural prompt boundary markers (such as XML-style delimiters or special tokens) between system instructions and inter-agent content. Apply content filtering and anomaly detection on agent-to-agent message passing. Enforce role-based output validation before any agent's output is injected into another agent's context. Consider implementing a "clean room" pattern where inter-agent messages are summarized by a separate, sandboxed model before being passed to the receiving agent.

What They Did Right

This assessment is not entirely critical. Paperclip demonstrates intentional security engineering in several subsystems that deserve recognition.

Secrets Encryption (AES-256-GCM): The local encrypted secrets provider uses AES-256-GCM with random 12-byte initialization vectors and authentication tags. The master key file is created with 0o600 permissions (owner-only read/write), and key derivation supports hex, base64, and raw 32-byte formats. This implementation follows current cryptographic best practices.

Timing-Safe JWT Verification: The agent JWT implementation uses Node.js crypto.timingSafeEqual for signature comparison, preventing timing side-channel attacks that could allow an attacker to incrementally guess valid signatures. Tokens are properly scoped with agent ID, company ID, adapter type, and run ID, with a configurable TTL defaulting to 48 hours.

Automatic Log Redaction: The log redaction module automatically detects and strips OS usernames and home directory paths from all log output across multiple operating systems. This reduces the risk of accidental PII exposure in audit trails, error logs, and diagnostic output.

Immutable Audit Trail: All agent actions, tool calls, API requests, and decisions are logged to an append-only audit system with no edit or delete capability. This provides forensic capability for incident response and creates an accountability record for every action taken by every agent.

Per-Agent Budget Enforcement: Token spending limits are enforced per agent with atomic checkout operations to prevent double-work and runaway costs. When an agent hits its budget limit, execution stops. This is a practical control that prevents the "paperclip maximizer" scenario where an agent consumes unlimited resources pursuing its objective.

Framework Cross-References

OWASP LLM Top 10 (2025):

  • LLM01 (Prompt Injection): F-05 — Cross-agent prompt injection via goal hierarchy
  • LLM02 (Sensitive Information Disclosure): F-02 — Hardcoded auth secret; F-04 — Env var exposure to agents
  • LLM06 (Excessive Agency): F-01 — Default admin access; F-03 — Permission skip flag

MITRE ATLAS:

  • AML.T0040 (ML Model Inference API Access): F-01 — Unauthenticated API access to full agent orchestration
  • AML.T0024 (Exfiltration via Cyber Means): F-02 — Forged sessions enable full data extraction
  • AML.T0051 (LLM Prompt Injection): F-03 and F-05 — Unrestricted execution via injection; cross-agent injection
  • AML.T0048 (Exfiltration via ML Inference API): F-04 — Credential theft through agent output channels

NIST AI RMF: Findings F-01 and F-03 map to GOVERN 1.1 (Legal and regulatory requirements) and MAP 3.5 (Scientific integrity and information quality). The default-insecure configuration contradicts NIST's guidance that AI systems should implement "secure by default" configurations.

Google SAIF: Finding F-05 maps to Principle 3 (Automate defenses to keep pace with threats). The lack of inter-agent prompt sanitization violates SAIF's guidance on maintaining trust boundaries between AI components in multi-model architectures.

Conclusion

Paperclip v0.3.1 demonstrates a fundamental disconnect between marketing and defaults. The platform sells governance, approval gates, and human oversight as core features — but ships with a configuration that bypasses all of them. A developer who follows the quickstart guide (npx paperclipai onboard --yes) gets a system where every request is auto-promoted to admin and session secrets are publicly known.

For organizations evaluating Paperclip for production use, the platform requires explicit hardening before deployment: setting the deployment mode to authenticated, generating unique secrets for BETTER_AUTH_SECRET and PAPERCLIP_AGENT_JWT_SECRET, disabling the dangerous permissions flag, and implementing network segmentation to restrict API access.

For the AI security community, Paperclip represents an important case study in how multi-agent orchestration platforms introduce novel attack surfaces — particularly cross-agent prompt injection — that traditional application security frameworks do not fully address. As these platforms mature and adoption grows, the security community will need new patterns for agent isolation, prompt boundary enforcement, and inter-agent trust verification.

Download Full Assessment

The complete assessment with detailed evidence references, CVSS scoring rationale, and additional framework mappings is available as a PDF.

Download Full Assessment (PDF)

Methodology Note

This assessment was conducted through static analysis of publicly available source code (MIT license) cloned from GitHub on March 14, 2026. No dynamic testing, penetration testing, or live exploitation was performed. All CVSS scores are estimated based on assessed attack vectors and impacts. The author is not affiliated with Paperclip or its maintainers. Responsible disclosure practices should be followed for any vulnerabilities identified in active open-source projects.

Paul Holder is a security professional with operational AI evaluation experience who conducts independent security assessments of AI systems. This assessment was performed as independent research on publicly available source code.

Get More Like This

Independent security assessments and honest analysis of AI systems. No hype, no doom.