Security And Governance

Approvals, redaction, ToolRouter, quarantine, credential boundaries, and audit proof.

Keyboard: / focus search Cmd/Ctrl+K open command menu
Status: implemented Version: latest Review: source-backed Last scanned: 2026-06-25T00:00:00Z Review required: false

Security And Governance

Approvals, redaction, ToolRouter, quarantine, credential boundaries, and audit proof.

Governance principles

Trinity defaults to safe and human-controlled. Agents may research, propose, draft, classify, prepare artifacts, and create action intents. Sensitive execution requires account scope, provider readiness, policy checks, idempotency, human approval, redaction, and audit proof.

Security is layered because the threat model is layered. The system must protect money, email reputation, private documents, inbound poisoned content, provider credentials, model context, account/team boundaries, generated skills, and public proof surfaces at the same time.

Control matrix

Risk Control Evidence
Indirect prompt injection Quarantine and Jido-backed content firewall Release audit and firewall decision metadata
Poisoned skill creation Skills only created from scoped, reviewed, source-backed work Skill scope, source run, review status, promotion proof
Cross-tenant memory leakage Project/account/global-core Hermes scope constraints hermes_skills indexes and hermes_agent_profiles shape checks
Unsafe outbound email Gmail readiness, alias, suppression, safety review, approval Draft, ModelDecision, ApprovalEvent, ToolCall
Unauthorized spend Account caps, spend policy, approval, idempotency ApprovalRequest, ToolCall, ledger entry
Provider secrets leakage Credential vault path, redaction, no secrets in assigns/logs/proof Safe readiness JSON and audit summaries
Tool bypass ToolRouter as side-effect boundary ToolCall status and AuditEvent
Model ambiguity Structured schemas and ModelDecision records Validated decision status and route proof
Webhook replay Signed event verification and event processing records StripeWebhookEvent and idempotent order/revenue writes
Browser-side authority LiveView server state and context-level authorization Protected routes and scoped context calls
Fake/demo proof confusion Sample/fallback/live labels in records and UI Ledger/status labels and docs disclosure

Human-in-the-middle content release

Inbound emails, uploaded docs, and attachments remain metadata-only until a human reviews them. Hermes can reference that an object exists, but cannot read the original text before release. This protects against malicious instructions hidden in email bodies, PDFs, images, encoded text, or copied documents.

Content state Hermes access Operator action
Quarantined inbound email Metadata, sender, subject, source ID Review and release selected content
Quarantined attachment Filename, type, hash, storage ref Preview/download and release if safe
Released document Bounded content or source ref Use in project context and artifacts
Generated artifact Artifact summary and proof links Approve/share/download
Skill candidate Summary, diff, source run Approve, promote, retire, or delete

Redaction posture

Public docs and proof surfaces expose:

  • Record IDs, statuses, timestamps, hashes, provider labels, and summaries.
  • Source paths and code modules for audit.
  • Approval outcomes and policy denial reasons.

They do not expose:

  • Credential values.
  • Private customer payloads.
  • Raw model context.
  • Hidden model reasoning.
  • Unreleased inbound email or document content.

Approval and action boundaries

Action Default Required before execution
Gmail draft creation Allowed only through configured Gmail path OAuth mailbox, alias readiness, policy check
Gmail send/reply Blocked by default LIVE_OUTREACH, alias, suppression check, safety review, ApprovalEvent
Stripe checkout/revenue Allowed when configured for public offer Signed webhook and idempotent processing
Agent spend/provisioning Blocked by default LIVE_SPEND, cap check, approval, idempotency
Tool execution Blocked if direct or malformed ToolRouter policy and adapter readiness
Model review Required for risky copy/deliverables Valid ModelDecision schema and fail-closed parsing
Secret changes Never autonomous Account owner action outside agent authority
Refunds/disputes/legal incidents Never autonomous Human/manual escalation

Credential and readiness boundaries

Trinity separates provider readiness from secret values. UI surfaces can show that NVIDIA, Stripe, Gmail, Hermes, Tigris, or other providers are configured or missing, but secret values are never rendered in LiveView assigns, logs, readiness JSON, docs, ToolCall rows, prompts, model context, or public proof.

Provider family Safe public/readiness data Private data
NVIDIA Route name, provider status, model label, decision status API key, raw prompt context, private payload
Stripe Mode, webhook status, event IDs, checkout IDs Secret key, webhook signing secret
Gmail Mailbox status, alias, thread/message IDs OAuth refresh token, raw unreleased inbound body
Hermes Runtime health, profile scope, skill scope Runtime credentials, private memory content
Storage Object key/hash/status Unreleased private document body

Official references

System Use in Trinity Official docs
Gmail OAuth scope and send/draft boundaries Gmail API scopes
Gmail drafts Draft creation and send flow Gmail drafts guide
Stripe Signed webhook verification Stripe webhooks
Jido Typed firewall action seam Jido Action

Source paths

  • docs/security/threat-model.md
  • docs/security/ai-governance-posture.md
  • lib/autonomous_agency/tools/tool_router.ex
  • lib/autonomous_agency/approvals.ex
  • lib/autonomous_agency/audit.ex
  • lib/autonomous_agency/security/redaction.ex
  • lib/autonomous_agency/security/content_firewall.ex
Was this page useful? Source-backed feedback keeps public docs honest.