Security And Governance
Approvals, redaction, ToolRouter, quarantine, credential boundaries, and audit proof.
Governance principles
Trinity defaults to safe and human-controlled. Agents may research, propose, draft, classify, prepare artifacts, and create action intents. Sensitive execution requires account scope, provider readiness, policy checks, idempotency, human approval, redaction, and audit proof.
Security is layered because the threat model is layered. The system must protect money, email reputation, private documents, inbound poisoned content, provider credentials, model context, account/team boundaries, generated skills, and public proof surfaces at the same time.
Control matrix
| Risk | Control | Evidence |
|---|---|---|
| Indirect prompt injection | Quarantine and Jido-backed content firewall | Release audit and firewall decision metadata |
| Poisoned skill creation | Skills only created from scoped, reviewed, source-backed work | Skill scope, source run, review status, promotion proof |
| Cross-tenant memory leakage | Project/account/global-core Hermes scope constraints |
hermes_skills indexes and hermes_agent_profiles shape checks
|
| Unsafe outbound email | Gmail readiness, alias, suppression, safety review, approval | Draft, ModelDecision, ApprovalEvent, ToolCall |
| Unauthorized spend | Account caps, spend policy, approval, idempotency | ApprovalRequest, ToolCall, ledger entry |
| Provider secrets leakage | Credential vault path, redaction, no secrets in assigns/logs/proof | Safe readiness JSON and audit summaries |
| Tool bypass | ToolRouter as side-effect boundary | ToolCall status and AuditEvent |
| Model ambiguity | Structured schemas and ModelDecision records | Validated decision status and route proof |
| Webhook replay | Signed event verification and event processing records | StripeWebhookEvent and idempotent order/revenue writes |
| Browser-side authority | LiveView server state and context-level authorization | Protected routes and scoped context calls |
| Fake/demo proof confusion | Sample/fallback/live labels in records and UI | Ledger/status labels and docs disclosure |
Human-in-the-middle content release
Inbound emails, uploaded docs, and attachments remain metadata-only until a human reviews them. Hermes can reference that an object exists, but cannot read the original text before release. This protects against malicious instructions hidden in email bodies, PDFs, images, encoded text, or copied documents.
| Content state | Hermes access | Operator action |
|---|---|---|
| Quarantined inbound email | Metadata, sender, subject, source ID | Review and release selected content |
| Quarantined attachment | Filename, type, hash, storage ref | Preview/download and release if safe |
| Released document | Bounded content or source ref | Use in project context and artifacts |
| Generated artifact | Artifact summary and proof links | Approve/share/download |
| Skill candidate | Summary, diff, source run | Approve, promote, retire, or delete |
Redaction posture
Public docs and proof surfaces expose:
- Record IDs, statuses, timestamps, hashes, provider labels, and summaries.
- Source paths and code modules for audit.
- Approval outcomes and policy denial reasons.
They do not expose:
- Credential values.
- Private customer payloads.
- Raw model context.
- Hidden model reasoning.
- Unreleased inbound email or document content.
Approval and action boundaries
| Action | Default | Required before execution |
|---|---|---|
| Gmail draft creation | Allowed only through configured Gmail path | OAuth mailbox, alias readiness, policy check |
| Gmail send/reply | Blocked by default |
LIVE_OUTREACH, alias, suppression check, safety review, ApprovalEvent
|
| Stripe checkout/revenue | Allowed when configured for public offer | Signed webhook and idempotent processing |
| Agent spend/provisioning | Blocked by default |
LIVE_SPEND, cap check, approval, idempotency
|
| Tool execution | Blocked if direct or malformed | ToolRouter policy and adapter readiness |
| Model review | Required for risky copy/deliverables | Valid ModelDecision schema and fail-closed parsing |
| Secret changes | Never autonomous | Account owner action outside agent authority |
| Refunds/disputes/legal incidents | Never autonomous | Human/manual escalation |
Credential and readiness boundaries
Trinity separates provider readiness from secret values. UI surfaces can show that NVIDIA, Stripe, Gmail, Hermes, Tigris, or other providers are configured or missing, but secret values are never rendered in LiveView assigns, logs, readiness JSON, docs, ToolCall rows, prompts, model context, or public proof.
| Provider family | Safe public/readiness data | Private data |
|---|---|---|
| NVIDIA | Route name, provider status, model label, decision status | API key, raw prompt context, private payload |
| Stripe | Mode, webhook status, event IDs, checkout IDs | Secret key, webhook signing secret |
| Gmail | Mailbox status, alias, thread/message IDs | OAuth refresh token, raw unreleased inbound body |
| Hermes | Runtime health, profile scope, skill scope | Runtime credentials, private memory content |
| Storage | Object key/hash/status | Unreleased private document body |
Primary source links
Official references
| System | Use in Trinity | Official docs |
|---|---|---|
| Gmail | OAuth scope and send/draft boundaries | Gmail API scopes |
| Gmail drafts | Draft creation and send flow | Gmail drafts guide |
| Stripe | Signed webhook verification | Stripe webhooks |
| Jido | Typed firewall action seam | Jido Action |
Source paths
docs/security/threat-model.mddocs/security/ai-governance-posture.mdlib/autonomous_agency/tools/tool_router.exlib/autonomous_agency/approvals.exlib/autonomous_agency/audit.exlib/autonomous_agency/security/redaction.exlib/autonomous_agency/security/content_firewall.ex