GovernorAI™ monitors every agent session. When an agent misses a heartbeat, goes silent, or behaves anomalously—it gets contained. Automatically.
Every active agent session emits periodic heartbeats to GovernorAI. If a heartbeat is missed beyond the configured TTL, the session escalates through severity levels automatically.
Sessions that remain open without any activity or heartbeats are classified as orphans. GovernorAI periodically scans for orphaned sessions and terminates them.
Orphan detection catches agents that crash, hang, or get stuck in infinite loops without explicitly failing.
GovernorAI uses a four-level severity model for agent anomalies, each with configurable thresholds and automated responses.
| INFO | Normal operation |
| WARNING | Heartbeat late, investigate |
| CRITICAL | TTL exceeded, alert |
| EMERGENCY | Auto kill switch |
From heartbeat to containment. Fully automated.
Agent Session
|
v
[Heartbeat Emitted] ---> GovernorAI
| |
v v
Normal (INFO) Record + Monitor
|
| (heartbeat missed)
v
WARNING (60s)
|
| (still missing)
v
CRITICAL (90s) -----> Alert + Webhook
|
| (still missing)
v
EMERGENCY (120s) ---> AUTO KILL SWITCH
|
v
Session terminated
Agent actions blocked
Event logged immutably Agent sessions emit heartbeats at a configured interval (default: 30s). GovernorAI records each heartbeat and monitors the TTL.
Heartbeat late beyond the warning threshold. GovernorAI logs the anomaly but takes no enforcement action yet.
TTL exceeded. GovernorAI triggers alerts via webhooks. Security team notified. Session flagged for review.
Automatic kill switch activation. Session terminated. All pending actions blocked. Immutable event record created.
Tune detection thresholds for your workload.
# Heartbeat configuration
heartbeat:
interval: 30s # How often agents must check in
ttl: 90s # Maximum time between heartbeats
escalation:
warning_after: 60s # First alert
critical_after: 90s # Webhook notifications
emergency_after: 120s # Auto kill switch
on_emergency: kill_switch
# Orphan detection
orphan_detection:
scan_interval: 60s # How often to scan for orphans
stale_threshold: 300s # Sessions with no activity
action: terminate # What to do with orphans Before enforcing, GovernorAI learns. A 7-day observation window builds the baseline your anomaly detection runs against.
GovernorAI runs in audit_only mode for 168 hours (7 days), recording every tool call, argument, session step count, and execution cost.
Learning activates only after 100 samples minimum. Agents that run infrequently won't trigger false positives from an under-trained baseline.
GovernorAI uses Claude to analyze captured tool arguments and session patterns, generating a plain-language summary of normal behavior and flagging structural anomalies before policies are written.
Once the learning period completes, GovernorAI proposes YAML policies based on observed behavior. Review, adjust, and promote to enforcement in one API call.
# governor.yaml — Learning configuration
learning:
enabled: true
default_period: 168h # 7-day observation window
min_samples: 100 # Minimum calls before activating
capture_args: true # Record tool arguments for analysis
llm:
provider: claude
model: claude-sonnet-4-20250514
# Learning progression
# Day 0–7: audit_only — observe, log, learn
# After 7d: GovernorAI proposes policies based on patterns
# On review: promote proposed policy to enforcement
# Example proposed policy output
# (generated from observed behavior)
id: finance-agent-auto-v1
agent_id: "finance-agent-v1"
governance_mode: shadow # validate before full enforcement
fail_closed: false # permissive during shadow phase
tools:
allowed:
- "erp.*" # observed in 847/1000 sessions
- "email.send" # observed in 312/1000 sessions
denied:
- "shell.*" # never observed, deny by default The GovernorAI Go SDK surfaces kill switch activation as a typed error — so your agent code handles containment correctly.
KilledErrorReturned when the kill switch has been activated for the agent or session. Contains the reason and the action ID that triggered containment.
DeniedErrorReturned when a tool call is blocked by policy. Contains the matched rule ID and the denial reason for audit logging.
ApprovalRequiredErrorReturned when a high-risk action requires human sign-off. Contains an ApprovalID and an approval URL for webhook routing.
SessionLimitErrorReturned when the session has exceeded its max_steps or max_cost_usd limit. Prevents unbounded agent loops.
c := governor.NewClient("http://localhost:8080",
governor.WithAPIKey("your-api-key"),
)
s := c.NewSession("finance-agent-v1", "production")
result, err := s.Execute("erp.process_payment", map[string]interface{}{
"amount": 9500, "currency": "USD",
})
switch {
case governor.IsKilled(err):
// Kill switch activated — agent is contained
// result.Reason: "Rogue detection: heartbeat timeout"
log.Fatalf("agent terminated: %s", result.Reason)
case governor.IsDenied(err):
// Tool blocked by policy — log rule ID for audit
var denied *governor.DeniedError
errors.As(err, &denied)
log.Printf("denied by rule %s: %s", denied.RuleID, denied.Reason)
case governor.IsApprovalRequired(err):
// High-value action — route to human approver
approvalID, _ := governor.GetApprovalID(err)
notifyApprover(approvalID)
case governor.IsSessionLimit(err):
// Step or cost cap exceeded — stop the agent
log.Fatal("session limit reached:", err)
}
// Step count is always tracked
log.Printf("session step %d completed", s.StepCount()) When rogue detection escalates to EMERGENCY, the kill switch activates automatically.
The kill switch propagates to all gateways in <100ms. The rogue agent's session is terminated, all pending actions are blocked, and an immutable record is created for post-incident analysis.
# Manual kill switch activation
curl -X POST http://localhost:8080/api/v1/killswitch \
-H "Content-Type: application/json" \
-d '{
"scope": "agent",
"target": "finance-agent-v1",
"reason": "Rogue detection: heartbeat timeout",
"initiated_by": "governor-auto",
"ttl": "1h"
}'
# Check kill switch status
curl http://localhost:8080/api/v1/killswitch/status Every rogue detection event is buffered, flushed, and retained. Prometheus metrics expose real-time enforcement state to your existing observability stack.
A 1,000-event in-memory buffer flushes to the append-only store every 100ms. Events are retained for 90 days by default — sufficient for SOC 2 and most compliance requirements.
GovernorAI exposes a Prometheus metrics endpoint out of the box. Scrape it from Grafana, Datadog, or any OpenMetrics-compatible collector — no configuration required.
Kill switch activations propagate across all gateways within 50ms. Each activation is recorded as an immutable event with scope, target, reason, initiator, and TTL.
# governor.yaml — Telemetry configuration
events:
buffer_size: 1000 # In-memory event buffer
flush_interval: 100ms # How often to flush to store
retention_days: 90 # Immutable event retention
killswitch:
propagation_interval: 50ms # Gateway propagation speed
default_expiry: 1h # Auto-recovery TTL
metrics:
enabled: true
port: 9090 # Prometheus scrape endpoint Get early access to GovernorAI's rogue agent detection and kill switch capabilities.