Alertmanager

Prometheus tells you what broke. Nerve helps with what next.

Use Alertmanager routing for detection and grouping, then send encrypted phone signals with optional approved runbook actions.

Architecture

Alertmanager should remain the source of alert truth. Nerve should not replace labels, inhibition, silences, or grouping. The integration point is the receiver: Alertmanager sends a short alert summary into Nerve, and the operator can decide whether to run a bounded action.

Route only page-worthy alerts

route:
  receiver: default
  routes:
    - matchers:
        - severity="critical"
      receiver: nerve-phone

receivers:
  - name: nerve-phone
    webhook_configs:
      - url: http://127.0.0.1:8099/alertmanager
        send_resolved: true

Action mapping

Do not map every alert to a fix. Map only stable alerts to reviewed actions.

InstanceDownRead-only status, journal tail, cloud console link. Usually not auto-restart.
ApiErrorRateHighRun diagnostics, check current deploy SHA, optionally restart one service.
DiskWillFillSoonCollect top directories; maybe clear a known cache path if documented.

Bridge payload

[{{ .Status }}] {{ .CommonLabels.alertname }}
severity={{ .CommonLabels.severity }}
instance={{ .CommonLabels.instance }}
runbook={{ .CommonAnnotations.runbook_url }}

When not to offer an action

Some alerts should only page a human. Do not attach a remediation button to unknown symptoms, data-loss risks, security alerts, or alerts that fire across many services at once. A useful Nerve action should reduce toil without hiding the incident.

Audit fields

For every approved action, log the Alertmanager fingerprint, action name, target host, pipe, approver identity, start time, exit code, and short output. That audit trail makes later incident review possible.

Safe remediation checklist

Related guides