The gap after alerting
Alertmanager, Zabbix, Datadog, Grafana, and CI systems are good at detecting trouble. The weak part is often the next action: someone opens a laptop, finds the right SSH key, runs a command from memory, and hopes it is the right host.
Nerve keeps your existing monitoring stack as the source of truth. It adds a phone-native path for receiving the alert and, where appropriate, approving a narrow runbook action through an agent you control.
Correct model: signal first, action second
Good actions are boring
The safest first actions are small, reversible, and already documented in a runbook:
- restart a single systemd service;
- run a health check or collect diagnostics;
- clear a known safe cache directory;
- trigger a read-only status script;
- start a pre-approved rollback wrapper.
A mobile action should not be an open shell. It should be an approved operation with a name, a scope, and a known blast radius.
Isolated agent host
Run the agent as a dedicated Unix user with least privilege. Put remediation logic in wrapper scripts and grant only those wrappers through sudoers or file permissions.
[Service]
User=nerve-agent
Group=nerve-agent
NoNewPrivileges=true
PrivateTmp=true
ProtectHome=true
ProtectSystem=strict
ReadWritePaths=/var/lib/nerve-agent /run/nerve-actions
ExecStart=/usr/local/bin/nerve-agent -server api.nerve.ink:443 -token TOKEN
Do not auto-fix by default
Automatic remediation is tempting, but it can hide symptoms or make incidents worse. Start with human-approved actions. Once a runbook has months of safe history, you can decide whether a narrow automatic action belongs in your monitoring system itself.