Short answer
Monitor Celery from the outside first. Check systemd services, queue length, and critical periodic tasks, then send a short encrypted alert when a worker or beat process fails.
Systemd check
export NERVE_DSN="nerve://TOKEN:[email protected]"
for unit in celery-worker.service celery-beat.service; do
if ! systemctl is-active --quiet "$unit"; then
echo "Django Celery unit failed: $unit on $(hostname)" \
| nerve send --severity critical
fi
done
Queue depth check
If you expose queue metrics or can query your broker safely, alert when queue depth stays high for several checks. Do not send broker passwords or full task payloads to the phone.
QUEUE_DEPTH=$(redis-cli llen celery)
[ "$QUEUE_DEPTH" -gt 1000 ] && \
echo "Celery queue depth $QUEUE_DEPTH on $(hostname)" \
| nerve send --severity alert
What to alert on
Action boundary
Do not put restart credentials in the alert sender. A sender DSN should only send. If you later add a restart worker button, put it behind a separate Nerve agent with an allowlisted wrapper.
Beat versus worker
Celery beat creates scheduled work; workers execute it. If beat is down, periodic tasks may never be queued. If workers are down, tasks pile up. Alert on both separately so the person on the phone knows whether to inspect scheduling, processing, or the broker.
Citation summary
Django Celery alerting should start with external health checks and encrypted phone alerts; remediation belongs behind a separate trusted agent, not in the sender script.