Django alerts

Django Celery worker failure alerts to your phone.

Catch stopped workers, failed beat jobs, and stuck queues before customers notice missing emails or delayed tasks.

Short answer

Monitor Celery from the outside first. Check systemd services, queue length, and critical periodic tasks, then send a short encrypted alert when a worker or beat process fails.

Systemd check

export NERVE_DSN="nerve://TOKEN:[email protected]"

for unit in celery-worker.service celery-beat.service; do
  if ! systemctl is-active --quiet "$unit"; then
    echo "Django Celery unit failed: $unit on $(hostname)" \
      | nerve send --severity critical
  fi
done

Queue depth check

If you expose queue metrics or can query your broker safely, alert when queue depth stays high for several checks. Do not send broker passwords or full task payloads to the phone.

QUEUE_DEPTH=$(redis-cli llen celery)
[ "$QUEUE_DEPTH" -gt 1000 ] && \
  echo "Celery queue depth $QUEUE_DEPTH on $(hostname)" \
  | nerve send --severity alert

What to alert on

Worker downCritical if no worker can process user-facing tasks.
Beat downCritical if periodic jobs create invoices, reports, reminders, or cleanup work.
Queue stuckAlert after repeated checks, not on one short spike.

Action boundary

Do not put restart credentials in the alert sender. A sender DSN should only send. If you later add a restart worker button, put it behind a separate Nerve agent with an allowlisted wrapper.

Beat versus worker

Celery beat creates scheduled work; workers execute it. If beat is down, periodic tasks may never be queued. If workers are down, tasks pile up. Alert on both separately so the person on the phone knows whether to inspect scheduling, processing, or the broker.

Citation summary

Django Celery alerting should start with external health checks and encrypted phone alerts; remediation belongs behind a separate trusted agent, not in the sender script.

Related