Celery beat is a single, separate scheduler process. If it crashes or its schedule file corrupts, every periodic task silently stops firing — the workers stay healthy and report nothing wrong.
Celery's worker pool and its scheduler are two different processes with two different failure modes. Workers can be healthy — accepting tasks, reporting success, passing every liveness probe — while celery beat itself is down, was never restarted after a deploy, or is stuck because its schedule file (the default celerybeat-schedule shelve database) got corrupted or lost its lock after an ungraceful kill. Because beat's only job is to enqueue periodic tasks on schedule, its death produces no task failure to alert on — there's simply no task at all. Running more than one beat instance (common after a bad Kubernetes rollout that doesn't set replicas: 1, or a supervisor config restarting a duplicate) creates the opposite, quieter problem: tasks fire twice, corrupting anything that assumes single-delivery, again with no worker-side error to flag it. Per-task success/failure signals only tell you what happened to a task that actually got enqueued — they can't tell you beat stopped enqueueing anything.
For each real periodic task, ping at the end of the task body (or wrap it, matching the Python guide's decorator):
If you'd rather not touch every task body, hook task_success and task_failure globally and filter by task name — this also works for tasks you don't own the source of:
Match each check's Cron schedule to the corresponding crontab(...) or timedelta(...) entry in your beat_schedule, in beat's configured timezone.
Everything above only proves an individual task ran; none of it proves beat is alive, because a dead beat process enqueues nothing, including the tasks pinging above. Add a trivial periodic task, scheduled every couple of minutes, whose only job is to ping:
Give that check a Simple schedule of 120 seconds and a short grace (2–3 minutes). If beat dies, gets stuck, or a deploy forgets to restart it, this is the only signal that will ever fire — nothing else in the stack detects "the scheduler stopped scheduling."
Every check has a public SVG badge that shows its live status (updates within ~1 minute). Paste this into any README — it doubles as a heartbeat anyone on the team can see:
Copy the exact markdown from your check's detail page. Add ?label=your-text to customize the left label.
Ready to wire this up? Create a free check — 20 checks, all alert channels, no card.