A CronJob can silently never run: someone left suspend: true, the schedule was missed past startingDeadlineSeconds, concurrencyPolicy: Forbid skipped it behind a stuck run, or the controller had no node to place the pod on. Nothing inside the container can report a run that never started — a dead man's switch can.
Every mainstream CronJob monitoring approach (Prometheus's kube_job_status_failed, an in-cluster alert on pod restarts, an APM check) depends on a Job object existing in the first place. Several very common situations never create one: a CronJob left with suspend: true after a maintenance window and forgotten; the schedule getting missed past startingDeadlineSeconds because the controller manager itself was momentarily unavailable; concurrencyPolicy: Forbid skipping the new run entirely because the previous one is still stuck; or the cluster autoscaler having no capacity to schedule the pod at the exact tick. In every one of these cases there is no Job, no Pod, no container logs, no metric spike — just silence, exactly like a server that never woke up to run its crontab. The only way to detect "nothing happened when something should have" is to expect a signal from outside the cluster and alert on its absence, which is what a ping-based check does.
The simplest pattern: ping on success, ping /fail on any error, straight from the CronJob spec.
Give the check a Cron schedule matching spec.schedule (same expression, and set the check's timezone to the CronJob's spec.timeZone — both default to UTC otherwise) and a grace period covering pod scheduling + image pull + normal runtime.
Prefer one line? Report the shell exit status — 0 counts as success, anything else alerts immediately:
Put the ping URL in a Secret and reference it, so manifests stay shareable:
The run that never happened. Prometheus alerts on kube_job_status_failed only fire once a Job object exists and fails. A suspended CronJob, a missed startingDeadlineSeconds, a Forbid concurrency skip, or a cluster that had no free node at 4am produce no Job at all — and no in-cluster metric. CronCanary alerts on the missing ping either way, watching from outside the cluster.
Every check has a public SVG badge that shows its live status (updates within ~1 minute). Paste this into any README — it doubles as a heartbeat anyone on the team can see:
Copy the exact markdown from your check's detail page. Add ?label=your-text to customize the left label.
Ready to wire this up? Create a free check — 20 checks, all alert channels, no card.