Monitor a Kubernetes CronJob

A CronJob can silently never run: someone left suspend: true, the schedule was missed past startingDeadlineSeconds, concurrencyPolicy: Forbid skipped it behind a stuck run, or the controller had no node to place the pod on. Nothing inside the container can report a run that never started — a dead man's switch can.

A check has a ping URL, e.g. https://croncanary-ping.sleeezydesigns.workers.dev/8f14e45f-ceea-467e-bd7a-2e71c4a91e35. Your job calls it on every successful run. If a ping is late past the grace period, CronCanary alerts you. Copy your check's exact URL from its detail page — examples below use $URL for that address.

Why Kubernetes CronJobs fail silently

Every mainstream CronJob monitoring approach (Prometheus's kube_job_status_failed, an in-cluster alert on pod restarts, an APM check) depends on a Job object existing in the first place. Several very common situations never create one: a CronJob left with suspend: true after a maintenance window and forgotten; the schedule getting missed past startingDeadlineSeconds because the controller manager itself was momentarily unavailable; concurrencyPolicy: Forbid skipping the new run entirely because the previous one is still stuck; or the cluster autoscaler having no capacity to schedule the pod at the exact tick. In every one of these cases there is no Job, no Pod, no container logs, no metric spike — just silence, exactly like a server that never woke up to run its crontab. The only way to detect "nothing happened when something should have" is to expect a signal from outside the cluster and alert on its absence, which is what a ping-based check does.

Wrap the container command

The simplest pattern: ping on success, ping /fail on any error, straight from the CronJob spec.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-backup
spec:
  schedule: "0 4 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: Never
          containers:
            - name: backup
              image: yourrepo/backup:latest
              command: ["/bin/sh", "-c"]
              args:
                - |
                  curl -fsS "$URL/start" || true
                  if /app/backup.sh; then
                    curl -fsS "$URL"
                  else
                    curl -fsS "$URL/fail"; exit 1
                  fi

Give the check a Cron schedule matching spec.schedule (same expression, and set the check's timezone to the CronJob's spec.timeZone — both default to UTC otherwise) and a grace period covering pod scheduling + image pull + normal runtime.

Exit-code reporting

Prefer one line? Report the shell exit status — 0 counts as success, anything else alerts immediately:

/app/backup.sh; rc=$?; curl -fsS "$URL/$rc"; exit $rc

Keep the URL out of the spec

Put the ping URL in a Secret and reference it, so manifests stay shareable:

kubectl create secret generic croncanary --from-literal=url="$URL"

# then in the container spec:
env:
  - name: URL
    valueFrom:
      secretKeyRef: { name: croncanary, key: url }

What this catches that in-cluster alerting can't

The run that never happened. Prometheus alerts on kube_job_status_failed only fire once a Job object exists and fails. A suspended CronJob, a missed startingDeadlineSeconds, a Forbid concurrency skip, or a cluster that had no free node at 4am produce no Job at all — and no in-cluster metric. CronCanary alerts on the missing ping either way, watching from outside the cluster.

Related guides

Add a live status badge to your README

Every check has a public SVG badge that shows its live status (updates within ~1 minute). Paste this into any README — it doubles as a heartbeat anyone on the team can see:

[![CronCanary](https://croncanary.fluxath.app/badge/<your-check-id>.svg)](https://croncanary.fluxath.app)

Copy the exact markdown from your check's detail page. Add ?label=your-text to customize the left label.

Ready to wire this up? Create a free check — 20 checks, all alert channels, no card.