Overall Analogy: Part-Time Work and Scheduled Cleaning#

Jobs and CronJobs are easy to understand when compared to part-time work and scheduled cleaning:

AnalogyKubernetesRole
One-time part-time workJobA task that runs once and finishes
Scheduled cleaning routineCronJobA task that runs periodically on a schedule
Hire 3 workerscompletions: 3Number of successful completions required
Deploy 2 workers at onceparallelism: 2Number of Pods to run simultaneously
Maximum retry attemptsbackoffLimitRetry limit on failure
Work deadlineactiveDeadlineSecondsOverall time limit for the task

In this way, a Job is “a task that finishes once the assigned work is done,” and a CronJob is “a scheduled task that runs repeatedly at set times.”


Target Audience: Developers who want to run batch or scheduled jobs on Kubernetes Prerequisites: Pod, Deployment concepts After reading this: You will understand how Jobs and CronJobs work and how to configure them

TL;DR
  • A Job terminates after successfully completing a specified number of times
  • A CronJob creates Jobs periodically based on a cron expression
  • Use backoffLimit and activeDeadlineSeconds to control failure handling

What Is a Job?#

A Job is a resource that creates one or more Pods and terminates once the specified task is complete. Unlike Deployments, the goal is for Pods to terminate successfully.

PropertyDeploymentJob
PurposeKeep Pods runningTerminate after task completion
When Pod terminatesRecreate new PodMark as complete if successful
Use caseWeb servers, API serversData migration, batch processing

Job Execution Flow#

flowchart TB
    JOB["Job Created"]
    JOB --> P1["Create Pod"]
    P1 --> Q1{"Success?"}
    Q1 -->|Yes| CHK{"completions<br>reached?"}
    CHK -->|Yes| DONE["Job Complete"]
    CHK -->|No| P2["Create Next Pod"]
    P2 --> Q2{"Success?"}
    Q2 -->|Yes| CHK
    Q1 -->|No| RETRY{"backoffLimit<br>exceeded?"}
    Q2 -->|No| RETRY
    RETRY -->|No| P3["Create Retry Pod"]
    P3 --> Q1
    RETRY -->|Yes| FAIL["Job Failed"]

Job YAML#

Basic Job#

apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration
spec:
  completions: 1        # Number of successful completions
  parallelism: 1        # Number of concurrent Pods
  backoffLimit: 3       # Maximum retry count
  activeDeadlineSeconds: 600  # Overall time limit (seconds)
  template:
    spec:
      containers:
      - name: migration
        image: my-app:1.0
        command: ["python", "migrate.py"]
        env:
        - name: DB_HOST
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: host
      restartPolicy: Never    # Must be Never or OnFailure for Jobs

Key fields explained.

FieldDefaultDescription
completions1Number of Pods that must complete successfully
parallelism1Number of Pods to run concurrently
backoffLimit6Maximum retry count on failure
activeDeadlineSecondsNoneOverall execution time limit for the Job (seconds)
restartPolicy-Never or OnFailure
restartPolicy Differences
  • Never: Leaves the failed Pod and creates a new one (useful for debugging)
  • OnFailure: Restarts the container within the same Pod (saves resources)

Parallel Processing Job#

Run multiple Pods simultaneously for faster task completion.

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-processing
spec:
  completions: 10       # 10 total successes needed
  parallelism: 3        # Run 3 Pods concurrently
  backoffLimit: 5
  template:
    spec:
      containers:
      - name: worker
        image: batch-worker:1.0
        command: ["./process.sh"]
      restartPolicy: Never
completionsparallelismBehavior
11Single Pod execution (default)
N1N Pods run sequentially
NMUp to M Pods run concurrently, N total successes
NoneMWork Queue mode (Pods decide termination themselves)

What Is a CronJob?#

A CronJob is a resource that creates Jobs periodically according to a cron schedule.

flowchart LR
    CJ["CronJob<br>Daily 02:00"]
    CJ -->|Day 1| J1["Job 1"]
    CJ -->|Day 2| J2["Job 2"]
    CJ -->|Day 3| J3["Job 3"]
    J1 --> P1["Pod"]
    J2 --> P2["Pod"]
    J3 --> P3["Pod"]
    P1 -->|Complete| D1["Done"]
    P2 -->|Complete| D2["Done"]
    P3 -->|Complete| D3["Done"]

CronJob YAML#

apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-backup
spec:
  schedule: "0 2 * * *"           # Daily at 2 AM
  concurrencyPolicy: Forbid        # Skip if previous Job is still running
  successfulJobsHistoryLimit: 3    # Number of successful Jobs to retain
  failedJobsHistoryLimit: 3        # Number of failed Jobs to retain
  startingDeadlineSeconds: 200     # Grace period for schedule start
  jobTemplate:
    spec:
      backoffLimit: 2
      activeDeadlineSeconds: 3600
      template:
        spec:
          containers:
          - name: backup
            image: backup-tool:1.0
            command: ["./backup.sh"]
          restartPolicy: OnFailure

Cron Expression#

+------------------- minute (0 - 59)
| +----------------- hour (0 - 23)
| | +--------------- day of month (1 - 31)
| | | +------------- month (1 - 12)
| | | | +----------- day of week (0 - 6, Sunday=0)
| | | | |
* * * * *
ExpressionMeaning
0 2 * * *Daily at 2 AM
*/15 * * * *Every 15 minutes
0 9 * * 1-5Weekdays at 9 AM
0 0 1 * *First day of every month at midnight
0 */6 * * *Every 6 hours

concurrencyPolicy#

ValueDescription
Allow (default)Create new Job even if previous is still running
ForbidSkip new Job if previous is still running
ReplaceCancel previous Job and create new one
Warning
With the Allow policy, if Job execution time exceeds the schedule interval, Jobs can accumulate. Use Forbid or Replace in production.

Retry Policy#

Settings that control retry behavior when a Job fails.

SettingDescription
backoffLimitMaximum retry count on failure (default 6)
activeDeadlineSecondsOverall execution time limit for the Job

Retry intervals increase with exponential backoff: 10s, 20s, 40s, … up to 6 minutes.

spec:
  backoffLimit: 3              # Job fails after 3 failures
  activeDeadlineSeconds: 600   # Job forcefully terminated after 10 minutes
Relationship Between backoffLimit and activeDeadlineSeconds
The Job is marked as failed if either condition is met. activeDeadlineSeconds limits the total time regardless of retry count.

Hands-on: Deploying Jobs and CronJobs#

Run and Verify a Job#

# Create Job
kubectl apply -f job.yaml

# Check Job status
kubectl get jobs

# Expected output:
# NAME              COMPLETIONS   DURATION   AGE
# data-migration    1/1           15s        30s

# Check Pod status (Completed state)
kubectl get pods -l job-name=data-migration

# Check Job logs
kubectl logs job/data-migration

Create and Verify a CronJob#

# Create CronJob
kubectl apply -f cronjob.yaml

# Check CronJob status
kubectl get cronjobs

# Expected output:
# NAME           SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE
# daily-backup   0 2 * * *   False     0        <none>

# Manually trigger immediately
kubectl create job --from=cronjob/daily-backup manual-backup

# Check created Jobs
kubectl get jobs

Debugging Failed Jobs#

# Check failed Pods
kubectl get pods -l job-name=data-migration --field-selector=status.phase=Failed

# Check failed Pod logs
kubectl logs <pod-name>

# Check Job events
kubectl describe job data-migration

Frequently Used kubectl Commands#

CommandDescription
kubectl get jobsList Jobs
kubectl get cronjobsList CronJobs
kubectl describe job <name>Job details
kubectl logs job/<name>Job logs
kubectl delete job <name>Delete Job
kubectl create job --from=cronjob/<name> <job-name>Manually trigger CronJob

Next Steps#

Now that you understand Jobs and CronJobs, proceed to the following:

GoalRecommended Document
Network policiesNetworkPolicy
Resource isolationNamespace
Access controlRBAC