Overall Analogy: Part-Time Work and Scheduled Cleaning#
Jobs and CronJobs are easy to understand when compared to part-time work and scheduled cleaning:
| Analogy | Kubernetes | Role |
|---|---|---|
| One-time part-time work | Job | A task that runs once and finishes |
| Scheduled cleaning routine | CronJob | A task that runs periodically on a schedule |
| Hire 3 workers | completions: 3 | Number of successful completions required |
| Deploy 2 workers at once | parallelism: 2 | Number of Pods to run simultaneously |
| Maximum retry attempts | backoffLimit | Retry limit on failure |
| Work deadline | activeDeadlineSeconds | Overall time limit for the task |
In this way, a Job is “a task that finishes once the assigned work is done,” and a CronJob is “a scheduled task that runs repeatedly at set times.”
Target Audience: Developers who want to run batch or scheduled jobs on Kubernetes Prerequisites: Pod, Deployment concepts After reading this: You will understand how Jobs and CronJobs work and how to configure them
TL;DR
- A Job terminates after successfully completing a specified number of times
- A CronJob creates Jobs periodically based on a cron expression
- Use backoffLimit and activeDeadlineSeconds to control failure handling
What Is a Job?#
A Job is a resource that creates one or more Pods and terminates once the specified task is complete. Unlike Deployments, the goal is for Pods to terminate successfully.
| Property | Deployment | Job |
|---|---|---|
| Purpose | Keep Pods running | Terminate after task completion |
| When Pod terminates | Recreate new Pod | Mark as complete if successful |
| Use case | Web servers, API servers | Data migration, batch processing |
Job Execution Flow#
flowchart TB
JOB["Job Created"]
JOB --> P1["Create Pod"]
P1 --> Q1{"Success?"}
Q1 -->|Yes| CHK{"completions<br>reached?"}
CHK -->|Yes| DONE["Job Complete"]
CHK -->|No| P2["Create Next Pod"]
P2 --> Q2{"Success?"}
Q2 -->|Yes| CHK
Q1 -->|No| RETRY{"backoffLimit<br>exceeded?"}
Q2 -->|No| RETRY
RETRY -->|No| P3["Create Retry Pod"]
P3 --> Q1
RETRY -->|Yes| FAIL["Job Failed"]Job YAML#
Basic Job#
apiVersion: batch/v1
kind: Job
metadata:
name: data-migration
spec:
completions: 1 # Number of successful completions
parallelism: 1 # Number of concurrent Pods
backoffLimit: 3 # Maximum retry count
activeDeadlineSeconds: 600 # Overall time limit (seconds)
template:
spec:
containers:
- name: migration
image: my-app:1.0
command: ["python", "migrate.py"]
env:
- name: DB_HOST
valueFrom:
secretKeyRef:
name: db-secret
key: host
restartPolicy: Never # Must be Never or OnFailure for JobsKey fields explained.
| Field | Default | Description |
|---|---|---|
completions | 1 | Number of Pods that must complete successfully |
parallelism | 1 | Number of Pods to run concurrently |
backoffLimit | 6 | Maximum retry count on failure |
activeDeadlineSeconds | None | Overall execution time limit for the Job (seconds) |
restartPolicy | - | Never or OnFailure |
restartPolicy Differences
Never: Leaves the failed Pod and creates a new one (useful for debugging)OnFailure: Restarts the container within the same Pod (saves resources)
Parallel Processing Job#
Run multiple Pods simultaneously for faster task completion.
apiVersion: batch/v1
kind: Job
metadata:
name: batch-processing
spec:
completions: 10 # 10 total successes needed
parallelism: 3 # Run 3 Pods concurrently
backoffLimit: 5
template:
spec:
containers:
- name: worker
image: batch-worker:1.0
command: ["./process.sh"]
restartPolicy: Never| completions | parallelism | Behavior |
|---|---|---|
| 1 | 1 | Single Pod execution (default) |
| N | 1 | N Pods run sequentially |
| N | M | Up to M Pods run concurrently, N total successes |
| None | M | Work Queue mode (Pods decide termination themselves) |
What Is a CronJob?#
A CronJob is a resource that creates Jobs periodically according to a cron schedule.
flowchart LR
CJ["CronJob<br>Daily 02:00"]
CJ -->|Day 1| J1["Job 1"]
CJ -->|Day 2| J2["Job 2"]
CJ -->|Day 3| J3["Job 3"]
J1 --> P1["Pod"]
J2 --> P2["Pod"]
J3 --> P3["Pod"]
P1 -->|Complete| D1["Done"]
P2 -->|Complete| D2["Done"]
P3 -->|Complete| D3["Done"]CronJob YAML#
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-backup
spec:
schedule: "0 2 * * *" # Daily at 2 AM
concurrencyPolicy: Forbid # Skip if previous Job is still running
successfulJobsHistoryLimit: 3 # Number of successful Jobs to retain
failedJobsHistoryLimit: 3 # Number of failed Jobs to retain
startingDeadlineSeconds: 200 # Grace period for schedule start
jobTemplate:
spec:
backoffLimit: 2
activeDeadlineSeconds: 3600
template:
spec:
containers:
- name: backup
image: backup-tool:1.0
command: ["./backup.sh"]
restartPolicy: OnFailureCron Expression#
+------------------- minute (0 - 59)
| +----------------- hour (0 - 23)
| | +--------------- day of month (1 - 31)
| | | +------------- month (1 - 12)
| | | | +----------- day of week (0 - 6, Sunday=0)
| | | | |
* * * * *| Expression | Meaning |
|---|---|
0 2 * * * | Daily at 2 AM |
*/15 * * * * | Every 15 minutes |
0 9 * * 1-5 | Weekdays at 9 AM |
0 0 1 * * | First day of every month at midnight |
0 */6 * * * | Every 6 hours |
concurrencyPolicy#
| Value | Description |
|---|---|
Allow (default) | Create new Job even if previous is still running |
Forbid | Skip new Job if previous is still running |
Replace | Cancel previous Job and create new one |
Warning
With theAllowpolicy, if Job execution time exceeds the schedule interval, Jobs can accumulate. UseForbidorReplacein production.
Retry Policy#
Settings that control retry behavior when a Job fails.
| Setting | Description |
|---|---|
backoffLimit | Maximum retry count on failure (default 6) |
activeDeadlineSeconds | Overall execution time limit for the Job |
Retry intervals increase with exponential backoff: 10s, 20s, 40s, … up to 6 minutes.
spec:
backoffLimit: 3 # Job fails after 3 failures
activeDeadlineSeconds: 600 # Job forcefully terminated after 10 minutesRelationship Between backoffLimit and activeDeadlineSeconds
The Job is marked as failed if either condition is met.activeDeadlineSecondslimits the total time regardless of retry count.
Hands-on: Deploying Jobs and CronJobs#
Run and Verify a Job#
# Create Job
kubectl apply -f job.yaml
# Check Job status
kubectl get jobs
# Expected output:
# NAME COMPLETIONS DURATION AGE
# data-migration 1/1 15s 30s
# Check Pod status (Completed state)
kubectl get pods -l job-name=data-migration
# Check Job logs
kubectl logs job/data-migrationCreate and Verify a CronJob#
# Create CronJob
kubectl apply -f cronjob.yaml
# Check CronJob status
kubectl get cronjobs
# Expected output:
# NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE
# daily-backup 0 2 * * * False 0 <none>
# Manually trigger immediately
kubectl create job --from=cronjob/daily-backup manual-backup
# Check created Jobs
kubectl get jobsDebugging Failed Jobs#
# Check failed Pods
kubectl get pods -l job-name=data-migration --field-selector=status.phase=Failed
# Check failed Pod logs
kubectl logs <pod-name>
# Check Job events
kubectl describe job data-migrationFrequently Used kubectl Commands#
| Command | Description |
|---|---|
kubectl get jobs | List Jobs |
kubectl get cronjobs | List CronJobs |
kubectl describe job <name> | Job details |
kubectl logs job/<name> | Job logs |
kubectl delete job <name> | Delete Job |
kubectl create job --from=cronjob/<name> <job-name> | Manually trigger CronJob |
Next Steps#
Now that you understand Jobs and CronJobs, proceed to the following:
| Goal | Recommended Document |
|---|---|
| Network policies | NetworkPolicy |
| Resource isolation | Namespace |
| Access control | RBAC |