Introduction
This article is part of an ongoing series designed to help you prepare for the Certified Kubernetes Application Developer (CKAD) exam through small, focused labs.
This article continues our exploration of the “Application Observability and Maintenance” domain. We’re covering the requirement:
Implement probes and health checks
Probes and health checks are fundamental to building resilient applications in Kubernetes. They allow the cluster to understand when your application is ready to serve traffic, when it’s still alive, and when it needs to be restarted.
You can start from the beginning of the series here: CKAD Preparation — What is Kubernetes.
Prerequisites
A running Kubernetes cluster (like, Minikube, Docker Desktop, or use one of the KillerCoda Kubernetes Playgrounds) and basic familiarity with Pods and YAML manifests.
Getting the Resources
Clone the lab repository and navigate to this article’s folder:
git clone https://github.com/SupaaHiro/schwifty-lab.git
cd schwifty-lab/blog-posts/20260129-ckad
Understanding Kubernetes Probes
Kubernetes provides three types of probes:
-
Liveness Probe: Determines if a container is running properly. If the liveness probe fails, Kubernetes kills the container and restarts it according to the restart policy.
-
Readiness Probe: Determines if a container is ready to accept traffic. If the readiness probe fails, the Pod’s IP address is removed from the Service endpoints, preventing traffic from reaching it.
-
Startup Probe: Determines if the application inside a container has started. While the startup probe is running, liveness and readiness probes are disabled. This is useful for applications that require long startup times.
Probe Mechanisms
Kubernetes supports several mechanisms for implementing probes:
- HTTP GET: Performs an HTTP GET request against a specified path and port. Success is indicated by a response code between 200 and 399.
- TCP Socket: Attempts to open a TCP connection to a specified port. Success is indicated by the ability to establish the connection.
- Exec: Executes a command inside the container. Success is indicated by an exit code of 0.
- gRPC: Performs a gRPC health check (available since Kubernetes 1.24).
Probe Configuration Parameters
All probes support the following configuration parameters:
initialDelaySeconds: Number of seconds after the container starts before the probe is initiated (default: 0)periodSeconds: How often (in seconds) to perform the probe (default: 10)timeoutSeconds: Number of seconds after which the probe times out (default: 1)successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed (default: 1)failureThreshold: Number of consecutive failures before the probe is considered failed (default: 3)
Hands-On Challenge: Implementing Different Types of Probes
Let’s explore each probe type with practical examples that demonstrate common scenarios and best practices.
Step 1: Liveness Probe - Detecting and Recovering from Deadlocks
A liveness probe ensures that your application is running correctly. Let’s create a web application that can simulate a deadlock condition, then use a liveness probe to detect and recover from it.
Create the file manifests/01-liveness-http.yaml:
apiVersion: v1
kind: Pod
metadata:
name: liveness-http
labels:
app: liveness-demo
spec:
containers:
- name: web-app
image: docker.io/supaahiro/ckad-test-liveness:1.0.0
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
resources:
limits:
memory: 128Mi
cpu: 100m
requests:
memory: 64Mi
cpu: 50m
Note: You can find the source code for the supaahiro/ckad-test-liveness image in the src/ directory of this repository. It is a simple FastAPI application that simulates a failure after 30 seconds.
Apply the manifest:
k apply -f manifests/01-liveness-http.yaml
Watch the Pod status:
k get pod liveness-http --watch
The registry.k8s.io/liveness image is designed to fail its liveness probe after 30 seconds. Initially, the HTTP server returns a 200 status code for the first 30 seconds, then it starts returning 500 errors. After 3 consecutive failures (as specified by failureThreshold), Kubernetes will restart the container.
You can see the restart count increasing:
k describe pod liveness-http | grep "Restart Count:"
Look for the Restart Count field and the events showing the container being killed and restarted.
Let’s understand what’s happening:
- Container starts and the liveness probe waits 10 seconds (
initialDelaySeconds) - Every 5 seconds (
periodSeconds), Kubernetes checks the/healthzendpoint - After 30 seconds, the application starts failing the health check
- After 3 consecutive failures (
failureThreshold×periodSeconds= 15 seconds), the container is killed - Kubernetes restarts the container, and the cycle repeats
Step 2: Readiness Probe - Controlling Traffic Flow
A readiness probe determines when a container is ready to accept traffic.
Let’s create an example with both liveness and readiness probes:
Create the file manifests/02-readiness-probe.yaml:
apiVersion: v1
kind: Pod
metadata:
name: readiness-demo
labels:
app: readiness-demo
spec:
containers:
- name: nginx
image: nginx:1.27
ports:
- containerPort: 80
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 3
successThreshold: 1
failureThreshold: 3
resources:
limits:
memory: 128Mi
cpu: 100m
requests:
memory: 64Mi
cpu: 50m
volumes:
- name: html
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: readiness-svc
spec:
selector:
app: readiness-demo
ports:
- protocol: TCP
port: 80
targetPort: 80
Apply the manifest:
k apply -f manifests/02-readiness-probe.yaml
Check the Pod status:
k get pod readiness-demo
You’ll notice the Pod shows 0/1 in the READY column. This is because the readiness probe is failing (the /ready endpoint doesn’t exist yet).
Check the Service endpoints:
k describe svc readiness-svc | grep Endpoints:
The endpoints will be empty because the Pod is not ready. Even though the container is running, Kubernetes won’t send traffic to it.
Now let’s make the Pod ready by creating the /ready file:
k exec readiness-demo -- sh -c 'echo "ready" > /usr/share/nginx/html/ready'
Wait a few seconds and check the Pod again:
k get pod readiness-demo
Now it should show 1/1 READY. Check the endpoints again:
k describe svc readiness-svc | grep Endpoints:
You should now see the Pod’s IP address listed in the endpoints.
Let’s simulate the application becoming unready:
k exec readiness-demo -- rm /usr/share/nginx/html/ready
After a few seconds, check the Pod and endpoints:
k get pod readiness-demo
k describe svc readiness-svc | grep Endpoints:
The Pod will show 0/1 READY again, and the endpoints will be empty. However, notice that the container is NOT restarted - it’s still running. This is the key difference between readiness and liveness probes:
- Liveness probe failure → Container is killed and restarted
- Readiness probe failure → Traffic is stopped, but container keeps running
Step 3: Startup Probe - Handling Slow-Starting Applications
Some applications require significant time to start up - they might need to load large datasets, establish database connections, or perform complex initialization. A startup probe allows you to give these applications more time to start without interfering with liveness probe settings.
Create the file manifests/03-startup-probe.yaml:
apiVersion: v1
kind: Pod
metadata:
name: startup-demo
labels:
app: startup-demo
spec:
containers:
- name: slow-starter
image: nginx:1.27
ports:
- containerPort: 80
startupProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30 # 30 * 5 = 150 seconds max startup time
resources:
limits:
memory: 128Mi
cpu: 100m
requests:
memory: 64Mi
cpu: 50m
Apply the manifest:
k apply -f manifests/03-startup-probe.yaml
Watch the Pod:
k get pod startup-demo --watch
In this configuration:
- The startup probe runs first, checking every 5 seconds for up to 150 seconds (30 × 5)
- During startup probe execution, liveness and readiness probes are disabled
- Once the startup probe succeeds, it never runs again
- Liveness and readiness probes then take over
This pattern is essential for legacy applications or microservices with long initialization times. Without a startup probe, you’d need to set a very high initialDelaySeconds on the liveness probe, which would delay failure detection during normal operation.
Step 4: TCP Socket Probe - Checking Port Availability
Not all applications expose HTTP endpoints. For applications that communicate over raw TCP (like databases or message queues), TCP socket probes are appropriate.
Create the file manifests/04-tcp-probe.yaml:
apiVersion: v1
kind: Pod
metadata:
name: tcp-probe-demo
labels:
app: tcp-demo
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
livenessProbe:
tcpSocket:
port: 6379
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
tcpSocket:
port: 6379
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
resources:
limits:
memory: 256Mi
cpu: 200m
requests:
memory: 128Mi
cpu: 100m
Apply the manifest:
k apply -f manifests/04-tcp-probe.yaml
Verify the Pod is ready:
k get pod tcp-probe-demo
The TCP socket probe simply attempts to establish a connection to the specified port. If the connection succeeds, the probe succeeds.
You can verify Redis is working:
k exec tcp-probe-demo -- redis-cli ping
This should return PONG.
Step 5: Exec Probe - Custom Health Check Commands
For maximum flexibility, exec probes run custom commands inside the container. This is useful when you need complex health check logic that can’t be expressed with HTTP or TCP checks.
Create the file manifests/05-exec-probe.yaml:
apiVersion: v1
kind: Pod
metadata:
name: exec-probe-demo
labels:
app: exec-demo
spec:
containers:
- name: postgres
image: postgres:16-alpine
env:
- name: POSTGRES_PASSWORD
value: mysecretpassword
- name: POSTGRES_USER
value: testuser
- name: POSTGRES_DB
value: testdb
ports:
- containerPort: 5432
livenessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U testuser -d testdb
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U testuser -d testdb
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
resources:
limits:
memory: 512Mi
cpu: 500m
requests:
memory: 256Mi
cpu: 250m
Apply the manifest:
k apply -f manifests/05-exec-probe.yaml
Wait for the Pod to be ready:
k get pod exec-probe-demo --watch
The pg_isready command checks if PostgreSQL is ready to accept connections. The command exits with status 0 if the database is ready, making it perfect for an exec probe.
You can verify PostgreSQL is working:
k exec exec-probe-demo -- psql -U testuser -d testdb -c "SELECT version();"
Wrapping Up: What We’ve Covered
In this article, we explored Kubernetes probes and health checks as part of the “Application Observability and Maintenance” domain for CKAD preparation. We covered both theoretical concepts and practical implementations:
- Three types of probes: Liveness (detect deadlocks), Readiness (control traffic), and Startup (handle slow starts)
- Four probe mechanisms: HTTP GET, TCP Socket, Exec commands, and gRPC health checks
- Probe configuration parameters: initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, and failureThreshold
- Hands-on implementations: HTTP probes, TCP socket probes for Redis, exec probes for PostgreSQL, and timing configurations
Mastering probes is essential for building resilient Kubernetes applications. They enable self-healing, proper traffic management, and graceful handling of application lifecycle events. These skills are not only crucial for the CKAD exam but also for designing production-grade applications that can withstand failures and maintain high availability.
Final Cleanup
To clean up all resources created in this lab:
k delete -f manifests/