Debug Workflow

Systematic debugging workflow for Kubernetes pods and resources.

When to Use

Pod is failing, crashing, or in error state
Application not behaving as expected in cluster
Need to troubleshoot connectivity or performance issues
Investigating resource problems

Debugging Steps

1. Check Pod Status

# Get all pods in current namespace
kubectl get pods

# Get pods in specific namespace
kubectl get pods -n <namespace>

# Get pods with more details
kubectl get pods -o wide

# Watch pod status changes
kubectl get pods -w

Common Pod States:

Pending: Waiting to be scheduled
ContainerCreating: Container image being pulled/created
Running: Pod is running
CrashLoopBackOff: Container keeps crashing
Error: Container exited with error
ImagePullBackOff: Cannot pull container image
Evicted: Pod evicted due to resource pressure

2. Describe Pod for Details

# Get detailed pod information
kubectl describe pod <pod-name>

# Focus on specific sections
kubectl describe pod <pod-name> | grep -A 10 Events
kubectl describe pod <pod-name> | grep -A 5 State

Look for:

Events (errors, warnings, scheduling issues)
Container states and restart counts
Resource requests/limits vs actual usage
Node assignment and conditions
Volume mount issues

3. Check Logs

# View current logs
kubectl logs <pod-name>

# View logs from specific container in multi-container pod
kubectl logs <pod-name> -c <container-name>

# Follow logs in real-time
kubectl logs <pod-name> -f

# View previous container logs (after crash)
kubectl logs <pod-name> --previous

# Get last N lines
kubectl logs <pod-name> --tail=100

# Show timestamps
kubectl logs <pod-name> --timestamps

4. Check Events

# Get recent events sorted by time
kubectl get events --sort-by=.metadata.creationTimestamp

# Watch events in real-time
kubectl get events -w

# Get events for specific namespace
kubectl get events -n <namespace>

# Filter events by type
kubectl get events --field-selector type=Warning

5. Exec into Pod

# Get shell in pod
kubectl exec -it <pod-name> -- /bin/sh
kubectl exec -it <pod-name> -- /bin/bash

# Run specific command
kubectl exec <pod-name> -- ls /app
kubectl exec <pod-name> -- env
kubectl exec <pod-name> -- cat /etc/resolv.conf

# Exec into specific container
kubectl exec -it <pod-name> -c <container-name> -- /bin/sh

Inside pod, check:

Environment variables: env
DNS resolution: nslookup service-name
Network connectivity: wget -O- http://service:port
File permissions: ls -la /app
Disk usage: df -h
Process list: ps aux

6. Port Forward for Testing

# Forward local port to pod port
kubectl port-forward pod/<pod-name> 8080:8080

# Forward to service
kubectl port-forward service/<service-name> 8080:80

# Forward in background
kubectl port-forward pod/<pod-name> 8080:8080 &

# Test with curl
curl http://localhost:8080

7. Check Resource Usage

# Get pod resource usage (requires metrics-server)
kubectl top pod <pod-name>

# Get all pods resource usage
kubectl top pods

# Get node resource usage
kubectl top nodes

# Check deployment
kubectl get deployment <deployment-name>
kubectl describe deployment <deployment-name>

# Check replica set
kubectl get rs
kubectl describe rs <replicaset-name>

# Check service
kubectl get svc
kubectl describe svc <service-name>

# Check endpoints
kubectl get endpoints <service-name>

# Check configmaps and secrets
kubectl get configmap
kubectl get secret
kubectl describe configmap <name>

9. Network Debugging

# Check service endpoints
kubectl get endpoints

# Test DNS resolution from within pod
kubectl exec <pod-name> -- nslookup kubernetes.default
kubectl exec <pod-name> -- nslookup <service-name>

# Check network policies
kubectl get networkpolicies
kubectl describe networkpolicy <policy-name>

# Test connectivity between pods
kubectl exec <pod-name> -- wget -O- http://<service-name>:<port>

10. Check RBAC Permissions

# Check if service account can perform action
kubectl auth can-i get pods --as=system:serviceaccount:<namespace>:<sa-name>

# Get service account details
kubectl get serviceaccount <sa-name> -o yaml

# Check role bindings
kubectl get rolebindings
kubectl describe rolebinding <binding-name>

Common Issues and Solutions

ImagePullBackOff

Causes:

Image doesn’t exist
Image tag is wrong
Private registry authentication failed
Network issues pulling image

Debug:

kubectl describe pod <pod-name> | grep -A 5 "Failed"
kubectl get events | grep "Failed to pull"

Solutions:

Verify image name and tag
Check imagePullSecrets
Test image pull manually: docker pull <image>

CrashLoopBackOff

Causes:

Application exits immediately
Missing dependencies or configuration
Resource limits too low
Failed health checks

Debug:

kubectl logs <pod-name> --previous
kubectl describe pod <pod-name>

Solutions:

Check application logs for errors
Verify environment variables and config
Increase resource limits
Check liveness probe configuration

Pending Pods

Causes:

Insufficient node resources
Node selector/affinity not matching
PersistentVolumeClaim not bound
Pod security policy blocking

Debug:

kubectl describe pod <pod-name> | grep -A 10 Events
kubectl get nodes
kubectl top nodes

Solutions:

Scale cluster or reduce resource requests
Check node labels and selectors
Verify PVC status
Review pod security policies

OOMKilled (Out of Memory)

Causes:

Memory limit too low
Memory leak in application
Unexpected memory usage spike

Debug:

kubectl describe pod <pod-name> | grep -i "OOMKilled"
kubectl top pod <pod-name>

Solutions:

Increase memory limits
Investigate application memory usage
Add memory profiling to application

Debugging Checklist

Check pod status and events
Review pod description for errors
Examine current and previous logs
Verify resource requests and limits
Test network connectivity
Validate environment variables and config
Check RBAC permissions
Verify image and pull secrets
Review health probe configuration
Check node resources and scheduling

Advanced Debugging

Enable Debug Container (Kubernetes 1.23+)

# Add ephemeral debug container to running pod
kubectl debug <pod-name> -it --image=busybox --target=<container-name>

# Create debug copy of pod
kubectl debug <pod-name> -it --copy-to=<debug-pod-name> --container=debug

Node Debugging

# SSH to node (if accessible)
kubectl get nodes -o wide

# Create privileged pod on specific node
kubectl debug node/<node-name> -it --image=ubuntu

Check API Server Logs

# Get API server logs (if running as pod)
kubectl logs -n kube-system kube-apiserver-<node-name>

# Get controller manager logs
kubectl logs -n kube-system kube-controller-manager-<node-name>

# Get scheduler logs
kubectl logs -n kube-system kube-scheduler-<node-name>

Monitoring Recommendations

Install metrics-server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Use kubectl plugins:

kubectl-debug: Enhanced debugging capabilities
kubectl-trace: Trace syscalls and kernel events
kubectl-df: Show disk usage in pods
kubectl-view-allocations: View resource allocations

Example Debugging Session

# 1. Find problematic pod
kubectl get pods | grep -v Running

# 2. Get details
kubectl describe pod myapp-5d4b6c8f9-xz2k4

# 3. Check logs
kubectl logs myapp-5d4b6c8f9-xz2k4 --previous

# 4. Check events
kubectl get events --field-selector involvedObject.name=myapp-5d4b6c8f9-xz2k4

# 5. Test connectivity
kubectl exec myapp-5d4b6c8f9-xz2k4 -- wget -O- http://backend-service:8080

# 6. Check resources
kubectl top pod myapp-5d4b6c8f9-xz2k4

# 7. If needed, exec into pod
kubectl exec -it myapp-5d4b6c8f9-xz2k4 -- /bin/sh

Debug Workflow

When to Use

Debugging Steps

1. Check Pod Status

2. Describe Pod for Details

3. Check Logs

4. Check Events

5. Exec into Pod

6. Port Forward for Testing

7. Check Resource Usage

8. Check Related Resources

9. Network Debugging

10. Check RBAC Permissions

Common Issues and Solutions

ImagePullBackOff

CrashLoopBackOff

Pending Pods

OOMKilled (Out of Memory)

Debugging Checklist

Advanced Debugging

Enable Debug Container (Kubernetes 1.23+)

Node Debugging

Check API Server Logs

Monitoring Recommendations

Example Debugging Session