Debug Workflow
Systematic debugging workflow for Kubernetes pods and resources.
When to Use
- Pod is failing, crashing, or in error state
- Application not behaving as expected in cluster
- Need to troubleshoot connectivity or performance issues
- Investigating resource problems
Debugging Steps
1. Check Pod Status
# Get all pods in current namespace
kubectl get pods
# Get pods in specific namespace
kubectl get pods -n <namespace>
# Get pods with more details
kubectl get pods -o wide
# Watch pod status changes
kubectl get pods -w
Common Pod States:
Pending: Waiting to be scheduledContainerCreating: Container image being pulled/createdRunning: Pod is runningCrashLoopBackOff: Container keeps crashingError: Container exited with errorImagePullBackOff: Cannot pull container imageEvicted: Pod evicted due to resource pressure
2. Describe Pod for Details
# Get detailed pod information
kubectl describe pod <pod-name>
# Focus on specific sections
kubectl describe pod <pod-name> | grep -A 10 Events
kubectl describe pod <pod-name> | grep -A 5 State
Look for:
- Events (errors, warnings, scheduling issues)
- Container states and restart counts
- Resource requests/limits vs actual usage
- Node assignment and conditions
- Volume mount issues
3. Check Logs
# View current logs
kubectl logs <pod-name>
# View logs from specific container in multi-container pod
kubectl logs <pod-name> -c <container-name>
# Follow logs in real-time
kubectl logs <pod-name> -f
# View previous container logs (after crash)
kubectl logs <pod-name> --previous
# Get last N lines
kubectl logs <pod-name> --tail=100
# Show timestamps
kubectl logs <pod-name> --timestamps
4. Check Events
# Get recent events sorted by time
kubectl get events --sort-by=.metadata.creationTimestamp
# Watch events in real-time
kubectl get events -w
# Get events for specific namespace
kubectl get events -n <namespace>
# Filter events by type
kubectl get events --field-selector type=Warning
5. Exec into Pod
# Get shell in pod
kubectl exec -it <pod-name> -- /bin/sh
kubectl exec -it <pod-name> -- /bin/bash
# Run specific command
kubectl exec <pod-name> -- ls /app
kubectl exec <pod-name> -- env
kubectl exec <pod-name> -- cat /etc/resolv.conf
# Exec into specific container
kubectl exec -it <pod-name> -c <container-name> -- /bin/sh
Inside pod, check:
- Environment variables:
env - DNS resolution:
nslookup service-name - Network connectivity:
wget -O- http://service:port - File permissions:
ls -la /app - Disk usage:
df -h - Process list:
ps aux
6. Port Forward for Testing
# Forward local port to pod port
kubectl port-forward pod/<pod-name> 8080:8080
# Forward to service
kubectl port-forward service/<service-name> 8080:80
# Forward in background
kubectl port-forward pod/<pod-name> 8080:8080 &
# Test with curl
curl http://localhost:8080
7. Check Resource Usage
# Get pod resource usage (requires metrics-server)
kubectl top pod <pod-name>
# Get all pods resource usage
kubectl top pods
# Get node resource usage
kubectl top nodes
8. Check Related Resources
# Check deployment
kubectl get deployment <deployment-name>
kubectl describe deployment <deployment-name>
# Check replica set
kubectl get rs
kubectl describe rs <replicaset-name>
# Check service
kubectl get svc
kubectl describe svc <service-name>
# Check endpoints
kubectl get endpoints <service-name>
# Check configmaps and secrets
kubectl get configmap
kubectl get secret
kubectl describe configmap <name>
9. Network Debugging
# Check service endpoints
kubectl get endpoints
# Test DNS resolution from within pod
kubectl exec <pod-name> -- nslookup kubernetes.default
kubectl exec <pod-name> -- nslookup <service-name>
# Check network policies
kubectl get networkpolicies
kubectl describe networkpolicy <policy-name>
# Test connectivity between pods
kubectl exec <pod-name> -- wget -O- http://<service-name>:<port>
10. Check RBAC Permissions
# Check if service account can perform action
kubectl auth can-i get pods --as=system:serviceaccount:<namespace>:<sa-name>
# Get service account details
kubectl get serviceaccount <sa-name> -o yaml
# Check role bindings
kubectl get rolebindings
kubectl describe rolebinding <binding-name>
Common Issues and Solutions
ImagePullBackOff
Causes:
- Image doesn’t exist
- Image tag is wrong
- Private registry authentication failed
- Network issues pulling image
Debug:
kubectl describe pod <pod-name> | grep -A 5 "Failed"
kubectl get events | grep "Failed to pull"
Solutions:
- Verify image name and tag
- Check imagePullSecrets
- Test image pull manually:
docker pull <image>
CrashLoopBackOff
Causes:
- Application exits immediately
- Missing dependencies or configuration
- Resource limits too low
- Failed health checks
Debug:
kubectl logs <pod-name> --previous
kubectl describe pod <pod-name>
Solutions:
- Check application logs for errors
- Verify environment variables and config
- Increase resource limits
- Check liveness probe configuration
Pending Pods
Causes:
- Insufficient node resources
- Node selector/affinity not matching
- PersistentVolumeClaim not bound
- Pod security policy blocking
Debug:
kubectl describe pod <pod-name> | grep -A 10 Events
kubectl get nodes
kubectl top nodes
Solutions:
- Scale cluster or reduce resource requests
- Check node labels and selectors
- Verify PVC status
- Review pod security policies
OOMKilled (Out of Memory)
Causes:
- Memory limit too low
- Memory leak in application
- Unexpected memory usage spike
Debug:
kubectl describe pod <pod-name> | grep -i "OOMKilled"
kubectl top pod <pod-name>
Solutions:
- Increase memory limits
- Investigate application memory usage
- Add memory profiling to application
Debugging Checklist
- Check pod status and events
- Review pod description for errors
- Examine current and previous logs
- Verify resource requests and limits
- Test network connectivity
- Validate environment variables and config
- Check RBAC permissions
- Verify image and pull secrets
- Review health probe configuration
- Check node resources and scheduling
Advanced Debugging
Enable Debug Container (Kubernetes 1.23+)
# Add ephemeral debug container to running pod
kubectl debug <pod-name> -it --image=busybox --target=<container-name>
# Create debug copy of pod
kubectl debug <pod-name> -it --copy-to=<debug-pod-name> --container=debug
Node Debugging
# SSH to node (if accessible)
kubectl get nodes -o wide
# Create privileged pod on specific node
kubectl debug node/<node-name> -it --image=ubuntu
Check API Server Logs
# Get API server logs (if running as pod)
kubectl logs -n kube-system kube-apiserver-<node-name>
# Get controller manager logs
kubectl logs -n kube-system kube-controller-manager-<node-name>
# Get scheduler logs
kubectl logs -n kube-system kube-scheduler-<node-name>
Monitoring Recommendations
Install metrics-server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Use kubectl plugins:
kubectl-debug: Enhanced debugging capabilitieskubectl-trace: Trace syscalls and kernel eventskubectl-df: Show disk usage in podskubectl-view-allocations: View resource allocations
Example Debugging Session
# 1. Find problematic pod
kubectl get pods | grep -v Running
# 2. Get details
kubectl describe pod myapp-5d4b6c8f9-xz2k4
# 3. Check logs
kubectl logs myapp-5d4b6c8f9-xz2k4 --previous
# 4. Check events
kubectl get events --field-selector involvedObject.name=myapp-5d4b6c8f9-xz2k4
# 5. Test connectivity
kubectl exec myapp-5d4b6c8f9-xz2k4 -- wget -O- http://backend-service:8080
# 6. Check resources
kubectl top pod myapp-5d4b6c8f9-xz2k4
# 7. If needed, exec into pod
kubectl exec -it myapp-5d4b6c8f9-xz2k4 -- /bin/sh