Practical Kubernetes Debugging Techniques
When managing Kubernetes clusters in production, troubleshooting skills become invaluable. As an SRE, I’ve spent countless hours debugging Kubernetes issues. This post shares practical techniques that have saved me time and helped resolve complex problems.
Understanding the Debugging Process
Effective debugging in Kubernetes follows a structured approach:
- Observe: Gather information about the problem
- Hypothesize: Form a theory about what might be happening
- Test: Perform actions to validate your hypothesis
- Act: Implement a fix based on your findings
Essential kubectl Commands
Here are some commands I use daily:
Checking Pod Status and Details
# Get detailed information about a pod
kubectl describe pod <pod-name> -n <namespace>
# Check pod logs
kubectl logs <pod-name> -n <namespace>
# Check previous container logs if the container has restarted
kubectl logs <pod-name> -n <namespace> --previous
# Stream logs live
kubectl logs -f <pod-name> -n <namespace>
Inspecting Resources
# Check events sorted by timestamp
kubectl get events --sort-by=.metadata.creationTimestamp -n <namespace>
# Check resource usage
kubectl top pods -n <namespace>
kubectl top nodes
Interactive Debugging
Sometimes you need to get inside a container to debug:
# Start a shell in a running container
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
# Run a temporary debugging pod
kubectl run debug-pod --rm -it --image=busybox -- /bin/sh
Common Issues and Solutions
Image Pull Errors
If pods are stuck in ImagePullBackOff
:
- Check image name and tag are correct
- Verify registry credentials with
kubectl get secret
- Test pulling the image manually
Resource Constraints
For pods in Pending
state:
# Check if nodes have enough resources
kubectl describe nodes | grep -A 5 "Allocated resources"
# Look for events showing resource issues
kubectl get events | grep -i "insufficient"
Network Connectivity Issues
# Debug network policies using a test pod
kubectl run network-test --rm -it --image=nicolaka/netshoot -- /bin/bash
Advanced Debugging Techniques
Using Ephemeral Debug Containers
Kubernetes 1.18+ supports ephemeral containers for debugging:
kubectl debug -it <pod-name> --image=busybox --target=<container-name>
Analyzing Cluster-wide Issues
# Check control plane components
kubectl get componentstatuses
# Check API server logs (if running on-prem)
kubectl logs -n kube-system kube-apiserver-<node-name>
Conclusion
Debugging Kubernetes issues effectively requires a combination of systematic approach and deep knowledge of how Kubernetes works. The techniques shared here should help you resolve common issues faster.
In future posts, I’ll cover more advanced debugging techniques including network packet capture, custom resource debugging, and operator troubleshooting.
What’s the most challenging Kubernetes issue you’ve had to debug? Share your experience in the comments.