Practical Kubernetes Debugging Techniques

Troubleshooting approaches for Kubernetes workloads

Practical Kubernetes Debugging Techniques

When managing Kubernetes clusters in production, troubleshooting skills become invaluable. As an SRE, I’ve spent countless hours debugging Kubernetes issues. This post shares practical techniques that have saved me time and helped resolve complex problems.

Understanding the Debugging Process

Effective debugging in Kubernetes follows a structured approach:

  1. Observe: Gather information about the problem
  2. Hypothesize: Form a theory about what might be happening
  3. Test: Perform actions to validate your hypothesis
  4. Act: Implement a fix based on your findings

Essential kubectl Commands

Here are some commands I use daily:

Checking Pod Status and Details

# Get detailed information about a pod
kubectl describe pod <pod-name> -n <namespace>

# Check pod logs
kubectl logs <pod-name> -n <namespace>

# Check previous container logs if the container has restarted
kubectl logs <pod-name> -n <namespace> --previous

# Stream logs live
kubectl logs -f <pod-name> -n <namespace>

Inspecting Resources

# Check events sorted by timestamp
kubectl get events --sort-by=.metadata.creationTimestamp -n <namespace>

# Check resource usage
kubectl top pods -n <namespace>
kubectl top nodes

Interactive Debugging

Sometimes you need to get inside a container to debug:

# Start a shell in a running container
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Run a temporary debugging pod
kubectl run debug-pod --rm -it --image=busybox -- /bin/sh

Common Issues and Solutions

Image Pull Errors

If pods are stuck in ImagePullBackOff:

  1. Check image name and tag are correct
  2. Verify registry credentials with kubectl get secret
  3. Test pulling the image manually

Resource Constraints

For pods in Pending state:

# Check if nodes have enough resources
kubectl describe nodes | grep -A 5 "Allocated resources"

# Look for events showing resource issues
kubectl get events | grep -i "insufficient"

Network Connectivity Issues

# Debug network policies using a test pod
kubectl run network-test --rm -it --image=nicolaka/netshoot -- /bin/bash

Advanced Debugging Techniques

Using Ephemeral Debug Containers

Kubernetes 1.18+ supports ephemeral containers for debugging:

kubectl debug -it <pod-name> --image=busybox --target=<container-name>

Analyzing Cluster-wide Issues

# Check control plane components
kubectl get componentstatuses

# Check API server logs (if running on-prem)
kubectl logs -n kube-system kube-apiserver-<node-name>

Conclusion

Debugging Kubernetes issues effectively requires a combination of systematic approach and deep knowledge of how Kubernetes works. The techniques shared here should help you resolve common issues faster.

In future posts, I’ll cover more advanced debugging techniques including network packet capture, custom resource debugging, and operator troubleshooting.

What’s the most challenging Kubernetes issue you’ve had to debug? Share your experience in the comments.