Troubleshooting Kubernetes Services and Networking
Introduction
This tutorial guides you through troubleshooting common networking and service-related problems in Kubernetes. It assumes you have a basic understanding of Kubernetes concepts like Pods, Services, and Deployments, and have kubectl configured to interact with a cluster. This tutorial will focus on common problems related to DNS resolution, service discovery, and connectivity issues between pods.
Prerequisites
- A running Kubernetes cluster (e.g., Minikube, Kind, or a cloud provider cluster).
kubectlconfigured to interact with the cluster.
Task 1: Verifying Basic Pod Connectivity
Before diving into service-related issues, ensure that your pods can communicate with each other directly using their IP addresses.
sequenceDiagram
participant Admin as Admin/CLI
participant K8s as Kubernetes Cluster
participant BB_A as busybox-a Pod
participant BB_B as busybox-b Pod
Note over Admin, K8s: Step 1 & 2: Deploy and Apply
Admin->>K8s: kubectl apply (two busybox pods)
activate K8s
K8s-->>Admin: pods created
deactivate K8s
Note over Admin, K8s: Step 3: Get Pod IPs
Admin->>K8s: kubectl get pods -l app=busybox -o wide
activate K8s
K8s-->>Admin: busybox-a IP (10.244.0.5), busybox-b IP (10.244.0.6)
deactivate K8s
Note over Admin, BB_B: Step 4: Verify Connectivity (Execute Ping)
Admin->>BB_A: kubectl exec (ping 10.244.0.6)
activate BB_A
BB_A->>BB_B: PING (ICMP request)
activate BB_B
BB_B-->>BB_A: PONG (ICMP response)
deactivate BB_B
BB_A-->>Admin: PING Statistics (0% loss)
deactivate BB_A
Note right of BB_B: (Optional, if failure)
Admin-xBB_A: If ping fails, check CNI or network policies.
-
Deploy two simple busybox pods:
NODE_TYPE // yamlapiVersion: v1 kind: Pod metadata: name: busybox-a labels: app: busybox spec: containers: - name: busybox image: busybox:1.28 command: ['sh', '-c', 'while true; do sleep 3600; done'] --- apiVersion: v1 kind: Pod metadata: name: busybox-b labels: app: busybox spec: containers: - name: busybox image: busybox:1.28 command: ['sh', '-c', 'while true; do sleep 3600; done'] -
Apply the manifest:
NODE_TYPE // bashkubectl apply -f - <<EOF apiVersion: v1 kind: Pod metadata: name: busybox-a labels: app: busybox spec: containers: - name: busybox image: busybox:1.28 command: ['sh', '-c', 'while true; do sleep 3600; done'] --- apiVersion: v1 kind: Pod metadata: name: busybox-b labels: app: busybox spec: containers: - name: busybox image: busybox:1.28 command: ['sh', '-c', 'while true; do sleep 3600; done'] EOF -
Get the IP addresses of the pods:
NODE_TYPE // bashkubectl get pods -l app=busybox -o wideNODE_TYPE // outputNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox-a 1/1 Running 0 1m 10.244.0.5 node-1 <none> <none> busybox-b 1/1 Running 0 1m 10.244.0.6 node-2 <none> <none> -
Exec into
busybox-aand try to pingbusybox-b’s IP address:NODE_TYPE // bashkubectl exec -it busybox-a -- ping -c 3 10.244.0.6If you cannot ping the other pod, it indicates a fundamental networking issue within your cluster, potentially related to the CNI plugin or network policies. Investigate your cluster’s network configuration.Expected successful output:
NODE_TYPE // outputPING 10.244.0.6 (10.244.0.6): 56 data bytes 64 bytes from 10.244.0.6: seq=0 ttl=63 time=0.079 ms 64 bytes from 10.244.0.6: seq=1 ttl=63 time=0.062 ms 64 bytes from 10.244.0.6: seq=2 ttl=63 time=0.059 ms --- 10.244.0.6 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 0.059/0.066/0.079 ms
Task 2: Troubleshooting Service Discovery (DNS)
Kubernetes uses DNS for service discovery. If your pods cannot resolve service names, they cannot connect to other services in the cluster.
sequenceDiagram
autonumber
participant Admin as Admin/CLI
participant BB as busybox-a Pod
participant DNS as CoreDNS / kube-dns
participant K8s as K8s API / Service
Note over Admin, K8s: Step 1 & 2: Infrastructure Setup
Admin->>K8s: kubectl apply (Deployment & Service)
K8s-->>Admin: nginx-deployment & nginx-service created
Note over Admin, K8s: Step 3: DNS Resolution Test
Admin->>BB: kubectl exec (nslookup nginx-service)
activate BB
BB->>DNS: DNS Query: nginx-service.default.svc.cluster.local
activate DNS
alt DNS Working
DNS-->>BB: Success: IP 10.100.186.153
BB-->>Admin: Output: Name & Address
else DNS Failing
DNS-->>BB: NXDOMAIN / Timeout
deactivate DNS
BB-->>Admin: Error: nslookup failed
Note right of Admin: TRIGGER TROUBLESHOOTING
end
deactivate BB
Note over Admin, K8s: Step 4: Diagnostic (If Failed)
Admin->>K8s: kubectl get pods -n kube-system -l k8s-app=kube-dns
K8s-->>Admin: DNS Pod Status (Running/Pending/CrashLoop)
Admin->>K8s: kubectl logs -n kube-system
K8s-->>Admin: DNS Error Logs (if any)
-
Deploy a simple service and deployment:
NODE_TYPE // yamlapiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:latest ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: nginx-service spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 -
Apply the manifest:
NODE_TYPE // bashkubectl apply -f - <<EOF apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:latest ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: nginx-service spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 EOF -
Exec into
busybox-aand try to resolve the service name:NODE_TYPE // bashkubectl exec -it busybox-a -- nslookup nginx-serviceIf nslookup fails to resolve the service name, it indicates a DNS configuration issue. Ensure the kube-dns or CoreDNS pods are running correctly in the kube-system namespace. Check their logs for errors.Expected successful output:
NODE_TYPE // outputServer: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: nginx-service.default.svc.cluster.local Address 1: 10.100.186.153 -
If DNS resolution fails, check the status of the DNS pods:
NODE_TYPE // bashkubectl get pods -n kube-system -l k8s-app=kube-dnsNODE_TYPE // bashkubectl logs -n kube-system <kube-dns-pod-name> -c kubedns
Task 3: Troubleshooting Service Connectivity
Even if DNS resolution works, pods might still fail to connect to a service due to network policies or misconfigured selectors.
sequenceDiagram
autonumber
participant Admin as Admin/CLI
participant BB as busybox-a Pod
participant SVC as nginx-service (ClusterIP)
participant Pod as nginx Pod (Endpoint)
Note over Admin, Pod: Step 1: Connectivity Test
Admin->>BB: kubectl exec (wget nginx-service)
activate BB
BB->>SVC: HTTP Request (Port 80)
alt Connection Success
SVC->>Pod: Forward to Endpoint IP
activate Pod
Pod-->>SVC: HTTP 200 OK (HTML)
deactivate Pod
SVC-->>BB: Return HTML Content
BB-->>Admin: Output: "Welcome to nginx!"
else Connection Fails / Hangs
Note right of SVC: Potential Block or Missing Endpoint
SVC--XBB: Connection Timeout / Refused
BB-->>Admin: Error: wget failed
deactivate BB
Note over Admin, Pod: Step 2 & 3: Troubleshooting Logic
rect rgb(46, 70, 255, 0.1)
Note right of Admin: Check Network Policies
Admin->>Admin: kubectl get networkpolicy
Note right of Admin: Check Selectors & Endpoints
Admin->>Admin: kubectl describe service nginx-service
Admin->>Admin: kubectl get pods -l app=nginx
end
end
-
Exec into
busybox-aand try to connect to the nginx service usingwget:NODE_TYPE // bashkubectl exec -it busybox-a -- wget -O- nginx-serviceIf the wget command hangs or fails, it indicates a connectivity issue to the service. Check if network policies are blocking traffic to the service’s pods. Also, verify that the service’s selector matches the labels on the target pods.Expected successful output (showing the default nginx page):
NODE_TYPE // output<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html> -
If the connection fails, check network policies:
NODE_TYPE // bashkubectl get networkpolicyIf any network policies are present, examine them to ensure they are not blocking traffic from
busybox-ato thenginx-service. -
Verify the service’s selector:
NODE_TYPE // bashkubectl describe service nginx-serviceCheck the
Selectorfield. Then, verify that the pods targeted by the service have the corresponding labels:NODE_TYPE // bashkubectl get pods -l app=nginx -o yamlEnsure the labels in the
metadata.labelssection of the pod definition match the service’s selector.
Task 4: Inspecting Logs
Logs are your best friend when troubleshooting.
sequenceDiagram
autonumber
participant Admin as Admin/CLI
participant K8s as K8s API Server
participant Pod as nginx Pod
participant Proxy as kube-proxy Pod (Node Level)
Note over Admin, Pod: Level 1: Application Diagnostics
Admin->>K8s: kubectl get pods (Identify )
K8s-->>Admin: nginx-deployment-76bf4969df-xxxxx
Admin->>K8s: kubectl logs nginx-deployment-
K8s->>Pod: Fetch Container Logs (stdout/stderr)
Pod-->>K8s: Log Stream
K8s-->>Admin: Display: 200/404/500 Errors
Note over Admin, Proxy: Level 2: Infrastructure Diagnostics
Admin->>K8s: kubectl get pods -n kube-system -l k8s-app=kube-proxy
K8s-->>Admin: list of kube-proxy pods
Admin->>K8s: kubectl logs -n kube-system
K8s->>Proxy: Fetch Proxy/IPtables Logs
Proxy-->>K8s: Log Stream (Service rules, endpoint sync)
K8s-->>Admin: Display: IPtables/IPVS updates or sync errors
-
Check the logs of the nginx pod:
NODE_TYPE // bashkubectl logs nginx-deployment-<hash>Replace
<hash>with the actual hash present in your deployment. Look for errors or unusual activity. -
Check the logs of the kube-proxy pods:
NODE_TYPE // bashkubectl get pods -n kube-system -l k8s-app=kube-proxy kubectl logs -n kube-system <kube-proxy-pod-name>kube-proxy is responsible for implementing service proxying rules. Its logs may contain information about service endpoint configuration or errors.
Task 5: Port Forwarding for Local Access
Sometimes, directly accessing a service from your local machine can help isolate issues.
sequenceDiagram
autonumber
participant User as Browser (Localhost:8080)
participant CLI as kubectl port-forward
participant K8s as K8s API Server (Tunnel)
participant SVC as nginx-service
participant Pod as nginx Pod
Note over User, CLI: Step 1: Establish Tunnel
Admin->>CLI: kubectl port-forward service/nginx-service 8080:80
CLI->>K8s: Open SPDY/HTTP2 Stream
K8s-->>CLI: Tunnel Established
Note over User, Pod: Step 2: Verification
User->>CLI: GET http://localhost:8080
CLI->>K8s: Encapsulate Data
K8s->>SVC: Forward to Service Port 80
SVC->>Pod: Route to TargetPort 80
activate Pod
Pod-->>SVC: HTTP 200 OK (HTML)
deactivate Pod
SVC-->>K8s: Return Data
K8s-->>CLI: Decapsulate Data
CLI-->>User: Render "Welcome to nginx!"
Note over User, Admin: Diagnostic Conclusion
rect rgb(46, 70, 255, 0.1)
Note right of User: If Browser works:
Internal Cluster Networking (CNI/Policies) is the culprit.
end
-
Use
kubectl port-forwardto forward a local port to the nginx service:NODE_TYPE // bashkubectl port-forward service/nginx-service 8080:80 -
Open your web browser and navigate to
http://localhost:8080. If you can access the nginx default page, it confirms that the service itself is working correctly, and the problem likely lies in the connectivity between pods within the cluster.
Task 6: Using kubectl debug (Ephemeral Containers)
Kubernetes 1.18+ introduced kubectl debug, which allows you to easily add an ephemeral container to a running pod for troubleshooting. This is useful when you need to run additional tools within the pod’s network namespace without modifying the pod’s original spec.
sequenceDiagram
autonumber
participant Admin as Admin/CLI (Terminal)
participant K8s as K8s API Server (Control Plane)
participant Pod as busybox-a Pod (Target)
participant Debug as ephemeral-container (Injected)
Note over Admin, Debug: Step 1: Establish Debug Session
Admin->>K8s: kubectl debug pod/busybox-a -i --image=busybox:1.28 --target=busybox
activate K8s
K8s->>Pod: Ingest Ephemeral Container Spec
K8s->>Pod: Attach Terminal Stream (SPDY/HTTP2)
Note right of Pod: Debug container joins Pod's Network Namespace.
activate Debug
Pod->>Debug: Share Network Interface (eth0)
Debug-->>K8s: Container Running & Attached
K8s-->>Admin: Connected (Interactive Shell)
deactivate K8s
Note over Admin, Debug: Step 2: Diagnostic Commands
rect rgb(46, 70, 255, 0.1)
Admin->>Debug: Execute nslookup nginx-service
Debug->>DNS: Query CoreDNS via Pod's Interface
DNS-->>Debug: Success (10.100.186.153)
Debug-->>Admin: Display DNS Info
end
rect rgb(46, 70, 255, 0.1)
Admin->>Debug: Execute wget -O- nginx-service
Debug->>SVC: Fetch HTML via Pod's Interface
SVC-->>Debug: HTTP 200 OK
Debug-->>Admin: Display HTML Output
end
-
Debug the
busybox-apod usingkubectl debug:NODE_TYPE // bashkubectl debug pod/busybox-a -i --image=busybox:1.28 --target=busyboxThis command injects a new container based on the
busybox:1.28image into thebusybox-apod and attaches your terminal to it. The--target=busyboxflag specifies that the new container should share the network namespace with thebusyboxcontainer already running in the pod. -
Inside the debug session, run troubleshooting commands:
NODE_TYPE // bashnslookup nginx-service wget -O- nginx-serviceThis allows you to run commands directly from within the pod’s network namespace, providing a more accurate view of the network environment.
Conclusion
This tutorial covered several techniques for troubleshooting Kubernetes services and networking. Key takeaways include verifying basic pod connectivity, troubleshooting DNS resolution, checking service selectors, and inspecting logs. Remember that a systematic approach is essential when diagnosing network issues in Kubernetes. Start with the basics and work your way up to more complex scenarios.