Technical Theory

Troubleshooting Kubernetes Services and Networking

Introduction

This tutorial guides you through troubleshooting common networking and service-related problems in Kubernetes. It assumes you have a basic understanding of Kubernetes concepts like Pods, Services, and Deployments, and have kubectl configured to interact with a cluster. This tutorial will focus on common problems related to DNS resolution, service discovery, and connectivity issues between pods.

Prerequisites

  • A running Kubernetes cluster (e.g., Minikube, Kind, or a cloud provider cluster).
  • kubectl configured to interact with the cluster.

Task 1: Verifying Basic Pod Connectivity

Before diving into service-related issues, ensure that your pods can communicate with each other directly using their IP addresses.

sequenceDiagram
    participant Admin as Admin/CLI
    participant K8s as Kubernetes Cluster
    participant BB_A as busybox-a Pod
    participant BB_B as busybox-b Pod

    Note over Admin, K8s: Step 1 & 2: Deploy and Apply
    Admin->>K8s: kubectl apply (two busybox pods)
    activate K8s
    K8s-->>Admin: pods created
    deactivate K8s

    Note over Admin, K8s: Step 3: Get Pod IPs
    Admin->>K8s: kubectl get pods -l app=busybox -o wide
    activate K8s
    K8s-->>Admin: busybox-a IP (10.244.0.5), busybox-b IP (10.244.0.6)
    deactivate K8s

    Note over Admin, BB_B: Step 4: Verify Connectivity (Execute Ping)
    Admin->>BB_A: kubectl exec (ping 10.244.0.6)
    activate BB_A
    BB_A->>BB_B: PING (ICMP request)
    activate BB_B
    BB_B-->>BB_A: PONG (ICMP response)
    deactivate BB_B
    BB_A-->>Admin: PING Statistics (0% loss)
    deactivate BB_A

    Note right of BB_B: (Optional, if failure)
    Admin-xBB_A: If ping fails, check CNI or network policies.
  1. Deploy two simple busybox pods:

    NODE_TYPE // yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: busybox-a
      labels:
        app: busybox
    spec:
      containers:
      - name: busybox
        image: busybox:1.28
        command: ['sh', '-c', 'while true; do sleep 3600; done']
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: busybox-b
      labels:
        app: busybox
    spec:
      containers:
      - name: busybox
        image: busybox:1.28
        command: ['sh', '-c', 'while true; do sleep 3600; done']
  2. Apply the manifest:

    NODE_TYPE // bash
    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: Pod
    metadata:
      name: busybox-a
      labels:
        app: busybox
    spec:
      containers:
      - name: busybox
        image: busybox:1.28
        command: ['sh', '-c', 'while true; do sleep 3600; done']
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: busybox-b
      labels:
        app: busybox
    spec:
      containers:
      - name: busybox
        image: busybox:1.28
        command: ['sh', '-c', 'while true; do sleep 3600; done']
    EOF
  3. Get the IP addresses of the pods:

    NODE_TYPE // bash
    kubectl get pods -l app=busybox -o wide
    NODE_TYPE // output
    NAME        READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
    busybox-a   1/1     Running   0          1m    10.244.0.5   node-1           <none>           <none>
    busybox-b   1/1     Running   0          1m    10.244.0.6   node-2           <none>           <none>
  4. Exec into busybox-a and try to ping busybox-b’s IP address:

    NODE_TYPE // bash
    kubectl exec -it busybox-a -- ping -c 3 10.244.0.6
    If you cannot ping the other pod, it indicates a fundamental networking issue within your cluster, potentially related to the CNI plugin or network policies. Investigate your cluster’s network configuration.

    Expected successful output:

    NODE_TYPE // output
    PING 10.244.0.6 (10.244.0.6): 56 data bytes
    64 bytes from 10.244.0.6: seq=0 ttl=63 time=0.079 ms
    64 bytes from 10.244.0.6: seq=1 ttl=63 time=0.062 ms
    64 bytes from 10.244.0.6: seq=2 ttl=63 time=0.059 ms
    
    --- 10.244.0.6 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0.059/0.066/0.079 ms

Task 2: Troubleshooting Service Discovery (DNS)

Kubernetes uses DNS for service discovery. If your pods cannot resolve service names, they cannot connect to other services in the cluster.

sequenceDiagram
    autonumber
    participant Admin as Admin/CLI
    participant BB as busybox-a Pod
    participant DNS as CoreDNS / kube-dns
    participant K8s as K8s API / Service

    Note over Admin, K8s: Step 1 & 2: Infrastructure Setup
    Admin->>K8s: kubectl apply (Deployment & Service)
    K8s-->>Admin: nginx-deployment & nginx-service created

    Note over Admin, K8s: Step 3: DNS Resolution Test
    Admin->>BB: kubectl exec (nslookup nginx-service)
    activate BB
    BB->>DNS: DNS Query: nginx-service.default.svc.cluster.local
    activate DNS
    
    alt DNS Working
        DNS-->>BB: Success: IP 10.100.186.153
        BB-->>Admin: Output: Name & Address
    else DNS Failing
        DNS-->>BB: NXDOMAIN / Timeout
        deactivate DNS
        BB-->>Admin: Error: nslookup failed
        Note right of Admin: TRIGGER TROUBLESHOOTING
    end
    deactivate BB

    Note over Admin, K8s: Step 4: Diagnostic (If Failed)
    Admin->>K8s: kubectl get pods -n kube-system -l k8s-app=kube-dns
    K8s-->>Admin: DNS Pod Status (Running/Pending/CrashLoop)
    Admin->>K8s: kubectl logs -n kube-system 
    K8s-->>Admin: DNS Error Logs (if any)
  1. Deploy a simple service and deployment:

    NODE_TYPE // yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
      labels:
        app: nginx
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:latest
            ports:
            - containerPort: 80
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx-service
    spec:
      selector:
        app: nginx
      ports:
      - protocol: TCP
        port: 80
        targetPort: 80
  2. Apply the manifest:

    NODE_TYPE // bash
    kubectl apply -f - <<EOF
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
      labels:
        app: nginx
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:latest
            ports:
            - containerPort: 80
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx-service
    spec:
      selector:
        app: nginx
      ports:
      - protocol: TCP
        port: 80
        targetPort: 80
    EOF
  3. Exec into busybox-a and try to resolve the service name:

    NODE_TYPE // bash
    kubectl exec -it busybox-a -- nslookup nginx-service
    If nslookup fails to resolve the service name, it indicates a DNS configuration issue. Ensure the kube-dns or CoreDNS pods are running correctly in the kube-system namespace. Check their logs for errors.

    Expected successful output:

    NODE_TYPE // output
    Server:    10.96.0.10
    Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
    
    Name:      nginx-service.default.svc.cluster.local
    Address 1: 10.100.186.153
  4. If DNS resolution fails, check the status of the DNS pods:

    NODE_TYPE // bash
    kubectl get pods -n kube-system -l k8s-app=kube-dns
    NODE_TYPE // bash
    kubectl logs -n kube-system <kube-dns-pod-name> -c kubedns

Task 3: Troubleshooting Service Connectivity

Even if DNS resolution works, pods might still fail to connect to a service due to network policies or misconfigured selectors.

sequenceDiagram
    autonumber
    participant Admin as Admin/CLI
    participant BB as busybox-a Pod
    participant SVC as nginx-service (ClusterIP)
    participant Pod as nginx Pod (Endpoint)

    Note over Admin, Pod: Step 1: Connectivity Test
    Admin->>BB: kubectl exec (wget nginx-service)
    activate BB
    BB->>SVC: HTTP Request (Port 80)
    
    alt Connection Success
        SVC->>Pod: Forward to Endpoint IP
        activate Pod
        Pod-->>SVC: HTTP 200 OK (HTML)
        deactivate Pod
        SVC-->>BB: Return HTML Content
        BB-->>Admin: Output: "Welcome to nginx!"
    else Connection Fails / Hangs
        Note right of SVC: Potential Block or Missing Endpoint
        SVC--XBB: Connection Timeout / Refused
        BB-->>Admin: Error: wget failed
        deactivate BB
        
        Note over Admin, Pod: Step 2 & 3: Troubleshooting Logic
        
        rect rgb(46, 70, 255, 0.1)
            Note right of Admin: Check Network Policies
            Admin->>Admin: kubectl get networkpolicy
            Note right of Admin: Check Selectors & Endpoints
            Admin->>Admin: kubectl describe service nginx-service
            Admin->>Admin: kubectl get pods -l app=nginx
        end
    end
  1. Exec into busybox-a and try to connect to the nginx service using wget:

    NODE_TYPE // bash
    kubectl exec -it busybox-a -- wget -O- nginx-service
    If the wget command hangs or fails, it indicates a connectivity issue to the service. Check if network policies are blocking traffic to the service’s pods. Also, verify that the service’s selector matches the labels on the target pods.

    Expected successful output (showing the default nginx page):

    NODE_TYPE // output
    <!DOCTYPE html>
    <html>
    <head>
    <title>Welcome to nginx!</title>
    <style>
        body {
            width: 35em;
            margin: 0 auto;
            font-family: Tahoma, Verdana, Arial, sans-serif;
        }
    </style>
    </head>
    <body>
    <h1>Welcome to nginx!</h1>
    <p>If you see this page, the nginx web server is successfully installed and
    working. Further configuration is required.</p>
    
    <p>For online documentation and support please refer to
    <a href="http://nginx.org/">nginx.org</a>.<br/>
    Commercial support is available at
    <a href="http://nginx.com/">nginx.com</a>.</p>
    
    <p><em>Thank you for using nginx.</em></p>
    </body>
    </html>
  2. If the connection fails, check network policies:

    NODE_TYPE // bash
    kubectl get networkpolicy

    If any network policies are present, examine them to ensure they are not blocking traffic from busybox-a to the nginx-service.

  3. Verify the service’s selector:

    NODE_TYPE // bash
    kubectl describe service nginx-service

    Check the Selector field. Then, verify that the pods targeted by the service have the corresponding labels:

    NODE_TYPE // bash
    kubectl get pods -l app=nginx -o yaml

    Ensure the labels in the metadata.labels section of the pod definition match the service’s selector.

Task 4: Inspecting Logs

Logs are your best friend when troubleshooting.

sequenceDiagram
    autonumber
    participant Admin as Admin/CLI
    participant K8s as K8s API Server
    participant Pod as nginx Pod
    participant Proxy as kube-proxy Pod (Node Level)

    Note over Admin, Pod: Level 1: Application Diagnostics
    Admin->>K8s: kubectl get pods (Identify )
    K8s-->>Admin: nginx-deployment-76bf4969df-xxxxx
    Admin->>K8s: kubectl logs nginx-deployment-
    K8s->>Pod: Fetch Container Logs (stdout/stderr)
    Pod-->>K8s: Log Stream
    K8s-->>Admin: Display: 200/404/500 Errors
    
    Note over Admin, Proxy: Level 2: Infrastructure Diagnostics
    Admin->>K8s: kubectl get pods -n kube-system -l k8s-app=kube-proxy
    K8s-->>Admin: list of kube-proxy pods
    Admin->>K8s: kubectl logs -n kube-system 
    K8s->>Proxy: Fetch Proxy/IPtables Logs
    Proxy-->>K8s: Log Stream (Service rules, endpoint sync)
    K8s-->>Admin: Display: IPtables/IPVS updates or sync errors
  1. Check the logs of the nginx pod:

    NODE_TYPE // bash
    kubectl logs nginx-deployment-<hash>

    Replace <hash> with the actual hash present in your deployment. Look for errors or unusual activity.

  2. Check the logs of the kube-proxy pods:

    NODE_TYPE // bash
    kubectl get pods -n kube-system -l k8s-app=kube-proxy
    kubectl logs -n kube-system <kube-proxy-pod-name>

    kube-proxy is responsible for implementing service proxying rules. Its logs may contain information about service endpoint configuration or errors.

Task 5: Port Forwarding for Local Access

Sometimes, directly accessing a service from your local machine can help isolate issues.

sequenceDiagram
    autonumber
    participant User as Browser (Localhost:8080)
    participant CLI as kubectl port-forward
    participant K8s as K8s API Server (Tunnel)
    participant SVC as nginx-service
    participant Pod as nginx Pod

    Note over User, CLI: Step 1: Establish Tunnel
    Admin->>CLI: kubectl port-forward service/nginx-service 8080:80
    CLI->>K8s: Open SPDY/HTTP2 Stream
    K8s-->>CLI: Tunnel Established

    Note over User, Pod: Step 2: Verification
    User->>CLI: GET http://localhost:8080
    CLI->>K8s: Encapsulate Data
    K8s->>SVC: Forward to Service Port 80
    SVC->>Pod: Route to TargetPort 80
    activate Pod
    Pod-->>SVC: HTTP 200 OK (HTML)
    deactivate Pod
    SVC-->>K8s: Return Data
    K8s-->>CLI: Decapsulate Data
    CLI-->>User: Render "Welcome to nginx!"

    Note over User, Admin: Diagnostic Conclusion
    rect rgb(46, 70, 255, 0.1)
        Note right of User: If Browser works: 
Internal Cluster Networking (CNI/Policies) is the culprit. end
  1. Use kubectl port-forward to forward a local port to the nginx service:

    NODE_TYPE // bash
    kubectl port-forward service/nginx-service 8080:80
  2. Open your web browser and navigate to http://localhost:8080. If you can access the nginx default page, it confirms that the service itself is working correctly, and the problem likely lies in the connectivity between pods within the cluster.

Task 6: Using kubectl debug (Ephemeral Containers)

Kubernetes 1.18+ introduced kubectl debug, which allows you to easily add an ephemeral container to a running pod for troubleshooting. This is useful when you need to run additional tools within the pod’s network namespace without modifying the pod’s original spec.

sequenceDiagram
    autonumber
    participant Admin as Admin/CLI (Terminal)
    participant K8s as K8s API Server (Control Plane)
    participant Pod as busybox-a Pod (Target)
    participant Debug as ephemeral-container (Injected)

    Note over Admin, Debug: Step 1: Establish Debug Session
    Admin->>K8s: kubectl debug pod/busybox-a -i --image=busybox:1.28 --target=busybox
    activate K8s
    K8s->>Pod: Ingest Ephemeral Container Spec
    K8s->>Pod: Attach Terminal Stream (SPDY/HTTP2)
    Note right of Pod: Debug container joins Pod's Network Namespace.
    activate Debug
    Pod->>Debug: Share Network Interface (eth0)
    Debug-->>K8s: Container Running & Attached
    K8s-->>Admin: Connected (Interactive Shell)
    deactivate K8s
    
    Note over Admin, Debug: Step 2: Diagnostic Commands
    rect rgb(46, 70, 255, 0.1)
        Admin->>Debug: Execute nslookup nginx-service
        Debug->>DNS: Query CoreDNS via Pod's Interface
        DNS-->>Debug: Success (10.100.186.153)
        Debug-->>Admin: Display DNS Info
    end

    rect rgb(46, 70, 255, 0.1)
        Admin->>Debug: Execute wget -O- nginx-service
        Debug->>SVC: Fetch HTML via Pod's Interface
        SVC-->>Debug: HTTP 200 OK
        Debug-->>Admin: Display HTML Output
    end
  1. Debug the busybox-a pod using kubectl debug:

    NODE_TYPE // bash
    kubectl debug pod/busybox-a -i --image=busybox:1.28 --target=busybox

    This command injects a new container based on the busybox:1.28 image into the busybox-a pod and attaches your terminal to it. The --target=busybox flag specifies that the new container should share the network namespace with the busybox container already running in the pod.

  2. Inside the debug session, run troubleshooting commands:

    NODE_TYPE // bash
    nslookup nginx-service
    wget -O- nginx-service

    This allows you to run commands directly from within the pod’s network namespace, providing a more accurate view of the network environment.

Conclusion

This tutorial covered several techniques for troubleshooting Kubernetes services and networking. Key takeaways include verifying basic pod connectivity, troubleshooting DNS resolution, checking service selectors, and inspecting logs. Remember that a systematic approach is essential when diagnosing network issues in Kubernetes. Start with the basics and work your way up to more complex scenarios.

Next Topic