Istio service mesh troubleshooting

Written by Airwalk Reply Senior Consultant James Mak
 

Below is a list of suggestions that may save you time in solving Istio problems.

Istio, when used hand in hand with Kubernetes (whether it is self-managed or managed) is commonly employed to realise a self-healing and highly available microservice architecture. Istio acts as an infrastructure layer on top of your application. It allows you to transparently add capabilities like observability, traffic management, and security, without adding them to your own code. We call this infrastructure layer the service mesh.

In a microservice architecture, a large number of microservices which are hosted in different systems need to talk to each other. They need to locate where these service endpoints are. Thus the traffic management is said to be the heart of such a microservice world. The problems that I found in everyday operation are usually relating to the Istio gateway, Istio virtual service, Istio destination rule or envoy proxy container.

In the following passage, I will share some of my experience in dealing with Istio problems and I hope they can help make your life easier when working with this service mesh technology. Please note that I am not going to go through all Istio basics and I assume that you have certain experience in managing Istio.

From my working experience, if a microservice is not reachable and you have identified the problem to lie on the Istio layer, it is worth looking into the envoy proxy that sit next to the application container in the same pod. 

Suggestion 1. 

First of all, verify your envoy is in-sync with Istio control plane by using the command “istioctl proxy-status” which should show “SYNCED” for your app and it has acknowledged the last configuration Istiod has sent to it.

Suggestion 2. 

Examine whether there are any OutOfMemory (OOM) errors in your envoy proxy container. Since Istio needs to store the service mesh routing information in the envoy container memory to control the pod’s ingress and egress traffic, the envoy memory requirement will increase as your service mesh size increases. In other words, if you have got a number of namespaces and implement certain cross namespace traffic controls, you will have to make sure you have allocated sufficient memory to your envoy container. I’ve seen an occasion where there were over 30 cross namespace constraints set in a namespace sidecar and the envoy proxy container kept restarting due to OOM and as a result causing the pod to crash.

Cross namespace constraints in the following sidecar configuration example:

apiVersion: networking.istio.io/v1alpha3
kind: Sidecar
metadata:
  name: ratings
  namespace: prod-us1
spec:
  workloadSelector:
    labels:
      app: ratings
  ingress:
  - port:
      number: 9080
      protocol: HTTP
      name: somename
    defaultEndpoint: unix:///var/run/someuds.sock
  egress:
  - port:
      number: 9080
      protocol: HTTP
      name: egresshttp
    hosts:
    - "prod-us1/*"
    - "prod-us2/*"
    - "prod-us3/*"
    - "prod-us4/*"
    - "prod-us5/*"
    - "prod-us6/*"
    - "prod-us7/*"
    - "prod-us8/*"
    - "prod-us9/*"
    - "prod-us10/*"
  
Suggestion 3. 

Virtual Service routing. Another possible cause of an unreachable microservice is that the Istio virtual service has not installed with the correct destination. Have all the destinations been listed in the virtual service? Is the destination weighing consistent with your intended setup? 

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews-route
spec:
  hosts:
  - reviews.prod.svc.cluster.local
  http:
  - headers:
      request:
        set:
          test: "true"
    route:
    - destination:
        host: reviews.prod.svc.cluster.local
        subset: v2
      weight: 25
    - destination:
        host: reviews.prod.svc.cluster.local
        subset: v1
      headers:
        response:
          remove:
          - foo
      weight: 75
Suggestion 4. 

Error in accessing external service entry. Make sure you have set the meshConfig.outboundTrafficPolicy.mode in your Istio installation to your desired mode of operation. Istio will block access to any service that is outside your service mesh if you set the aforementioned option to REGISTRY_ONLY. The other option value is ALLOW_ANY.

Suggestion 5. 

Examine the envoy proxy container log by the command “kubectl logs -c istio-proxy” Every inbound and outbound connection via envoy proxy is recorded in the log. Try digging into the log to look for the %RESPONSE_FLAGS% of the problem connection to understand what went wrong.



Istio is still evolving. Bugs are inevitable as more and more new features are injected to the product. But when it comes to its versatility and open source nature, I think Istio is a valuable piece of software to enable distributed microservices-based applications. If you have any questions, please get in touch

Sector expertise

Technology,
done right.

We work with major organisations across both financial services and the public sector delivering transformational change through technology.

Learn more