2022-01-25 00:00:00

Chaos Testing with Istio

Written by Consultant James Mak, at Airwalk Reply.

The combination of Kubernetes + Istio has largely reduced many emergency phone calls at 2am by providing a flexible way to tackle infrastructure and application level failure.

Its Cluster AutoScaler can scale up or down the number of nodes in the cluster based on usage. K8s also continuously monitors application pods’ healthiness and restarts the pods if necessary, while Istio has introduced granular control on ingress/egress traffic management at Cluster level and Pod level. They have greatly improved the user experience with less interruption and increased service availability.

However, even though K8s + Istio have got lots of useful features, it is always better to prepare for the worst before the worst comes. This is where Chaos testing comes in. We try to explore what will happen when different components in our system break. Of course, this will be carried out in a controlled environment, we will devise ways to break the system. For example, reduce infrastructure capacity, create high load in compute resource, create network outage, application failure, etc. All common or uncommon outage scenarios that you think of can be included in your Destroyer plan.

On the other hand, we also need our Savior repair strategy to get things restored once Doomsday occurs. We need to experiment with this plan and assess whether it returns our configuration to a stable state as we would want. Hence we build confidence that the service mesh can tolerate failing nodes and can prevent localised failures from cascading to other nodes.

It’s becoming popular for enterprise IT to hold a Game Day to get their IT expertise ‘rehearsed’ in such situations.

Technically speaking, Envoy, an open source lightweight proxy is the building block of Istio. Envoy works alongside the Kubernetes workload pod. It acts as a gateway between the workload pod and the Kubernetes mesh. Envoy intercepts all inbound and outbound traffic to and from the app workload. Hence we can use Envoy to manipulate the traffic by using its versatile routing features.

In the following, I will focus on using Istio to carry out Chaos testing, where some network delay and HTTP error response will be introduced to emulate network issues in microservice-based applications.

Prerequisites

Basic knowledge in Kubernetes
Basic knowledge in Istio

The client request call will first reach Istio Ingress Gateway which matches the Virtual Service and Destination Rule (if any). Based on the routing configuration, the request will be dispatched to the Backend.

Istio provides two kinds of HTTP failure injection at Virtual Service level, they are namely,

HTTP delay fault
HTTP abort fault

We can use HTTP delay fault to introduce network latency when the request reaches the Ingress Gateway. The envoy proxy response flag will be set to DI indicating that the request processing was delayed for a period specified via fault injection. With more granular control, you can specify what percentage of traffic you want to delay. Following is an example YAML file for creating a virtual service injecting a five second delay to ALL matched virtual service traffic.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: test-vs
spec:
  hosts:
  - backend
  http:
  - fault:
      delay:
        percentage:
          value: 100
        fixedDelay: 5s
    route:
    - destination:
        host: backend
  gateways:
  - ingress-gateway

Next comes the HTTP abort fault. Following is an example YAML where HTTP response code “500 — Internal Server Error” will be returned to the client for matched traffic. The envoy proxy response flag will be set to FI indicating that the request is aborted with a response code specified.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: test-vs
spec:
  hosts:
  - backend
  http:
  - fault:
      abort:
        httpStatus: 500
        percentage:
          value: 100
    route:
    - destination:
        host: backend
  gateways:
  - ingress-gateway

You can use Istio Virtual Service to do Chaos testing at the application layer transparently, by injecting timeouts or HTTP errors into your services, without actually updating your app code. Testing the system in distress to ensure its resilience is extremely important for modern microservice applications with little tolerance for downtime.

For a more orchestrated Chaos Engineering platform, Chaos Mesh will be a choice. It not only does Network Chaos, but is also able to carry Pod Chaos, DNS Chaos, IO Chaos, etc. and visualises the operation.

Services

Technology Delivery

Delivery Transformation

Technology Strategy and Operating Models

IT Service Management

Read More

AI Enablement and Acceleration Services

Cloud

Sectors

Financial Services

Read More

Public Sector

Read More

Other Sectors

Read More

Case Studies

Partners

Amazon Web Services

Read More

Microsoft

Read More

Insights and News

About Us

Leadership Team

Read More

Office Locations

Read More

Insights and News

Read More

Careers

Contact Us

Our Company

Read More

Do the Right Thing

Read More

Leadership Team

Read More

Office Locations

Read More

Insights and News

Read More

Transformation

Technology Delivery

Delivery Transformation

Tech Strategy and Operating Models

IT Service Management

Read More

Technology

AI Enablement and Acceleration Services

Cloud

Financial Services

Read More

Public Sector

Read More

Other Sectors

Read More

Amazon Web Services

Read More

Microsoft

Read More

SHARE

0

Chaos Testing with Istio

Written by Consultant James Mak, at Airwalk Reply.

Prerequisites

We would love to talk about transforming your business

Get in touch

We would love to talk about
transforming your business