Service to Service call patterns – Multi-cluster using Anthos Service Mesh

Biju KunjummenJanuary 9th, 2022Last Updated: January 2nd, 2022

0 77 3 minutes read

This is third blog post as part of a series exploring service to service call patterns in different application runtimes in Google Cloud.

The first post explored Service to Service call pattern in a GKE runtime using a Kubernetes Service abstraction

The second post explored Service to Service call pattern in a GKE runtime with Anthos Service mesh.

This post will explore the call pattern across multiple GKE runtimes with Anthos Service mesh providing a simple way for secure calls to be made across clusters.

Target Architecture

A target architecture that I am aiming for is the following:

Here two different applications are hosted on two separate Kubernetes clusters in different availability zones and the Service(called “Caller”) in one cluster invokes the Service(called “Producer”) in another cluster.

Creating the Cluster and configuring Anthos Service Mesh

I have hosted a gist here with the steps to bring up:

2 GKE clusters in us-west1-a and us-central1-a zones
Install Anthos service mesh on each of the clusters
Register the clusters to be centrally managed

Services Installation

Assuming that the 2 GKE clusters are now available, the first cluster holds the Caller and an Ingress Gateway to enable the UI of the caller to be accessible to the user. This is through a deployment descriptor which looks something like this for the caller:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-caller-v1
  labels:
    app: sample-caller
    version: v1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sample-caller
      version: v1
  template:
    metadata:
      labels:
        app: sample-caller
        version: v1
    spec:
      serviceAccountName: sample-caller-sa
      containers:
        - name: sample-caller
          image: us-docker.pkg.dev/sample/docker-repo/sample-caller:latest
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080
          securityContext:
            runAsUser: 1000
          resources:
            requests:
              memory: "256Mi"
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
            initialDelaySeconds: 3
            periodSeconds: 3
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: 8080

I have reproduced the entire yaml just for demonstration, there is nothing that should stand out in the file.

Along the same lines the Producer application is deployed to the second cluster.

Caller to Producer call – Hacky approach

Now that the 2 services are in place, how do they call each other. Let’s start with “Caller” attempting to reach “Producer” using the host name of “sample-producer”(sample-producer happens to the name of the Kubernetes service that producer application is installed with, in the second cluster). This fails with a message which looks like this:

This is reasonable as the service “sample-producer” does not exist in cluster 1 and exists only in cluster 2.

The next thought that I have is check the behavior if I were to artificially add in a Kubernetes service called “sample-producer”

apiVersion: v1
kind: Service
metadata:
  name: sample-producer
  labels:
    app: sample-producer
    service: sample-producer
spec:
  ports:
    - port: 8080
      name: http
  selector:
    app: sample-producer

Surprisingly adding such a service in “cluster 1” does work and sample-producer does resolve cleanly, even though the pods running “Producer” in cluster 2! Additionally adding in a virtual service and destination rule for the call from “Caller” to “Producer”, the timeout and circuit breaker works cleanly also:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: sample-producer-dl
  namespace: istio-apps
spec:
  host: sample-producer.istio-apps.svc.cluster.local
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
    outlierDetection:
      consecutive5xxErrors: 3
      interval: 15s
      baseEjectionTime: 15s
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: sample-producer-route
  namespace: istio-apps
spec:
  hosts:
    - "sample-producer.istio-apps.svc.cluster.local"
  http:
    - timeout: 5s
      route:
        - destination:
            host: sample-producer
            port:
              number: 8080

Why this works is still a mystery to me. Let me now get to the right way of making this work.

Caller to Producer call – Right Approach

The right approach to getting service to service call working across a cluster is to use a feature of Anthos called Multi-cluster service and is described in detail in this blog post and this how to post.

The short of it is that if a “ServiceExport” resource is defined in cluster 2 and if the same namespace exists in Cluster 1 then the Service is resolved using a host name of the form “service-name.namespace.svc.clusterset.local” and in my case this maps to “sample-producer.istio-apps.svc.clusterset.local”!. The ServiceExport resource looks something like this:

kind: ServiceExport
apiVersion: net.gke.io/v1
metadata:
  namespace: istio-apps
  name: sample-producer

This is the only change that I have to make to the caller, instead of calling Producer using “sample-producer”, now it uses the host name of “sample-producer.istio-apps.svc.clusterset.local” and everything resolves cleanly and the call continues to work across the cluster.

View from the caller:

View from the Producer:

Conclusion

I hope this clarifies to some extent how service to service call can be enable across multiple clusters, even across regions. Anthos service mesh handles all the underlying mechanics of getting this to work and from a callers perspective all that changes is the hostname to invoke the service with.

There are a few small catches, for eg, to get the Mutual TLS to work across clusters specific resources had to be created which I will leave as an exercise or please do reach out if it feels difficult to handle.

Published on Java Code Geeks with permission by Biju Kunjummen, partner at our JCG program. See the original article here: Service to Service call patterns – Multi-cluster using Anthos Service Mesh

Opinions expressed by Java Code Geeks contributors are their own.