Service to Service call patterns – Multi-cluster using Anthos Service Mesh
This is third blog post as part of a series exploring service to service call patterns in different application runtimes in Google Cloud.
The first post explored Service to Service call pattern in a GKE runtime using a Kubernetes Service abstraction
The second post explored Service to Service call pattern in a GKE runtime with Anthos Service mesh.
This post will explore the call pattern across multiple GKE runtimes with Anthos Service mesh providing a simple way for secure calls to be made across clusters.
A target architecture that I am aiming for is the following:
Here two different applications are hosted on two separate Kubernetes clusters in different availability zones and the Service(called “Caller”) in one cluster invokes the Service(called “Producer”) in another cluster.
Creating the Cluster and configuring Anthos Service Mesh
I have hosted a gist here with the steps to bring up:
- 2 GKE clusters in us-west1-a and us-central1-a zones
- Install Anthos service mesh on each of the clusters
- Register the clusters to be centrally managed
Assuming that the 2 GKE clusters are now available, the first cluster holds the Caller and an Ingress Gateway to enable the UI of the caller to be accessible to the user. This is through a deployment descriptor which looks something like this for the caller:
apiVersion: apps/v1 kind: Deployment metadata: name: sample-caller-v1 labels: app: sample-caller version: v1 spec: replicas: 1 selector: matchLabels: app: sample-caller version: v1 template: metadata: labels: app: sample-caller version: v1 spec: serviceAccountName: sample-caller-sa containers: - name: sample-caller image: us-docker.pkg.dev/sample/docker-repo/sample-caller:latest imagePullPolicy: IfNotPresent ports: - containerPort: 8080 securityContext: runAsUser: 1000 resources: requests: memory: "256Mi" livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 initialDelaySeconds: 3 periodSeconds: 3 readinessProbe: httpGet: path: /actuator/health/readiness port: 8080
I have reproduced the entire yaml just for demonstration, there is nothing that should stand out in the file.
Along the same lines the Producer application is deployed to the second cluster.
Caller to Producer call – Hacky approach
Now that the 2 services are in place, how do they call each other. Let’s start with “Caller” attempting to reach “Producer” using the host name of “sample-producer”(sample-producer happens to the name of the Kubernetes service that producer application is installed with, in the second cluster). This fails with a message which looks like this:
This is reasonable as the service “sample-producer” does not exist in cluster 1 and exists only in cluster 2.
The next thought that I have is check the behavior if I were to artificially add in a Kubernetes service called “sample-producer”
apiVersion: v1 kind: Service metadata: name: sample-producer labels: app: sample-producer service: sample-producer spec: ports: - port: 8080 name: http selector: app: sample-producer
Surprisingly adding such a service in “cluster 1” does work and sample-producer does resolve cleanly, even though the pods running “Producer” in cluster 2! Additionally adding in a virtual service and destination rule for the call from “Caller” to “Producer”, the timeout and circuit breaker works cleanly also:
apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: sample-producer-dl namespace: istio-apps spec: host: sample-producer.istio-apps.svc.cluster.local trafficPolicy: tls: mode: ISTIO_MUTUAL outlierDetection: consecutive5xxErrors: 3 interval: 15s baseEjectionTime: 15s --- apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: sample-producer-route namespace: istio-apps spec: hosts: - "sample-producer.istio-apps.svc.cluster.local" http: - timeout: 5s route: - destination: host: sample-producer port: number: 8080
Why this works is still a mystery to me. Let me now get to the right way of making this work.
Caller to Producer call – Right Approach
The right approach to getting service to service call working across a cluster is to use a feature of Anthos called Multi-cluster service and is described in detail in this blog post and this how to post.
The short of it is that if a “ServiceExport” resource is defined in cluster 2 and if the same namespace exists in Cluster 1 then the Service is resolved using a host name of the form “service-name.namespace.svc.clusterset.local” and in my case this maps to “sample-producer.istio-apps.svc.clusterset.local”!. The ServiceExport resource looks something like this:
kind: ServiceExport apiVersion: net.gke.io/v1 metadata: namespace: istio-apps name: sample-producer
This is the only change that I have to make to the caller, instead of calling Producer using “sample-producer”, now it uses the host name of “sample-producer.istio-apps.svc.clusterset.local” and everything resolves cleanly and the call continues to work across the cluster.
View from the caller:
View from the Producer:
I hope this clarifies to some extent how service to service call can be enable across multiple clusters, even across regions. Anthos service mesh handles all the underlying mechanics of getting this to work and from a callers perspective all that changes is the hostname to invoke the service with.
There are a few small catches, for eg, to get the Mutual TLS to work across clusters specific resources had to be created which I will leave as an exercise or please do reach out if it feels difficult to handle.
Published on Java Code Geeks with permission by Biju Kunjummen, partner at our JCG program. See the original article here: Service to Service call patterns – Multi-cluster using Anthos Service Mesh
Opinions expressed by Java Code Geeks contributors are their own.