Software Development

Ambient Mesh & Sidecar-less Istio

The service mesh was supposed to simplify microservice communication. Then it gave every pod a second process to babysit. Ambient mode finally fixes that — and the numbers are hard to argue with.

If you’ve tried to introduce Istio into a Kubernetes cluster, you’ve probably had the conversation. The one where someone asks why every single pod now has a second container sitting next to it consuming memory, injecting latency, and complicating every debug session. The sidecar proxy model — specifically Envoy injected as a co-located container — has been the backbone of Istio since its inception in 2017. It’s powerful. It’s also genuinely painful at scale.

Ambient mesh is Istio’s answer to that pain. Announced in 2022 and reaching stable (General Availability) status with Istio 1.22 in 2024, it separates the mesh data plane from the application pod entirely — no injection, no sidecar, no per-pod Envoy. Here’s how it actually works, what you give up, and what you gain.

1. The Problem With Sidecars

To understand why ambient mode matters, you need to feel the weight of the sidecar model first. In traditional Istio, when a pod joins the mesh, an Envoy proxy process gets injected into it automatically via a mutating admission webhook. This proxy intercepts all inbound and outbound traffic using iptables rules and handles mTLS, observability, and traffic policies on behalf of your application.

The sidecar model required users to understand and manage a lot of complexity — especially when it came to resource requests, permissions, and the operational overhead of running a proxy process next to every single workload.— Istio Blog, “Introducing Ambient Mesh”, September 2022

In practice this means several compounding problems as clusters grow:

Pain PointRoot CauseAt-Scale Impact
Memory overhead per podEnvoy baseline ~50–100 MB each500 pods = 25–50 GB extra RAM
Slow pod startupiptables rules + proxy init containerNoticeable in burst scaling scenarios
Debugging complexityTraffic passes through invisible proxyLong incident investigation times
Mesh upgradesEvery pod must be restarted to update EnvoyCluster-wide rolling restarts required
Opt-out is fragileAnnotation-based exclusions are error-pronePartial mesh = inconsistent policy

2. How Ambient Mode Works

Ambient mesh moves the proxy function out of pods entirely and into the node-level infrastructure. It introduces two new components that replace the per-pod sidecar: the ztunnel and the waypoint proxy.

→ ztunnel: the always-on security layer

The ztunnel (zero-trust tunnel) runs as a DaemonSet — one instance per node. It handles the Layer 4 concerns: mutual TLS, secure tunneling between nodes via HBONE (HTTP-Based Overlay Network Encapsulation), and basic telemetry. It’s lightweight, always present on every node, and your pods talk to it transparently through network redirection at the kernel level.

→ waypoint proxy: on-demand Layer 7 policies

The waypoint proxy is optional and per-namespace (or per-service account). You deploy it only when you need Layer 7 capabilities — HTTP routing, request retries, header manipulation, JWT authentication. This is the key insight: L4 security for everyone, L7 power only where you actually need it.

Key insight

In sidecar mode, you get L4 and L7 capabilities in every pod whether you use them or not. In ambient mode, L4 is free (ztunnel), and L7 is opt-in (waypoint). For a cluster where 80% of services only need mTLS and basic observability, this translates directly to hardware savings and simpler operations.

3. The Performance Numbers

The Istio team published benchmark comparisons when ambient mode hit GA. The results across a representative workload on a 3-node GKE cluster running 500 pods tell a clear story:

Memory & CPU Overhead: Sidecar vs. Ambient Mode

Per-node overhead measured at 500 pods — no waypoint proxy deployed (L4 only). Source: Istio Performance Benchmarks 2024 — istio.io/latest/docs/ops/deployment/performance-and-scalability

Latency Comparison: p50, p90, p99 (ms)

HTTP request latency added by the mesh — lower is better. Measured with Fortio at 1,000 RPS. Source: Istio ambient mode benchmarks — istio.io/blog/2022/introducing-ambient-mesh — adjusted for 1.22 GA release

The latency numbers deserve a note: at p50 and p90, ambient mode is competitive with sidecar. At p99 — the tail latency that matters most in service-to-service communication under real load — ambient mode shows a measurable improvement because there’s no inter-process communication hop between the app container and the proxy container within the same pod.

4. Getting Started: Enabling Ambient Mode

Ambient mode requires Istio 1.22 or later and Kubernetes 1.26+. The simplest path is via istioctl. These commands work against any running Kubernetes cluster where you have cluster-admin permissions:

# Step 1 — Install Istio in ambient mode
istioctl install --set profile=ambient -y

# Step 2 — Verify the ztunnel DaemonSet is running on all nodes
kubectl get daemonset ztunnel -n istio-system

# Step 3 — Enroll a namespace into the ambient mesh
# (no pod restarts required — this is instant)
kubectl label namespace default istio.io/dataplane-mode=ambient

# Step 4 — (Optional) Deploy a waypoint proxy for L7 policies
istioctl waypoint apply --namespace default
kubectl label namespace default istio.io/use-waypoint=waypoint

The key difference from sidecar mode is in step 3: enrolling a namespace doesn’t require restarting any existing pods. The ztunnel picks them up live. This makes phased adoption genuinely smooth — you can migrate namespace by namespace without service disruption.

5. Sidecar vs. Ambient: Feature Parity Today

Ambient mode reached GA for L4 features in Istio 1.22. L7 support via waypoint proxies is stable but still maturing. Here’s where things stand:

FeatureSidecar ModeAmbient Mode (1.22+)Notes
Mutual TLS (mTLS)StableStableVia ztunnel — no waypoint needed
Basic telemetry (L4)StableStableTCP connection metrics, no HTTP details
HTTP traffic managementStableStableRequires waypoint proxy
AuthorizationPolicy (L7)StableStableRequires waypoint proxy
gRPC streamingStableBetaSome edge cases in active resolution
Multi-cluster supportStableBetaActively being built out post-1.22
No pod restart on enrollAmbient’s most practical advantage
Mesh upgrade w/o restartztunnel upgrades independently of pods

6. Should You Migrate Now?

The honest answer depends on your situation. If you’re starting a new cluster with no existing Istio investment, ambient mode is the right default today — simpler to operate, less resource intensive, and the direction the project is heading. If you have a large, stable production cluster with extensive L7 policies and custom Envoy filters (WASM plugins), waiting until the Istio 1.23–1.24 cycle gives the ecosystem time to mature around ambient edge cases.

SituationRecommendation
New cluster, no existing IstioStart with ambient mode
Existing Istio, mostly L4 policiesMigrate namespace by namespace
Heavy WASM filter usageWait — WASM + waypoint still maturing
Multi-cluster mesh todayWait for 1.23+ multi-cluster ambient GA
Resource-constrained nodesMigrate — this is the biggest win

The official ambient getting started guide walks through a complete migration and is kept up to date with each release. The ambient GitHub label is the best place to track open issues before committing a production migration.

7. What We’ve Learned

Ambient mesh isn’t a rewrite of Istio — it’s a smarter architecture for the same job. We saw why sidecars became painful at scale: per-pod Envoy processes consuming memory on every node, slow upgrades requiring cluster-wide pod restarts, and debugging complexity that masked rather than illuminated problems. Ambient mode resolves this by splitting the data plane into two layers: the ztunnel DaemonSet for always-on L4 security and telemetry, and the optional waypoint proxy for teams that actually need L7 control. The benchmark numbers show roughly 70% memory savings and competitive or better latency. Feature parity for standard use cases is there as of Istio 1.22 — multi-cluster and WASM filter support are the remaining gaps. For new clusters and resource-constrained environments, ambient mode is ready today. For complex production meshes, a measured, namespace-by-namespace migration is the sensible path forward.

Eleftheria Drosopoulou

Eleftheria is an Experienced Business Analyst with a robust background in the computer software industry. Proficient in Computer Software Training, Digital Marketing, HTML Scripting, and Microsoft Office, they bring a wealth of technical skills to the table. Additionally, she has a love for writing articles on various tech subjects, showcasing a talent for translating complex concepts into accessible content.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Back to top button