How to Read a Flame Graph: A Practical Profiling Guide for JVM Developers

Eleftheria DrosopoulouJune 4th, 2026Last Updated: May 30th, 2026

0 911 10 minutes read

Stop guessing where your Java app is spending its time. Here is everything you need to go from zero to actionable insight with async-profiler.

You have a Java service that is slower than it should be. Latency is spiking, CPU is higher than expected, and the thread dump tells you nothing useful. So you reach for a profiler — and suddenly you are staring at a towering wall of colourful stacked rectangles that looks equal parts fascinating and completely impenetrable. Sound familiar?

Flame graphs are, once you understand how to read them, the single most information-dense way to understand where your JVM is spending its time. They were invented by Brendan Gregg at Netflix specifically because traditional profiler output — long lists of percentages, tree views, drill-down menus — was too slow for the fast-moving engineering environment that cloud software demands. A flame graph lets you find the hot path in seconds, not minutes.

In this guide, we will walk through how flame graphs are constructed, how to read them correctly (including some counterintuitive traps), and how to generate them on a live JVM using async-profiler — the tool that has become the standard choice for production JVM profiling. By the end, you will know not just what you are looking at, but what to do about it.

Why async-profiler — Not Just Any Profiler

Before diving into flame graphs themselves, it is worth understanding why async-profiler has become the go-to choice and what makes it fundamentally different from older JVM profilers.

Most traditional Java profilers — including many commercial tools — are built on the JVMTI API (GetAllStackTraces). The problem with this approach is that it can only capture stack traces at safepoints: specific points in the JVM’s execution where threads are paused for garbage collection or deoptimisation. This introduces what is known as safepoint bias — your profiler systematically under-reports code that spends time in tight loops or other code paths that do not frequently reach a safepoint. In other words, it lies to you about your most performance-critical code.

The safepoint bias problem

Research into JVM profiling bias shows that a 5ms safepoint every 100ms represents roughly 5% steady-state overhead — and that overhead gets worse under load, precisely when you most need accurate profiling data. Deep call stacks from frameworks like Spring compound the issue significantly.

async-profiler sidesteps this entirely. Instead of relying on JVMTI, it uses the AsyncGetCallTrace API provided by HotSpot, combined with Linux perf_events, to interrupt threads at arbitrary points — not just safepoints. As a result, it captures an accurate picture of what code is actually running, including native code, JVM internals, and kernel functions that JVMTI-based profilers never show you at all.

Getting Started: Install and Run in Under 5 Minutes

async-profiler ships as a pre-compiled tarball for Linux and macOS — no compilation or build tooling required. Download the latest release from the GitHub releases page, extract it, and you are ready to go.

# Download and extract (Linux x64 example — check releases for your platform)
wget https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-x64.tar.gz
tar -xzf async-profiler-3.0-linux-x64.tar.gz
cd async-profiler-3.0-linux-x64

# Profile a running JVM for 30 seconds, output an interactive HTML flame graph
# Replace 12345 with your actual Java process PID (use: ps aux | grep java)
./asprof -d 30 -f /tmp/cpu-flame.html 12345

# Open the result in your browser
open /tmp/cpu-flame.html

That single command is often all you need for a first pass. However, for more accurate results from the JIT compiler, add these two flags when starting your Java process. They tell HotSpot to generate more detailed debug information at non-safepoints:

java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -jar myapp.jar

On Linux, you may also need to allow perf events for non-root users. If you see a permission error, run:

# Allow perf events (add to /etc/sysctl.conf for persistence)
echo 1 | sudo tee /proc/sys/kernel/perf_event_paranoid
echo 0 | sudo tee /proc/sys/kernel/kptr_restrict

Profiling inside Docker

If your application runs in a container — which, today, it almost certainly does — profiling requires a couple of extra steps. The container needs SYS_PTRACE capability and access to perf events:

# Copy async-profiler into your running container
docker cp async-profiler-3.0-linux-x64 :/tmp/asprof

# Run the profiler inside the container (attach to PID 1 or your Java process)
docker exec  /tmp/asprof/asprof -d 30 -f /tmp/profile.html 1

# Copy the result out
docker cp :/tmp/profile.html ./profile.html

How to Actually Read a Flame Graph

This is where most tutorials either oversimplify or confuse. Let us be precise, because the flame graph layout has a few properties that catch developers off guard the first time.

The axes are not what you might expect

The most common misreading of a flame graph is treating the X axis as time. It is not. The X axis represents an alphabetically sorted aggregation of all collected stack traces. Width is proportional to the number of samples that included that frame — not the order in which things happened. Two methods running at different times but equally often will appear equally wide, even if one ran at 9am and the other at 9pm.

The Y axis is call depth. Frames at the bottom of the graph are closest to main() or a thread’s entry point. Frames at the very top are the innermost — the code that was actually running on the CPU when the sample was taken.

The golden rule: look at the top, and look at the wide
The widest frames at the top of the graph (or plateau regions) are your hotspots — the code spending the most CPU time without calling anything else. That is where your optimisation effort should start. A wide frame at the bottom just means it is a common ancestor for many different code paths, not necessarily that it is slow.

Colour in async-profiler’s flame graphs

async-profiler uses colour to encode origin, not importance. This is critical to know so that you read the graph correctly rather than assuming the most visually striking colour is the problem:

Colour	What it means	Examples
🟩 Green	Java methods	Your application code, Spring Framework, Hibernate
🟨 Yellow	JVM code (C++)	JIT compilation, GC internals, class loading
🟥 Red	Native methods	JDBC drivers, JNI calls, native libraries
🟧 Orange	Linux kernel	System calls, I/O, network stack, context switches

In practice, a healthy Spring Boot application will show a sea of green at the bottom (the framework bootstrap and request handling scaffolding) with narrower columns of application-specific green rising above it. Wide red or orange regions near the top of a stack often indicate expensive I/O or native library calls worth investigating.

An annotated flame graph

The interactive diagram below shows a representative flame graph from a Spring Boot application handling HTTP requests. Hover over any frame for details, and click any frame to zoom into that subtree — just as you would in a real async-profiler HTML output.

Annotated CPU flame graph — Spring Boot API (simulated)

X axis = alphabetically sorted sample aggregation (not time). Width = proportion of total samples. Y axis = call depth. Top frames = actual CPU work.

Choosing the Right Profiling Mode

One of the most common mistakes developers make when starting with async-profiler is defaulting to CPU mode for every investigation. In reality, choosing the wrong mode can lead you to spend hours optimising something that is not actually the bottleneck. Consequently, understanding when to use each mode is just as important as knowing how to run the tool.

CPU mode (-e cpu)

Samples threads that are actively consuming CPU cycles. Use this when CPU utilisation is genuinely high. Will miss time spent blocked on I/O or locks.

Best for: compute-heavy work

Wall-clock mode (-e wall)

Samples all threads regardless of CPU state — including those blocked on I/O, sleeping, or waiting on locks. Best starting point for latency issues in distributed systems.

Best for: latency, I/O, DB calls

Allocation mode (-e alloc)

Captures stack traces at every object allocation. Invaluable for finding allocation hot paths that stress the GC. The top frame is the allocated class.

Best for: GC pressure, memory leaks

Lock mode (-e lock)

Profiles contended monitor/lock attempts. The top frame shows the lock class; the counter is nanoseconds spent waiting. Essential for diagnosing thread contention.

Best for: thread contention

Start with wall-clock mode for distributed systems

If your application talks to databases, message queues, or external APIs, most of its latency will not show up in CPU mode at all — because threads are blocked, not running. Wall-clock mode captures everything: time waiting on the database, time blocked on a lock, time in the GC. For most modern Spring Boot applications, this is the right first choice. Switch to CPU mode once you have confirmed that CPU consumption itself is the problem.

Profiling modes at a glance

When to use each async-profiler mode

Illustrative comparison of time captured per mode for a typical REST API with DB calls

Five Flame Graph Patterns and What They Mean

Once you can read a flame graph, the next skill is pattern recognition — knowing what to look for. Over time, certain shapes and structures start to jump out as clear signals.

1. The plateau: your number-one target

A wide, flat top — a frame that spans a large horizontal segment with no children above it — is a method spending CPU time without delegating to anything else. This is almost always the hotspot you want to fix first. Wide plateaus at the top of a CPU flame graph are the unmissable signal that async-profiler was designed to surface.

2. Tall narrow towers: deep call chains

A very tall, narrow column indicates a deep call chain where each frame calls exactly one other thing. By itself this is not alarming — it might just be a Spring interceptor chain or a deeply nested framework initialisation path. However, if the tower is wide and tall, it means a substantial portion of your samples are going through that entire chain, which is worth tracing back to root.

3. Unexpected yellow or orange regions

Significant amounts of JVM-internal (yellow) or kernel (orange) frames at the top of your stacks often indicate you are hitting infrastructure limits: excessive GC pressure (yellow/JVM frames), lots of system calls for I/O (orange), or memory pressure causing page faults. If your application logic looks lean but performance is poor, look here.

4. Wide bases of framework code

In a Spring Boot application you will typically see a wide base of green framework frames — DispatcherServlet, ThreadPoolExecutor, AbstractHandlerMethodAdapter — spanning most of the width. This is normal and expected. The interesting work is the application-specific green frames rising above the framework base. Only worry about framework frames if they are also wide at the top.

5. Allocation mode: the serialisation surprise

In allocation mode, JSON serialisation libraries (Jackson, Gson) and string formatting operations almost always appear prominently. This is because they create many short-lived objects. Most of the time this is fine — the GC handles it efficiently. However, if you are seeing frequent GC pauses, allocation mode will show you exactly which code paths are responsible, and you can decide whether object pooling, more efficient serialisers, or structural code changes are warranted.

async-profiler CPU overhead by sampling frequency

Flame Graph — Approximate overhead on a Spring Boot app with 50 active threads — measured at different Hz settings

A Practical Investigation Workflow

Knowing the theory is one thing — having a repeatable workflow for an actual production incident is another. Here is a battle-tested sequence that experienced JVM engineers tend to follow.

Start with wall-clock mode for 30–60 seconds under realistic load. This gives you the broadest view of where time is actually going, regardless of whether the culprit is CPU, I/O, or concurrency. Look at the resulting flame graph and ask: is the wide top-level work happening in framework code, application code, or native/kernel layers?

If application code is dominant, switch to CPU mode for a more focused picture of computational hotspots. If you are seeing unexpected yellow (JVM internals), consider running allocation mode to check for GC pressure. And if your application under load shows inconsistent latency even when CPU is not high, lock mode is your next stop — contended synchronisation is a classic source of latency spikes that CPU mode will completely miss.

# Step 1: Wall-clock overview (works for all workloads)
./asprof -e wall -t -d 60 -f /tmp/wall.html <pid>

# Step 2: CPU hotspot analysis (only if CPU is genuinely elevated)
./asprof -e cpu -d 30 -f /tmp/cpu.html <pid>

# Step 3: Allocation pressure (if GC pauses are frequent)
./asprof -e alloc -d 30 -f /tmp/alloc.html <pid>

# Step 4: Lock contention (if latency is high but CPU is not)
./asprof -e lock -d 30 -f /tmp/lock.html <pid>

# Optional: output JFR format for IntelliJ or Java Mission Control
./asprof -e cpu -o jfr -f /tmp/profile.jfr <pid>

IntelliJ IDEA Ultimate (2023.1+) bundles async-profiler natively. You can profile any run or debug configuration directly from the IDE: open Run → Edit Configurations → Profiler tab → select CPU or allocation mode → click “Run with Profiler”. The flame graph opens inline with source-level navigation — clicking any frame jumps straight to the corresponding code. This is by far the most convenient way to profile during development.

Common Pitfalls to Avoid

Mistake	What it looks like	Fix
Using CPU mode for I/O-bound apps	Flame graph looks thin; everything seems fast, but latency is high	Switch to wall-clock mode first
Profiling without `-XX:+DebugNonSafepoints`	JIT-compiled methods appear as `[unknown]` frames	Add the flag when starting the JVM
Profiling without realistic load	Results show idle threads and GC; nothing meaningful	Run a load test (k6, wrk, JMeter) before starting the profiler
Treating the X axis as time	Thinking left-side frames ran first and right-side frames ran later	Remember: X is alphabetical aggregation, not temporal order
Assuming GC-heavy frames means slow application code	Large yellow JVM GC frames near the top	Switch to allocation mode to find the root cause in application code

Always profile under production-like load
A flame graph from an idle application is nearly useless. The JIT compiler will not have warmed up the hot paths, many threads will be parked, and the profiles you collect will not reflect real usage patterns at all. Always run your profiling session while a realistic load test is in progress — otherwise you risk optimising code that is never a bottleneck in practice.

What We Learned

Flame graphs are an aggregated visualisation of stack trace samples, where the X axis represents sorted sample counts (not time) and the Y axis represents call depth. The widest frames at the very top of a flame graph are your actual hotspots — the code spending the most CPU time without delegating elsewhere. async-profiler is the right tool to generate these on JVM applications because, unlike JVMTI-based profilers, it uses AsyncGetCallTrace and Linux perf_events to sample threads at arbitrary points and thereby avoids safepoint bias entirely.

Colour encodes origin: green is Java code, yellow is JVM internals, red is native, and orange is Linux kernel. Choosing the right profiling mode matters as much as reading the output: wall-clock mode captures all thread time regardless of blocking state and is the correct starting point for latency investigations in distributed systems, while CPU mode is only suitable when CPU utilisation is genuinely elevated. Running the profiler without production-realistic load, without -XX:+DebugNonSafepoints, or misreading the X axis as time are the most common ways developers get misleading results.

How to Read a Flame Graph: A Practical Profiling Guide for JVM Developers

Why async-profiler — Not Just Any Profiler

Getting Started: Install and Run in Under 5 Minutes

Profiling inside Docker

How to Actually Read a Flame Graph

The axes are not what you might expect

Colour in async-profiler’s flame graphs

An annotated flame graph

Choosing the Right Profiling Mode

Profiling modes at a glance

Five Flame Graph Patterns and What They Mean

1. The plateau: your number-one target

2. Tall narrow towers: deep call chains

3. Unexpected yellow or orange regions

4. Wide bases of framework code

5. Allocation mode: the serialisation surprise

A Practical Investigation Workflow

Common Pitfalls to Avoid

What We Learned

Thank you!

Eleftheria Drosopoulou

Thank you!

Why async-profiler — Not Just Any Profiler

Getting Started: Install and Run in Under 5 Minutes

Profiling inside Docker

How to Actually Read a Flame Graph

The axes are not what you might expect

Colour in async-profiler’s flame graphs

An annotated flame graph

Choosing the Right Profiling Mode

Profiling modes at a glance

Five Flame Graph Patterns and What They Mean

1. The plateau: your number-one target

2. Tall narrow towers: deep call chains

3. Unexpected yellow or orange regions

4. Wide bases of framework code

5. Allocation mode: the serialisation surprise

A Practical Investigation Workflow

Common Pitfalls to Avoid

What We Learned

Thank you!

Related Articles

Thank you!