About Pierre Hugues Charbonneau

Pierre-Hugues Charbonneau (nickname P-H) is working for CGI Inc. Canada for the last 10 years as a senior IT consultant. His primary area of expertise is Java EE, middleware & JVM technologies. He is a specialist in production system troubleshooting, root cause analysis, middleware, JVM tuning, scalability and capacity improvement; including internal processes improvement for IT support teams. P-H is the principal author at Java EE Support Patterns.

JVM: How to analyze Thread Dump

This article will teach you how to analyze a JVM Thread Dump and pinpoint the root cause of your problem(s). From my perspective, Thread Dump analysis is the most important skillset to master for any individual involved in Java EE production support. The amount of information that you can derive from Thread Dump snapshots is often much beyond than what you can think of.

My goal is to share with you my knowledge on Thread Dump analysis that I accumulated over the last 10 years e.g. hundreds of Thread Dump analysis cycles with dozens of common problem patterns across many JVM versions and JVM vendors.

Please bookmark this page and stay tuned for weekly articles.
Please also feel free to share this Thread Dump training plan with your work colleagues and friends.

Sounds good, I really need to improve my Thread Dump skills… so where do we start?

What I’m proposing to you is a complete Thread Dump training plan. The following items will be covered. I will also provide you with real life Thread Dump examples that you can study and understand.

1)  Thread Dump overview & fundamentals
2)  Thread Dump generation techniques and available tools
3)  Thread Dump format differences between Sun HotSpot, IBM JRE and Oracle JRockit
4)  Thread Stack Trace explanation and interpretation
5)  Thread Dump analysis and correlation techniques
6)  Thread Dump common problem patterns (Thread race, deadlock, hanging IO calls, garbage collection / OutOfMemoryError problems, infinite looping etc.)
7)  Thread Dump examples via real life case studies

I really hope this Thread Dump analysis training plan will be beneficial for you so please stay tuned for weekly updates and articles!

But what if I still have questions or still struggling to understand these training articles?

Don’t worry and please consider me as your trainer. I strongly encourage you to ask me any question on Thread Dump (remember, there are no stupid questions) so I propose the following options to you for free; simply chose the communication model that you are more comfortable with:

1)  Submit your Thread Dump related question(s) by posting your comment(s) below the article (please feel free to remain Anonymous)
2)  Submit your Thread Dump data to the Root Cause Analysis forum
3)  Email me your Thread Dump related question(s) @phcharbonneau@hotmail.com

Can I send you my Thread Dump data from my production environment / servers?

Yes, please feel free to send me your generated Thread Dump data via email or Root Cause Analysis forum if you wish to discuss the root cause of your problem(s). Real life Thread Dump analysis is always the best way to learn.

I really hope that you will enjoy and share this Thread Dump analysis training plan. I will do my very best to provide you with quality material and answers to any question.

Before going deeper into Thread Dump analysis and problem patterns, it is very important that you understand the fundamentals. The post will cover the basics and allow you to better your JVM and middleware interaction with your Java EE container.

Java VM overview

The Java virtual machine is really the foundation of any Java EE platform. This is where your middleware and applications are deployed and active.

The JVM provides the middleware software and your Java / Java EE program with:

- A runtime environment for your Java / Java EE program (bytecode format)
- Several program features and utilities (IO facilities, data structure, Threads management, security, monitoring etc.)
- Dynamic memory allocation and management via the garbage collector

Your JVM can reside on many OS (Solaris, AIX, Windows etc.) and depending of your physical server specifications, you can install 1…n JVM processes per physical / virtual server.

JVM and Middleware software interactions

Find below a diagram showing you a high level interaction view between the JVM, middleware and application(s).

This is showing you a typical and simple interaction diagram between the JVM, middleware and application. As you can see, the Threads allocation for a standard Java EE application are done mainly between the middleware kernel itself and JVM (there are some exceptions when application itself or some APIs create Threads directly but this is not common and must be done very carefully).

Also, please note that certain Threads are managed internally within the JVM itself such as GC (garbage collection) Threads in order to handle concurrent garbage collections.

Since most of the Thread allocations are done by the Java EE container, it is important that you understand and recognize the Thread Stack Trace and identify it properly from the Thread Dump data. This will allow you to understand quickly the type of request that the Java EE container is attempting to execute.

From a Thread Dump analysis perspective, you will learn how to differentiate between the different Thread Pools found from the JVM and identify the request type.

This last section will provide you with an overview of what is a JVM Thread Dump for the HotSpot VM and the different Threads that you will find. Detail for the IBM VM Thread Dump format will be provided in the part 4.

Please note that you will find the Thread Dump sample used for this article from the root cause analysis forum.

JVM Thread Dump – what is it?

A JVM Thread Dump is a snapshot taken at a given time which provides you with a complete listing of all created Java Threads.

Each individual Java Thread found gives you information such as:

- Thread name; often used by middleware vendors to identify the Thread Id along with its associated Thread Pool name and state (running, stuck etc.)

      – Thread type & priority ex: daemon prio=3 ** middleware softwares typically create their Threads as daemon meaning their Threads are running in background; providing services to its user e.g. your Java EE application **

       - Java Thread ID ex: tid=0x000000011e52a800 ** This is the Java Thread Id obtained via java.lang.Thread.getId() and usually implemented as an auto-incrementing long 1..n**

       - Native Thread ID ex: nid=0x251c** Crucial information as this native Thread Id allows you to correlate for example which Threads from an OS perspective are using the most CPU within your JVM etc. **

       - Java Thread State and detail ex: waiting for monitor entry [0xfffffffea5afb000] java.lang.Thread.State: BLOCKED (on object monitor)
** Allows to quickly learn about Thread state and its potential current blocking condition **

        – Java Thread Stack Trace; this is by far the most important data that you will find from the Thread Dump. This is also where you will spent most of your analysis time since the Java Stack Trace provides you with 90% of the information that you need in order to pinpoint root cause of many problem pattern types as you will learn later in the training sessions

        – Java Heap breakdown; starting with HotSpot VM 1.6, you will also find at the bottom of the Thread Dump snapshot a breakdown of the HotSpot memory spaces utilization such as your Java Heap (YoungGen, OldGen) & PermGen space. This is quite useful when excessive GC is suspected as a possible root cause so you can do out-of-the-box correlation with Thread data / patterns found

Heap
PSYoungGen      total 466944K, used 178734K [0xffffffff45c00000, 0xffffffff70800000, 0xffffffff70800000)
eden space 233472K, 76% used [0xffffffff45c00000,0xffffffff50ab7c50,0xffffffff54000000)
from space 233472K, 0% used [0xffffffff62400000,0xffffffff62400000,0xffffffff70800000)
to   space 233472K, 0% used [0xffffffff54000000,0xffffffff54000000,0xffffffff62400000)
PSOldGen        total 1400832K, used 1400831K [0xfffffffef0400000, 0xffffffff45c00000, 0xffffffff45c00000)
object space 1400832K, 99% used [0xfffffffef0400000,0xffffffff45bfffb8,0xffffffff45c00000)
PSPermGen       total 262144K, used 248475K [0xfffffffed0400000, 0xfffffffee0400000, 0xfffffffef0400000)
object space 262144K, 94% used [0xfffffffed0400000,0xfffffffedf6a6f08,0xfffffffee0400000)

Thread Dump breakdown overview

In order for you to better understand, find below a diagram showing you a visual breakdown of a HotSpot VM Thread Dump and its common Thread Pools found:

As you can there are several pieces of information that you can find from a HotSpot VM Thread Dump. Some of these pieces will be more important than others depending of your problem pattern(problem patterns will be simulated and explained in future articles).

For now, find below a detailed explanation for each Thread Dump section as per our sample HotSpot Thread Dump:

# Full thread dump identifier
This is basically the unique keyword that you will find in your middleware / standalong Java standard output log once you generate a Thread Dump (ex: via kill -3 <PID> for UNIX). This is the beginning of the Thread Dump snapshot data.

Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode):

# Java EE middleware, third party & custom application Threads
This portion is the core of the Thread Dump and where you will typically spend most of your analysis time. The number of Threads found will depend on your middleware software that you use, third party libraries (that might have its own Threads) and your application (if creating any custom Thread, which is generally not a best practice).

In our sample Thread Dump, Weblogic is the middleware used. Starting with Weblogic 9.2, a self-tuning Thread Pool is used with unique identifier “'weblogic.kernel.Default (self-tuning)”

"[STANDBY] ExecuteThread: '414' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=3 tid=0x000000010916a800 nid=0x2613 in Object.wait() [0xfffffffe9edff000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xffffffff27d44de0> (a weblogic.work.ExecuteThread)
        at java.lang.Object.wait(Object.java:485)
        at weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:160)
        - locked <0xffffffff27d44de0> (a weblogic.work.ExecuteThread)
        at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)

# HotSpot VM Thread
This is an internal Thread managed by the HotSpot VM in order to perform internal native operations. Typically you should not worry about this one unless you see high CPU(via Thread Dump & prstat / native Thread id correlation).

"VM Periodic Task Thread" prio=3 tid=0x0000000101238800 nid=0x19 waiting on condition

# HotSpot GC Thread
When using HotSpot parallel GC (quite common these days when using multi physical cores hardware), the HotSpot VM create by default or as per your JVM tuning a certain # of GC Threads. These GC Threads allow the VM to perform its periodic GC cleanups in a parallel manner, leading to an overall reduction of the GC time; at the expense of increased CPU utilization.

"GC task thread#0 (ParallelGC)" prio=3 tid=0x0000000100120000 nid=0x3 runnable
"GC task thread#1 (ParallelGC)" prio=3 tid=0x0000000100131000 nid=0x4 runnable
………………………………………………………………………………………………………………………………………………………………

This is crucial data as well since when facing GC related problems such as excessive GC, memory leaks etc, you will be able to correlate any high CPU observed from the OS / Java process(es) with these Threads using their native id value (nid=0×3). You will learn how to identify and confirm this problem is future articles.

# JNI global references count
JNI (Java Native Interface) global references are basically Object references from the native code to a Java object managed by the Java garbage collector. Its role is to prevent collection of an object that is still in use by native code but technically with no “live” references in the Java code.

It is also important to keep an eye on JNI references in order to detect JNI related leaks. This can happen if you program use JNI directly or using third party tools like monitoring tools which are prone to native memory leaks.

JNI global references: 1925

# Java Heap utilization view
This data was added back to JDK 1 .6 and provides you with a short and fast view of your HotSpot Heap. I find it quite useful when troubleshooting GC related problems along with HIGH CPU since you get both Thread Dump & Java Heap in a single snapshot allowing you to determine (or to rule out) any pressure point in a particular Java Heap memory space along with current Thread computing currently being done at that time. As you can see in our sample Thread Dump, the Java Heap OldGen is maxed out!

Heap
 PSYoungGen      total 466944K, used 178734K [0xffffffff45c00000, 0xffffffff70800000, 0xffffffff70800000)
  eden space 233472K, 76% used [0xffffffff45c00000,0xffffffff50ab7c50,0xffffffff54000000)
  from space 233472K, 0% used [0xffffffff62400000,0xffffffff62400000,0xffffffff70800000)
  to   space 233472K, 0% used [0xffffffff54000000,0xffffffff54000000,0xffffffff62400000)
 PSOldGen        total 1400832K, used 1400831K [0xfffffffef0400000, 0xffffffff45c00000, 0xffffffff45c00000)
  object space 1400832K, 99% used [0xfffffffef0400000,0xffffffff45bfffb8,0xffffffff45c00000)
 PSPermGen       total 262144K, used 248475K [0xfffffffed0400000, 0xfffffffee0400000, 0xfffffffef0400000)
  object space 262144K, 94% used [0xfffffffed0400000,0xfffffffedf6a6f08,0xfffffffee0400000)

I hope this article has helped to understand the basic view of a HotSpot VM Thread Dump.The next article will provide you this same Thread Dump overview and breakdown for the IBM VM.

Please feel free to post any comment or question.

Reference: How to analyze Thread Dump – part 1 How to analyze Thread Dump – Part2: JVM Overview & How to analyze Thread Dump – Part 3: HotSpot VM from our JCG partner Pierre-Hugues Charbonneau at the Java EE Support Patterns & Java Tutorial blog.

Related Whitepaper:

Bulletproof Java Code: A Practical Strategy for Developing Functional, Reliable, and Secure Java Code

Use Java? If you do, you know that Java software can be used to drive application logic of Web services or Web applications. Perhaps you use it for desktop applications? Or, embedded devices? Whatever your use of Java code, functional errors are the enemy!

To combat this enemy, your team might already perform functional testing. Even so, you're taking significant risks if you have not yet implemented a comprehensive team-wide quality management strategy. Such a strategy alleviates reliability, security, and performance problems to ensure that your code is free of functionality errors.Read this article to learn about this simple four-step strategy that is proven to make Java code more reliable, more secure, and easier to maintain.

Get it Now!  

9 Responses to "JVM: How to analyze Thread Dump"

  1. Hoàng Hưng says:

    thanks

    • Pierre-Hugues Charbonneau says:

      Hi,

       

      Please note that I will release the next series shortly
      which will include Thread Dump Stack Trace analysis approach and common problem
      patterns that will be available to share on Java Code Geeks. Thread Dump analysis complexity
      is often due to so many different problem patterns so this should really help any Java
      / Java EE individual involved in either application support or development (load testing
      etc.) to identify such patterns more quickly. This series will also include data correlation
      techniques; especially for problems related to high CPU, excessive garbage
      collection etc.

       

      I’m also looking forward for any feedback and “wish list” of what
      aspect of Thread Dump analysis you want more detail on.

       

      Regards,

      Pierre-Hugues

  2. This “JVM: How to analyze Thread Dump” article of your is ridiculously similar to “How to Analyze Java Thread Dumps” http://www.cubrid.org/blog/dev-platform/how-to-analyze-java-thread-dumps/ published couple months ago by CUBRID developers on their official blog.
    I strongly believe these two should go together hand in hand as they mutually supportive. Great article, by the way!

  3. C Jacob says:

    My application is running in tomcat.
    If the server is running for many days without restarting tomcat, we face a problem as the server CPU usage will go above 100% very frequently and come back to normal immediately. By the time, we take thread dump, it come back to normal. However on continuous observation it is found that the high cpu usage nid is pointing to a line in thread dump

    “Concurrent Mark-Sweep GC Thread” prio=10 tid=0x00007fb3581e3800 nid=0x74f3 runnable

    Few more lines are given below

    “VM Thread” prio=10 tid=0x00007fb35826e800 nid=0x74f4 runnable

    “Gang worker#0 (Parallel GC Threads)” prio=10 tid=0x00007fb358012800 nid=0x74d7 runnable

    “Gang worker#1 (Parallel GC Threads)” prio=10 tid=0x00007fb358014000 nid=0x74d8 runnable

    “Gang worker#2 (Parallel GC Threads)” prio=10 tid=0x00007fb358016000 nid=0x74d9 runnable

    “Gang worker#3 (Parallel GC Threads)” prio=10 tid=0x00007fb358018000 nid=0x74da runnable

    “Gang worker#4 (Parallel GC Threads)” prio=10 tid=0x00007fb358019800 nid=0x74dc runnable

    “Gang worker#5 (Parallel GC Threads)” prio=10 tid=0x00007fb35801b800 nid=0x74dd runnable

    “Gang worker#6 (Parallel GC Threads)” prio=10 tid=0x00007fb35801d800 nid=0x74de runnable

    “Gang worker#7 (Parallel GC Threads)” prio=10 tid=0x00007fb35801f000 nid=0x74df runnable

    “Gang worker#8 (Parallel GC Threads)” prio=10 tid=0x00007fb358021000 nid=0x74e0 runnable

    “Gang worker#9 (Parallel GC Threads)” prio=10 tid=0x00007fb358023000 nid=0x74e1 runnable

    “Gang worker#10 (Parallel GC Threads)” prio=10 tid=0x00007fb358024800 nid=0x74e2 runnable

    “Gang worker#11 (Parallel GC Threads)” prio=10 tid=0x00007fb358026800 nid=0x74e3 runnable

    “Gang worker#12 (Parallel GC Threads)” prio=10 tid=0x00007fb358028800 nid=0x74e4 runnable

    “Gang worker#13 (Parallel GC Threads)” prio=10 tid=0x00007fb35802a000 nid=0x74e5 runnable

    “Gang worker#14 (Parallel GC Threads)” prio=10 tid=0x00007fb35802c000 nid=0x74e6 runnable

    “Gang worker#15 (Parallel GC Threads)” prio=10 tid=0x00007fb35802e000 nid=0x74e7 runnable

    “Concurrent Mark-Sweep GC Thread” prio=10 tid=0x00007fb3581e3800 nid=0x74f3 runnable
    “Gang worker#0 (Parallel CMS Threads)” prio=10 tid=0x00007fb3581df000 nid=0x74f1 runnable

    “Gang worker#1 (Parallel CMS Threads)” prio=10 tid=0x00007fb3581e1000 nid=0x74f2 runnable

    “VM Periodic Task Thread” prio=10 tid=0x00007fb35849c800 nid=0×7501 waiting on condition

    JNI global references: 6075

    Any help….

    • Hi C Jacob,

      It is likely that your problem is related to a major GC collection. You have quite a few GC threads created, I would recommend to verify if the increased CPU is due to GC threads performing excessive garbage collection, typical during a major collection. With 16 GC threads concurrently , this can significantly impact CPU utilization until the memory is fully reclaimed.

      The CMS collector has the tendency to fragment over time, at some point requiring a major collection which can trigger quite a large surge of your CPU utilization.

      One more recommendation, enable -verbose:gc or use JVisualVM to monitor your heap utilization. This will help you correlate the GC process with your CPU utilization spikes.

      Regards,
      P-H

  4. Wallace Roberto says:

    HI, thanks very much.

    I’ve started work with weblogic support , And i have some problems to identify this problems wich can be discovered analyzing thread dump

    att

  5. James says:

    Why do you write that creating custom threads is not best practice?

  6. Hank says:

    System was built on
    J2EE/Struts
    Tomcat Catalina
    IIS

    Get frequently crash(Tomcat) problem. even after input enough RAM recently.

Leave a Reply


+ five = 14



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books