One key concept to understand for a Java VM running on the Linux OS is that Java threads are implemented as native Threads, which results in each thread being a separate Linux process.
Ok thanks for the info but why is this related to prstat?
Well this key concept means that you don’t need a prstat command for Linux. Since each Java VM Thread is implemented as a native Thread, each Thread CPU % can simply be extracted out-of-the-box using the top command.
You still need to generate Thread Dump data of your JVM process in order to correlate with the Linux top command output.
Thanks for this explanation. Now please show me how to do this
Please simply follow the instructions below:
1) Execute the top command (press SHIFT-H to get the Threads toggle view) or use -H option (to show all Threads) and find the PID associated with your affected / high CPU WLS process(es) (remember, many may show up since each Java Thread is implemented as a separate Linux process)
2) Immediately after, generate a few Thread Dump snapshots using kill –3 <PID> of the parent WLS process. Thread Dump provides you with the complete list with associated Stack Trace of each Java Thread within your JVM process
3) Now, convert the PID(s) extracted from the top command output to HEX format
4) The next step is to search from the Thread Dump data for a match nid=<HEX PID>
5) The final step is to analyze the affected Thread(s) and analyze the Stack Trace so you can determine where in the code is the problem (application code, middleware itself, JDK etc.)
Example: top command captured of a Weblogic Server Java Thread running at 40% CPU utilization along with Thread Dump data via kill -3 <PID>
## top output sample
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
...........
22111 userWLS 9 0 86616 84M 26780 S 0.0 40.1 0:00 java
- Decimal to HEX conversion of Java Thread (native Thread) 22111 >> 0x565F
- Now using the HEX value, we can search within the Thread Dump for the following keyword: nid=0x565F
## Thread Dump output sample Thread as per the above search criteria nid=0x565F
"ExecuteThread: '0' for queue: 'default'" daemon prio=1 tid=0x83da550 nid=0x565F waiting on monitor [0x56138000..0x56138870]
at java.util.zip.ZipFile.getEntry(Native Method)
at java.util.zip.ZipFile.getEntry(ZipFile.java:172)
at java.util.jar.JarFile.getEntry(JarFile.java:269)
at java.util.jar.JarFile.getJarEntry(JarFile.java:252)
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:989)
at sun.misc.URLClassPath$JarLoader.findResource(URLClassPath.java:967)
at sun.misc.URLClassPath.findResource(URLClassPath.java:262)
at java.net.URLClassLoader$4.run(URLClassLoader.java:763)
at java.security.AccessController.doPrivileged(AccessController.java:224)
at java.net.URLClassLoader.findResource(URLClassLoader.java:760)
at java.lang.ClassLoader.getResource(ClassLoader.java:444)
at java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:504)
............................................
In the above example, the problem was related to an excessive class loading / IO problem.
As you can see, the approach did allow us to quickly pinpoint high CPU Thread contributor but you will need to spend additional time to analyze the root cause which is now your job.
Need any additional help?
I hope this short tutorial has helped you understand how you can pinpoint high CPU Thread contributors for your JVM running on the Linux OS.
For any question or additional help please simply post a comment or question below this article. You can also email me directly @phcharbonneau@hotmail.com.
12 comments:
This can be very handy while troubleshooting deadlock in Java or debugging infinite loops on java program. Its much easier in Linux and take less time to analyze thread dumps. Thanks for sharing these nice tips.
Javin
How to find if Thread holds lock in Java
Thanks for the great article.
I have a query to you.
As per the instructions given..I have run the top command...noted down few PIDs which are having high CPU %...then I pressed Ctrl+C in my putty window.. Then I typed command...kill -3 ..I am getting following message...
-bash: kill: (29632) - Operation not permitted
Am I getting the above because of lack of permissions on Linux box?
Can I skip the second instruction move to third to identify the troublesome java code?
Thanks in advance for your reply.
Thanks Javatechie,
This is because your Linux user does not have permission to run the kill -3 command againts the Java process. This is critical command to run as it will generate a JVM Thread Dump. Please ask your sys Admin to give you such right or ask him to run the command using the user owner of the Java process.
Thanks.
P-H
Thanks for the quick response.
When u say user owner of the java process, are you referring to the user who had deployed and started the jboss(our application server) server?
thanks
Siva
Exactly, you can also see the owner of the Java process via the Top command. For example, in this article, the user owning the Java process is "userWLS".
Regards,
P-H
Where can I find the "Thread Dump output"? Is it in a file? Java console?
Hi SSH,
Thread Dump output will depend of the Java VM vendor.
JRockit & Sun/Oracle HotSpot: found in the server standard output log (look for keyword Full thread dump...). You can also use JDK jstack utility to generate thread dump. Output will be visible from your SSH/telnet screen/session.
IBM Java VM: Thread Dump generated as separate javacorexyz file.
Regards,
P-H
Hi P-H,
I am using UNIX and having this difficulty while using the steps mentioned above.
1) top -H does not work. So I am getting the PID from "top" command. Converting it into Hex.
2) Generating the Thread Dump using jstack, as I don't want to kill the process.
3) I can't find the HEX PID in the Thread Dump.
How should I analyze the thread dump then?
Thanks,
DB
Hi Divs,
You need to search for "nid" in the thread Dump.
Hi , It is very use full for me ,but your explain little bit briefly means i can understand very well.Thank you.
awesome article
Hi
Currently my application is facing high CPU load. i went through your article and as per that it points to following nid. how can i come to a conclusion why this is happening.
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fa0a0021800 nid=0x18cc runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fa0a0023800 nid=0x18cd runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fa0a0025000 nid=0x18ce runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fa0a0027000 nid=0x18cf runnable
"GC task thread#4 (ParallelGC)" prio=10 tid=0x00007fa0a0029000 nid=0x18d0 runnable
"GC task thread#5 (ParallelGC)" prio=10 tid=0x00007fa0a002b000 nid=0x18d1 runnable
"GC task thread#6 (ParallelGC)" prio=10 tid=0x00007fa0a002d000 nid=0x18d2 runnable
"GC task thread#7 (ParallelGC)" prio=10 tid=0x00007fa0a002f000 nid=0x18d3 runnable
Post a Comment