/ HashMap.get High CPU – Case Study ~ Java EE Support Patterns

2.29.2012

HashMap.get High CPU – Case Study

This article will describe the complete root cause analysis and solution of a HashMap High CPU problem (infinite looping) affecting a Weblogic 10.0 environment running on the Java HotSpot VM 1.5.

This case study will again demonstrate this importance of mastering Thread Dump analysis skill and CPU correlation techniques such as Solaris prstat.

Environment specifications

-        Java EE server: Oracle Weblogic Portal 10.0
-        Middleware OS: Solaris 10
-        Java VM: Java HotSpot VM 1.5
-        Platform type: Portal application

Monitoring and troubleshooting tools

-        JVM Thread Dump (HotSpot format)
-        Solaris prstat (CPU contributors analysis)

Problem overview

-        Problem type: High CPU observed from our Weblogic production environment

A high CPU problem was observed from our Solaris physical servers hosting a Weblogic Portal 10 environment. Users also reporting major slowdown of the portal application.

Gathering and validation of facts

As usual, a Java EE problem investigation requires gathering of technical and non-technical facts so we can either derived other facts and/or conclude on the root cause. Before applying a corrective measure, the facts below were verified in order to conclude on the root cause:

·        What is the client impact? HIGH
·        Recent change of the affected platform? No
·        Any recent traffic increase to the affected platform? Yes
·        How does this high CPU manifest itself?  A sudden CPU increase was observed and is not going down; even after load goes down e.g. near zero level.
·        Did an Oracle OSB recycle resolve the problem? Yes, but problem is returning after few hours or few days (unpredictable pattern)

-        Conclusion #1: The high CPU problem appears to be intermittent vs. pure correlation with load
-        Conclusion #2: Since high CPU remains after load goes down, this typically indicates either the presence of some infinite looping or heavy processing Threads

Solaris CPU analysis using prstat

Solaris prstat is a powerful OS command allowing you to obtain the CPU per process but more importantly CPU per Thread within a process. As you can see below from our case study, the CPU utilization was confirmed to go up as high as 100% utilization (saturation level).

## PRSTAT (CPU per Java Thread analysis)
prstat -L -p 8223 1 1

PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME      CPU  PROCESS/LWPID
8223 bea10      2809M 2592M sleep   59    0  14:52:59        38.6% java/494
8223 bea10      2809M 2592M sleep   57    0   12:28:05       22.3% java/325
8223 bea10      2809M 2592M sleep   59    0   11:52:02       28.3% java/412
8223 bea10      2809M 2592M sleep   59    0   5:50:00         0.3% java/84
8223 bea10      2809M 2592M sleep   58    0   2:27:20         0.2% java/43
8223 bea10      2809M 2592M sleep   59    0   1:39:42         0.2% java/41287
8223 bea10      2809M 2592M sleep   59    0   4:41:44         0.2% java/30503
8223 bea10      2809M 2592M sleep   59    0   5:58:32         0.2% java/36116
……………………………………………………………………………………

As you can see from above data, 3 Java Threads were found using together close to 100% of the CPU utilization.

For our root cause analysis, we did focus on Thread #494 (decimal format) corresponding to 0x1ee (HEXA format).

Thread Dump analysis and PRSTAT correlation

Once the culprit Threads were identified, the next step was to correlate this data with the Thread Dump data (which was captured exactly at the same time as prstat).

A quick search within the generated Thread Dump file did reveal the Thread Stack Trace (Weblogic Stuck Thread #125) for 0x1ee as per below.

"[STUCK] ExecuteThread: '125' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=1 tid=0x014c5030 nid=0x1ee runnable [0x536fb000..0x536ffc70]
       at java.util.HashMap.get(HashMap.java:346)
       at org.apache.axis.encoding.TypeMappingImpl.getClassForQName(TypeMappingImpl.java:715)
       at org.apache.axis.encoding.TypeMappingDelegate.getClassForQName(TypeMappingDelegate.java:170)
       at org.apache.axis.encoding.TypeMappingDelegate.getClassForQName(TypeMappingDelegate.java:160)
       at org.apache.axis.encoding.TypeMappingImpl.getDeserializer(TypeMappingImpl.java:454)
       at org.apache.axis.encoding.TypeMappingDelegate.getDeserializer(TypeMappingDelegate.java:108)
       at org.apache.axis.encoding.TypeMappingDelegate.getDeserializer(TypeMappingDelegate.java:102)
       at org.apache.axis.encoding.DeserializationContext.getDeserializer(DeserializationContext.java:457)
       at org.apache.axis.encoding.DeserializationContext.getDeserializerForType(DeserializationContext.java:547)
       at org.apache.axis.encoding.ser.BeanDeserializer.getDeserializer(BeanDeserializer.java:514)
       at org.apache.axis.encoding.ser.BeanDeserializer.onStartChild(BeanDeserializer.java:286)
       at org.apache.axis.encoding.DeserializationContext.startElement(DeserializationContext.java:1035)
       at org.apache.axis.message.SAX2EventRecorder.replay(SAX2EventRecorder.java:165)
       at org.apache.axis.message.MessageElement.publishToHandler(MessageElement.java:1141)
       at org.apache.axis.message.RPCElement.deserialize(RPCElement.java:236)
       at org.apache.axis.message.RPCElement.getParams(RPCElement.java:384)
       at org.apache.axis.client.Call.invoke(Call.java:2467)
       at org.apache.axis.client.Call.invoke(Call.java:2366)
       at org.apache.axis.client.Call.invoke(Call.java:1812)
…………………………………………………………………………………………………………………………………………

Thread Dump analysis – HashMap.get() infinite loop condition!

As you can see from the above Thread Stack Trace, the Thread is currently stuck in an infinite loop over a java.util.HashMap that originates from the Apache Axis TypeMappingImpl Java class.

This finding was quite revealing. The 2 others Threads using high CPU also did reveal infinite looping condition within the same Apache Axis HashMap Object.

Root cause: non Thread safe HashMap in Apache Axis 1.4

Additional research did reveal this known defect affecting Apache Axis 1.4; which is the version that our application was using.

As you may already know, usage of non Thread safe / non synchronized HashMap under concurrent Threads condition is very dangerous and can easily lead to internal HashMap index corruption and / or infinite looping. This is also a golden rule for any middleware software such as Oracle Weblogic, IBM WAS, Red Hat JBoss which rely heavily on HashMap data structures from various Java EE and caching services.

Such best practice is also applicable for any Open Source third party API such as Apache Axis.

The most common solution is to use the ConcurrentHashMap data structure which is designed for that type of concurrent Thread execution context.

Solution

Our team did apply the proposed patch from Apache (synchronize the non Thread safe HashMap) which did resolve the problem. We are also currently looking at upgrading our application to a  newer version of Apache Axis.

Conclusion

I hope this case study has helped you understand how to pinpoint the root cause of high CPU Threads and the importance of proper Thread safe data structure for high concurrent Thread / processing applications.

Please don’t hesitate to post any comment or question.

Find office supplies promo codes to save money for your department's bottom line, so you can spend more on the latest software.

4 comments:

Have you ever tried JVisualVM or YourKit? They will enable the same analysis with more ease and are cross-platform.

Hi Christia and thanks for your comments,

Yes, JVisualVM with CPU sampling will provide you with similar CPU breakdown. I'm actually planning to release a few articles on JVisualVM.

However, I always highly encourage technical individuals to learn how to do the same using native approaches. There are scenarios where JVisualVM will hang (remote JMX connection will hang under some scenarios such as excessive GC or very high CPU etc.), especially for big production systems with large Java Heaps & # of threads so in those scenarios you will need to rely on native Thread Dump and native CPU per Thread OS commands for analysis.

Bottom line, my recommendation is:
Use your preferred tool when you can such as JVisualVM.
Learn how to fall back on native approaches when required.

Regards,
P-H

Hi Pierre,

How do you say the thread 494 is stuck in an infinite loop ?

Thanks,
Shan

Hi Shan,

The idea is to monitor the thread and generate a few Thread Dump snapshots. After about 15 minutes or less, you will then notice that the offending thread is always executing the same execution path, along with the PRSTAT showing the same sustained CPU utilization from that offending thread.

Regards,
P-H

Post a Comment