/ Java EE Support Patterns

7.26.2013

Plumbr 3.0 – An evolutive approach for Java memory leaks

Most Java developers are familiar with the concept of “memory leaks”. In the Java world, it essentially means the constant expansion of a particular memory space such as the Java heap, PermGen & Metaspace (Java 8) or the native memory. Such leaks are often the result of Java code problems from your applications, third part API’s or even inside the Java EE container and JDK code base. The level of complexity to pinpoint and resolve these types of problems can vary from medium to very high.

The good news is that the JVM has evolves into a great piece of engineering. The GC algorithms, troubleshooting and runtime data exposure capabilities of the JVM are now quite mature. We have access to most critical runtime data via the JMX API. All JVM vendors also provide JVM Thread Dump & Heap Dump generation techniques. Finally, we also have access to a great arsenal of tools and Java profilers to monitor the JVM internals and analyze memory dumps such as JVisualVM, Eclipse MAT or other commercial products.

That being said, even with the help of all these tools, the analysis process of Java memory leaks remains quite challenging. The skillset requirement for individuals performing such analysis is also high since it requires proper knowledge of the Java heap, troubleshooting and runtime analysis techniques.

I recently had the chance to experiment with the latest version of a Java memory leak analyzer tool that I’m sure you heard about: Plumbr. Plumbr’s approach can be resumed as this: instead of analyzing the memory dumps “after the fact” or following an OutOfMemoryError condition, why not writing a program that keeps tract of the Java objects and detect precisely memory leak suspects at runtime…and with live production traffic?

This article will share the positive experience I had with this product while experimenting and “engineering” a Java heap memory leak with the latest version of WildFly (formerly known as JBoss AS). I will also share my personal tips on how to improve your troubleshooting experience with Plumbr by combining additional tools or “synergists”. A complete Plumbr tutorial video will also be available from my YouTube channel in the near future.

Plumbr has also the capabilities to identify class loader related memory leaks affecting the PermGen space of the HotSpot JVM. I may publish more articles in the future addressing this particular type of leak with the help of Plumbr.

Memory leak simulator and environment specifications

In order to give Plumbr some challenges, the following memory leak was engineered:

  • A JAX-RS (REST) Web Service was created and exposed via the jvmleak URI as per below attributes.
    @GET
    @Path("/jvmleak")
    @Produces(MediaType.APPLICATION_JSON)
    public Integer jvmLeak() {}

Each invocation of jvmleak is performing the following logic:

1.     Allocate a high amount of short-lived objects (no reference).
2.     Allocate a small amount of long-lived objects (normal or hard references) to a static ConcurrentHashMap data structure.
3.     Returns the current count of the “leaking” ConcurrentHashMap.

We also created 3 extra Java classes:

  • JVMMemoryAllocator. This class is responsible to perform the short-lived and long-lived memory allocations.
  • ShortLivedObj. This class represents a short-lived object with no reference.
  • LongLivedObj. This class represents a long-lived object with hard references.
You will find below the environment specifications and software’s used to create and execute this Java heap memory leak simulator.

Plumbr download and installation

Plumbr is packaged as a “trialware”. This means that you can download the full version for free and determine if your application(s) contain memory leak(s). The location of the memory leak (code level) will require your or your client to purchase a Plumbr license plan as per your needs.

I recommend that you first install Plumbr to a development environment prior to production. Plumbr is a lightweight agent but extra due diligence is always recommended in order to reduce risk for your production environment.

  1. Login / register at https://portal.plumbr.eu/. This will allow you to download the Plumbr client.

  1. Unzip the archive at a location of your choice (local environment, development server etc.).
  2. If your Java application is running on Windows OS, you can simply double click the plumbr.jar located under the Plumbr root directory or use the following command: java -jar plumbr.jar. This will launch the Plumbr client and allow you to “attach” it to a running JVM process. Another approach is to simply edit your application JVM start-up settings and add the following argument: -javaagent:<PLUMBR ROOT DIRECTORY>plumbr.jar. I personally prefer the second approach.
Now back to our memory leak simulator, you will notice from the snapshot below that we added the Plumbr agent reference to the WildFly JVM start-up arguments inside the IDE directly.


After the restart of your JVM process, you will notice a similar message at the beginning of the standard output log:

************************************************************
* Plumbr (B1422) is attached.                              *
* Running with JRE from …\jdk1.7.0_09\jre                  *
* Your license is valid                                    *
*   for 20 more days                                       *
*   for Pierre-Hugues Charbonneau.                         *
* Plumbr agent is connected to the Plumbr Portal.          *
* Open up https://portal.plumbr.eu to follow its progress. *
************************************************************

This is telling us that Plumbr is now active and connected to the Plumbr Portal or reporting server. Now go to Plumbr portal and verify if it is properly receiving data from your JVM process.




Memory leak simulator warm-up

Now that the Plumbr agent is connected, it is now time to fire our memory leak simulator. In order to simulate an interesting memory leak, we will use the following load testing specifications:

  • JMeter will be configured to execute our REST Web Service URI jvmleak.
  • Our JMeter thread group will be configured to run “forever” with 20 concurrent threads.
  • JvisualVM and the Plumbr Portal will both be used to monitor the Java heap utilization along with the internal Plumbr MBeans.

Please note that launching JVisualVM is optional but I definitely recommend that you use it in conjunction with the MBeans browser plugin (add-on for JVisualVM). This will allow you to track the status and progress of the Plumbr agent directly from JVisualVM. It is important to note you will need to generate load and get several Full GC iterations before Plumbr is able to identify true memory leaks. The elapsed time will also depend of the activity (load) and nature of your application.



Since we have not fire JMeter at this point, you can see that Plumbr is still in its WARMUP phase with 10% completion.

Memory leak simulator execution

Generating an adequate amount of load to your application will allow Plumbr to complete its WARMUP phase fairly quickly. We can now see from our simulator that Plumbr is in RUNNING state and tracking several thousands of objects already. However, we had no Full GC at this point so we will need to wait until a few iterations are performed. This will allow the internal Plumbr computing engine to do its assessment and potentially narrow down the location of a memory leak.




You can notice that number of objects tracked by Plumbr will vary in between Full GC iterations. This is because Plumbr will typically only focus on objects that are “suspect” such as objects able to survive major collections (Full GC).

After a few major collections, Plumbr was able to detect a potential memory leak with 98% confidence. We are almost there…It is important to note that no OutOfMemoryError was thrown. The leak can also be observed from JVisualVM but it is not that obvious such early in the load test life cycle.




Finally, after more major collections, Plumbr was able to detect a memory leak with a confidence level of 100%. As you can see from the snapshots, JVisualVM did allow us to easily monitor the Java heap and Plumbr MBeans status progressions.




You will also notice the following message in the standard output logs.
15:22:41,270 INFO  [stdout] (Plumbr thread - 176) ******************************************************************************
15:22:41,271 INFO  [stdout] (Plumbr thread - 176) * Plumbr has found a memory leak.                                             *
15:22:41,271 INFO  [stdout] (Plumbr thread - 176) * Plumbr will now stop tracking to leave more resources for your application. *
15:22:41,271 INFO  [stdout] (Plumbr thread - 176) * You can find the detailed memory leak report by opening this url:           *
15:22:41,272 INFO  [stdout] (Plumbr thread - 176) * https://portal.plumbr.eu/report/5515                                        *
15:22:41,272 INFO  [stdout] (Plumbr thread - 176) ******************************************************************************

At this point we have to wait for Plumbr to analyze the location and type of leak found. Once the analysis process is completed, a report will be generated and available from the Plumbr Portal.








Plumbr memory leak report

Plumbr 3.0 provides centralized memory leak report capabilities. Each time Plumbr is able to detect a new memory leak, a report is created and uploaded to the reporting Portal. We can see below that a new report was generated from our fresh memory leak simulator execution.



The report is the ultimate outcome and deliverable of this exercise: a precise location, at code level, of where the memory leak is located.




We can see from the above report that Plumbr was perfectly able to identify our memory leak. You can notice that the leak report is split into 4 main sections:

  • The header contains the number of leaks found along with the detail on the memory footprint occupied by the leaking objects vs. the total Java heap capacity.
  • Leaking object type: This represents the object type of the instances accumulating over time in between major collections.
  • Leak creation location: This represents the caller and Java class where the leaking objects are created.
  • Memory references: This represents the object reference tree where the leaking objects are still referenced or held.

In our case, Plumbr was able to identify the exact location of our engineered memory leak.

  • LongLivedObj is indeed the expected leaking object type.
  • JVMMemoryAllocator is definitely the Java class where the leak is created.
  • ConcurrentHashMap is the implemented "referencer" or container.
Interestingly, Plumbr was also able to identify another potential leak inside WildFly 8 Alpha 3 itself…

Now please keep in mind that following the creation of the report, Plumbr will not magically resolve the memory leak for you. You will need to spend some time and review the affected code, especially the location where the leak is created and understand why the objects are still referenced. Plumbr will do 50%+ of the work. You will need to take care of the other half and determine the proper code fix required, upgrade or patch of the offending API(s).

Final words

I hope that you appreciated this tutorial and review of Plumbr 3.0 while simulating a true Java heap memory leak. This product is looking quite promising and I definitely recommend that you give Plumbr a try in your environment; especially if you suspect that you are dealing with memory leak problems affecting your production environment stability. Thumbs up to the Plumbr core team!

I’m looking forward for your comments and please share your experience with Plumbr.

7.09.2013

Java Just-In-Time compilation: more than just a buzzword

A recent Java production performance problem forced me to revisit and truly appreciate the Java VM Just-In-Time (JIT) compiler. Most Java developers and support individuals have heard of this JVM run time performance optimization but how many truly understand and appreciate its benefits?

This article will share with you a troubleshooting exercise I was involved with following the addition of a new virtual server (capacity improvement and horizontal scaling project).

For a more in-depth coverage of JIT, I recommend the following articles:

## Just-in-time compilation

## The Java HotSpot Performance Engine Architecture

## Understanding Just-In-Time Compilation and Optimization

## How the JIT compiler optimizes code

JIT compilation overview

The JIT compilation is essentially a process that improves the performance of your Java applications at run time.

The diagram below illustrates the different JVM layers and interaction. It describes the following high level process:

  1. Java source files are compiled by the Java compiler into platform independent bytecode or Java class files.
  2. After your fire your Java application, the JVM loads the compiled classes at run time and execute the proper computation semantic via the Java interpreter.
  3. When JIT is enabled, the JVM will analyze the Java application method calls and compile the bytecode (after some internal thresholds are reached) into native, more efficient, machine code. The JIT process is normally prioritized by the busiest method calls first.
  4. Once such method call is compiled into machine code, the JVM executes it directly instead of “interpreting” it.
  5. The above process leads to improved run time performance over time.


Case study

Now here is the background on the project I was referring to earlier. The primary goal was to add a new IBM P7 AIX virtual server (LPAR) to the production environment in order to improve the platform’s capacity. Find below the specifications of the platform itself:

  • Java EE server: IBM WAS 6.1.0.37 & IBM WCC 7.0.1
  • OS: AIX 6.1
  • JDK: IBM J2RE 1.5.0 (SR12 FP3 +IZ94331) @64-bit
  • RDBMS: Oracle 10g
  • Platform type: Middle tier and batch processing
In order to achieve the existing application performance levels, the exact same hardware specifications were purchased. The AIX OS version and other IBM software’s were also installed using the same version as per existing production.

The following items (check list) were all verified in order to guarantee the same performance level of the application:

  • Hardware specifications (# CPU cores, physical RAM, SAN…).
  • OS version and patch level; including AIX kernel parameters.
  • IBM WAS & IBM WCC version, patch level; including tuning parameters.
  • IBM JRE version, patch level and tuning parameters (start-up arguments, Java heap size…).
  • The network connectivity and performance were also assessed properly.
After the new production server build was completed, functional testing was performed which did also confirm a proper behaviour of the online and batch applications.

However, a major performance problem was detected on the very first day of its production operation. You will find below a summary matrix of the performance problems observed.

Production server
Operation elapsed time
Volume processed (# orders)
CPU %
(average)
Middleware health
Existing server
10 hours
250 000 (baseline)
20%
healthy
*New* server
10 hours
50 000   -500%
80%  +400%
High thread utilization

As you can see from the above view, the performance results were quite disastrous on the first production day. Not only much less orders were processed by the new production server but the physical resource utilization such as CPU % was much higher compared with the existing production servers.

The situation was quite puzzling given the amount of time spent ensuring that the new server was built exactly like the existing ones. At that point, another core team was engaged in order to perform extra troubleshooting and identify the source of the performance problem.

Troubleshooting: searching for the culprit...

The troubleshooting team was split in 2 in order to focus on the items below:

  • Identify the source of CPU % from the IBM WAS container and compare the CPU footprint with the existing production server.
  • Perform more data and file compares between the existing and new production server.
In order to understand the source of the CPU %, we did perform an AIX CPU per Thread analysis from the IBM JVM running IBM WAS and IBM WCC. As you can see from the screenshot below, many threads were found using between 5-20% each. The same analysis performed on the existing production server did reveal fewer # of threads with CPU footprint always around 5%.



Conclusion: the same type of business process was using 3-4 times more CPU vs. the existing production server.

In order to understand the type of processing performed, JVM thread dumps were captured at the same time of the CPU per Thread data. Now the first thing that we realized after reviewing the JVM thread dump (Java core) is that JIT was indeed disabled! The problem was also confirmed by running the java –version command from the running JVM processes.


This finding was quite major, especially given that JIT was enabled on the existing production servers. Around the same time, the other team responsible of comparing the servers did finally find differences between the environment variables of the AIX user used to start the application. Such compare exercise was missed from the earlier gap analysis. What they found is that the new AIX production server had the following extra entry:

JAVA_COMPILER=NONE

As per the IBM documentation, adding such environment variable is one of the ways to disable JIT.

Complex root cause analysis, simple solution

In order to understand the impact of disabling JIT in our environment, you have to understand its implication. Disabling JIT essentially means that the entire JVM is now running in interpretation mode. For our application, running in full interpretation mode not only reduces the application throughput significantly but also increases the pressure point on the server CPU utilization since each request/thread takes 3-4 more times CPU than a request executed with JIT (remember, when JIT is enabled, the JVM will perform many calls to the machine/native code directly).

As expected, the removal of this environment variable along with the restart of the affected JVM processes did resolve the problem and restore the performnance level.

Assessing the JIT benefits for your application

I hope you appreciated this case study and short revisit of the JVM JIT compilation process. In order to understand the impact of not using JIT for your Java application, I recommend that you preform the following experiment:

  • Generate load to your application with JIT enabled and capture some baseline data such as CPU %, response time, # requests etc.
  • Disable JIT.
  • Redo the same testing and compare the results.
I’m looking forward for your comments and please share any experience you may have with JIT.

6.17.2013

Java GC tool from IBM: Tutorial on JCG – Part 1

This post is to inform you that we just released on Java Code Geeks the first article of a new series dedicated to the free IBM Garbage Collection and Memory Visualizer tool.

GCMV is a great and free tool, developed by IBM, which allows you to “visualize” and analyze the internal JVM memory usage and GC activity of your Java application(s).

The part 1 can be found here.

As usual, please feel free to comment and share your experience with GCMV.

Best regards,
P-H
Senior IT Consultant
Java EE Support Patterns

6.12.2013

IT Cloud Providers: A definitive guide from DZone



This post is to inform you that DZone has released a comprehensive guide that can help you and your client to choose the right Cloud provider for your business needs. This reference guide includes:

  • 100 pages of cloud product analysis.
  • 9 categories of cloud providers analyzed and compared.
  • 35+ of the most active cloud solutions and their feature sets.
  • Surveys of 400+ IT professionals on their cloud preferences.
I highly recommend that you download your own copy today!

Please feel free to also comment and share your experience with your current Cloud provider.


Best regards,
P-H
Senior IT Consultant
Java EE Support Patterns

5.02.2013

HotSpot 32-bit to 64-bit upgrade: what to look for

This short post will test your knowledge on JVM and project delivery skills; especially regarding JVM upgrades. I’m looking forward for your comments and answers on how to approach this type of projects in order to say away from performance problems.

Background

I was recently involved in a recent problem case affecting a production environment running on Weblogic 10 and using the HotSpot JVM 1.6 @32-bit. Given recent challenges and load increase forecast, the decision was taken to upgrade the HotSpot JVM 1.6 from 32-bit to 64-bit.

Please note that no change was applied to the JVM arguments.

After a few weeks of functional testing and planning, the upgrade was deployed successfully to the production environment. However, the support team did observe the next day major performance degradation, including thread lock contention, forcing the deployment team to rollback the upgrade.

The root cause was eventually found and the upgrade will be re-attempted in the near future.

Question:

Based on the above background, provide a list of possible root causes that may explain this performance degradation.

Propose a list of improvements to the project delivery and recommendations on how to properly manage and de-risk this type of upgrade.

Answer:

I often hear the assumption that switching from a 32-bit JVM to 64-bit JVM will automatically bring performance improvement. This is partially true. Performance improvements will only be observed if you are dealing with existing memory footprint problem(s) prior to the upgrade such as excessive GC or java.lang.outofmemoryerrorjava heap space conditions and if you performed proper tuning & Java heap sizing.

Unfortunately, we often overlook the fact that for a 64-bit JVM process, native pointers in the system takes up 8 bytes instead of 4. This can result in an increased memory footprint of your application and leading to more frequent GC and performance degradation.

Here is the official explanation from Oracle:

What are the performance characteristics of 64-bit versus 32-bit VMs?

"Generally, the benefits of being able to address larger amounts of memory come with a small performance loss in 64-bit VMs versus running the same application on a 32-bit VM.  This is due to the fact that every native pointer in the system takes up 8 bytes instead of 4.  The loading of this extra data has an impact on memory usage which translates to slightly slower execution depending on how many pointers get loaded during the execution of your Java program.  The good news is that with AMD64 and EM64T platforms running in 64-bit mode, the Java VM gets some additional registers which it can use to generate more efficient native instruction sequences.  These extra registers increase performance to the point where there is often no performance loss at all when comparing 32 to 64-bit execution speed.  
The performance difference comparing an application running on a 64-bit platform versus a 32-bit platform on SPARC is on the order of 10-20% degradation when you move to a 64-bit VM.  On AMD64 and EM64T platforms this difference ranges from 0-15% depending on the amount of pointer accessing your application performs."   

Now back to our original problem case, this memory footprint increase was found to be significant and at the root cause of our problem. Depending of the GC policy that you are using, an increase in GC major collections will lead to higher JVM & thread pause times, opening the door for thread lock contention and other problems. As you can see below, the upgrade to 64-bit JVM did increase our existing application static memory footprint (tenured space) by 45%. 

Java Heap footprint ~900 MB after major collections (32-bit)



 Java Heap footprint ~1.3 GB after major collections (64-bit)



** 45% increase of the application Java heap memory footprint (retained tenured space). Again, this is due to the expanded size of managed native pointers.

The other part of the problem is that no performance and load testing was performed prior to production implementation, functional testing only. Also, since no change or tuning was applied to the JVM settings, this automatically triggered an increased frequency of major collections and JVM pause time. The final solution did involve increasing the Java heap capacity form 2 GB to 2.5 GB and using the Compressed Oops option available with HotSpot JDK 6u23.

Now that you understand the root cause, find below my recommendations when performing this type of upgrade:

  • Execute performance and load testing cycles and compare your application memory footprint and GC behaviour before (32-bit) and after (64-bit).
  • Ensure you spend enough time tuning your GC settings and the Java heap size in order to minimize the JVM pause time.
  • If you are using HotSpot JVM 1.6 6u23 and later, take advantage of the Compressed Oops tuning parameter. Compressed oops will allow the HotSpot JVM to represent many (not all) managed pointers as 32-bit object offsets from the 64-bit Java heap base address; resulting in a reduced memory footprint following the upgrade.
  • Perform proper capacity planning of the hardware hosting your JVM processes. Ensure that you have enough physical RAM and CPU capacity to handle the extra memory and CPU footprint associated with this upgrade.
  • Develop a low risk implementation strategy by upgrading only a certain percentage of your production environment to 64-bit JVM e.g. 25%-50%. This will also allow you to compare the behavior with the existing 32-bit JVM processes and make sure performance is aligned with your P & L results.

4.16.2013

java.lang.NullPointerException Minecraft


I have been receiving recently many comments and emails related to Minecraft Java related errors such as java.lang.NullPointerException

I recommend to look at the following YouTube video:




While this Blog is dedicated to general Java and Java EE troubleshooting, I’m still willing to help non Java programmers and Minecraft gamers who are facing these problems such as java.lang.NullPointerException when developing Mods and understand what is null in Java.

If you are in fact facing such problem, please follow the steps below:

  • Provide me with the Java error and complete stack trace from Minecraft and/or your code.
  • Post the complete error as a comment of this post (Post a Comment section below).
I also encourage you to search and post your problems to the Minecraft community forum.

4.15.2013

HotSpot GC Thread CPU footprint on Linux


The following question will test your knowledge on garbage collection and high CPU troubleshooting for Java applications running on Linux OS. This troubleshooting technique is especially crucial when investigating excessive GC and / or CPU utilization.

It will assume that you do not have access to advanced monitoring tools such as Compuware dynaTrace or even JVisualVM. Future tutorials using such tools will be presented in the future but please ensure that you first master the base troubleshooting principles.

Question:

How can you monitor and calculate how much CPU % each of the Oracle HotSpot or JRockit JVM garbage collection (GC) threads is using at runtime on Linux OS?

Answer:

On the Linux OS, Java threads are implemented as native Threads, which results in each thread being a separate Linux process. This means that you are able to monitor the CPU % of any Java thread created by the HotSpot JVM using the top –H command (Threads toggle view).

That said, depending of the GC policy that you are using and your server specifications, the HotSpot & JRockit JVM will create a certain number of GC threads that will be performing young and old space collections. Such threads can be easily identified by generating a JVM thread dump. As you can see below in our example, the Oracle JRockit JVM did create 4 GC threads identified as "(GC Worker Thread X)”.

===== FULL THREAD DUMP ===============
Fri Nov 16 19:58:36 2012
BEA JRockit(R) R27.5.0-110-94909-1.5.0_14-20080204-1558-linux-ia32

"Main Thread" id=1 idx=0x4 tid=14911 prio=5 alive, in native, waiting
    -- Waiting for notification on: weblogic/t3/srvr/T3Srvr@0xfd0a4b0[fat lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:474)
    at weblogic/t3/srvr/T3Srvr.waitForDeath(T3Srvr.java:730)
    ^-- Lock released while waiting: weblogic/t3/srvr/T3Srvr@0xfd0a4b0[fat lock]
    at weblogic/t3/srvr/T3Srvr.run(T3Srvr.java:380)
    at weblogic/Server.main(Server.java:67)
    at jrockit/vm/RNI.c2java(IIIII)V(Native Method)
    -- end of trace

"(Signal Handler)" id=2 idx=0x8 tid=14920 prio=5 alive, in native, daemon

"(GC Main Thread)" id=3 idx=0xc tid=14921 prio=5 alive, in native, native_waiting, daemon

"(GC Worker Thread 1)" id=? idx=0x10 tid=14922 prio=5 alive, in native, daemon

"(GC Worker Thread 2)" id=? idx=0x14 tid=14923 prio=5 alive, in native, daemon

"(GC Worker Thread 3)" id=? idx=0x18 tid=14924 prio=5 alive, in native, daemon

"(GC Worker Thread 4)" id=? idx=0x1c tid=14925 prio=5 alive, in native, daemon
………………………

Now let’s put all of these principles together via a simple example.

Step #1 - Monitor the GC thread CPU utilization

The first step of the investigation is to monitor and determine:

  • Identify the native Thread ID for each GC worker thread shown via the Linux top –H command.
  • Identify the CPU % for each GC worker thread.



Step #2 – Generate and analyze JVM Thread Dumps

At the same time of Linux top –H, generate 2 or 3 JVM Thread Dump snapshots via kill -3 <Java PID>.

  • Open the JVM Thread Dump and locate the JVM GC worker threads.
  • Now correlate the "top -H" output data with the JVM Thread Dump data by looking at the native thread id (tid attribute).



As you can see in our example, such analysis did allow us to determine that all our GC worker threads were using around 20% CPU each. This was due to major collections happening at that time. Please note that it is also very useful to enable verbose:gc as it will allow you to correlate such CPU spikes with minor and major collections and determine how much your JVM GC process is contributing to the overall server CPU utilization.