/ 2013 ~ Java EE Support Patterns


Java VM – Beware of the YoungGen space

As you may have seen from our previous performance oriented articles, a healthy JVM is one of the most important goals to achieve for optimal application performance and stability. Such health assessment is very often only focusing on the frequency (avoidance) of major collections or detecting the presence of memory leaks. What about the sizing and footprint of the Young Generation space or short lived objects?

This article is based on a true story and a recent troubleshooting experience we had with one of our IT clients. It will demonstrate, through a sample application, that an excessive memory footprint and frequency of minor collections can trigger performance problems as severe as its bigger brother, the Old Generation or tenured space.

JVM health diagnostic

If you are new to the JVM tuning world, you will soon realize that there are no universal solutions applicable for all applications. Regardless of the high quality material that you may find from the Web from various sources, you still need to do your due diligence and properly understand the type of application that you are dealing with, including its sensitivity to JVM GC pauses (some application requires very low JVM pause time < 1%).

Java profiling (including memory leak detection) along with performance & load testing are good examples of extra work that you will need to perform in order to gather all proper data and facts about your application memory footprint and JVM runtime health.

That being said, what do we mean by a “healthy” JVM? Please answer the following questions on the best of your knowledge.

** If you answer NO, please assume a confidence level of 90%+, otherwise answer I DON’T KNOW.

  • Is your Java heap or OldGen space leaking over time (after major collections)?
  • Is your application currently affected by large and/or frequent JVM GC pauses?
  • Is you JVM overall pause time higher than 5% or higher than an established ideal baseline?
  • Is your application response time currently affected by the JVM GC activity on a regular basis and beyond the application tolerance point?
  • Did you observe over the last 3 months occurrences of java.lang.OutOfMemoryError errors?
  • Did you observe over the last 3 months occurrences of JVM crashes (sudden JVM failure with core dump & crash report)?
  • Do you believe that your JVM is currently unstable and/or requires too much human intervention (regular restart etc.)?
If you answered YES or I DON’T KNOW to any of those, this means that you or your production performance tuning team have some work to do here, including a review of the current JVM GC policy.

If you answered NO to all of those with high confidence level, it means that you have likely achieved a solid application and JVM stability, congratulation. I still recommend that you re-assess the situation in between major releases and incremental load forecasts.

Young Generation: Stop the world, really?

As we saw from the quick JVM health assessment exercise, one of the points refers to the JVM overall pause time. This essentially means how much time the JVM is spending during the “stop the world” events. During such periods, application threads are suspended and not performing any work, increasing response time of your application. This metric is crucial since large JVM pauses will trigger unstable and unpredictable response times.

One common misperception that I have seen over the last few years is that YoungGen or minor collections are fully transparent and not effecting the application response time. This statement could almost be true if your Java heap size is small (YG space < 1 GB) and dealing with moderate short lived objects footprint or allocation rate. In this scenario, if the minor collections are executed very fast (< 20 ms) and not too frequently (every 30 seconds+), the overall JVM pause time contributed by the YoungGen space will remain small (<< 1%). However, the situation can change very quickly if the YG memory allocation rate increases (increased footprint per request, traffic surge etc.).

I recommend the following articles for more details about the YoungGen space and concurrent collectors available for the HotSpot JVM.

# Oracle HotSpot mostly concurrent collectors: CMS vs. G1

# Oracle HotSpot minor collections exhaustive coverage

Regardless of the HotSpot GC policy that you are using, including the mostly concurrent collectors such as CMS or G1, the YoungGen space collection remains a “stop the world” event. To our knowledge, Azul Zing C4 is the only JVM collector advertised as a true continuously concurrent compacting collector. We did not have a chance to experiment with this collector at this point. I welcome anyone with C4 tuning experience to share their observations, especially true fact figures vs. mostly concurrent collectors such as G1.

Now that we covered some of the theory, let’s deep dive into our sample application and review the performance testing results against various YoungGen footprint and allocation rates.

Sample application specifications

In order to compare the responsiveness and JVM pause time % between various YG allocation rates, we created a sample application as per below specifications:

  • A JAX-RS (REST) Web Service was created and exposed via the jvm URI as per below attributes.
    public Integer jvm() {}

Each invocation of jvm is performing the following logic:

1.     Allocate a pre-determined size of short-lived objects (eligible for fast YG GC).

In addition, an initial memory footprint of 1 GB of long-lived objects (not eligible for GC) is allocated at class loading time in order to create some noise for the CMS collector.

The YG short lived objects memory allocation and OldGen static retention was simply achieved through the creation of a static array of primitive byte values as per below code snippet. The true memory footprint can be observed as per the JVM heap dump analysis using MAT.

private final static int LONG_LIVED_OBJ_FOOTPRINT = (1024 * 1024 * 1024);
private final static int SHORT_LIVED_OBJ_FOOTPRINT = (100 * 1024 * 1024);

// 1 GB static memory footprint
private final static byte byteArrayLongLivedObj[] = new byte[LONG_LIVED_OBJ_FOOTPRINT];

// 100 MB memory allocation (waste) created per execution
public void generateShortLivedObj(String objId) {          
  byte byteArrayShortLivedObj[] = new byte[SHORT_LIVED_OBJ_FOOTPRINT];

Finally, find below the environment specifications and software’s used to create, execute and monitor this YG comparison performance testing.

Performance testing results and observations

The following performance testing simulated a real life application that was dealing with high JVM pause time and severe degradation under peak load. 3 runs were executed, one for the baseline, and 2 runs after simulating improvements (reduction) of the application memory footprint per request.


  • 10 concurrent threads
  • 100 MB of short lived objects created per execution per JVM process
The short lived objects memory footprint may look extreme but this is indeed what we were dealing with initially.


  • Average response time: 140 ms
  • Throughput: 68 req / sec
  • JVM overall pause time: 25.8%
  • YG collection frequency: 7 collections per second
  • Rate of GC: 308 909 MB per minute

As per JVisualVM, it looks like the JVM is healthy (no memory leak, stable & low OldGen etc.). However, when you deep dive further in the verbose:gc logs, you then realize that the overall JVM pause time is 25.8%, and all due to excessive frequency of YG collections. This is strongly proving the point to properly analyze the verbose:gc logs vs. only focusing on the JVM tenured space trends.

Testing & tuning #1

  • 10 concurrent threads
  • 50 MB of short lived objects created per execution per JVM process
This run simulates an initial improvement of the application footprint and memory allocation rate from 100 MB to 50 MB per allocation. We can clearly see an improvement to all figures, especially the throughput by simply reducing the application memory footprint per request.


  • Average response time: 119 ms  -21
  • Throughput: 79 req / sec  +11
  • JVM overall pause time: 15.59%  -10.21
  • YG collection frequency: 3-4 collections per second  -3
  • Rate of GC: 164 950 MB per minute  -143 959

Testing & tuning #2

  • 10 concurrent threads
  • 5 MB of short lived objects created per execution per JVM process
This run simulates a much reduced application footprint and memory allocation rate from 100 MB to only 5 MB per allocation.


  • Average response time: 107 ms  -33
  • Throughput: 90 req / sec  +22
  • JVM overall pause time: 1.9%  -23.9
  • YG collection frequency: 1 collection every 2-3 seconds * significant reduction
  • Rate of GC: 15 841 MB per minute  -293 068

As you can see, the final improvement to the application footprint and memory allocation did significantly decrease the JVM pause time to a more acceptable 1.9%. It is important to note that throughout these 3 tests, the OldGen footprint and CMS activity did not have any substantial impact on the JVM pause time, the performance problem was due to the excessive activity and high volume of stop the world events associated with the YG collections.

Solutions and recommendations

Our problem case demonstrated that we can reduce the JVM pause time associated with the excessive YG collection activity by tuning and reducing the memory footprint per application request, thus reducing the allocation rate and YG GC frequency.

However, when such tuning strategy is not possible in the short term, it is worth exploring other solutions. Similar results can potentially be achieved through the following capacity improvement strategies:

·        Horizontal and vertical scaling: split the traffic via an increased number of JVM processes, at the expense of the available hardware, thus reducing the allocation rate and frequency of YG collections. This essentially means throwing hardware at the problem. My recommendation is always to fine tune your application memory footprint first, and then explore in parallel other scaling options.
·        Java heap size & YG ratio tuning: increasing the size of the YG space will definitely help reducing the frequency of stop the world YG collections. Now please be careful that you don’t “starve” the OldGen space otherwise you will simply move the problem with even more severe consequences such as JVM thrashing and OOM events.

Final words

I hope that you enjoyed this article and now have a better understanding of potential performance impact of excessive JVM YG collections.

I recommend that you do the following exercise after reading this article:

  • Pick one of your busiest applications.
  • Review the verbose:gc log and determine the JVM pause time via GCMV.
  • Determine the frequency and impact of the YG collections and identify tuning opportunities.
I’m looking forward for your comments and please share your JVM tuning experience.


Oracle Weblogic stuck thread detection

The following question will again test your knowledge of the Oracle Weblogic threading model. I’m looking forward for your comments and experience on the same.

If you are a Weblogic administrator, I’m certain that you heard of this common problem: stuck threads. This is one of the most common problems you will face when supporting a Weblogic production environment.

A Weblogic stuck thread simply means a thread performing the same request for a very long time and more than the configurable Stuck Thread Max Time.


How can you detect the presence of STUCK threads during and following a production incident?


* An Oracle Weblogic 12c Thread Monitoring YouTube video is now available.

As we saw from our last article “Weblogic Thread Monitoring Tips”, Weblogic provides functionalities allowing us to closely monitor its internal self-tuning thread pool. It will also highlight you the presence of any stuck thread.

This monitoring view is very useful when you do a live analysis but what about after a production incident? The good news is that Oracle Weblogic will also log any detected stuck thread to the server log. Such information includes details on the request and more importantly, the thread stack trace. This data is crucial and will allow you to potentially better understand the root cause of any slowdown condition that occurred at a certain time.

<Sep 25, 2013 7:23:02 AM EST> <Error> <WebLogicServer> <Server1> <App1>
<[ACTIVE] ExecuteThread: '11' for queue: 'weblogic.kernel.Default (self-tuning)'>
<BEA-000337> <[STUCK] ExecuteThread: '35' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "608" seconds working on the request
"Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 608213 ms

POST /App1/jsp/test.jsp HTTP/1.1
Accept: application/x-ms-application...
Referer: http://..
Accept-Language: en-US
User-Agent: Mozilla/4.0 ..
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
Content-Length: 539
Connection: Keep-Alive
Cache-Control: no-cache

]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
    <Application Execution Stack Trace>

Here is one more tip: the generation and analysis of a JVM thread dump will also highlight you stuck threads. As we can see from the snapshot below, the Weblogic thread state is now updated to STUCK, which means that this particular request is being executed since at least 600 seconds or 10 minutes.

This is very useful information since the native thread state will typically remain to RUNNABLE. The native thread state will only get updated when dealing with BLOCKED threads etc. You have to keep in mind that RUNNABLE simply means that this thread is healthy from a JVM perspective. However, it does not mean that it truly is from a middleware or Java EE container perspective. This is why Oracle Weblogic has its own internal ExecuteThread state.

Finally, if your organization or client is using any commercial monitoring tool, I recommend that you enable some alerting around both hogging thread and stuck thread. This will allow your support team to take some pro-active actions before the affected Weblogic managed server(s) become fully unresponsive.


Weblogic Thread Monitoring Tips

If you are working as a middleware administrator or application support individual, you may have realized by now how crucial it is to have proper knowledge of the JVM along with a good understanding of the Java concurrency principles (yes you have to learn how to analyze thread dumps).

There is one principle I’m sure about: it is never too late to improve our knowledge and troubleshooting skills. Reaching a skill “plateau” is quite common and typically not due to our ability to learn but because of our fear and lack of willingness to embrace the challenges.

One of such challenges is possibly your ability to understand and assess the health of the JVM & middleware threads of the Java EE container you are responsible for such as Oracle Weblogic Server. If this is your situation then this post is for you.


How can you monitor the JVM threads in an efficient manner using the Weblogic admin console? Also, please elaborate how you can differentiate between healthy threads vs. slow running threads. Finally, what other tools can help you achieve this task?


Please note that Weblogic Server 10.3.5 was used for the following example.

* An Oracle Weblogic 12c Thread Monitoring YouTube video is now available.

Oracle Weblogic Server is always installed with an admin console that provides you with out-of-the-box monitoring functions of the various Java EE resources exposed via the JMX API. Weblogic threads (created and assigned by the WLS kernel to the default self-tuning thread pool) are also fully exposed.

This monitoring page allows you to:

  • Monitor the full list of all Java threads under Weblogic control.
  • Correlate any slow running thread with your application, request and assigned Work Manager, if any.
  • Generate a JVM Thread Dump of the Weblogic managed server directly from the page via the Dump Thread Stacks button.

Thread states - summary view

This section provides a summary of all different Weblogic threads and states.

Thread states - detailed view

The detailed view is much more interesting. This is where you will be spending most of your analysis time. Make sure that you add all proper columns including the associated Work Manager, application name etc.

The live Weblogic thread monitoring analysis process I usually follow is as per below. This approach is very useful for production environments when you are trying to determine the source of a performance slowdown or just to give you an idea of the health of the Weblogic threads.

  • Refresh the page every 3-5 seconds.
  • In between the refresh actions, identify the threads that are still executing the same request (slow running threads). This can be determined if you see the same Weblogic thread “Name” executing the same “Current Request” with the same “Total requests” value. Other criteria’s would be if Weblogic “promote” the affected thread(s) to Hogger or STUCK.
  • Continue until you are done with your monitoring activity.
  • As soon as one or a few slow running threads are found, identify the affected request(s) and application(s).
  • Immediately after, generate a JVM Thread Dump using the Dump Thread Stacks button and copy/paste the output to a text editor for live or future analysis.

I also recommend that you use other tools to monitor the JVM and threads such as JVisualVM. JVisualVM will give a full view of all the threads, including GC related threads. It will also allow you to monitor the Java heap and correlate any finding with the health of the activity of the garbage collector.

Finally, if you suspect that you are dealing with a deeper thread concurrency problem such as thread lock contention or Java-level deadlock, you will need to generate a native thread dump (JVisualVM, kill -3 PID, jstack etc.) which will allow you to review the different monitor locks and locked ownable synchronizers.


Plumbr 3.0 – An evolutive approach for Java memory leaks

Most Java developers are familiar with the concept of “memory leaks”. In the Java world, it essentially means the constant expansion of a particular memory space such as the Java heap, PermGen & Metaspace (Java 8) or the native memory. Such leaks are often the result of Java code problems from your applications, third part API’s or even inside the Java EE container and JDK code base. The level of complexity to pinpoint and resolve these types of problems can vary from medium to very high.

The good news is that the JVM has evolves into a great piece of engineering. The GC algorithms, troubleshooting and runtime data exposure capabilities of the JVM are now quite mature. We have access to most critical runtime data via the JMX API. All JVM vendors also provide JVM Thread Dump & Heap Dump generation techniques. Finally, we also have access to a great arsenal of tools and Java profilers to monitor the JVM internals and analyze memory dumps such as JVisualVM, Eclipse MAT or other commercial products.

That being said, even with the help of all these tools, the analysis process of Java memory leaks remains quite challenging. The skillset requirement for individuals performing such analysis is also high since it requires proper knowledge of the Java heap, troubleshooting and runtime analysis techniques.

I recently had the chance to experiment with the latest version of a Java memory leak analyzer tool that I’m sure you heard about: Plumbr. Plumbr’s approach can be resumed as this: instead of analyzing the memory dumps “after the fact” or following an OutOfMemoryError condition, why not writing a program that keeps tract of the Java objects and detect precisely memory leak suspects at runtime…and with live production traffic?

This article will share the positive experience I had with this product while experimenting and “engineering” a Java heap memory leak with the latest version of WildFly (formerly known as JBoss AS). I will also share my personal tips on how to improve your troubleshooting experience with Plumbr by combining additional tools or “synergists”. A complete Plumbr tutorial video will also be available from my YouTube channel in the near future.

Plumbr has also the capabilities to identify class loader related memory leaks affecting the PermGen space of the HotSpot JVM. I may publish more articles in the future addressing this particular type of leak with the help of Plumbr.

Memory leak simulator and environment specifications

In order to give Plumbr some challenges, the following memory leak was engineered:

  • A JAX-RS (REST) Web Service was created and exposed via the jvmleak URI as per below attributes.
    public Integer jvmLeak() {}

Each invocation of jvmleak is performing the following logic:

1.     Allocate a high amount of short-lived objects (no reference).
2.     Allocate a small amount of long-lived objects (normal or hard references) to a static ConcurrentHashMap data structure.
3.     Returns the current count of the “leaking” ConcurrentHashMap.

We also created 3 extra Java classes:

  • JVMMemoryAllocator. This class is responsible to perform the short-lived and long-lived memory allocations.
  • ShortLivedObj. This class represents a short-lived object with no reference.
  • LongLivedObj. This class represents a long-lived object with hard references.
You will find below the environment specifications and software’s used to create and execute this Java heap memory leak simulator.

Plumbr download and installation

Plumbr is packaged as a “trialware”. This means that you can download the full version for free and determine if your application(s) contain memory leak(s). The location of the memory leak (code level) will require your or your client to purchase a Plumbr license plan as per your needs.

I recommend that you first install Plumbr to a development environment prior to production. Plumbr is a lightweight agent but extra due diligence is always recommended in order to reduce risk for your production environment.

  1. Login / register at https://portal.plumbr.eu/. This will allow you to download the Plumbr client.

  1. Unzip the archive at a location of your choice (local environment, development server etc.).
  2. If your Java application is running on Windows OS, you can simply double click the plumbr.jar located under the Plumbr root directory or use the following command: java -jar plumbr.jar. This will launch the Plumbr client and allow you to “attach” it to a running JVM process. Another approach is to simply edit your application JVM start-up settings and add the following argument: -javaagent:<PLUMBR ROOT DIRECTORY>plumbr.jar. I personally prefer the second approach.
Now back to our memory leak simulator, you will notice from the snapshot below that we added the Plumbr agent reference to the WildFly JVM start-up arguments inside the IDE directly.

After the restart of your JVM process, you will notice a similar message at the beginning of the standard output log:

* Plumbr (B1422) is attached.                              *
* Running with JRE from …\jdk1.7.0_09\jre                  *
* Your license is valid                                    *
*   for 20 more days                                       *
*   for Pierre-Hugues Charbonneau.                         *
* Plumbr agent is connected to the Plumbr Portal.          *
* Open up https://portal.plumbr.eu to follow its progress. *

This is telling us that Plumbr is now active and connected to the Plumbr Portal or reporting server. Now go to Plumbr portal and verify if it is properly receiving data from your JVM process.

Memory leak simulator warm-up

Now that the Plumbr agent is connected, it is now time to fire our memory leak simulator. In order to simulate an interesting memory leak, we will use the following load testing specifications:

  • JMeter will be configured to execute our REST Web Service URI jvmleak.
  • Our JMeter thread group will be configured to run “forever” with 20 concurrent threads.
  • JvisualVM and the Plumbr Portal will both be used to monitor the Java heap utilization along with the internal Plumbr MBeans.

Please note that launching JVisualVM is optional but I definitely recommend that you use it in conjunction with the MBeans browser plugin (add-on for JVisualVM). This will allow you to track the status and progress of the Plumbr agent directly from JVisualVM. It is important to note you will need to generate load and get several Full GC iterations before Plumbr is able to identify true memory leaks. The elapsed time will also depend of the activity (load) and nature of your application.

Since we have not fire JMeter at this point, you can see that Plumbr is still in its WARMUP phase with 10% completion.

Memory leak simulator execution

Generating an adequate amount of load to your application will allow Plumbr to complete its WARMUP phase fairly quickly. We can now see from our simulator that Plumbr is in RUNNING state and tracking several thousands of objects already. However, we had no Full GC at this point so we will need to wait until a few iterations are performed. This will allow the internal Plumbr computing engine to do its assessment and potentially narrow down the location of a memory leak.

You can notice that number of objects tracked by Plumbr will vary in between Full GC iterations. This is because Plumbr will typically only focus on objects that are “suspect” such as objects able to survive major collections (Full GC).

After a few major collections, Plumbr was able to detect a potential memory leak with 98% confidence. We are almost there…It is important to note that no OutOfMemoryError was thrown. The leak can also be observed from JVisualVM but it is not that obvious such early in the load test life cycle.

Finally, after more major collections, Plumbr was able to detect a memory leak with a confidence level of 100%. As you can see from the snapshots, JVisualVM did allow us to easily monitor the Java heap and Plumbr MBeans status progressions.

You will also notice the following message in the standard output logs.
15:22:41,270 INFO  [stdout] (Plumbr thread - 176) ******************************************************************************
15:22:41,271 INFO  [stdout] (Plumbr thread - 176) * Plumbr has found a memory leak.                                             *
15:22:41,271 INFO  [stdout] (Plumbr thread - 176) * Plumbr will now stop tracking to leave more resources for your application. *
15:22:41,271 INFO  [stdout] (Plumbr thread - 176) * You can find the detailed memory leak report by opening this url:           *
15:22:41,272 INFO  [stdout] (Plumbr thread - 176) * https://portal.plumbr.eu/report/5515                                        *
15:22:41,272 INFO  [stdout] (Plumbr thread - 176) ******************************************************************************

At this point we have to wait for Plumbr to analyze the location and type of leak found. Once the analysis process is completed, a report will be generated and available from the Plumbr Portal.

Plumbr memory leak report

Plumbr 3.0 provides centralized memory leak report capabilities. Each time Plumbr is able to detect a new memory leak, a report is created and uploaded to the reporting Portal. We can see below that a new report was generated from our fresh memory leak simulator execution.

The report is the ultimate outcome and deliverable of this exercise: a precise location, at code level, of where the memory leak is located.

We can see from the above report that Plumbr was perfectly able to identify our memory leak. You can notice that the leak report is split into 4 main sections:

  • The header contains the number of leaks found along with the detail on the memory footprint occupied by the leaking objects vs. the total Java heap capacity.
  • Leaking object type: This represents the object type of the instances accumulating over time in between major collections.
  • Leak creation location: This represents the caller and Java class where the leaking objects are created.
  • Memory references: This represents the object reference tree where the leaking objects are still referenced or held.

In our case, Plumbr was able to identify the exact location of our engineered memory leak.

  • LongLivedObj is indeed the expected leaking object type.
  • JVMMemoryAllocator is definitely the Java class where the leak is created.
  • ConcurrentHashMap is the implemented "referencer" or container.
Interestingly, Plumbr was also able to identify another potential leak inside WildFly 8 Alpha 3 itself…

Now please keep in mind that following the creation of the report, Plumbr will not magically resolve the memory leak for you. You will need to spend some time and review the affected code, especially the location where the leak is created and understand why the objects are still referenced. Plumbr will do 50%+ of the work. You will need to take care of the other half and determine the proper code fix required, upgrade or patch of the offending API(s).

Final words

I hope that you appreciated this tutorial and review of Plumbr 3.0 while simulating a true Java heap memory leak. This product is looking quite promising and I definitely recommend that you give Plumbr a try in your environment; especially if you suspect that you are dealing with memory leak problems affecting your production environment stability. Thumbs up to the Plumbr core team!

I’m looking forward for your comments and please share your experience with Plumbr.