/ Java EE Support Patterns

3.13.2013

OpenJPA: Memory Leak Case Study

This article will provide the complete root cause analysis details and resolution of a Java heap memory leak (Apache OpenJPA leak) affecting an Oracle Weblogic server 10.0 production environment.

This post will also demonstrate the importance to follow the Java Persistence API best practices when managing the javax.persistence.EntityManagerFactory lifecycle.

Environment specifications

  • Java EE server: Oracle Weblogic Portal 10.0
  • OS: Solaris 10
  • JDK: Oracle/Sun HotSpot JVM 1.5 32-bit @2 GB capacity
  • Java Persistence API: Apache OpenJPA 1.0.x (JPA 1.0 specifications)
  • RDBMS: Oracle 10g
  • Platform type: Web Portal

Troubleshooting tools


Problem description & observations

The problem was initially reported by our Weblogic production support team following production outages. An initial root cause analysis exercise did reveal the following facts and observations:

  • Production outages were observed on regular basis after ~2 weeks of traffic.
  • The failures were due to Java heap (OldGen) depletion e.g. OutOfMemoryError: Java heap space error found in the Weblogic logs.
  • A Java heap memory leak was confirmed after reviewing the Java heap OldGen space utilization over time from Foglight monitoring tool along with the Java verbose GC historical data.



Following the discovery of the above problems, the decision was taken to move to the next phase of the RCA and perform a JVM heap dump analysis of the affected Weblogic (JVM) instances.

JVM heap dump analysis

** A video explaining the following JVM Heap Dump analysis is now available here.

In order to generate a JVM heap dump, the supported team did use the HotSpot 1.5 jmap utility which generated a heap dump file (heap.bin) of about ~1.5 GB. The heap dump file was then analyzed using the Eclipse Memory Analyzer Tool. Now let’s review the heap dump analysis so we can understand the source of the OldGen memory leak.

MAT provides an initial Leak Suspects report which can be very useful to highlight your high memory contributors. For our problem case, MAT was able to identify a leak suspect contributing to almost 600 MB or 40% of the total OldGen space capacity.




At this point we found one instance of java.util.LinkedList using almost 600 MB of memory and loaded to one of our application parent class loader (@ 0x7e12b708). The next step was to understand the leaking objects along with the source of retention.

MAT allows you to inspect any class loader instance of your application, providing you with capabilities to inspect the loaded classes & instances. Simply search for the desired object by providing the address e.g. 0x7e12b708 and then inspect the loaded classes & instances by selecting List Objects > with outgoing references.




As you can see from the above snapshot, the analysis was quite revealing. What we found was one instance of org.apache.openjpa.enhance.PCRegistry at the source of the memory retention; more precisely the culprit was the _listeners field implemented as a LinkedList.

For your reference, the Apache OpenJPA PCRegistry is used internally to track the registered persistence-capable classes. Find below a snippet of the PCRegistry source code from Apache OpenJPA version 1.0.4 exposing the _listeners field.


/**
 * Tracks registered persistence-capable classes.
 *
 * @since 0.4.0
 * @author Abe White
 */
public class PCRegistry {
    // DO NOT ADD ADDITIONAL DEPENDENCIES TO THIS CLASS

    private static final Localizer _loc = Localizer.forPackage
        (PCRegistry.class);

    // map of pc classes to meta structs; weak so the VM can GC classes
    private static final Map _metas = new ConcurrentReferenceHashMap
        (ReferenceMap.WEAK, ReferenceMap.HARD);

    // register class listeners
    private static final Collection _listeners = new LinkedList();
……………………………………………………………………………………

Now the question is why is the memory footprint of this internal data structure so big and potentially leaking over time? The next step was to deep dive into the _listeners LinkedLink instance in order to review the leaking objects.




We finally found that the leaking objects were actually the JDBC & SQL mapping definitions (metadata) used by our application in order to execute various queries against our Oracle database. A review of the JPA specifications, OpenJPA documentation and source did confirm that the root cause was associated with a wrong usage of the  javax.persistence.EntityManagerFactory such of lack of closure of a newly created EntityManagerFactory instance.




If you look closely at the above code snapshot, you will realize that the close() method is indeed responsible to cleanup any recently used metadata repository instance. It did also raise another concern, why are we creating such Factory instances over and over…

The next step of the investigation was to perform a code walkthrough of our application code, especially around the life cycle management of the JPA EntityManagerFactory and EntityManager objects.

Root cause and solution

A code walkthrough of the application code did reveal that the application was creating a new instance of EntityManagerFactory on each single request and not closing it properly.

public class Application {
      
       @Resource
       private UserTransaction utx = null;
       
       // Initialized on each application request and not closed!
       @PersistenceUnit(unitName = "UnitName")
       private EntityManagerFactory emf = Persistence.createEntityManagerFactory("PersistenceUnit"); 

       public EntityManager getEntityManager() {
             return this.emf.createEntityManager();
       }
      
       public void businessMethod() {
            
             // Create a new EntityManager instance via from the newly created EntityManagerFactory instance
             // Do something...
             // Close the EntityManager instance
       }
}

This code defect and improver use of JPA EntityManagerFactory was causing a leak or accumulation of metadata repository instances within the OpenJPA _listeners data structure demonstrated from the earlier JVM heap dump analysis.

The solution of the problem was to centralize the management & life cycle of the thread safe javax.persistence.EntityManagerFactory via the Singleton pattern. The final solution was implemented as per below:

  • Create and maintain only one static instance of javax.persistence.EntityManagerFactory per application class loader and implemented via the Singleton Pattern.
  • Create and dispose new instances of EntityManager for each application request.
Please review this discussion from Stackoverflow as the solution we implemented is quite similar.

Following the implementation of the solution to our production environment, no more Java heap OldGen memory leak is observed.

Please feel free to provide your comments and share your experience on the same.

3.06.2013

Web and Java learning application


You may have noticed that we started to release training and tutorial videos in 2013.

In order to further improve and simplify your learning process, we will be releasing a free Java and Java EE troubleshooting learning platform (application) in the following weeks.

This application will be platform independent (Tomcat, JBoss, Weblogic, WAS etc.) and allow you to:

  • Improve your Java & Java EE troubleshooting skills such as JVM monitoring & tuning, thread dump analysis, common problem patterns etc.
  • Improve your knowledge on Java programming and troubleshooting.
  • Improve your knowledge on some key design patterns and emerging Web & language technologies.
  • Simulate common performance & stability problems.
  • Practice and learn via programming puzzles.
Some of the future articles and videos available from this Blog will be using this new learning platform for an enhanced learning experience.

We will also leverage this application along with other Web load testing and profiling technologies in order to conduct some future product reviews such as the memory leak analysis tool Plumbr.

I’m looking forward for your comments and suggestions.

Thank you.
Pierre-Hugues Charbonneau
Your author and teacher on Java EE Support Patterns.

2.21.2013

ClassCastException and ClassLoader Puzzle

The following question and puzzle will test your knowledge on Java class loaders and more precisely on one of the Java language specifications. It will also help you better troubleshoot problems such as java.lang.NoClassDefFoundError.

I highly suggest that you do not look at the explanation and solution until you review the code and come up with your own explanation.

You can download the Java program source code and binaries (compiled with JDK 1.7) here. In order to run the program, simply use the following command:

<JDK 1.7 HOME>\bin\java -classpath MainProgram.jar org.ph.javaee.training9.ChildFirstCLPuzzle

** Make sure that you also download the 3 JAR files below before you run the program.

  • MainProgram.jar contains the main program along with super class ProgrammingLanguage.
  • ProgrammingLanguage.jar contains the super class ProgrammingLanguage.
  • JavaLanguage.jar contains the implementation class JavaLanguage, which extends ProgrammingLanguage.

Question (puzzle)

Review closely the program source and packaging along with the diagram below reflecting the class loader delegation model used for this program.



Why can’t we cast (ChildFirstCLPuzzle.java @line 53) the Object javaLanguageInstance of type JavaLanguage, into ProgrammingLanguage?

...............................                   
// Finally, cast the object instance into ProgrammingLanguage
/** Question: why is the following code failing with ClassCastException given the fact JavaLanguage is indeed a ProgrammingLanguage?? **/
ProgrammingLanguage programmingLanguage = (ProgrammingLanguage)javaLanguageInstance;

............................... 

Propose a solution to allow the above cast to succeed without changing the original source code. Hint: look again at the class loader delegation model, packaging and diagram.

Answer & solution

The Java program is attempting to demonstrate a Java language specification rule related to class loaders: two classes loaded by different class loaders are considered to be distinct and hence incompatible.

If you review closely the program source, packaging and diagram, you will realize the following facts:

  • Our main program is loaded to the parent class loader e.g. $AppClassLoader.
  • The super class ProgrammingLanguage is also loaded to the parent class loader since it is referenced by our main program at line 53.
  • The implementation class JavaLanguage is loaded to our child class loader e.g. ChildFirstClassLoader which is following a “child first” delegation model.
  • Finally, the super class ProgrammingLanguage is also loaded to our child class loader.

The key point to understand is that the super class is loaded by 2 different class loaders. As per Java language specification, two classes loaded by different class loaders are considered to be distinct and hence incompatible. This means that ProgrammingLanguage loaded from the “child firs” class loader is different and not compatible with ProgrammingLanguage loaded from the parent classloader. This is why the cast attempted at line 53 failed with the error below:

ChildFirstCLPuzzle execution failed with ERROR: java.lang.ClassCastException: org.ph.javaee.training9.JavaLanguage cannot be cast to org.ph.javaee.training9.ProgrammingLanguage

Now how can we fix this problem without actually changing the source code? Well please keep in mind that we are using a “child first” delegation model here. This is why we end up with 2 versions of the same ProgrammingLanguage class. The solution can be visualized as per below.


In order to fix this problem, we can simply ensure that we have only one version of ProgrammingLanguage loaded. Since our main program is referencing ProgrammingLanguage directly, the solution is to remove the ProgrammingLanguage.jar file from the child class loader. This will force the child class loader to look for the class from the parent class loader, problem solved! In order to test the solution, simply remove the ProgrammingLanguage.jar from your testing folder and re-run the program.

I hope you appreciated this puzzle related to “child first” class loader delegation model and class loader rules. This understanding is especially important when you are dealing with complex Java EE deployments involving many class loaders, exposing you to this type of problems at runtime.

Please do not hesitate to post any comment or question on this puzzle.

2.01.2013

Java 8: From PermGen to Metaspace

As you may be aware, the JDK 8 Early Access is now available for download. This allows Java developers to experiment with some of the new language and runtime features of Java 8.

One of these features is the complete removal of the Permanent Generation (PermGen) space which has been announced by Oracle since the release of JDK 7. Interned strings, for example, have already been removed from the PermGen space since JDK 7. The JDK 8 release finalizes its decommissioning.

This article will share the information that we found so far on the PermGen successor: Metaspace. We will also compare the runtime behavior of the HotSpot 1.7 vs. HotSpot 1.8 (b75) when executing a Java program “leaking” class metadata objects.

The final specifications, tuning flags and documentation around Metaspace should be available once Java 8 is officially released.

Metaspace: A new memory space is born

The JDK 8 HotSpot JVM is now using native memory for the representation of class metadata and is called Metaspace; similar to the Oracle JRockit and IBM JVM's.

The good news is that it means no more java.lang.OutOfMemoryError: PermGen space problems and no need for you to tune and monitor this memory space anymore…not so fast. While this change is invisible by default, we will show you next that you will still need to worry about the class metadata memory footprint. Please also keep in mind that this new feature does not magically eliminate class and classloader memory leaks. You will need to track down these problems using a different approach and by learning the new naming convention.

I recommend that you read the PermGen removal summary and comments from Jon on this subject. 

In summary:

PermGen space situation

  • This memory space is completely removed.
  • The PermSize and MaxPermSize JVM arguments are ignored and a warning is issued if present at start-up.

Metaspace memory allocation model

  • Most allocations for the class metadata are now allocated out of native memory.
  • The klasses that were used to describe class metadata have been removed.

Metaspace capacity

  • By default class metadata allocation is limited by the amount of available native memory (capacity will of course depend if you use a 32-bit JVM vs. 64-bit along with OS virtual memory availability).
  • A new flag is available (MaxMetaspaceSize), allowing you to limit the amount of native memory used for class metadata. If you don’t specify this flag, the Metaspace will dynamically re-size depending of the application demand at runtime.

Metaspace garbage collection

  • Garbage collection of the dead classes and classloaders is triggered once the class metadata usage reaches the “MaxMetaspaceSize”.
  • Proper monitoring & tuning of the Metaspace will obviously be required in order to limit the frequency or delay of such garbage collections. Excessive Metaspace garbage collections may be a symptom of classes, classloaders memory leak or inadequate sizing for your application.

Java heap space impact

  • Some miscellaneous data has been moved to the Java heap space. This means you may observe an increase of the Java heap space following a future JDK 8 upgrade.

Metaspace monitoring

  • Metaspace usage is available from the HotSpot 1.8 verbose GC log output.
  • Jstat & JVisualVM have not been updated at this point based on our testing with b75 and the old PermGen space references are still present.

Enough theory now, let’s see this new memory space in action via our leaking Java program…

PermGen vs. Metaspace runtime comparison

In order to better understand the runtime behavior of the new Metaspace memory space, we created a class metadata leaking Java program. You can download the source here.

The following scenarios will be tested:

  • Run the Java program using JDK 1.7 in order to monitor & deplete the PermGen memory space set at 128 MB.
  • Run the Java program using JDK 1.8 (b75) in order to monitor the dynamic increase and garbage collection of the new Metaspace memory space.
  • Run the Java program using JDK 1.8 (b75) in order to simulate the depletion of the Metaspace by setting the MaxMetaspaceSize value at 128 MB.

JDK 1.7 @64-bit – PermGen depletion

  • Java program with 50K configured iterations
  • Java heap space of 1024 MB
  • Java PermGen space of 128 MB (-XX:MaxPermSize=128m)


As you can see form JVisualVM, the PermGen depletion was reached after loading about 30K+ classes. We can also see this depletion from the program and GC output.

Class metadata leak simulator
Author: Pierre-Hugues Charbonneau
http://javaeesupportpatterns.blogspot.com
ERROR: java.lang.OutOfMemoryError: PermGen space


Now let’s execute the program using the HotSpot JDK 1.8 JRE.

JDK 1.8 @64-bit – Metaspace dynamic re-size

  • Java program with 50K configured iterations
  • Java heap space of 1024 MB
  • Java Metaspace space: unbounded (default)





As you can see from the verbose GC output, the JVM Metaspace did expand dynamically from 20 MB up to 328 MB of reserved native memory in order to honor the increased class metadata memory footprint from our Java program. We could also observe garbage collection events in the attempt by the JVM to destroy any dead class or classloader object. Since our Java program is leaking, the JVM had no choice but to dynamically expand the Metaspace memory space.

The program was able to run its 50K of iterations with no OOM event and loaded 50K+ Classes.

Let's move to our last testing scenario.

JDK 1.8 @64-bit – Metaspace depletion

  • Java program with 50K configured iterations
  • Java heap space of 1024 MB
  • Java Metaspace space: 128 MB (-XX:MaxMetaspaceSize=128m)





As you can see form JVisualVM, the Metaspace depletion was reached after loading about 30K+ classes; very similar to the run with the JDK 1.7. We can also see this from the program and GC output. Another interesting observation is that the native memory footprint reserved was twice as much as the maximum size specified. This may indicate some opportunities to fine tune the Metaspace re-size policy, if possible, in order to avoid native memory waste.

Now find below the Exception we got from the Java program output.

Class metadata leak simulator
Author: Pierre-Hugues Charbonneau
http://javaeesupportpatterns.blogspot.com
ERROR: java.lang.OutOfMemoryError: Metadata space
Done!

As expected, capping the Metaspace at 128 MB like we did for the baseline run with JDK 1.7 did not allow us to complete the 50K iterations of our program. A new OOM error was thrown by the JVM. The above OOM event was thrown by the JVM from the Metaspace following a memory allocation failure.

#metaspace.cpp



Final words

I hope you appreciated this early analysis and experiment with the new Java 8 Metaspace. The current observations definitely indicate that proper monitoring & tuning will be required in order to stay away from problems such as excessive Metaspace GC or OOM conditions triggered from our last testing scenario. Future articles may include performance comparisons in order to identify potential performance improvements associated with this new feature.

Please feel free to provide any comment.

1.25.2013

Java concurrency: the hidden thread deadlocks

Most Java programmers are familiar with the Java thread deadlock concept. It essentially involves 2 threads waiting forever for each other. This condition is often the result of flat (synchronized) or ReentrantLock (read or write) lock-ordering problems.

Found one Java-level deadlock:
=============================
"pool-1-thread-2":
  waiting to lock monitor 0x0237ada4 (object 0x272200e8, a java.lang.Object),
  which is held by "pool-1-thread-1"
"pool-1-thread-1":
  waiting to lock monitor 0x0237aa64 (object 0x272200f0, a java.lang.Object),
  which is held by "pool-1-thread-2"

The good news is that the HotSpot JVM is always able to detect this condition for you…or is it?

A recent thread deadlock problem affecting an Oracle Service Bus production environment has forced us to revisit this classic problem and identify the existence of “hidden” deadlock situations.

This article will demonstrate and replicate via a simple Java program a very special lock-ordering deadlock condition which is not detected by the latest HotSpot JVM 1.7. You will also find a video at the end of the article explaining you the Java sample program and the troubleshooting approach used.

The crime scene

I usually like to compare major Java concurrency problems to a crime scene where you play the lead investigator role. In this context, the “crime” is an actual production outage of your client IT environment. Your job is to:

  • Collect all the evidences, hints & facts (thread dump, logs, business impact, load figures…)
  • Interrogate the witnesses & domain experts (support team, delivery team, vendor, client…)
The next step of your investigation is to analyze the collected information and establish a potential list of one or many “suspects” along with clear proofs. Eventually, you want to narrow it down to a primary suspect or root cause. Obviously the law “innocent until proven guilty” does not apply here, exactly the opposite.

Lack of evidence can prevent you to achieve the above goal. What you will see next is that the lack of deadlock detection by the Hotspot JVM does not necessary prove that you are not dealing with this problem.

The suspect

In this troubleshooting context, the “suspect” is defined as the application or middleware code with the following problematic execution pattern.

  • Usage of FLAT lock followed by the usage of ReentrantLock WRITE lock (execution path #1)
  • Usage of ReentrantLock READ lock followed by the usage of FLAT lock (execution path #2)
  • Concurrent execution performed by 2 Java threads but via a reversed execution order
The above lock-ordering deadlock criteria’s can be visualized as per below:


Now let’s replicate this problem via our sample Java program and look at the JVM thread dump output.

Sample Java program

This above deadlock conditions was first identified from our Oracle OSB problem case. We then re-created it via a simple Java program. You can download the entire source code of our program here.

The program is simply creating and firing 2 worker threads. Each of them execute a different execution path and attempt to acquire locks on shared objects but in different orders. We also created a deadlock detector thread for monitoring and logging purposes.

For now, find below the Java class implementing the 2 different execution paths.

package org.ph.javaee.training8;

import java.util.concurrent.locks.ReentrantReadWriteLock;

/**
 * A simple thread task representation
 * @author Pierre-Hugues Charbonneau
 *
 */
public class Task {
      
       // Object used for FLAT lock
       private final Object sharedObject = new Object();
       // ReentrantReadWriteLock used for WRITE & READ locks
       private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
      
       /**
        *  Execution pattern #1
        */
       public void executeTask1() {
            
             // 1. Attempt to acquire a ReentrantReadWriteLock READ lock
             lock.readLock().lock();
            
             // Wait 2 seconds to simulate some work...
             try { Thread.sleep(2000);}catch (Throwable any) {}
            
             try {              
                    // 2. Attempt to acquire a Flat lock...
                    synchronized (sharedObject) {}
             }
             // Remove the READ lock
             finally {
                    lock.readLock().unlock();
             }           
            
             System.out.println("executeTask1() :: Work Done!");
       }
      
       /**
        *  Execution pattern #2
        */
       public void executeTask2() {
            
             // 1. Attempt to acquire a Flat lock
             synchronized (sharedObject) {                 
                   
                    // Wait 2 seconds to simulate some work...
                    try { Thread.sleep(2000);}catch (Throwable any) {}
                   
                    // 2. Attempt to acquire a WRITE lock                   
                    lock.writeLock().lock();
                   
                    try {
                           // Do nothing
                    }
                   
                    // Remove the WRITE lock
                    finally {
                           lock.writeLock().unlock();
                    }
             }
            
             System.out.println("executeTask2() :: Work Done!");
       }
      
       public ReentrantReadWriteLock getReentrantReadWriteLock() {
             return lock;
       }
}

As soon ad the deadlock situation was triggered, a JVM thread dump was generated using JVisualVM.


As you can see from the Java thread dump sample. The JVM did not detect this deadlock condition (e.g. no presence of Found one Java-level deadlock) but it is clear these 2 threads are in deadlock state.

Root cause: ReetrantLock READ lock behavior

The main explanation we found at this point is associated with the usage of the ReetrantLock READ lock. The read locks are normally not designed to have a notion of ownership. Since there is not a record of which thread holds a read lock, this appears to prevent the HotSpot JVM deadlock detector logic to detect deadlock involving read locks.

Some improvements were implemented since then but we can see that the JVM still cannot detect this special deadlock scenario.

Now if we replace the read lock (execution pattern #1) in our program by a write lock, the JVM will finally detect the deadlock condition but why?

Found one Java-level deadlock:
=============================
"pool-1-thread-2":
  waiting for ownable synchronizer 0x272239c0, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
  which is held by "pool-1-thread-1"
"pool-1-thread-1":
  waiting to lock monitor 0x025cad3c (object 0x272236d0, a java.lang.Object),
  which is held by "pool-1-thread-2"

Java stack information for the threads listed above:
===================================================
"pool-1-thread-2":
       at sun.misc.Unsafe.park(Native Method)
       - parking to wait for  <0x272239c0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
       at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
       at org.ph.javaee.training8.Task.executeTask2(Task.java:54)
       - locked <0x272236d0> (a java.lang.Object)
       at org.ph.javaee.training8.WorkerThread2.run(WorkerThread2.java:29)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       at java.lang.Thread.run(Thread.java:722)
"pool-1-thread-1":
       at org.ph.javaee.training8.Task.executeTask1(Task.java:31)
       - waiting to lock <0x272236d0> (a java.lang.Object)
       at org.ph.javaee.training8.WorkerThread1.run(WorkerThread1.java:29)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       at java.lang.Thread.run(Thread.java:722)

This is because write locks are tracked by the JVM similar to flat locks. This means the HotSpot JVM deadlock detector appears to be currently designed to detect:

  • Deadlock on Object monitors involving FLAT locks
  • Deadlock involving Locked ownable synchronizers associated with WRITE locks
The lack of read lock per-thread tracking appears to prevent deadlock detection for this scenario and significantly increase the troubleshooting complexity.

I suggest that you read Doug Lea’s comments on this whole issue since concerns were raised back in 2005 regarding the possibility to add per-thread read-hold tracking due to some potential lock overhead.

Find below my troubleshooting recommendations if you suspect a hidden deadlock condition involving read locks:

  • Analyze closely the thread call stack trace, it may reveal some code potentially acquiring read locks and preventing other threads to acquire write locks.
  • If you are the owner of the code, keep track of the read lock count via the usage of the lock.getReadLockCount() method
I’m looking forward for your feedback, especially from individuals with experience on this type of deadlock involving read locks.

Finally, find below a video explaining such findings via the execution and monitoring of our sample Java program.