This
article will provide the complete root cause analysis details and resolution of
a Java heap memory leak (Apache OpenJPA leak) affecting an Oracle Weblogic
server 10.0 production environment.
This post
will also demonstrate the importance to follow the Java Persistence API best
practices when managing the javax.persistence.EntityManagerFactory lifecycle.
Environment specifications
- Java EE server:
Oracle Weblogic Portal 10.0
- OS:
Solaris 10
- JDK:
Oracle/Sun HotSpot JVM 1.5 32-bit @2 GB capacity
- Java Persistence API: Apache
OpenJPA 1.0.x (JPA 1.0 specifications)
- RDBMS:
Oracle 10g
- Platform type: Web
Portal
Troubleshooting tools
- Quest Foglight
for Java (Java heap monitoring)
- MAT (Java heap dump analysis)
Problem description & observations
The
problem was initially reported by our Weblogic production support team
following production outages. An initial root cause analysis exercise did
reveal the following facts and observations:
- Production outages were observed on regular basis
after ~2 weeks of traffic.
- The failures were due to Java heap (OldGen) depletion
e.g. OutOfMemoryError: Java heap space error found in the Weblogic logs.
- A Java heap memory leak was confirmed after
reviewing the Java heap OldGen space utilization over time from Foglight
monitoring tool along with the Java verbose GC historical data.
Following
the discovery of the above problems, the decision was taken to move to the next
phase of the RCA and perform a JVM heap dump analysis of the affected Weblogic (JVM)
instances.
JVM heap dump analysis
In order
to generate a JVM heap dump, the supported team did use the HotSpot 1.5 jmap
utility which generated a heap dump file (heap.bin) of about ~1.5 GB. The heap
dump file was then analyzed using the Eclipse Memory Analyzer Tool. Now let’s
review the heap dump analysis so we can understand the source of the OldGen
memory leak.
MAT
provides an initial Leak Suspects
report which can be very useful to highlight your high memory contributors. For
our problem case, MAT was able to identify a leak suspect contributing to almost
600 MB or 40% of the total OldGen space capacity.
At this
point we found one instance of java.util.LinkedList using almost 600 MB of memory
and loaded to one of our application parent class loader (@ 0x7e12b708). The
next step was to understand the leaking objects along with the source of
retention.
MAT allows
you to inspect any class loader instance of your application, providing you with
capabilities to inspect the loaded classes & instances. Simply search for
the desired object by providing the address e.g. 0x7e12b708 and then inspect
the loaded classes & instances by selecting List Objects > with outgoing
references.
As you can
see from the above snapshot, the analysis was quite revealing. What we found
was one instance of org.apache.openjpa.enhance.PCRegistry at the source of the
memory retention; more precisely the culprit was the _listeners field implemented as a LinkedList.
For your
reference, the Apache OpenJPA PCRegistry is used internally to track the registered
persistence-capable classes. Find below a snippet of the PCRegistry source code
from Apache OpenJPA version 1.0.4 exposing the _listeners field.
/**
*
Tracks registered persistence-capable classes.
*
* @since 0.4.0
* @author Abe White
*/
public class PCRegistry {
// DO NOT ADD ADDITIONAL DEPENDENCIES TO THIS
CLASS
private static final Localizer
_loc = Localizer.forPackage
(PCRegistry.class);
// map of pc classes to meta structs;
weak so the VM can GC classes
private static final Map _metas = new ConcurrentReferenceHashMap
(ReferenceMap.WEAK, ReferenceMap.HARD);
// register class listeners
private static final Collection
_listeners = new LinkedList();
……………………………………………………………………………………
Now the
question is why is the memory footprint of this internal data structure so big
and potentially leaking over time? The next step was to deep dive into the
_listeners LinkedLink instance in order to review the leaking objects.
We finally
found that the leaking objects were actually the JDBC & SQL mapping
definitions (metadata) used by our application in order to execute various
queries against our Oracle database. A review of the JPA specifications, OpenJPA
documentation and source did confirm that the root cause was associated with a
wrong usage of the javax.persistence.EntityManagerFactory
such of lack of closure of a newly created EntityManagerFactory instance.
If you
look closely at the above code snapshot, you will realize that the close() method
is indeed responsible to cleanup any recently used metadata repository instance.
It did also raise another concern, why are we creating such Factory instances
over and over…
The next
step of the investigation was to perform a code walkthrough of our application
code, especially around the life cycle management of the JPA EntityManagerFactory
and EntityManager objects.
Root cause and solution
A code
walkthrough of the application code did reveal that the application was creating
a new instance of EntityManagerFactory on each single request and not closing
it properly.
public class Application
{
@Resource
private UserTransaction utx = null;
// Initialized on each application request and not closed!
@PersistenceUnit(unitName
= "UnitName")
private EntityManagerFactory emf = Persistence.createEntityManagerFactory("PersistenceUnit");
public EntityManager getEntityManager() {
return this.emf.createEntityManager();
}
public void businessMethod() {
// Create a new EntityManager instance via from
the newly created EntityManagerFactory instance
// Do something...
// Close the EntityManager instance
}
}
This code
defect and improver use of JPA EntityManagerFactory was causing a leak or
accumulation of metadata repository instances within the OpenJPA _listeners
data structure demonstrated from the earlier JVM heap dump analysis.
The solution
of the problem was to centralize the management & life cycle of the thread
safe javax.persistence.EntityManagerFactory via the Singleton pattern. The
final solution was implemented as per below:
- Create and maintain only one static instance of javax.persistence.EntityManagerFactory
per application class loader and implemented via the Singleton Pattern.
- Create and dispose new instances of EntityManager for
each application request.
Following
the implementation of the solution to our production environment, no more Java
heap OldGen memory leak is observed.
Please
feel free to provide your comments and share your experience on the same.
6 comments:
Thanks of the excellent overview. I've become a bit lazy in this field ever since we bought Plumbr (http://plumbr.eu).
Out of curiosity - how much time did it take you (MAT analysis and code walk through)?
Thanks John for your comments and very good question,
Just to you and my other readers an idea of the effort...
The root cause analysis and code walk-through was performed by myself for my IT client:
- 1 week elapsed time for JVM heap dump generation prep work, risk mitigation, planning etc.
- 4 hours for Heap Dump generation process via jmap
- 2-3 hours Heap Dump analysis via MAT tool
- 1 day of JPA / OpenJPA specs + doc review + code walkthrough
- 2 days of code resolution (implemented by the delivery team)
You can expect a longer RCA process for individuals with minimal exposure with this type of troubleshooting.
I'm planning to release in the future articles comparing the MAT analysis route vs. tool like Plumbr via our crash test application.
More to come...
Regards,
P-H
Hello, Pierre do you know if this problem occurs on openJPA 1.2? I'm with this problem in my environment (WAS 7 + OpenJPA1.2).
Thanks.
Elvis Oliveira.
Hi Elvis,
This version is also exposed to potential memory leak if the client implementation is not following the best practices. Did you confirm that you are leaking memory from the JPA internals?
Regards,
P-H
Pierre, we have analysed the JVM and of the objects that are in memory 80% are from OpenJPA, it's about 20.000 objects from the same Class. So our best shot is the OpenJPA. We will open a ticket on IBM for this.
PS.: our version of OpenJPA is 1.2.3
Thanks again.
Thanks Elvis,
Looks like that you did your due diligence.That being said, I would also recommend that you review your client implementation and ensure that is is following best practices; unless it is managed by some IBM framework/code internally.
Regardless, it is a good action item to share this problem with IBM support; including snapshots of your JVM heap dump analysis.
Thanks.
P-H
Post a Comment