This case study describes the complete steps of a root cause analysis, tuning and solution implemented to overcome a java.lang.OutOfMemoryError problem with a Red Hat JBoss production environment.
Thread Dump analysis
Environment specifications
· Java EE server: Red Hat JBoss [EAP] 5.0.0.GA
· Hardware: Sun-Fire-15000
· OS: SunOS 5.10,sparc
· JDK: Java 1.6.0_17, Sun Microsystems Inc.
· JVM settings: -Xms1024m, -Xmx1024m
· RDBMS: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production
· Platform type: Middle tier - Web Service provider
Monitoring and troubleshooting tools
- JVM Thread Dump (Sun JDK 1.6 format)
- JConsole (from Sun 1.6 JDK)
- Eclipse Memory Analyser 0.6.0.2 (via IBM Support Assistant 4.1)
- Apache JMeter (load testing tool)
Problem overview
· Problem
type: Java Heap memory problem
OutOfMemoryError observed from all JBOSS production server
logs causing timeout to our upstream applications (client applications
invoking our JBOSS Web Services).
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at sun.security.util.ManifestDigester.<init>(ManifestDigester.java:117)
at java.util.jar.JarVerifier.processEntry(JarVerifier.java:250)
at java.util.jar.JarVerifier.update(JarVerifier.java:188)
at java.util.jar.JarFile.initializeVerifier(JarFile.java:321)
at java.util.jar.JarFile.getInputStream(JarFile.java:386)
at org.jboss.virtual.plugins.context.zip.ZipFileWrapper.openStream(ZipFileWrapper.java:215)
at org.jboss.virtual.plugins.context.zip.ZipEntryContext.openStream(ZipEntryContext.java:1084)
at org.jboss.virtual.plugins.context.zip.ZipEntryHandler.openStream(ZipEntryHandler.java:154)
at org.jboss.virtual.VirtualFile.openStream(VirtualFile.java:241)
Gathering and validation of facts
As usual, a Java EE problem investigation requires gathering of technical and non technical facts so we can either derived other facts and/or conclude on the root cause. Before applying a corrective measure, the facts below were verified in order to conclude on the root cause:
· Recent change of the platform? Yes, a deployment was performed a few days before the incident which did increase load to the affected JBoss environment by a factor of 50%
· java.lang.OutOfMemoryError: GC overhead limit exceeded description? This normally indicates the Java Heap utilization is too high, causing too many full garbage collection iterations
· Any upstream and/or downstream system problem reported? Yes, a problem with the primary downstream system was reported prior to the JBoss incident. Http timeouts were observed within the JBoss logs.
Several Thread Dump snapshots were captured during the OutOfMemoryError condition and were analysed. Additional findings and facts were derived from thread dump analysis:
· Total JBoss Thread Pool current count was: 200
· Thread pool free count was: 0
· ~50 of the threads were in IO socket.read() blocking state; waiting response from the primary downstream system (another Web Service provider)
"http-0.0.0.0-8443-102" daemon prio=3 tid=0x022a6400 nid=0x1bd runnable [0x78efb000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:331)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:789)
- locked <0xdd0ed968> (a java.lang.Object)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:746)
at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
- locked <0xdd0eda88> (a com.sun.net.ssl.internal.ssl.AppInputStream)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked <0xddb1f6d0> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072)
- locked <0xdd0e5698> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:373)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
at org.jboss.remoting.transport.http.HTTPClientInvoker.getResponseCode(HTTPClientInvoker.java:1269)
at org.jboss.remoting.transport.http.HTTPClientInvoker.useHttpURLConnection(HTTPClientInvoker.java:351)
at org.jboss.remoting.transport.http.HTTPClientInvoker.makeInvocation(HTTPClientInvoker.java:232)
at org.jboss.remoting.transport.http.HTTPClientInvoker.transport(HTTPClientInvoker.java:162)
at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:167)
at org.jboss.remoting.Client.invoke(Client.java:1917)
at org.jboss.remoting.Client.invoke(Client.java:768)
at org.jboss.ws.core.client.HTTPRemotingConnection.invoke(HTTPRemotingConnection.java:232)
at org.jboss.ws.core.client.SOAPProtocolConnectionHTTP.invoke(SOAPProtocolConnectionHTTP.java:71)
at org.jboss.ws.core.CommonClient.invoke(CommonClient.java:339)
at org.jboss.ws.core.jaxws.client.ClientImpl.invoke(ClientImpl.java:231)
at org.jboss.ws.core.jaxws.client.ClientProxy.invoke(ClientProxy.java:162)
at org.jboss.ws.core.jaxws.client.ClientProxy.invoke(ClientProxy.java:148)
…………………………………………………………………….
- ~50 of the threads were in BLOCKED state, waiting to get a free Bean from the JBoss EJB3 Stateless pool.
"http-0.0.0.0-8443-103" daemon prio=3 tid=0x06cc3400 nid=0x1be waiting for monitor entry [0x7a1fd000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.jboss.ejb3.pool.ThreadlocalPool.get(ThreadlocalPool.java:84)
- waiting to lock <0x9f84d370> (a org.jboss.ejb3.InfinitePool)
at org.jboss.ejb3.stateless.StatelessInstanceInterceptor.invoke(StatelessInstanceInterceptor.java:58)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
at org.jboss.ejb3.tx.NullInterceptor.invoke(NullInterceptor.java:42)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
at org.jboss.aspects.tx.TxPropagationInterceptor.invoke(TxPropagationInterceptor.java:76)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
at org.jboss.ejb3.tx.NullInterceptor.invoke(NullInterceptor.java:42)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
at org.jboss.ejb3.security.RoleBasedAuthorizationInterceptorv2.invoke(RoleBasedAuthorizationInterceptorv2.java:201)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
at org.jboss.ejb3.security.Ejb3AuthenticationInterceptorv2.invoke(Ejb3AuthenticationInterceptorv2.java:186)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
at org.jboss.ejb3.ENCPropagationInterceptor.invoke(ENCPropagationInterceptor.java:41)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
at org.jboss.ejb3.BlockContainerShutdownInterceptor.invoke(BlockContainerShutdownInterceptor.java:67)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
at org.jboss.aspects.currentinvocation.CurrentInvocationInterceptor.invoke(CurrentInvocationInterceptor.java:67)
at org.jboss.aop.joinpoint.MethodInvocation.invokeNext(MethodInvocation.java:102)
at org.jboss.ejb3.stateless.StatelessContainer.localInvoke(StatelessContainer.java:306)
at org.jboss.ejb3.stateless.StatelessContainer.invokeEndpoint(StatelessContainer.java:662)
at org.jboss.wsf.container.jboss50.invocation.InvocationHandlerEJB3.invoke(InvocationHandlerEJB3.java:96)
at org.jboss.ws.core.server.ServiceEndpointInvoker.invoke(ServiceEndpointInvoker.java:221)
at org.jboss.wsf.stack.jbws.RequestHandlerImpl.processRequest(RequestHandlerImpl.java:468)
at org.jboss.wsf.stack.jbws.RequestHandlerImpl.handleRequest(RequestHandlerImpl.java:293)
at org.jboss.wsf.stack.jbws.RequestHandlerImpl.doPost(RequestHandlerImpl.java:203)
at org.jboss.wsf.stack.jbws.RequestHandlerImpl.handleHttpRequest(RequestHandlerImpl.java:129)
at org.jboss.wsf.common.servlet.AbstractEndpointServlet.service(AbstractEndpointServlet.java:85)
- The other threads did not show any significant or interesting pattern
Thread Dump analysis, facts and conclusions
Thread dump analysis did confirm the additional facts below:
· Problem with our primary downstream system was confirmed given the high amount of thread waiting on blocking IO / socket.read() for a very long time
· The maximum pool size of the JBoss EJB3 of our stateless Web Services was reached, potentially explaining why the Java Heap utilization was so high at that time, ultimately failing with OutOfMemoryError condition
· Conclusion #1: The instability and slowdown of our main downstream system was the trigger of he incident / OutOfMemoryError condition
· Conclusion #2: The incident did reveal a possible lack of HTTP timeout between the JBoss application and the affected downstream system (Web Service provider)
· Conclusion #3: The incident did reveal a possible memory footprint problem of our JBoss application
Problem replication & performance testing
Such problematic scenario (downstream system slowdown) was easily replicated in our performance test environment using JMeter; simulating and reproducing almost exactly what happened in production.
Additional Thread Dump snapshots were captured and compared with the production environment.
JVM Heap Dump files were also captured using JConsole and analysed using the Eclipse Memory Analyser.
Heap dump analysis
The heap dump was analysed using Eclipse Memory Analyser in order to better understand the high memory footprint under heavy load and negative scenario.
Analysis did reveal:
· High amount of retained Web Service EJB3 Stateless bean and JBoss ClientEndPointMetaData
· Memory footprint of each instance of ClientEndPointMetaData taking 5 MB+ of memory
· The Heap Dump findings did correlate with Thread Dump analysis and the above facts gathered
Root cause
The problem replication and further analysis of the Heap dump and Thread dump captured during performance testing revealed:
· Our application Web Service footprint for a single call is quite high can take up to 5.5-6MB of Java Heap
· Lack of Web Service HTTP timeout between our JBOSS Web Service and one of our downstream systems
· Lack of JBOSS – EJB3 Pool tuning (EJB pool was allowed to grow to an amount our system could not handled due to application footprint and current JVM memory settings)
Given the above problems, any slowdown of our primary downstream systems (non happy path) is leading to rapid Thread surge (Thread waiting for downstream system response). The combination of lack of HTTP timeout and lack of EJB pool tuning/cap was causing sharp increase of Java heap memory since each request can take up to 5MB+ of space on Java heap; leading to its rapid depletion under heavy load.
Solution and tuning
3 areas were looked at during performance testing and tuning exercise:
· Revisit Web Service / application memory footprint and identify opportunity to reduce
· Implement HTTP timeout between JBOSS and downstream system Web Services up to 30 sec; preventing JBOSS threads to remain in hang state for too long and using too much Java heap memory
· Implement a wait timeout and cap on # of bean within EJB3 pool up to 30. Such tuning did allow for a more predictable memory footprint during negative scenarios / non happy paths
· Tuning #2 was achieved programmatically via StubExt.PROPERTY_CLIENT_TIMEOUT
· Tuning #3 was achieved programmatically and later externalized via the following EJB3 annotation @Pool (value=PoolDefaults.POOL_IMPLEMENTATION_STRICTMAX, maxSize=30, timeout=10000)
Conclusion
Such tuning was also deployed to our production environment since a few weeks now and so far it is showing a very healthy Java heap footprint with no re-occurrence.
The application memory footprint will also be revisited on future releases.
The performance testing done before and after tuning of item #2 and item #3 did reveal great improvement of the JBOSS environment and overall reduction to the Java Heap memory footprint during negative testing scenarios.
The application memory footprint will also be revisited on future releases.
0 comments:
Post a Comment