Pages

9.25.2013

Oracle Weblogic stuck thread detection

The following question will again test your knowledge of the Oracle Weblogic threading model. I’m looking forward for your comments and experience on the same.

If you are a Weblogic administrator, I’m certain that you heard of this common problem: stuck threads. This is one of the most common problems you will face when supporting a Weblogic production environment.

A Weblogic stuck thread simply means a thread performing the same request for a very long time and more than the configurable Stuck Thread Max Time.





Question:

How can you detect the presence of STUCK threads during and following a production incident?

Answer:

* An Oracle Weblogic 12c Thread Monitoring YouTube video is now available.

As we saw from our last article “Weblogic Thread Monitoring Tips”, Weblogic provides functionalities allowing us to closely monitor its internal self-tuning thread pool. It will also highlight you the presence of any stuck thread.





This monitoring view is very useful when you do a live analysis but what about after a production incident? The good news is that Oracle Weblogic will also log any detected stuck thread to the server log. Such information includes details on the request and more importantly, the thread stack trace. This data is crucial and will allow you to potentially better understand the root cause of any slowdown condition that occurred at a certain time.

<Sep 25, 2013 7:23:02 AM EST> <Error> <WebLogicServer> <Server1> <App1>
<[ACTIVE] ExecuteThread: '11' for queue: 'weblogic.kernel.Default (self-tuning)'>
<BEA-000337> <[STUCK] ExecuteThread: '35' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "608" seconds working on the request
"Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 608213 ms

POST /App1/jsp/test.jsp HTTP/1.1
Accept: application/x-ms-application...
Referer: http://..
Accept-Language: en-US
User-Agent: Mozilla/4.0 ..
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
Content-Length: 539
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: JSESSIONID=

]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
    <Application Execution Stack Trace>
    ...................................
               javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
               javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
               weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
               weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
               weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:301)
               weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:184)
               weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction....
               weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run()
               weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
               weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
               weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2281)
               weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2180)
               weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1491)
               weblogic.work.ExecuteThread.execute(ExecuteThread.java:256)
               weblogic.work.ExecuteThread.run(ExecuteThread.java:221)


Here is one more tip: the generation and analysis of a JVM thread dump will also highlight you stuck threads. As we can see from the snapshot below, the Weblogic thread state is now updated to STUCK, which means that this particular request is being executed since at least 600 seconds or 10 minutes.




This is very useful information since the native thread state will typically remain to RUNNABLE. The native thread state will only get updated when dealing with BLOCKED threads etc. You have to keep in mind that RUNNABLE simply means that this thread is healthy from a JVM perspective. However, it does not mean that it truly is from a middleware or Java EE container perspective. This is why Oracle Weblogic has its own internal ExecuteThread state.

Finally, if your organization or client is using any commercial monitoring tool, I recommend that you enable some alerting around both hogging thread and stuck thread. This will allow your support team to take some pro-active actions before the affected Weblogic managed server(s) become fully unresponsive.

12 comments:

  1. Very good article. Very useful for me

    ReplyDelete
  2. Dear PH,

    Would be very helpful if you could share any article related to STUCK threads in JRockit version R28 JVM

    ReplyDelete
  3. Please share a sample problem and we could open a discussion thread on JRockit R28 JVM/Threads issues.

    Thanks.
    P-H

    ReplyDelete
  4. <[STUCK] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "600" seconds working on the request "weblogic.jms.frontend.FEConnectionFactoryImpl", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:

    weblogic.messaging.dispatcher.DispatcherManager.dispatcherFind(DispatcherManager.java:145)

    weblogic.messaging.dispatcher.DispatcherManager.addDispatcherReference(DispatcherManager.java:107)

    weblogic.jms.dispatcher.JMSDispatcherManager.addDispatcherReference(JMSDispatcherManager.java:187)

    weblogic.jms.frontend.FEConnectionFactory.connectionCreateInternal(FEConnectionFactory.java:368)

    weblogic.jms.frontend.FEConnectionFactoryImpl.connectionCreateInternal(FEConnectionFactoryImpl.java:79)

    weblogic.jms.frontend.FEConnectionFactoryImpl.connectionCreateRequest(FEConnectionFactoryImpl.java:62)

    weblogic.jms.frontend.FEConnectionFactoryImpl_WLSkel.invoke(Unknown Source)

    weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:667)

    weblogic.rmi.cluster.ClusterableServerRef.invoke(ClusterableServerRef.java:230)

    weblogic.rmi.internal.BasicServerRef$1.run(BasicServerRef.java:522)

    weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363)

    weblogic.security.service.SecurityManager.runAs(SecurityManager.java:146)

    weblogic.rmi.internal.BasicServerRef.handleRequest(BasicServerRef.java:518)

    weblogic.rmi.internal.wls.WLSExecuteRequest.run(WLSExecuteRequest.java:118)

    weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)

    weblogic.work.ExecuteThread.run(ExecuteThread.java:178)

    >

    ReplyDelete
  5. Hi P-H,
    we faced stuck thread issue on our production env and upon debugging we could see most of them are similar to below .

    Thread 2 locked 0x00002aaad8ca1df8 and thread 3 is waiting to aquire lock on the same. Can you please explain what are these threads in weblogic 10.3.2 .

    *********
    "ExecuteThread: '3' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aab58a85000 nid=0x6171 waiting for monitor entry [0x0000000041b5b000]
    java.lang.Thread.State: BLOCKED (on object monitor)
    at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:93)
    - waiting to lock <0x00002aaad8ca1df8> (a weblogic.socket.PosixSocketMuxer$1)
    at weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
    at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
    at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
    at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)

    "ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aab588f5800 nid=0x6170 runnable [0x0000000041a5a000]
    java.lang.Thread.State: RUNNABLE
    at weblogic.socket.PosixSocketMuxer.poll(Native Method)
    at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:102)
    - locked <0x00002aaad8ca1df8> (a weblogic.socket.PosixSocketMuxer$1)
    at weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
    at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
    at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
    at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)

    ********************************

    ReplyDelete
  6. Hi Arun,

    These BLOCKED threads are normal and due to SocketMuxer. Did you see any other threads in STUCK state?

    Thanks.
    P-H

    ReplyDelete
  7. hi P-H,
    thank you. i can see some of the below threads are in STUCK state (0x0000000049fac800), which every time (every thread dump) it locked with different ID.
    And also ACTIVE thread id (0x00002aab90d9a000), which every time (every thread dump) it locked with different ID. PFB



    *********************
    "[STUCK] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x0000000049fac800 nid=0x61a2 runnable [0x0000000042bbf000]
    java.lang.Thread.State: RUNNABLE
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.(String.java:215)
    at java.lang.StringBuffer.toString(StringBuffer.java:585)
    - locked <0x00002aab08039768> (a java.lang.StringBuffer)
    at java.util.regex.Matcher.replaceAll(Matcher.java:817)
    at java.lang.String.replaceAll(String.java:2189)
    ****************************************

    "[STUCK] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x0000000049fac800 nid=0x61a2 runnable [0x0000000042bbf000]
    java.lang.Thread.State: RUNNABLE
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
    at java.lang.StringBuffer.append(StringBuffer.java:224)
    - locked <0x00002aab08154920> (a java.lang.StringBuffer)
    at java.util.regex.Matcher.appendTail(Matcher.java:769)
    at java.util.regex.Matcher.replaceAll(Matcher.java:816)
    at java.lang.String.replaceAll(String.java:2189)
    **********************************
    "[ACTIVE] ExecuteThread: '49' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002aab90d9a000 nid=0x7ef5 runnable [0x0000000052922000]
    java.lang.Thread.State: RUNNABLE
    at oracle.jdbc.driver.T2CStatement.t2cParseExecuteDescribe(Native Method)
    at oracle.jdbc.driver.T2CStatement.executeForDescribe(T2CStatement.java:695)
    at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1145)
    at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1267)
    at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1469)
    - locked <0x00002aaae97f3a28> (a oracle.jdbc.driver.T2CConnection)
    at oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:389)
    at weblogic.jdbc.wrapper.Statement.executeQuery(Statement.java:479)

    ***********************************
    "[ACTIVE] ExecuteThread: '49' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002aab90d9a000 nid=0x7ef5 runnable [0x0000000052921000]
    java.lang.Thread.State: RUNNABLE
    at oracle.jdbc.driver.T2CConnection.t2cCommit(Native Method)
    at oracle.jdbc.driver.T2CConnection.doCommit(T2CConnection.java:536)
    at oracle.jdbc.driver.PhysicalConnection.commit(PhysicalConnection.java:3674)
    - locked <0x00002aaaf5f9f3f0> (a oracle.jdbc.driver.T2CConnection)
    at oracle.jdbc.driver.PhysicalConnection.commit(PhysicalConnection.java:3680)
    at weblogic.jdbc.wrapper.JTSConnection.internalCommit(JTSConnection.java:448)
    at weblogic.jdbc.wrapper.JTSXAResourceImpl.commit(JTSXAResourceImpl.java:52)

    ********************************

    ReplyDelete
  8. Hi Arun,

    The STUCK thread stack trace is exposing the problem. It appears that you are dealing with either infinite processing of String replacement or very large payload. This problem will not only trigger STUCK thread but consume excessive CPU from your production servers.

    Please review the entire stack trace and determine what code is responsible to execute the String.replaceAll() operation.

    Thx
    P-H

    ReplyDelete
  9. Hi P-H,

    Can you please look into below stuck thread exception and tell me how can I deal with it:

    <[STUCK] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "645" seconds working on the request "Workmanager: weblogic.kernel.Default, Version: 0, Scheduled=false, Started=true, Started time: 645635 ms
    ", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
    java.lang.Object.wait(Native Method)
    weblogic.rjvm.ResponseImpl.waitForData(ResponseImpl.java:90)
    weblogic.rjvm.ResponseImpl.getTxContext(ResponseImpl.java:130)
    weblogic.rjvm.BasicOutboundRequest.sendReceive(BasicOutboundRequest.java:110)
    weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:345)
    weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:259)

    ReplyDelete
  10. Hi Vijay,

    This STUCK thread indicates a RMI call is not responding. Could you please share more piece of the stack trace? you can simply remove/rename any reference to the application names/package.

    Thanks.
    P-H

    ReplyDelete
  11. PH, great article as usual. What is the difference between hogging thread and stuck thread? When is hogging thread a problem and what can be done to prevent hogging threads?
    Once we have stuck thread, is there a way to avoid impact or is recycle the only option to release all stuck therads?
    Thanks in advance.

    ReplyDelete
  12. What does this stuck thread say?

    "[STUCK] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" id=21 idx=0x58 tid=5932 prio=1 alive, native_blocked, daemon
    at com/bharosa/uio/util/UIOSessionData.instance(UIOSessionData.java:197)[optimized]
    at jsp_servlet/__shiloginpage._jspService(__shiloginpage.java:89)
    at weblogic/servlet/jsp/JspBase.service(JspBase.java:34)
    at weblogic/servlet/internal/StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
    at weblogic/servlet/internal/StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
    at weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:301)
    at weblogic/servlet/internal/TailFilter.doFilter(TailFilter.java:26)
    at weblogic/servlet/internal/FilterChainImpl.doFilter(FilterChainImpl.java:60)
    at oracle/security/jps/ee/http/JpsAbsFilter$1.run(JpsAbsFilter.java:119)
    at jrockit/vm/AccessController.doPrivileged(AccessController.java:254)[optimized]
    at oracle/security/jps/util/JpsSubject.doAsPrivileged(JpsSubject.java:315)
    at oracle/security/jps/ee/util/JpsPlatformUtil.runJaasMode(JpsPlatformUtil.java:442)
    at oracle/security/jps/ee/http/JpsAbsFilter.runJaasMode(JpsAbsFilter.java:103)
    at oracle/security/jps/ee/http/JpsAbsFilter.doFilter(JpsAbsFilter.java:171)
    at oracle/security/jps/ee/http/JpsFilter.doFilter(JpsFilter.java:71)
    at weblogic/servlet/internal/FilterChainImpl.doFilter(FilterChainImpl.java:60)
    at oracle/security/wls/filter/SSOSessionSynchronizationFilter.doFilter(SSOSessionSynchronizationFilter.java:321)
    at weblogic/servlet/internal/FilterChainImpl.doFilter(FilterChainImpl.java:60)
    at oracle/security/am/agent/wls/filters/OAMServletAuthenticationFilter.doFilter(OAMServletAuthenticationFilter.java:265)
    at oracle/security/am/agent/wls/filters/OAMValidationSystemFilter.doFilter(OAMValidationSystemFilter.java:133)
    at oracle/security/wls/oamagent/OAMAgentWrapperFilter.doFilter(OAMAgentWrapperFilter.java:120)
    at weblogic/servlet/internal/FilterChainImpl.doFilter(FilterChainImpl.java:60)
    at oracle/dms/servlet/DMSServletFilter.doFilter(DMSServletFilter.java:139)
    at weblogic/servlet/internal/FilterChainImpl.doFilter(FilterChainImpl.java:60)
    at weblogic/servlet/internal/WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3748)
    at weblogic/servlet/internal/WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3714)
    at weblogic/security/acl/internal/AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)[inlined]
    at weblogic/security/service/SecurityManager.runAs(SecurityManager.java:120)[optimized]
    at weblogic/servlet/internal/WebAppServletContext.securedExecute(WebAppServletContext.java:2283)
    at weblogic/servlet/internal/WebAppServletContext.execute(WebAppServletContext.java:2182)
    at weblogic/servlet/internal/ServletRequestImpl.run(ServletRequestImpl.java:1491)
    at weblogic/work/ExecuteThread.execute(ExecuteThread.java:263)[optimized]
    at weblogic/work/ExecuteThread.run(ExecuteThread.java:221)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

    ReplyDelete