The
following question will again test your knowledge of the Oracle Weblogic
threading model. I’m looking forward for your comments and experience on the
same.
If you are
a Weblogic administrator, I’m certain that you heard of this common problem: stuck
threads. This is one of the most common problems you will face when
supporting a Weblogic production environment.
A Weblogic
stuck thread simply means a thread performing the same request for a
very long time and more than the configurable Stuck
Thread Max Time.
Question:
How can
you detect the presence of STUCK threads during and following a production
incident?
Answer:
As we saw
from our last article “Weblogic
Thread Monitoring Tips”, Weblogic provides functionalities allowing us to
closely monitor its internal self-tuning thread pool. It will also highlight
you the presence of any stuck thread.
This
monitoring view is very useful when you do a live analysis but what about after
a production incident? The good news is that Oracle Weblogic will also log any detected
stuck thread to the server log. Such information includes details on the
request and more importantly, the thread stack trace. This data is crucial and
will allow you to potentially better understand the root cause of any slowdown
condition that occurred at a certain time.
<Sep
25, 2013 7:23:02 AM EST> <Error> <WebLogicServer>
<Server1> <App1>
<[ACTIVE]
ExecuteThread: '11' for queue: 'weblogic.kernel.Default (self-tuning)'>
<BEA-000337>
<[STUCK] ExecuteThread: '35' for queue: 'weblogic.kernel.Default
(self-tuning)' has been busy for "608" seconds working on the request
"Workmanager:
default, Version: 0, Scheduled=true, Started=true, Started time: 608213 ms
POST
/App1/jsp/test.jsp HTTP/1.1
Accept:
application/x-ms-application...
Referer:
http://..
Accept-Language:
en-US
User-Agent:
Mozilla/4.0 ..
Content-Type:
application/x-www-form-urlencoded
Accept-Encoding:
gzip, deflate
Content-Length:
539
Connection:
Keep-Alive
Cache-Control:
no-cache
Cookie:
JSESSIONID=
]",
which is more than the configured time (StuckThreadMaxTime) of "600"
seconds. Stack trace:
<Application Execution Stack Trace>
...................................
javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:301)
weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:184)
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction....
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run()
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)
weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2281)
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2180)
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1491)
weblogic.work.ExecuteThread.execute(ExecuteThread.java:256)
weblogic.work.ExecuteThread.run(ExecuteThread.java:221)
Here is
one more tip: the generation and analysis of a JVM thread dump will also
highlight you stuck threads. As we can see from the snapshot below, the
Weblogic thread state is now updated to STUCK, which means that this particular
request is being executed since at least 600 seconds or 10 minutes.
This is
very useful information since the native thread state will typically remain to
RUNNABLE. The native thread state will only get updated when dealing with
BLOCKED threads etc. You have to keep in mind that RUNNABLE simply means that
this thread is healthy from a JVM perspective. However, it does not mean that it truly is from a middleware or Java EE container perspective. This is why
Oracle Weblogic has its own internal ExecuteThread state.
Finally,
if your organization or client is using any commercial monitoring tool, I recommend
that you enable some alerting around both hogging thread and stuck thread. This
will allow your support team to take some pro-active actions before the
affected Weblogic managed server(s) become fully unresponsive.
Very good article. Very useful for me
ReplyDeleteDear PH,
ReplyDeleteWould be very helpful if you could share any article related to STUCK threads in JRockit version R28 JVM
Please share a sample problem and we could open a discussion thread on JRockit R28 JVM/Threads issues.
ReplyDeleteThanks.
P-H
<[STUCK] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "600" seconds working on the request "weblogic.jms.frontend.FEConnectionFactoryImpl", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
ReplyDeleteweblogic.messaging.dispatcher.DispatcherManager.dispatcherFind(DispatcherManager.java:145)
weblogic.messaging.dispatcher.DispatcherManager.addDispatcherReference(DispatcherManager.java:107)
weblogic.jms.dispatcher.JMSDispatcherManager.addDispatcherReference(JMSDispatcherManager.java:187)
weblogic.jms.frontend.FEConnectionFactory.connectionCreateInternal(FEConnectionFactory.java:368)
weblogic.jms.frontend.FEConnectionFactoryImpl.connectionCreateInternal(FEConnectionFactoryImpl.java:79)
weblogic.jms.frontend.FEConnectionFactoryImpl.connectionCreateRequest(FEConnectionFactoryImpl.java:62)
weblogic.jms.frontend.FEConnectionFactoryImpl_WLSkel.invoke(Unknown Source)
weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:667)
weblogic.rmi.cluster.ClusterableServerRef.invoke(ClusterableServerRef.java:230)
weblogic.rmi.internal.BasicServerRef$1.run(BasicServerRef.java:522)
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363)
weblogic.security.service.SecurityManager.runAs(SecurityManager.java:146)
weblogic.rmi.internal.BasicServerRef.handleRequest(BasicServerRef.java:518)
weblogic.rmi.internal.wls.WLSExecuteRequest.run(WLSExecuteRequest.java:118)
weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
weblogic.work.ExecuteThread.run(ExecuteThread.java:178)
>
Hi P-H,
ReplyDeletewe faced stuck thread issue on our production env and upon debugging we could see most of them are similar to below .
Thread 2 locked 0x00002aaad8ca1df8 and thread 3 is waiting to aquire lock on the same. Can you please explain what are these threads in weblogic 10.3.2 .
*********
"ExecuteThread: '3' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aab58a85000 nid=0x6171 waiting for monitor entry [0x0000000041b5b000]
java.lang.Thread.State: BLOCKED (on object monitor)
at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:93)
- waiting to lock <0x00002aaad8ca1df8> (a weblogic.socket.PosixSocketMuxer$1)
at weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)
"ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00002aab588f5800 nid=0x6170 runnable [0x0000000041a5a000]
java.lang.Thread.State: RUNNABLE
at weblogic.socket.PosixSocketMuxer.poll(Native Method)
at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:102)
- locked <0x00002aaad8ca1df8> (a weblogic.socket.PosixSocketMuxer$1)
at weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)
********************************
Hi Arun,
ReplyDeleteThese BLOCKED threads are normal and due to SocketMuxer. Did you see any other threads in STUCK state?
Thanks.
P-H
hi P-H,
ReplyDeletethank you. i can see some of the below threads are in STUCK state (0x0000000049fac800), which every time (every thread dump) it locked with different ID.
And also ACTIVE thread id (0x00002aab90d9a000), which every time (every thread dump) it locked with different ID. PFB
*********************
"[STUCK] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x0000000049fac800 nid=0x61a2 runnable [0x0000000042bbf000]
java.lang.Thread.State: RUNNABLE
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.(String.java:215)
at java.lang.StringBuffer.toString(StringBuffer.java:585)
- locked <0x00002aab08039768> (a java.lang.StringBuffer)
at java.util.regex.Matcher.replaceAll(Matcher.java:817)
at java.lang.String.replaceAll(String.java:2189)
****************************************
"[STUCK] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x0000000049fac800 nid=0x61a2 runnable [0x0000000042bbf000]
java.lang.Thread.State: RUNNABLE
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuffer.append(StringBuffer.java:224)
- locked <0x00002aab08154920> (a java.lang.StringBuffer)
at java.util.regex.Matcher.appendTail(Matcher.java:769)
at java.util.regex.Matcher.replaceAll(Matcher.java:816)
at java.lang.String.replaceAll(String.java:2189)
**********************************
"[ACTIVE] ExecuteThread: '49' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002aab90d9a000 nid=0x7ef5 runnable [0x0000000052922000]
java.lang.Thread.State: RUNNABLE
at oracle.jdbc.driver.T2CStatement.t2cParseExecuteDescribe(Native Method)
at oracle.jdbc.driver.T2CStatement.executeForDescribe(T2CStatement.java:695)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1145)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1267)
at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1469)
- locked <0x00002aaae97f3a28> (a oracle.jdbc.driver.T2CConnection)
at oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:389)
at weblogic.jdbc.wrapper.Statement.executeQuery(Statement.java:479)
***********************************
"[ACTIVE] ExecuteThread: '49' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00002aab90d9a000 nid=0x7ef5 runnable [0x0000000052921000]
java.lang.Thread.State: RUNNABLE
at oracle.jdbc.driver.T2CConnection.t2cCommit(Native Method)
at oracle.jdbc.driver.T2CConnection.doCommit(T2CConnection.java:536)
at oracle.jdbc.driver.PhysicalConnection.commit(PhysicalConnection.java:3674)
- locked <0x00002aaaf5f9f3f0> (a oracle.jdbc.driver.T2CConnection)
at oracle.jdbc.driver.PhysicalConnection.commit(PhysicalConnection.java:3680)
at weblogic.jdbc.wrapper.JTSConnection.internalCommit(JTSConnection.java:448)
at weblogic.jdbc.wrapper.JTSXAResourceImpl.commit(JTSXAResourceImpl.java:52)
********************************
Hi Arun,
ReplyDeleteThe STUCK thread stack trace is exposing the problem. It appears that you are dealing with either infinite processing of String replacement or very large payload. This problem will not only trigger STUCK thread but consume excessive CPU from your production servers.
Please review the entire stack trace and determine what code is responsible to execute the String.replaceAll() operation.
Thx
P-H
Hi P-H,
ReplyDeleteCan you please look into below stuck thread exception and tell me how can I deal with it:
<[STUCK] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "645" seconds working on the request "Workmanager: weblogic.kernel.Default, Version: 0, Scheduled=false, Started=true, Started time: 645635 ms
", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
java.lang.Object.wait(Native Method)
weblogic.rjvm.ResponseImpl.waitForData(ResponseImpl.java:90)
weblogic.rjvm.ResponseImpl.getTxContext(ResponseImpl.java:130)
weblogic.rjvm.BasicOutboundRequest.sendReceive(BasicOutboundRequest.java:110)
weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:345)
weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:259)
Hi Vijay,
ReplyDeleteThis STUCK thread indicates a RMI call is not responding. Could you please share more piece of the stack trace? you can simply remove/rename any reference to the application names/package.
Thanks.
P-H
PH, great article as usual. What is the difference between hogging thread and stuck thread? When is hogging thread a problem and what can be done to prevent hogging threads?
ReplyDeleteOnce we have stuck thread, is there a way to avoid impact or is recycle the only option to release all stuck therads?
Thanks in advance.
What does this stuck thread say?
ReplyDelete"[STUCK] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" id=21 idx=0x58 tid=5932 prio=1 alive, native_blocked, daemon
at com/bharosa/uio/util/UIOSessionData.instance(UIOSessionData.java:197)[optimized]
at jsp_servlet/__shiloginpage._jspService(__shiloginpage.java:89)
at weblogic/servlet/jsp/JspBase.service(JspBase.java:34)
at weblogic/servlet/internal/StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
at weblogic/servlet/internal/StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
at weblogic/servlet/internal/ServletStubImpl.execute(ServletStubImpl.java:301)
at weblogic/servlet/internal/TailFilter.doFilter(TailFilter.java:26)
at weblogic/servlet/internal/FilterChainImpl.doFilter(FilterChainImpl.java:60)
at oracle/security/jps/ee/http/JpsAbsFilter$1.run(JpsAbsFilter.java:119)
at jrockit/vm/AccessController.doPrivileged(AccessController.java:254)[optimized]
at oracle/security/jps/util/JpsSubject.doAsPrivileged(JpsSubject.java:315)
at oracle/security/jps/ee/util/JpsPlatformUtil.runJaasMode(JpsPlatformUtil.java:442)
at oracle/security/jps/ee/http/JpsAbsFilter.runJaasMode(JpsAbsFilter.java:103)
at oracle/security/jps/ee/http/JpsAbsFilter.doFilter(JpsAbsFilter.java:171)
at oracle/security/jps/ee/http/JpsFilter.doFilter(JpsFilter.java:71)
at weblogic/servlet/internal/FilterChainImpl.doFilter(FilterChainImpl.java:60)
at oracle/security/wls/filter/SSOSessionSynchronizationFilter.doFilter(SSOSessionSynchronizationFilter.java:321)
at weblogic/servlet/internal/FilterChainImpl.doFilter(FilterChainImpl.java:60)
at oracle/security/am/agent/wls/filters/OAMServletAuthenticationFilter.doFilter(OAMServletAuthenticationFilter.java:265)
at oracle/security/am/agent/wls/filters/OAMValidationSystemFilter.doFilter(OAMValidationSystemFilter.java:133)
at oracle/security/wls/oamagent/OAMAgentWrapperFilter.doFilter(OAMAgentWrapperFilter.java:120)
at weblogic/servlet/internal/FilterChainImpl.doFilter(FilterChainImpl.java:60)
at oracle/dms/servlet/DMSServletFilter.doFilter(DMSServletFilter.java:139)
at weblogic/servlet/internal/FilterChainImpl.doFilter(FilterChainImpl.java:60)
at weblogic/servlet/internal/WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3748)
at weblogic/servlet/internal/WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3714)
at weblogic/security/acl/internal/AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)[inlined]
at weblogic/security/service/SecurityManager.runAs(SecurityManager.java:120)[optimized]
at weblogic/servlet/internal/WebAppServletContext.securedExecute(WebAppServletContext.java:2283)
at weblogic/servlet/internal/WebAppServletContext.execute(WebAppServletContext.java:2182)
at weblogic/servlet/internal/ServletRequestImpl.run(ServletRequestImpl.java:1491)
at weblogic/work/ExecuteThread.execute(ExecuteThread.java:263)[optimized]
at weblogic/work/ExecuteThread.run(ExecuteThread.java:221)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace