/ September 2013 ~ Java EE Support Patterns


Oracle Weblogic stuck thread detection

The following question will again test your knowledge of the Oracle Weblogic threading model. I’m looking forward for your comments and experience on the same.

If you are a Weblogic administrator, I’m certain that you heard of this common problem: stuck threads. This is one of the most common problems you will face when supporting a Weblogic production environment.

A Weblogic stuck thread simply means a thread performing the same request for a very long time and more than the configurable Stuck Thread Max Time.


How can you detect the presence of STUCK threads during and following a production incident?


* An Oracle Weblogic 12c Thread Monitoring YouTube video is now available.

As we saw from our last article “Weblogic Thread Monitoring Tips”, Weblogic provides functionalities allowing us to closely monitor its internal self-tuning thread pool. It will also highlight you the presence of any stuck thread.

This monitoring view is very useful when you do a live analysis but what about after a production incident? The good news is that Oracle Weblogic will also log any detected stuck thread to the server log. Such information includes details on the request and more importantly, the thread stack trace. This data is crucial and will allow you to potentially better understand the root cause of any slowdown condition that occurred at a certain time.

<Sep 25, 2013 7:23:02 AM EST> <Error> <WebLogicServer> <Server1> <App1>
<[ACTIVE] ExecuteThread: '11' for queue: 'weblogic.kernel.Default (self-tuning)'>
<BEA-000337> <[STUCK] ExecuteThread: '35' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "608" seconds working on the request
"Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 608213 ms

POST /App1/jsp/test.jsp HTTP/1.1
Accept: application/x-ms-application...
Referer: http://..
Accept-Language: en-US
User-Agent: Mozilla/4.0 ..
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
Content-Length: 539
Connection: Keep-Alive
Cache-Control: no-cache

]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
    <Application Execution Stack Trace>

Here is one more tip: the generation and analysis of a JVM thread dump will also highlight you stuck threads. As we can see from the snapshot below, the Weblogic thread state is now updated to STUCK, which means that this particular request is being executed since at least 600 seconds or 10 minutes.

This is very useful information since the native thread state will typically remain to RUNNABLE. The native thread state will only get updated when dealing with BLOCKED threads etc. You have to keep in mind that RUNNABLE simply means that this thread is healthy from a JVM perspective. However, it does not mean that it truly is from a middleware or Java EE container perspective. This is why Oracle Weblogic has its own internal ExecuteThread state.

Finally, if your organization or client is using any commercial monitoring tool, I recommend that you enable some alerting around both hogging thread and stuck thread. This will allow your support team to take some pro-active actions before the affected Weblogic managed server(s) become fully unresponsive.