This
article will provide the complete root cause analysis details and resolution of
a Java performance problem affecting a legacy Oracle WebLogic 11g production
environment and involving Socket Muxers.
This
performance problem was identified while performing a workload migration and
performance assessment of a WLS11g environment to RedHat OpenShift container
and PaaS platform.
While “Muxers”
is an old concept, this post will demonstrate the importance of adequate knowledge
of native IO configuration and runtime behavior within a Java EE container
such as Oracle WebLogic.
Environment Specifications
- Workload
location: On-Premises Data Center
- Business domain: Telecommunications
- NLB & Web
server: F5 NLB & Apache
- Java EE container:
Oracle WebLogic 11g
- JDK/JRE:
Oracle HotSpot JVM 1.7 64-bit
- OS:
Solaris 11
APM & Troubleshooting Tools
- Cisco AppDynamics
- WebLogic 11g
Admin console & logs
- JVM Thread Dump
analysis
References
Problem & Observations
The
problem was first communicated by our production Ops team following recent performance
degradation complaints by the end-users under peak load. An initial root cause
analysis exercise did reveal the following facts and observations:
- Response time spikes were observed on regular basis
and especially under peak load.
- An analysis of AppDynamics data did expose
unexpected delay for inbound traffic via HTTPS.
- Processing time of the application web requests
(after body/payload received) was found to be optimal and < 1 sec.
- An initial review of the WebLogic Threads and JVM
Thread Dump did not expose any bottleneck or contention within the
application code.
- Network packet analysis did not expose any network
latency but isolated the response time delay within the WebLogic server
tier.
JVM Thread Dump analysis – second pass
Another
analysis iteration was performed of the JVM Thread Dump data captured which did
reveal the following findings:
As we can
see from the above image, it was identified that “Java Muxers” threads were
being used for the overall WebLogic Network I/O. In general, it is not
recommended enabling the Java Muxers since they offer poor scalability and
suboptimal performance vs. native Muxers or more recent NIO Muxers. Java Muxers
block on “reads” until there is data to be read from a socket and does not
scale well when dealing with a large influx of inbound web requests.
The
following Thread stacktrace can be found from the thread dump when using NIO
(Oracle WebLogic 12.2.x).
Following
the above finding, a review of the WebLogic 11g configuration was performed but
did not reveal any problem (native IO enabled). The next phase of the RCA was
now to determine why Java Muxers were enabled by WebLogic on start-up.
Root Cause and Solution
The root
cause was finally identified following a review of the WebLogic start-up logs.
<[ACTIVE]
ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'>
<<WLS Kernel>> <> <> <1248787500274>
<BEA-000447> <Native
IO Disabled. Using
Java IO.>
As per
above, it was found that Native IO was disabled on start-up due to a problem
with the “Performance Pack”, which includes the Native Muxers, falling back on Java
IO but still allowing the WebLogic server to start properly.
Furthermore,
it was identified that the JVM 1.7 start-up parameters did not include the “-d64”
which was confusing & preventing WebLogic from loading the proper 64-bit
Performance Pack library, thus disabling Native IO and falling back on the Java
Muxers.
Following
the implementation of the solution (restoration of the Native Muxers) to the
production environment, we could observe a significant improvement of the
application performance and improved scalability.