/ 2018 ~ Java EE Support Patterns

4.11.2018

Oracle WebLogic Native IO & Java Muxers

This article will provide the complete root cause analysis details and resolution of a Java performance problem affecting a legacy Oracle WebLogic 11g production environment and involving Socket Muxers.

This performance problem was identified while performing a workload migration and performance assessment of a WLS11g environment to RedHat OpenShift container and PaaS platform.

While “Muxers” is an old concept, this post will demonstrate the importance of adequate knowledge of native IO configuration and runtime behavior within a Java EE container such as Oracle WebLogic.

Environment Specifications

  • Workload location: On-Premises Data Center
  • Business domain: Telecommunications
  • NLB & Web server: F5 NLB & Apache
  • Java EE container: Oracle WebLogic 11g
  • JDK/JRE: Oracle HotSpot JVM 1.7 64-bit
  • OS: Solaris 11

APM & Troubleshooting Tools

  • Cisco AppDynamics
  • WebLogic 11g Admin console & logs
  • JVM Thread Dump analysis

References


Problem & Observations

The problem was first communicated by our production Ops team following recent performance degradation complaints by the end-users under peak load. An initial root cause analysis exercise did reveal the following facts and observations:

  • Response time spikes were observed on regular basis and especially under peak load.
  • An analysis of AppDynamics data did expose unexpected delay for inbound traffic via HTTPS.
  • Processing time of the application web requests (after body/payload received) was found to be optimal and < 1 sec.
  • An initial review of the WebLogic Threads and JVM Thread Dump did not expose any bottleneck or contention within the application code.
  • Network packet analysis did not expose any network latency but isolated the response time delay within the WebLogic server tier.

JVM Thread Dump analysis – second pass

Another analysis iteration was performed of the JVM Thread Dump data captured which did reveal the following findings:
 

As we can see from the above image, it was identified that “Java Muxers” threads were being used for the overall WebLogic Network I/O. In general, it is not recommended enabling the Java Muxers since they offer poor scalability and suboptimal performance vs. native Muxers or more recent NIO Muxers. Java Muxers block on “reads” until there is data to be read from a socket and does not scale well when dealing with a large influx of inbound web requests.

The following Thread stacktrace can be found from the thread dump when using NIO (Oracle WebLogic 12.2.x).




Following the above finding, a review of the WebLogic 11g configuration was performed but did not reveal any problem (native IO enabled). The next phase of the RCA was now to determine why Java Muxers were enabled by WebLogic on start-up.

Root Cause and Solution

The root cause was finally identified following a review of the WebLogic start-up logs.

<[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1248787500274> <BEA-000447> <Native IO Disabled. Using Java IO.>

As per above, it was found that Native IO was disabled on start-up due to a problem with the “Performance Pack”, which includes the Native Muxers, falling back on Java IO but still allowing the WebLogic server to start properly.

Furthermore, it was identified that the JVM 1.7 start-up parameters did not include the “-d64” which was confusing & preventing WebLogic from loading the proper 64-bit Performance Pack library, thus disabling Native IO and falling back on the Java Muxers.

Following the implementation of the solution (restoration of the Native Muxers) to the production environment, we could observe a significant improvement of the application performance and improved scalability.