The short article will provide you with the most common problem patterns you can face with hanging Java Threads at socketinputstream.socketread0.
For more detail and troubleshooting approaches for this type of problem, please visit my orignal post on this subject.
Problem overview
java.net.socketinputstream.socketread0() – why is it hanging?
java.net.socketinputstream.socketread0() – what is the solution?
Final recommendation – timeout implementation is critical!
For more detail and troubleshooting approaches for this type of problem, please visit my orignal post on this subject.
Problem overview
Any communication protocol such as HTTP / HTTPS, JDBC, RMI etc. ultimately rely on the JDK java.net layer to perform lower TCP-IP / Socket operations. The creation of a java.net.Socket is required in order for your application & JVM to connect, send and receive the data from an external source (Web Service, Oracle database etc.).
The SocketInputStream.socketRead0 is the actual native blocking IO operation executed by the JDK in order to read and receive the data from the remote source. This is why Thread hanging problems on such operation is so common in the Java EE world.
There are a few common scenarios which can lead your application and Java EE server Threads to hang for some time or even forever at java.net.socketinputstream.socketread0.
# Problem pattern #1
Slowdown or instability of a remote service provider invoked by your application such as:
- A Web Service provider (via HTTP/HTTPS)
- A RDBMS (Oracle) database
- A RMI server etc.
- Other remote service providers (FTP, pure TCP-IP etc.)
This is by far the most common problem ~90%+. See below an example of hang Thread from the Thread Dump data extract due to instability of a remote Web Service provider:
# Problem pattern #2
Functional problem causing long running transaction(s) from your remote service provider
This is quite similar to problem pattern #1 but the difference is that the remote service provider is healthy but taking more time to process certain requests from your application due to a bad functional behaviour.
A good example is a long running Oracle database SQL query (lack of indexes, execution plan issue etc.) that will show up in the Thread Dump as per below:
# Problem pattern #3
Intermittent or consistent network slowness or latency.
Severe network latency will cause the data transfer between the client and server to slowdown, causing the SocketInputStream write() and read() operations to take more time to complete and Thread to hang at socketinputstream.socketread0 until the bytes data is received from the network .
# Problem pattern #1
The solution for this problem pattern is to contact the support team of the affected remote service provider and share your observations from your application, Threads etc. so they can investigate and resolve their global system problem.
# Problem pattern #2
The solution for this problem pattern will depend of the technology involved. A root cause analysis must be performed in order to identify and fix the culprit (missing table indexes, too much data returned from the Web Service etc.).
# Problem pattern #3
The solution for this problem pattern will require the engagement of your network support team so they can conduct some a “sniffing” activity of the TCP-IP packets between your application server(s) and your affected remote service provider (s).
You should also attempt to replicate the problem using OS commands such as ping and traceroute to provide some guidance to your network support team.
Proper timeouts should be implemented, when possible, in order to prevent a domino affect situation on your Java EE application server. The timeout value should be low enough to prevent your Threads to hang for too long and tested properly (negative testing).
Socket timeouts (connect, read & write operations) for Web Services via HTTP / HTTPS are quite easy to implement and can be achieved by following your proper API documentation (JAX-WS, Weblogic, WAS, JBoss WS, Apache AXIS etc.).
12 comments:
Hi P-H,
Thanks for this article, if we are getting timeout for webservice/Oracle etc... call and due this thread are in stucks state then what would be the best option to fix issue,
1. Need to check with downstream (if downstream systems are fine then what would be the next step.)
Need to recycle to release hung threads (if we recycle then there might be chances to loose those transaction which has already in and processing to complete)
Is there anyway to relaese hung threads without recycle the server.
Thanks,
Sujit
Thanks Sujit,
Stuck Threads cannot be released manualy as there is no Java specification allowing such operation. Thread.stop() etc. are deprecated Java operations due to its risk of data corruption etc.
The best solution is problem prevention such timeout implementation etc.
Thanks.
P-H
Hi P-H,
Thank you for this article.
Can the problem pattern #1 be caused by problem pattern #3.
Thanks
KRR
Hi KRR,
Yes, intermittent or sudden network slowness will also generate similar symptoms as per problem pattern #1 e.g. you will see thread stuck in either socket.write() and / or socket.read().
Thanks.
P-H
Hi P-H,
Could you please provide the suggestion on fixing the below exception.
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:331)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at org.apache.axis.transport.http.HTTPSender.readHeadersFromSocket(HTTPSender.java:583)
at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSe...
Thanks,
KRR
Hi KRR,
This exception an actual IO timeout triggered from the Apache Axis/JDK SSL layer. As per this article, read timed out means that your application (client) stopped to wait for the response from the remote side. This is a typical Exception to get when your remote system is taking too much time to reply.
Please do and share the following:
1) Identify from the above stacktrace which remote Web Service system your application is trying to invoke
2) Engage the support team of this remote system provider and investigate any high response time problem pattern
) Revisit the timeout value configured from your Apache Axis client application, verify if the value is set too low vs. average response time of your remote WS provider
Thanks.
P-H
Hi P-H,
Could you please take a look at below stack trace and suggest where the threads are hung:
There is/are 8 thr
ead(s) in total in the server that may be hung.
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:155)
at java.io.FilterInputStream.read(FilterInputStream.java:134)
at com.wily.introscope.agent.probe.net.ManagedSocketInputStream.read(ManagedSocketInputStream.java:214)
at java.io.FilterInputStream.read(FilterInputStream.java:134)
at com.siebel.analytics.web.sawconnect.SAWConnection$NotifyInputStream.read(SAWConnection.java:131)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
at java.io.BufferedInputStream.read(BufferedInputStream.java:246)
at com.siebel.analytics.web.sawconnect.sawprotocol.SAWProtocol.readInt(SAWProtocol.java:162)
at com.siebel.analytics.web.sawconnect.sawprotocol.SAWProtocolInputStreamImpl.readChunkHeader(SAWProtocolInputStreamImpl.java:232)
at com.siebel.analytics.web.sawconnect.sawprotocol.SAWProtocolInputStreamImpl.startReadingNewMessage(SAWProtocolInputStreamImpl.java:46)
at com.siebel.analytics.web.sawconnect.SAWServletHttpBinding.forwardResponse(SAWServletHttpBinding.java:105)
at com.siebel.analytics.web.SAWBridge.processRequest(SAWBridge.java:307)
at com.siebel.analytics.web.SAWBridge.doGet(SAWBridge.java:325)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:743)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1213)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:658)
at com.ibm.ws.wswebcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:526)
at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:90)
at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:764)
at com.ibm.ws.wswebcontainer.WebContainer.handleRequest(WebContainer.java:1478)
at com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:133)
at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:457)
at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest(HttpInboundLink.java:515)
at com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest(HttpInboundLink.java:300)
at com.ibm.ws.http.channel.inbound.impl.HttpICLReadCallback.complete(HttpICLReadCallback.java:102)
at com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:165)
at com.ibm.io.async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)
at com.ibm.io.async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)
at com.ibm.io.async.AsyncFuture.completed(AsyncFuture.java:136)
at com.ibm.io.async.ResultHandler.complete(ResultHandler.java:196)
at com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:751)
at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:881)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1551)
Can u provide suggestion for below :
"Receiver-146" daemon prio=10 tid=0x00007fb3fc010000 nid=0x7642 runnable [0x00007fb5906c5000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.
socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0x00000007688f1ff0> (a java.io.BufferedInputStream)
at org.smpp.TCPIPConnection.receive(TCPIPConnection.java:413)
at org.smpp.ReceiverBase.receivePDUFromConnection(ReceiverBase.java:197)
at org.smpp.Receiver.receiveAsync(Receiver.java:351)
at org.smpp.ReceiverBase.process(ReceiverBase.java:96)
at org.smpp.util.ProcessingThread.run(ProcessingThread.java:199)
at java.lang.Thread.run(Thread.java:722)
- SP
Hi P-H,
Thank you for this helpful article. Is it possible for the hanging to happen due to out of memory issues on the server that our application is running on? I know normally we should get out of memory exceptions in the application, but are there cases that no exceptions are thrown and the thread just hangs?
- E
Hi - E,
Yes, it is indeed possible. This can happen if the response payload is very large, thus triggering excessive garbage collection.
In order to provide this, look at the history of your Java Heap / GC logs when Thread hang are observed and determine if this could be a source of your problem.
Thanks.
P-H
Hi P-H:
I have met the issue like Problem pattern #2.
Will this stuck thread cause the whole server hangs? My application is a J2EE one using weblogic and oracle database.
And does there have any other reasons which may cause the server hangs?
The stack trace is below:
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at oracle.net.ns.Packet.receive(Packet.java:300)
at oracle.net.ns.DataPacket.receive(DataPacket.java:106)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:315)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:260)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:185)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:102)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:124)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:80)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1137)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:290)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:192)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:884)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1167)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1289)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3593)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3637)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1495)
at weblogic.jdbc.wrapper.PreparedStatement.executeQuery(PreparedStatement.java:130)
...
Thanks
Hi Felix,
This problem will not cause you full server to hang unless you have too many threads stuck in this condition.
From the stacktrace, it is clear that you are getting high DB response time and/or large data fetch. Were you able to identify the problem via Oracle AWR report?
Thanks.
Post a Comment