Most Java
programmers are familiar with the Java thread deadlock concept. It essentially
involves 2 threads waiting forever for each other. This condition is often the
result of flat (synchronized) or ReentrantLock (read or write) lock-ordering
problems.
Found one Java-level deadlock:
=============================
"pool-1-thread-2":
waiting to lock monitor 0x0237ada4 (object
0x272200e8, a java.lang.Object),
which is held by "pool-1-thread-1"
"pool-1-thread-1":
waiting to lock monitor 0x0237aa64 (object
0x272200f0, a java.lang.Object),
which is held by "pool-1-thread-2"
The good
news is that the HotSpot JVM is always able to detect this condition for you…or
is it?
A recent
thread deadlock problem affecting an Oracle Service Bus production environment
has forced us to revisit this classic problem and identify the existence of “hidden”
deadlock situations.
This
article will demonstrate and replicate via a simple Java program a very special
lock-ordering deadlock condition which is not detected by the latest HotSpot JVM
1.7. You will also find a video at the end of the article explaining you the
Java sample program and the troubleshooting approach used.
The crime scene
I usually
like to compare major Java concurrency problems to a crime scene where you play
the lead investigator role. In this context, the “crime” is an actual
production outage of your client IT environment. Your job is to:
- Collect all the evidences, hints & facts (thread
dump, logs, business impact, load figures…)
- Interrogate the witnesses & domain experts (support
team, delivery team, vendor, client…)
The next
step of your investigation is to analyze the collected information and
establish a potential list of one or many “suspects” along with clear proofs.
Eventually, you want to narrow it down to a primary suspect or root cause. Obviously
the law “innocent until proven guilty” does not apply here, exactly the opposite.
Lack of
evidence can prevent you to achieve the above goal. What you will see next is
that the lack of deadlock detection by the Hotspot JVM does not necessary prove
that you are not dealing with this problem.
The suspect
In this
troubleshooting context, the “suspect” is defined as the application or
middleware code with the following problematic execution pattern.
- Usage of FLAT lock followed by the usage of ReentrantLock
WRITE lock (execution path #1)
- Usage of ReentrantLock READ lock followed by the
usage of FLAT lock (execution path #2)
- Concurrent execution performed by 2 Java threads but
via a reversed execution order
The above lock-ordering
deadlock criteria’s can be visualized as per below:
Now let’s replicate
this problem via our sample Java program and look at the JVM thread dump output.
Sample Java program
This above
deadlock conditions was first identified from our Oracle OSB problem case. We
then re-created it via a simple Java program. You can download the entire
source code of our program here.
The
program is simply creating and firing 2 worker threads. Each of them execute a
different execution path and attempt to acquire locks on shared objects but in
different orders. We also created a deadlock detector thread for monitoring and
logging purposes.
For now,
find below the Java class implementing the 2 different execution paths.
package org.ph.javaee.training8;
import java.util.concurrent.locks.ReentrantReadWriteLock;
/**
* A
simple thread task representation
* @author Pierre-Hugues Charbonneau
*
*/
public class Task
{
// Object used for FLAT lock
private final Object sharedObject = new Object();
// ReentrantReadWriteLock used for WRITE
& READ locks
private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
/**
*
Execution pattern #1
*/
public void executeTask1() {
// 1. Attempt to acquire a
ReentrantReadWriteLock READ lock
lock.readLock().lock();
// Wait 2 seconds to simulate some work...
try { Thread.sleep(2000);}catch (Throwable
any) {}
try {
// 2. Attempt to acquire a Flat lock...
synchronized (sharedObject) {}
}
// Remove the READ lock
finally {
lock.readLock().unlock();
}
System.out.println("executeTask1()
:: Work Done!");
}
/**
*
Execution pattern #2
*/
public void executeTask2() {
// 1. Attempt to acquire a Flat lock
synchronized (sharedObject) {
// Wait 2 seconds to simulate some work...
try { Thread.sleep(2000);}catch (Throwable
any) {}
// 2. Attempt to acquire a WRITE lock
lock.writeLock().lock();
try {
// Do nothing
}
// Remove the WRITE lock
finally {
lock.writeLock().unlock();
}
}
System.out.println("executeTask2()
:: Work Done!");
}
public ReentrantReadWriteLock getReentrantReadWriteLock() {
return lock;
}
}
As soon ad
the deadlock situation was triggered, a JVM thread dump was generated using
JVisualVM.
Root cause: ReetrantLock READ lock behavior
The main
explanation we found at this point is associated with the usage of the
ReetrantLock READ lock. The read locks are normally not designed to have a
notion of ownership. Since there is not a record of which thread holds a read
lock, this appears to prevent the HotSpot JVM deadlock detector logic to detect
deadlock involving read locks.
Some improvements were implemented since then but we can see that the
JVM still cannot detect this special deadlock scenario.
Now if we replace the read lock (execution pattern #1) in our program by
a write lock, the JVM will finally detect the deadlock condition but why?
Found
one Java-level deadlock:
=============================
"pool-1-thread-2":
waiting for ownable synchronizer 0x272239c0,
(a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
which is held by "pool-1-thread-1"
"pool-1-thread-1":
waiting to lock monitor 0x025cad3c (object
0x272236d0, a java.lang.Object),
which is held by "pool-1-thread-2"
Java
stack information for the threads listed above:
===================================================
"pool-1-thread-2":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x272239c0> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
at
org.ph.javaee.training8.Task.executeTask2(Task.java:54)
- locked <0x272236d0> (a
java.lang.Object)
at
org.ph.javaee.training8.WorkerThread2.run(WorkerThread2.java:29)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
"pool-1-thread-1":
at
org.ph.javaee.training8.Task.executeTask1(Task.java:31)
- waiting to lock <0x272236d0> (a
java.lang.Object)
at
org.ph.javaee.training8.WorkerThread1.run(WorkerThread1.java:29)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
This is because write locks are tracked by the JVM similar to flat
locks. This means the HotSpot JVM deadlock detector appears to be currently
designed to detect:
- Deadlock on Object monitors involving FLAT locks
- Deadlock involving Locked ownable synchronizers associated with WRITE locks
The lack of read lock per-thread tracking appears to prevent deadlock detection
for this scenario and significantly increase the troubleshooting complexity.
I suggest that you read Doug Lea’s comments on this whole issue since
concerns were raised back in 2005 regarding the possibility to add per-thread read-hold tracking due to some
potential lock overhead.
Find below my troubleshooting recommendations if you suspect a hidden deadlock
condition involving read locks:
- Analyze closely the thread call stack trace, it
may reveal some code potentially acquiring read locks and preventing other
threads to acquire write locks.
- If you are the owner of the code, keep track of
the read lock count via the usage of the lock.getReadLockCount() method
I’m looking forward for your feedback, especially from individuals with
experience on this type of deadlock involving read locks.
Finally, find below a video explaining such findings via the execution
and monitoring of our sample Java program.
"Now if we replace the read lock (execution pattern #2) in our program by a write lock" wasn't this supposed to be "[...] (execution pattern #1) [...]"?
ReplyDeleteThanks anonymous for pointing that out, clear typo. I meant to replace the READ lock by a WRITE lock from execution pattern #1.
ReplyDeleteI just updated the article. Glad to see that you were following it closely.
Regards,
P-H