Pages

11.01.2012

Java deadlock troubleshooting and resolution

One of the great things about JavaOne annual conferences is the presentation of several technical and troubleshooting labs presented by subject matter experts. One of these labs did especially capture my attention this year: “HOL6500 - Finding And Solving Java Deadlocks”, presented by Java Champion Heinz Kabutz. This is one of the best presentations I have seen on this subject. I recommend that you download, run and study the labs yourself.

This article will revisit this classic thread problem and summarize the key troubleshooting and resolution techniques presented. I will also expand the subject based on my own multi-threading troubleshooting experience.

Java deadlock: what is it?

A true Java deadlock can essentially be described as a situation where two or more threads are blocked forever, waiting for each other. This situation is very different from other more commons “day-to-day” thread problem patterns such as lock contention & thread races, threads waiting on blocking IO calls etc.  Such lock-ordering deadlock situation can be visualized as per below:



In the above visual example, the attempt by Thread A & Thread B to acquire 2 locks in different orders is fatal. Once threads reached the deadlocked state, they can never recover, forcing you to restart the affected JVM process.

Heinz also describes another type of deadlock: resource deadlock. This is by far the most common thread problem pattern I have seen in my experience with Java EE enterprise system troubleshooting. A resource deadlock is essentially a scenario where one or multiple threads are waiting to acquire a resource which will never be available such as JDBC Pool depletions.

Lock-ordering deadlocks

You should know by now that I am a big fan of JVM thread dump analysis; crucial skill to acquire for individuals either involved in Java/Java EE development or production support. The good news is that Java-level deadlocks can be easily identified out-of-the-box by most JVM thread dump formats (HotSpot, IBM VM…) since they contain a native deadlock detection mechanism which will actually show you the threads involved in a true Java-level deadlock scenario along with the execution stack trace. JVM thread dump can be captured via the tool of your choice such as JVisualVM, jstack or natively such as kill -3 <PID> on Unix-based OS. Find below the JVM Java-level deadlock detection section after running lab 1:


Now this is the easy part…The core of the root cause analysis effort is to understand why such threads are involved in a deadlock situation at the first place. Lock-ordering deadlocks could be triggered from your application code but unless you are involved in high concurrency programming, chances are that the culprit code is a third part API or framework that you are using or the actual Java EE container itself, when applicable.

Now let’s review below the lock-ordering deadlock resolution strategies presented by Heinz:

# Deadlock resolution by global ordering (see lab1 solution)

  • Essentially involves the definition of a global ordering for the locks that would always prevent deadlock (please see lab1 solution)

# Deadlock resolution by TryLock (see lab2 solution)

  • Lock the first lock
  • Then try to lock the second lock
  • If you can lock it, you are good to go
  • If you cannot, wait and try again
The above strategy can be implemented using Java Lock & ReantrantLock which also gives you also flexibility to setup a wait timeout in order to prevent thread starvation in the event the first lock is acquired for too long.

public interface Lock {
    void lock();
    void lockInterruptibly() throws InterruptedException;
    boolean tryLock();
    boolean tryLock(long timeout, TimeUnit unit)
        throws InterruptedException;
    void unlock();
    Condition newCondition();
}

If you look at the JBoss AS7 implementation, you will notice that Lock & ReantrantLock are widely used from core implementation layers such as:

  • Deployment service
  • EJB3 implementation (widely used)
  • Clustering and session management
  • Internal cache & data structures (LRU, ConcurrentReferenceHashMap…)
        
Now and as per Heinz’s point, the deadlock resolution strategy #2 can be quite efficient but proper care is also required such as releasing all held lock via a finally{} block otherwise you can transform your deadlock scenario into a livelock.

Resource deadlocks

Now let’s move to resource deadlock scenarios. I’m glad that Heinz's lab #3 covered this since from my experience this is by far the most common “deadlock” scenario that you will see, especially if you are developing and supporting large distributed Java EE production systems.

Now let’s get the facts right.

  • Resource deadlocks are not true Java-level deadlocks
  • The JVM Thread Dump will not magically should you these types of deadlocks. This means more work for you to analyze and understand this problem as a starting point.
  • Thread dump analysis can be especially confusing when you are just starting to learn how to read Thread Dump since threads will often show up as RUNNING state vs. BLOCKED state for Java-level deadlocks. For now, it is important to keep in mind that thread state is not that important for this type of problem e.g. RUNNING state != healthy state.
  • The analysis approach is very different than Java-level deadlocks. You must take multiple thread dump snapshots and identify thread problem/wait patterns between each snapshot. You will be able to see threads not moving e.g. threads waiting to acquire a resource from a pool and other threads that already acquired such resource and hanging…
  • Thread Dump analysis is not the only data point/fact important here. You will need to collect other facts such statistics on the resource(s) the threads are waiting for, overall middleware or environment health etc. The combination of all these facts will allow you to conclude on the root cause along with a resolution strategy which may or may not involve code change.
I will get back to you with more thread dump problem patterns but first please ensure that you are comfortable with the basic principles of JVM thread dump as a starting point.

Conclusion

I hope you had the chance to review, run and enjoy the labs from Heinz's presentation as much as I did. Concurrency programming and troubleshooting can be quite challenging but I still recommend that you spend some time trying to understand some of these principles since I’m confident you will face a situation in the near future that will force you to perform this deep dive and acquire those skills.

No comments:

Post a Comment