Mysterious Hanging Java Process
I recently had a Java (Bamboo Dev Instance) app just hanging when it started up. Since it was a dev setup, which sometimes has other issues, I did the usual things, like:
Turn it on and off again, aka restart it.
Clean up the working directory, like a
mvn clean
,make clean
or what ever.Check out a stable branch.
Run against different sets of dependency versions
The app kept on being stuck. So, I was forced to debug it =).
First I took a few thread dumps
jstack -p {pid}
of the hanging process. Then I looked at the stack to see if I can see a thread which seems to being stuck. One thread stuck out, doing IO:
"localhost-startStop-1@15993" daemon prio=5 tid=0x1a nid=NA runnable java.lang.Thread.State: RUNNABLE at sun.nio.fs.UnixNativeDispatcher.stat0(UnixNativeDispatcher.java:-1) at sun.nio.fs.UnixNativeDispatcher.stat(UnixNativeDispatcher.java:291) at sun.nio.fs.UnixFileAttributes.get(UnixFileAttributes.java:70) at sun.nio.fs.UnixFileStore.devFor(UnixFileStore.java:55) at sun.nio.fs.UnixFileStore.<init>(UnixFileStore.java:70) at sun.nio.fs.LinuxFileStore.<init>(LinuxFileStore.java:48) at sun.nio.fs.LinuxFileSystem.getFileStore(LinuxFileSystem.java:128) at sun.nio.fs.UnixFileSystem$FileStoreIterator.readNext(UnixFileSystem.java:213) at sun.nio.fs.UnixFileSystem$FileStoreIterator.hasNext(UnixFileSystem.java:224) - locked <0x3e8a> (a sun.nio.fs.UnixFileSystem$FileStoreIterator) at org.apache.lucene.util.IOUtils.getFileStore(IOUtils.java:584) at org.apache.lucene.util.IOUtils.spinsLinux(IOUtils.java:528) at org.apache.lucene.util.IOUtils.spins(IOUtils.java:517) at org.apache.lucene.util.IOUtils.spins(IOUtils.java:492)
Ok, that looked like this thread iterates over files, indicated by the
hasNext
andreadNext
. It might iterate over a gigantic list of files?To confirm this theory, I added a break point to the
UnixNativeDispatcher.stat
method. However, the break point didn’t trigger. So, it seems stuck on thestat
syscall.Unfortunately, the thread dump doesn’t give you the variable values involved. So, I looked at the
UnixNativeDispatcher.stat
code and saw that the code takes a UnixPath as the main variable:
static void stat(UnixPath path, UnixFileAttributes attrs) throws UnixException { NativeBuffer buffer = copyToNativeBuffer(path); try { stat0(buffer.address(), attrs); } finally { buffer.release(); } }
So, I took a heap dump of the process. Since it is local, I used VisualVM.
Then I searched for the
UnixPath
instances. Luckily, there were only a few.There I saw immediately: A path to
~/storage/Music
:
Well, this immediately explained the issue. This path is a mounted path to data on a NAS, and the NAS was down. So, the syscall was hanging waiting for response of the offline NAS.
I unmounted the directories, restarted the process and it started as expected.