Troubleshooting deadlock in an Apache opensource library

Apache PDFBox is a popular open-source library that facilitates java applications to work with PDF documents. Recently we encountered a Deadlock that surfaced in this library. In this post we have shared how we troubleshooted and identified the root cause of the problem.

What is Deadlock?

First let’s try to understand what ‘Deadlock’ means. Several technical definitions aren’t clear. ‘Deadlock’ definition is one among them :-). Deadlock’s definition goes like this: “Deadlock is a situation where a set of processes are blocked because each process is holding a resource and waiting for another resource acquired by some other process.”

How to troubleshoot Deadlock in an Apache opensource library

It’s always easier to learn something new through examples and pictures. Let’s look at the below practical example, which may help you to understand Deadlock better.

Fig1: Trains starting in the same track

Fig2: Trains experiencing Deadlock

Let’s say there is only one train track, and this train track has six parts(part-1, part-2, part-3, part-4, part-5, part-6). Train-A starts at part-1 and Train-B starts at Part-6 on the same train track at the same time. Both trains travel at the same speed. Under this circumstance, Train-A and Train-B will reach a Deadlock state when they reach part-3 and part-4 of the train track. Because when Train-A is in part-3 of the train track, it will be stuck waiting for part-4 of the track, which Train-B holds. On the other hand, when Train-B is in part-4, it will be stuck waiting for part-3, which Train-A holds. Thus, both the trains can’t move forward. This is a classic Deadlock situation. Similarly, once a Deadlock happens in the application, it cannot be recovered. The only way to recover from Deadlock is to restart the application. To learn more about Deadlock basics & troubleshooting, you may refer to this blog post.

Troubleshooting Deadlock

Now let’s discuss about the Deadlock problem we faced in the application. From the above explanation you can understand that Deadlock is caused due to threads. Thus, to analyze Deadlock, you need to capture thread dump from the application. Thread dump is basically a snapshot of all threads that are running in your application. It contains information such as: stack trace, thread state, thread priority, … You can capture thread dump using one of the approaches given here.

Note: Most of the time, you will not know whether the actual problem in your application is deadlock or not. What you will notice is unresponsiveness from the application. Thus, it’s safe to capture all the prominent artifacts that are essential for troubleshooting such as: Garbage Collection log, thread dump, heap dump, netstat, iostat,… you may use yCrash open source script, which would capture 360-degree data (GC log, 3 snapshots of thread dump, heap dump, netstat, iostat, vmstat, top, top -H,…) from your application stack within a minute and generate a bundle zip file.

We uploaded the captured thread dump to fastThread – a thread dump analysis tool. Tool immediately pointed out that the two threads caused the deadlock. Below is the excerpt from the fastThread report.

Fig: Deadlock pointed by fastThread

Tool pointed out the stack trace of the two threads that were in deadlock. Below is the stack trace of those two threads:

"APP_Thread_50_WorkerTask_pool-5-thread-6" #1899 prio=5 os_prio=0 tid=0x00007f894c405555 nid=0x44d1 waiting for monitor entry [0x00007f88e9c44000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.fontbox.ttf.TrueTypeFont.getTable(TrueTypeFont.java:147)
	- waiting to lock <0x00000002d216fec8> (a org.apache.fontbox.ttf.TrueTypeFont)
	at org.apache.fontbox.ttf.TrueTypeFont.getHorizontalMetrics(TrueTypeFont.java:229)
	at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:210)
	at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:191)
	- locked <0x00000002d218ca28> (a org.apache.fontbox.ttf.RAFDataStream)
	at org.apache.fontbox.ttf.TrueTypeFont.getPath(TrueTypeFont.java:676)
	at org.apache.pdfbox.pdmodel.font.PDType1Font.getPath(PDType1Font.java:638)
	at org.apache.pdfbox.rendering.Type1Glyph2D.getPathForCharacterCode(Type1Glyph2D.java:83)
	at org.apache.pdfbox.rendering.PageDrawer.drawGlyph2D(PageDrawer.java:495)
	at org.apache.pdfbox.rendering.PageDrawer.showFontGlyph(PageDrawer.java:476)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:787)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:805)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:743)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:606)
	at org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:56)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155)
	at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:277)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:347)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:268)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:228)
	at xxxxxxx.export.service.thumbnail.PDFSlideGeneratorServiceImpl$1.call(PDFSlideGeneratorServiceImpl.java:60)
	at xxxxxxx.ops.executor.util.ParallelExecutorTask.doCall(ParallelExecutorTask.java:43)
	at xxxxxxx.ops.executor.util.ParallelExecutor.executeTaskFromQueue(ParallelExecutor.java:154)
	at xxxxxxx.ops.executor.util.ParallelExecutor.access$100(ParallelExecutor.java:26)
	at xxxxxxx.ops.executor.util.ParallelExecutor$WorkerTask.doCall(ParallelExecutor.java:164)
	at xxxxxxx.ops.executor.util.UserContextAwareCallable.call(UserContextAwareCallable.java:89)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
"APP_Thread_WorkerTask_APP-pool-5-thread-5" #1898 prio=5 os_prio=0 tid=0x00007f894c898900 nid=0x44d0 waiting for monitor entry [0x00007f88e9d45000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:356)
	- waiting to lock <0x00000002d218ca28> (a org.apache.fontbox.ttf.RAFDataStream)
	at org.apache.fontbox.ttf.TrueTypeFont.getTable(TrueTypeFont.java:150)
	- locked <0x00000002d216fec8> (a org.apache.fontbox.ttf.TrueTypeFont)
	at org.apache.fontbox.ttf.TrueTypeFont.getCmap(TrueTypeFont.java:262)
	at org.apache.fontbox.ttf.TrueTypeFont.getUnicodeCmapImpl(TrueTypeFont.java:556)
	at org.apache.fontbox.ttf.TrueTypeFont.getUnicodeCmapLookup(TrueTypeFont.java:541)
	at org.apache.fontbox.ttf.TrueTypeFont.nameToGID(TrueTypeFont.java:629)
	at org.apache.fontbox.ttf.TrueTypeFont.hasGlyph(TrueTypeFont.java:698)
	at org.apache.pdfbox.pdmodel.font.PDType1Font.getNameInFont(PDType1Font.java:601)
	at org.apache.pdfbox.pdmodel.font.PDType1Font.hasGlyph(PDType1Font.java:645)
	at org.apache.pdfbox.rendering.Type1Glyph2D.getPathForCharacterCode(Type1Glyph2D.java:59)
	at org.apache.pdfbox.rendering.PageDrawer.drawGlyph2D(PageDrawer.java:495)
	at org.apache.pdfbox.rendering.PageDrawer.showFontGlyph(PageDrawer.java:476)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:787)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.showGlyph(PDFStreamEngine.java:805)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:743)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:606)
	at org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:56)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155)
	at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:277)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:347)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:268)
	at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:228)
	at xxxxxxx.export.service.thumbnail.PDFSlideGeneratorServiceImpl$1.call(PDFSlideGeneratorServiceImpl.java:60)
	at xxxxxxx.ops.executor.util.ParallelExecutorTask.doCall(ParallelExecutorTask.java:43)
	at xxxxxxx.ops.executor.util.ParallelExecutor.executeTaskFromQueue(ParallelExecutor.java:154)
	at xxxxxxx.ops.executor.util.ParallelExecutor.access$100(ParallelExecutor.java:26)
	at xxxxxxx.ops.executor.util.ParallelExecutor$WorkerTask.doCall(ParallelExecutor.java:164)
	at xxxxxxx.ops.executor.util.UserContextAwareCallable.call(UserContextAwareCallable.java:89)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

You can see the ‘APP_Thread_50_WorkerTask_pool-5-thread-6’ thread is in Deadlock with ‘APP_Thread_50_WorkerTask_pool-5-thread-5’ thread. From the stacktrace you can observe following two things:

  1. ‘APP_Thread_50_WorkerTask_pool-5-thread-6’ has acquired the lock ‘0x00000002d218ca28’ of the ‘org.apache.fontbox.ttf.RAFDataStream’ object and waiting to acquire the lock ‘0x00000002d216fec8’ of the ‘org.apache.fontbox.ttf.TrueTypeFont’ object.
  2. On the other hand, ‘APP_Thread_50_WorkerTask_pool-5-thread-5’ thread is trying to do the exact opposite, it acquired the lock ‘0x00000002d216fec8’ of ‘org.apache.fontbox.ttf.TrueTypeFont’ object and waiting to acquire the lock ‘0x00000002d218ca28’ of the ‘org.apache.fontbox.ttf.RAFDataStream’ object.

Indeed, it’s a classic Deadlock condition.

Deadlock in Apache PDFBox

If you notice two objects which are causing Deadlock are ‘org.apache.fontbox.ttf.TrueTypeFont’ and ‘org.apache.fontbox.ttf.RAFDataStream’ . Both of these objects are originating from the open source Apache PDFBox library.

Once seeing this bug, we searched in the Apache PDFBox bug database to see whether this problem was already reported or not. We couldn’t see this problem reported earlier. Thus, we went ahead and filed a new ticket in the Apache bug database with the details. Here is the ticket that we filed, for your reference. The Apache PDFBox development team was highly responsive. They started acting on it right away and issued a fix within 2 – 3 days. Great job from the PDFBox team. Truly enjoyed the open source community collaboration.