Multi-scale network architecture

Document Image Binarization with Fully Convolutional Neural Networks


Binarization of degraded historical manuscript images is an important pre-processing step for many document processing tasks. We formulate binarization as a pixel classification learning task and apply a novel Fully Convolutional Network (FCN) architecture that operates at multiple image scales, including full resolution. The FCN is trained to optimize a continuous version of the Pseudo F-measure metric and an ensemble of FCNs outperform the competition winners on 4 of 7 DIBCO competitions. This same binarization technique can also be applied to different domains such as Palm Leaf Manuscripts with good performance. We analyze the performance of the proposed model w.r.t. the architectural hyperparameters, size and diversity of training data, and the input features chosen.

International Conference on Document Analysis and Recognition (ICDAR), IEEE.

I submitted this binarization algorithm to the 2017 Document Image Binarization Contest (DIBCO 2017) and have made the code and model files for evaluating the network available on github. This repo has two versions. One uses a densely connected CRF for post-processing, which improves results a little bit. The other version uses only the network output.

The exact models used in this paper are available in a separate repo and have an associated docker image. These include both models trained on DIBCO 2009-2014 and on the Palm Leaf Manuscripts. The models I submitted to DIBCO 2017 were trained on DIBCO 2009-2016.

An earlier version of this algorithm placed first in the binarization task of the 2016 Competition on the Analysis of Handwritten Text in Images of Balinese Palm Leaf Manuscripts.

For training the network, I used my fork of the popular deep learning library Caffe. In my fork, you can find my implementation of the loss function presented in this paper in /src/caffe/layers/weighted_fmeasure_loss_layer.cpp (.cu GPU implementation too).