Selected Publications

In this work, we present a deep learning based system, PageNet, which identifies the main page region in an image in order to segment content from both textual and non-textual border noise. In PageNet, a Fully Convolutional Network obtains a pixel-wise segmentation which is post-processed into the output quadrilateral region.
HIP (in submission)

Convolutional Neural Networks (CNNs) are state- of-the-art models for document image classification tasks. However, many of these approaches rely on parameters and architectures designed for classifying natural images, which differ from document images. We question whether this is appropriate and conduct a large empirical study to find what aspects of CNNs most affect performance on document images.

Classifying pages or text lines into font categories aids transcription because single font Optical Character Recognition (OCR) is generally more accurate than omni-font OCR. We present a simple framework based on Convolutional Neural Networks (CNNs), where a CNN is trained to classify small patches of text into predefined font classes.

Binarization of degraded historical manuscript images is an important pre-processing step for many document processing tasks. We formulate binarization as a pixel classification learning task and apply a novel Fully Convolutional Network (FCN) architecture that operates at multiple image scales, including full resolution. The FCN is trained to optimize a continuous version of the Pseudo F-measure metric and an ensemble of FCNs outperform the competition winners on 4 of 7 DIBCO competitions.

Identifying the type of a scanned form greatly facilitates processing, including automated field segmentation and field recognition. Contrary to the majority of existing techniques, we focus on unsupervised type identification, where the set of form types are not known apriori, and on noisy collections that contain very similar document types. This work presents a novel algorithm: CONFIRM (Clustering Of Noisy Form Images using Robust Matching), which simultaneously discovers the types in a collection of forms and assigns each form to a type. CONFIRM matches type-set text and rule lines between forms to create domain specific features, which we show outperform Bag of Visual Word (BoVW) features employed by the current state-of-the-art.
MS Thesis

Recent Publications

Recent & Upcoming Talks

Often text lines must be localized and segmented before transcription. Detecting baselines is one way of localizing and deskewing text lines. We present a system for detecting baselines based on a Fully Convolutional Network (FCN) followed by post processing. We entered our system into the cBAD and HisDB competitions organized in conjunction with ICDAR 2017. We placed 3rd and 2nd on the simple and complex layout tracks of cBAD and 2nd on the HisDB baseline detection task.
Family History Tech Workshop 2018

The ability to automatically cluster large collections of noisy form images according to form type would improve the efficiency of organizations that currently do this by hand. Some noisy form collections contain form types that are structurally very similar, but should cluster apart. To address this issue, we propose CONFIRM - Clustering Of Noisy Form Images using Robust Metrics.
Family History Tech Workshop 2015

Recent Posts


I taught CS478 (Machine Learning and Data Mining) at Brigham Young University in the Fall 2015 semester. We covered topics such as neural networks, decision trees, K-nearest neighbor, and clustering algorithms.

I have also been a teaching assistant a few times:

  • CS236: Discrete Structures
  • CS312: Algorithm Design & Analysis
  • CS478: Machine Learning and Data Mining (x2)