Selected Publications

We present a deep learning model that jointly learns text detection, segmentation, and recognition using mostly images without detection or segmentation annotations. Our Start, Follow, Read (SFR) model is composed of a Region Proposal Network to find the start position of text lines, a novel line follower network that incrementally follows and preprocesses lines of (perhaps curved) text into dewarped images suitable for recognition by a CNN-LSTM network. SFR achieves state-of-the-art results on ICDAR2017 handwriting recognition competition dataset, even without using the provided region annotations.
ECCV 2018

We address the problem of training handwriting recognition (HWR) models for low resource languages by leveraging data from high resource languages with similar scripts through transfer learning. A langauge model in the target language is used to refine a model trained on a source language. Using this approach we demonstrate improved transferability among French, English, and Spanish languages using both historical and modern handwriting datasets.
ICFHR 2018

In this work, we present a deep learning based system, PageNet, which identifies the main page region in an image in order to segment content from both textual and non-textual border noise. In PageNet, a Fully Convolutional Network obtains a pixel-wise segmentation which is post-processed into the output quadrilateral region.
HIP 2017

Convolutional Neural Networks (CNNs) are state- of-the-art models for document image classification tasks. However, many of these approaches rely on parameters and architectures designed for classifying natural images, which differ from document images. We question whether this is appropriate and conduct a large empirical study to find what aspects of CNNs most affect performance on document images.
ICDAR 2017

Classifying pages or text lines into font categories aids transcription because single font Optical Character Recognition (OCR) is generally more accurate than omni-font OCR. We present a simple framework based on Convolutional Neural Networks (CNNs), where a CNN is trained to classify small patches of text into predefined font classes.
ICDAR 2017

Binarization of degraded historical manuscript images is an important pre-processing step for many document processing tasks. We formulate binarization as a pixel classification learning task and apply a novel Fully Convolutional Network (FCN) architecture that operates at multiple image scales, including full resolution. The FCN is trained to optimize a continuous version of the Pseudo F-measure metric and an ensemble of FCNs outperform the competition winners on 4 of 7 DIBCO competitions.
ICDAR 2017

Identifying the type of a scanned form greatly facilitates processing, including automated field segmentation and field recognition. Contrary to the majority of existing techniques, we focus on unsupervised type identification, where the set of form types are not known apriori, and on noisy collections that contain very similar document types. This work presents a novel algorithm: CONFIRM (Clustering Of Noisy Form Images using Robust Matching), which simultaneously discovers the types in a collection of forms and assigns each form to a type. CONFIRM matches type-set text and rule lines between forms to create domain specific features, which we show outperform Bag of Visual Word (BoVW) features employed by the current state-of-the-art.
MS Thesis 2016

Recent Publications

Recent & Upcoming Talks

Often text lines must be localized and segmented before transcription. Detecting baselines is one way of localizing and deskewing text lines. We present a system for detecting baselines based on a Fully Convolutional Network (FCN) followed by post processing. We entered our system into the cBAD and HisDB competitions organized in conjunction with ICDAR 2017. We placed 3rd and 2nd on the simple and complex layout tracks of cBAD and 2nd on the HisDB baseline detection task.
Family History Tech Workshop 2018

The ability to automatically cluster large collections of noisy form images according to form type would improve the efficiency of organizations that currently do this by hand. Some noisy form collections contain form types that are structurally very similar, but should cluster apart. To address this issue, we propose CONFIRM - Clustering Of Noisy Form Images using Robust Metrics.
Family History Tech Workshop 2015

Teaching

I taught CS478 (Machine Learning and Data Mining) at Brigham Young University in the Fall 2015 semester. We covered topics such as neural networks, decision trees, K-nearest neighbor, and clustering algorithms.

I have also been a teaching assistant a few times:

  • CS236: Discrete Structures
  • CS312: Algorithm Design & Analysis
  • CS478: Machine Learning and Data Mining (x2)

Contact