Page layout analysis and classification in complex scanned documents

Show full item record

Title: Page layout analysis and classification in complex scanned documents
Author: Erkilinc, Mustafa
Abstract: Page layout analysis has been extensively studied since the 1980`s, particularly after computers began to be used for document storage or database units. For efficient document storage and retrieval from a database, a paper document would be transformed into its electronic version. Algorithms and methodologies are used for document image analysis in order to segment a scanned document into different regions such as text, image or line regions. To contribute a novel approach in the field of page layout analysis and classification, this algorithm is developed for both RGB space and grey-scale scanned documents without requiring any specific document types, and scanning techniques. In this thesis, a page classification algorithm is proposed which mainly applies wavelet transform, Markov random field (MRF) and Hough transform to segment text, photo and strong edge/ line regions in both color and gray-scale scanned documents. The algorithm is developed to handle both simple and complex page layout structures and contents (text only vs. book cover that includes text, lines and/or photos). The methodology consists of five modules. In the first module, called pre-processing, image enhancements techniques such as image scaling, filtering, color space conversion or gamma correction are applied in order to reduce computation time and enhance the scanned document. The techniques, used to perform the classification, are employed on the one-fourth resolution input image in the CIEL*a*b* color space. In the second module, the text detection module uses wavelet analysis to generate a text-region candidate map which is enhanced by applying a Run Length Encoding (RLE) technique for verification purposes. The third module, photo detection, initially uses block-wise segmentation which is based on basis vector projection technique. Then, MRF with maximum a-posteriori (MAP) optimization framework is utilized to generate photo map. Next, Hough transform is applied to locate lines in the fourth module. Techniques for edge detection, edge linkages, and line-segment fitting are used to detect strong-edges in the module as well. After those three classification maps are obtained, in the last module a final page layout map is generated by using K-Means. Features are extracted to classify the intersection regions and merge into one classification map with K-Means clustering. The proposed technique is tested on several hundred images and its performance is validated by utilizing Confusion Matrix (CM). It shows that the technique achieves an average of 85% classification accuracy rate in text, photo, and background regions on a variety of scanned documents like articles, magazines, business-cards, dictionaries or newsletters etc. More importantly, it performs independently from a scanning process and an input scanned document (RGB or gray-scale) with comparable classification quality.
Record URI: http://hdl.handle.net/1850/14330
Date: 2011-07

Files in this item

Files Size Format View
MErkilincThesis7-2011.pdf 14.05Mb PDF View/Open

The following license files are associated with this item:

This item appears in the following Collection(s)

Show full item record

Search RIT DML


Advanced Search

Browse