Volume 22, Issue 2, April 2016, Pages 304–311
M. Manimaraboopathy1, M. Anto Bennet2, M. Kalpana3, S. Premalatha4, and G. Gayathri5
1 Assistant Professor, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
2 Professor, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
3 UG Student, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
4 UG Student, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
5 UG Student, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
Original language: English
Copyright © 2016 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The proposed OCR algorithm to retrieve the text in the scanned document images. Here the text detection algorithm based on two machine learning classifiers: one allows generating candidate word regions and the other filters out non-text ones. The extract connected components (CCs) in images by using the maximally stable extremal region algorithm. In CC clustering adaboost classifiers are used to determine whether the region contains text or not. Then using binarization method, the gray image is converted into binary image. The binarization outcomes are subject to OCR and the corresponding result is evaluated with respect to character and word accuracy. As more and more text documents are scanned fast and accurate. Additional performance metrics of the percentage rates of broken and missed text, false alarms, background noise, character enlargement and merging. This effectiveness of the proposed method is also confirmed by tests carried on realistic document images. For proposed algorithm MATLAB version 13 software is used.
Author Keywords: Maximally Stable Extremal Regions(MSER), optical character recognition (OCR).
M. Manimaraboopathy1, M. Anto Bennet2, M. Kalpana3, S. Premalatha4, and G. Gayathri5
1 Assistant Professor, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
2 Professor, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
3 UG Student, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
4 UG Student, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
5 UG Student, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
Original language: English
Copyright © 2016 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
The proposed OCR algorithm to retrieve the text in the scanned document images. Here the text detection algorithm based on two machine learning classifiers: one allows generating candidate word regions and the other filters out non-text ones. The extract connected components (CCs) in images by using the maximally stable extremal region algorithm. In CC clustering adaboost classifiers are used to determine whether the region contains text or not. Then using binarization method, the gray image is converted into binary image. The binarization outcomes are subject to OCR and the corresponding result is evaluated with respect to character and word accuracy. As more and more text documents are scanned fast and accurate. Additional performance metrics of the percentage rates of broken and missed text, false alarms, background noise, character enlargement and merging. This effectiveness of the proposed method is also confirmed by tests carried on realistic document images. For proposed algorithm MATLAB version 13 software is used.
Author Keywords: Maximally Stable Extremal Regions(MSER), optical character recognition (OCR).
How to Cite this Article
M. Manimaraboopathy, M. Anto Bennet, M. Kalpana, S. Premalatha, and G. Gayathri, “Degraded Document Image Binarization Using Optical Character Recognition,” International Journal of Innovation and Scientific Research, vol. 22, no. 2, pp. 304–311, April 2016.