2019年 4 月7日
Available online at www.sciencedirect.com
AASRI Procedia 4 (2013) 306 – 312
2013 AASRI Conference on Intelligent Systems and Control
Off-Line Handwritten Character Recognition using
Features Extracted from Binarization Technique
Amit Choudharya,*, Rahul Rishib, Savita Ahlawatc
aMaharaja Surajmal Institute, New Delhi, India
bUIET, Maharshi Dayanand University, Rohtak, India
cMaharaja Surajmal Institute of Technology, New Delhi, India
The choice of pattern classifier and the technique used to extract the features are the main factors to judge the recognition accuracy and the capability of an Optical Character Recognition (OCR) system. The main focus of this work is to extract features obtained by binarization technique for recognition of handwritten characters of English language. The recognition of handwritten character images have been done by using multi-layered feed forward artificial neural network as a classifier. Some preprocessing techniques such as thinning, foreground and background noise removal, cropping and size normalization etc. are also employed to preprocess the character images before their classification. Very promising results are achieved when binarization features and the multilayer feed forward neural network classifier is used to recognize the off-line cursive handwritten characters.
Keywords: OCR; Binarization; Feature Extraction; Character Recognition; Backpropagation Algorithm; Neural Network.
The significance of piece of paper cannot be overlooked towards improving peoplersquo;s memory. It is used for both private (letters, notes, addresses, reminders, lists, diaries etc.) and official correspondence (bank cheques, tax forms, admission forms etc.). The paper is important in our daily life because it is cheap, reliable, easily available, flexible in filling, secure for future references and is easy to keep. A huge amount of important historical data is also written on papers. So, there is a great demand to digitize all these paper documents so that the people all over the world can access these important sources of knowledge. For this purpose, the image of handwritten text is preprocessed and segmented into individual characters and are recognized by a neural network classifier.
The process of reading handwritten text from the static surfaces is termed as off-line cursive handwriting recognition. Simulating the behaviour of the human brain into a machine (for the task of reading handwritten or printed text) opened innovative prospects to improve man-machine interface. For the last four decades, the classification of cursive and unconstrained handwritten characters has been a major issue in this field of research.
The off-line character recognition is an active area of research these days. As compared to machine printed character recognition, the work done by the researchers in the area of handwritten character recognition is very limited as mentioned by Apurva A. Desai . In 2002, Kundu amp; Chen  used HMM to recognize 100 postal words and reported 88.2 % recognition accuracy. In 2007, Tomoyuki et al.  used 1646 city names of European countries in the recognition experiment and the accuracy of 80.2% is achieved. In 2006, Gatos et al. used K-NN classifier to recognize 3799 words from IAM database and reported 81% accuracy.
3.Handwritten Character Database Preparation
The handwritten character images are captured with the help of a digital camera. The character images can also be scanned by using a scanner. This process is known as Image Acquisition . All the handwritten character images are converted to a uniform image format such as .bmp or .jpg so as to make all the images ready for the next processing step. Pure white background or some colored (noisy) background may be used to write/print these handwritten character images. These samples may be written with different pens of various colored ink. Character image samples contributed by 10 different people (age 15-50 years) are collected where each contributor writes 5 samples of the complete English alphabet (a-z). In this way 1300 (10times;5times;26=1300) character image samples are collected for the proposed experiment.
Preprocessing is done to remove the variability that is present in off-line handwritten characters.
In this phase of preprocessing, the input image of handwritten character in .bmp format from the local database as shown in Fig 1(a) is converted to grayscale format by using “rgb2gray” function of MATLAB and the resultant handwritten character image is shown in Fig 1(b).
Binarization is an important image processing step in which the pixel values are separated into two groups; white as background and black as foreground. Only two colors, white and black, can be present in a binary image. The goal of binarization is to minimize the unwanted information present in the image while protecting the useful information. It must preserve the maximum useful information and details present in the image, and on the other hand, it must eliminate the background noise associated with the image in an efficient way.
It is assumed that the intensity of the text is less than that of backgr