Inventing a Recognition System to Rotate, Scale and Translate Invariant Characters

Research Paper (undergraduate) 2018 47 Pages

Engineering - Computer Engineering


Table of Contents

1.2.1 First Phase: Propagation
1.2.2 Second Phase: Weight Update


3.6.1. Text/Characters Segmentation and Training:





Extraction of text from documented images finds application in maximum entries which are document related in offices. The most of the popular applications which we find in public or college libraries where the entries of number of books are done by manually typing the title of book along with other credentials like name of the author and other attributes. The complete process can be made effortless with the application of a suitable algorithm or application software which can be extract the documented part from the cover of book and other parts of the book thereby reducing the manual job like typing of user. Which reduces the overall job to only arranging the book title etc.by formatting the material. [1]

The goal of document image analysis is to covert the documented information hidden on a digitized images into a well scripted symbolic representation. The main information carriers are the textual parts in most of the applications. Hence it is important to locate text blocks within the images, recognize the text and extract the hidden documents. The documents consists of texts, graphics and images which may overlap. Since the alignment of text may not be always horizontally aligned thus finding the documented images and segmenting words, characters and lines are not a trivial task. Due to the huge reduction in the memory space of the concluded results, it is fruitful to generate transmit and save the document in the reevaluated form. The region which is extracted can be processed by number of steps depending on their types e.g. OCR for documented images blocks and compression for graphical images together with halftone images. Up to now several strategies have been tried to solve the problem of segmentation. [2]

The proposed system consists of two parts: in one part the features set is presented to the neural network for training of the neural network system. Once the system is trained for all the characters, the same may be used for application character recognition using the adjusted or trained weights. The weights are basically the neural network intelligence gained during the training part. [5]


Optical Character Recognition (OCR) is the process of turning a scanned document into editable and machine-readable text. At Docparser, we automatically apply OCR whenever we detect that your uploaded document is a scanned image. The accuracy of OCR is usually near to 100% if your document comes in a professional scan quality. There are however various situations where OCR can yield to less accurate results, including:

- The font size of the document is very small
- The scanned image contains scanning artifacts (pixel noise, black paper borders)
- The text is not surrounded by a white background
- The scanned image has low black and white contrasts
- The document was not well aligned during scanning and the image is skewed

A high quality scan has the following attributes:

- A resolution of 200 - 300 DPI.
- Well aligned and no skewing
- High black & white contrasts
- No scanning artifacts (pixel noise, black paper borders)

The data is captured by the Capture Center which is stored in paper document and in other type of contents like email and faxes with attached documents, forms, images and complex multiple documents. Thus, Capture Center curtail manual keying, paper supervision improves the quality of data, accelerates business processing and saves money. [6]

OpenText Capture Center efficiently and instantly digitized and captures scripts, forms and faxes by pulling from sources like Multi-Function Peripherals (MFPs), scanning devices, folders of file systems, email servers and FTP sites. To classify the type of documents firstly capture center implement document recognition functionality (to regulate if it is an invoice a bill, insurance claim or some other defined type of document). Then using Intelligent Character Recognition (ICR), Intelligent Document Recognition (IDR) and Optical Character Recognition (OCR) capture center extracts data from digital images.

With the help for manual data entry and in case of unrecognizable data for exception handling capture center certify the document and digitized documents to OpenText content server, Microsoft SharePoint Environment or OpenText Transaction content processing application.

In past few years the major improvement in OCR is about how the algorithm understands the whole structure of document rather than recognition at character level. This is document analysis. Talking theoretical if we compare two algorithm that has similar character recognition and one algorithm has document analysis and other did not then the former would win. [7]

Document analysis is all about how the algorithm breaks apart he contents of a document like paragraphs, columns, lines, words, graphics etc. Without Documents Analysis the algorithm is blind in reference to OCR and it will assume every object in the content as text. Which leads to clustering of lines. The next aspect of document analysis is related to delivery of formatting at the dump that matches to the formatting in the document which can include style and size of font.

In case of traditional documents the result with document analysis will obtain the result spot on. This is an important aspect, not just for editing but also for maintaining the readability of the documents. One more important aspect of document analysis is to find the reading order. e.g. if there is multiple column and paragraph in the document, the algorithm has to decide in which order the reading flows. This comes useful while recognition but in case of formatting document converted to text files there could be a chance of confusion. [8]

With the help of module roman text, text recognition can be done. This module consider the entire image consists of only text, which means the non-text region should be removed earlier. Here firstly we document how module roman text is applicable from a user point of view and then jump to page segmentation cannot replace be replaced with a custom segmentation cannot be replaced with a custom segmentation algorithm thus need to develop a new toolkit for OCR.

It is quite possible to overlap directly the method of segment, its is more desired to overwrite one of the few functions called in the segment, because in maximum cases of application only some specific segmentation steps might be needed to get replaced. The page to line will split the page in segments which is lines. The algorithm is based on bounding box merging used in the base class page. The order lines shortens the line segments in the reading order.

Lines to chars separate each lines into characters. This outcome is saved in the variable Textlines, which is a list from Textline objects. In the base class page the character segmentation algorithm uses a connected component segmentation which is then followed by merging of diacritical sign to the main characters.

Chars to words gathered the characters in every line to words and saved the words in the various textlines in Textline.words. The word grouping is done in the base class. [9]


The back-propagation learning algorithm can be separated into two phases. First phase is propagation and second phase is weight update.

1.2.1 First Phase: Propagation

Every propagation includes below steps:

1. In Forward propagation method the training pattern's input is invoked through neural network for generating the output activation.
2. In Backward propagation method the output of propagation activities through neural network with the use of training and patterns target for generating all outputs and hidden neurons.

1.2.2 Second Phase: Weight Update

Every Weight-synapse include following steps:

1. The gradient of the weight is generated by multiplying input activation and output delta.
2. The weight is brought in opposite direction to the gradient by finding difference of ratio of it from the given weight.

The above ratio affects pace and learning quality which is known as learning rate. In learning rate the sign of the gradient related to a weight indicates in which portion the error is increasing. That is the reason why the weight must be updated in the reverse direction. [10]

Repetition of phase one and two number of times will make the performance of the network satisfactory.

The back-propagation algorithm states:

1. The back-propagation algorithm works in two stages: Firstly, the training phase in which some sample data for training are provided as input at input layer, to train the network with some predefined collection of data classes. Secondly, the testing phase in which the random test data is provided to the input layer for anticipating the applied patterns.

2. Since the supervised learning approach is the basis of this algorithm, therefore the final desired output is already known to the network. In case of inconsistency between the desired result and current output, the derivation between the two is back-propagated as a input to the input layer. Hence the weight functions of the perceptions are calibrated so as to pull the error up to the level of error tolerance factor range.

3. This back-propagation algorithm can be operated in two modes: First is Incremental mode and second is Batch mode. In Incremental mode each propagation is followed right after the weight adjustment. In Batch mode the updating of weight takes place after number of consecutive propagations. Normally batch mode is preferred over incremental mode. The reason behind is the less consumption of time and less number of propagative iterations. In batch mode, a systematic pattern is available at the input layer. The neurons present at the input layer pass the pattern to the next layer of neurons. The next layer of neurons are basically the hidden layer in this case. Using Threshold functions the output at the hidden layer neurons are produced along with the activations which is determined with the manipulation of weight and inputs. The threshold function is generated as: 1/ (1 + exp (-x)) here x is the activation function value. x is generated by multiplying the value of weight vector with the value of input pattern vector.

4. The output of the hidden layer neurons turn into input for the neurons of output layer, which is further processed the same saturation function.

5. The final result as an output of the network is calculated using the activations from the neurons of output layer.

6. The comparison of computed pattern with the input pattern happens and in case of any discrepancy for each component of the pattern an error function is determined. On the basis of the weight adjustment between the outer layer and the hidden layer is computed. An almost similar computation which is based on the error in output, is made for connection weights in-between the hidden layer neurons and the input layer neurons. This process is repeated number of times until the error function falls in the range of error tolerance factor which is set by the user.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1.1: Proposed Back-Propagation Neural Network Structure

The input neurons are Entropy, Energy, Contrast, Homogeneity, Correlation, Area, Perimeter, Mean Radius, Standard Deviation and Threshold. Structure of Back Propagation Network is shown above in Figure 1.1. [11]


In delta learning rule training, during the implementation of the back-propagation training the input patterns are acknowledge sequentially. When the pattern is submitted and the outcome of the association or classification is inaccurate then the threshold together with synaptic weight are adjusted in a way such that the least mean square classification error is decreased. The mapping of input/output, the comparison of the actual value to the target value and the adjustment is done, if required, and keep on doing until all the mapping examples and output of the training set are brought down to an acceptable error range. Normally the nature of mapping error is cumulative and computation is done on the full cycle of training set. While the phases of association and classification, the operation of trained neural network is in a feed forward manner. Yet, the adjustment of weight enforced by the learning proliferate in backward manner from output layer through “hidden layer” towards input layer.

Step-1 : Compute total weighted input Xi by the use of following:

Abbildung in dieser Leseprobe nicht enthalten

Here yi refers to activity level of jth unit in the previous layer. Wij refers the weight of the connection in between the ith unit and the jth unit.

Step-2 : Calculate yj using the function of total weighted input.

Abbildung in dieser Leseprobe nicht enthalten

Once the exercise of all yield units have been determined, the system figure out the error E, which is characterized by the expression:

Abbildung in dieser Leseprobe nicht enthalten

where yj resembles activity level of the jth unit in the uppermost layer and dj is the wanted output of the jth unit . Following are the four steps of the back-propagation algorithm :

1. Compute how quick the error changes as the movement of an output unit is changed. Here error derivative (EA) is the difference between the actual activity and the desired activity.

Abbildung in dieser Leseprobe nicht enthalten

2. Compute how quick the error changes as the complete input gathered by an output unit is changed. The quantity (EI) is calculated by using EAj from step 1 is multiplied by the rate in which the output of a unit changes as the complete input is changed.

Abbildung in dieser Leseprobe nicht enthalten

3. Compute how quick the error fluctuate as a weight on the connection into an output unit fluctuates. The quantity (EW) is calculated using EI from step 2 which is multiplied by activity level of the unit from which the connection exudes.

Abbildung in dieser Leseprobe nicht enthalten

4. Compute how quick the error fluctuates as the activity of a unit in the preceding layer is changed. This important step will allows the back-propagation method to be implemented to multilayer networks. When there is any change in the activity of a unit in the previous layer, it is going to affects the activities of all the connected output units. So in order to compute the comprehensive effect on the error, we need to add all together these separate effects on output units. But each and every effect is simpler to calculate. This is the answer (EIj) in step 2 which is multiplied by the weight on the connection to that output unit.

Abbildung in dieser Leseprobe nicht enthalten

The main asset of this algorithm is that it is very simpler to implement and is well applicable to bring an output to all the complicated patterns. Furthermore, it is faster and efficient to implement this algorithm depending on the magnitude of input-output data which is present in the layers. [12]


Desaim et al. (1994) exhibits distinguishing proof of pictures independent of their area, size or introduction is one of the critical undertakings in example investigation. Utilization of worldwide minute elements has been a standout amongst the most well-known systems for this reason. We exhibit a straightforward and successful technique for picture representation and distinguishing proof which uses nearby outspread snippets of portions of picture as components rather than worldwide elements and a basic classifier, for example, the closest neighbor classifier. The procedure measurements not require interpretation, scaling or pivot of the picture. Besides, it is suitable for parallel usage and consequently is helpful for continuous applications. The arrangement ability of the procedure is shown by analyses on scaled, turned and loud pictures of upper and lower case characters and digits of English alphabet. [15]

Cheung et al. (1998) states that the optical character acknowledgment frameworks enhance human-machine communication and are generally utilized as a part of numerous administration and business offices. Following number of years of escalated examination, OCR frameworks for most scripts are very much created. Be that as it may, not for script in Arabic language. Since Arabic language is a prevalent script, Arabic OCR Systems ought to have awesome business esteem. Therefore an acknowledgment based Arabic OCR framework is included in this paper. It comprises of the picture procurement, preprocessing, and division, and character fracture, blend of character pieces, highlight extraction, and characterization. A sign is bolstered back to enhance and decide the division acknowledgment result. [17]

Tan (1998) concerns the extraction of pivot invariant surface components and the utilization of such elements in script distinguishing proof from report pictures. Revolution invariant surface elements are processed in view of an expansion of the prevalent multi-channel Gabor separating procedure, and their viability is tried with 300 haphazardly turned specimens of 15 Brodatz compositions. These components are then utilized as a part of an endeavor to explain a down to earth however up to this point for the most part neglected issue in record picture preparing—the distinguishing proof of the written document of a machine printed report. Programmed script and dialect acknowledgment is a vital process of front-end for the proficient and right utilization of OCR and dialect interpretation items in a environment of multilingual. Six dialects (Russian, Persian Chinese, English, Malayalam and Greek) are exhibited the capability of such a composition based methodology in identification of script. [16]

Vehtari et al. (2000) exhibit the benefits of exercising image analysis by Bayesian multi-layer perceptron (MLP) neural networks. The Bayesian approach delivers constant way to do implication by merging the confirmation from the data to erstwhile information from the problem. A real-world problem with MLPs is to choose the right intricacy for the model, i.e. the appropriate number of hidden units or correct parameters. The Bayesian approach suggests competent tools for evading over fitting even with the very multifarious models. It also facilitates approximation of the intervals of the final results. Here we review the Bayesian methods and present the result which comprises the two case studies. In the first of the two cases, to solve the inverse issue in the electrical impedance tomography MLPs were used. The Bayesian MLP delivered consistently improved outcomes than other approaches. In the second case study, the objective was to pinpoint trunks of trees in the forest. Using Bayesian MLP it became possible to practice outsized number of possibly suitable features. This leads us for defining the significance of the features automatically. [2]

Frias-Martinez et al. (2001) narrates the problem of Signature verifications is more recognized as compare to the automatic signature recognition despite the fact that automatic signature recognition is a potential application for processing historical and legal documents and accessing security-sensitive facilities. This paper shelters a capable logged off human mark acknowledgment framework which depends on Support Vector Machines (SVM) and contrasts its execution and a conventional characterization system, Multi Layer Perceptron (MLP). In either cases there are two ways to deal with the issue: (1) develop every component vector utilizing an arrangement of worldwide geometric and minute based attributes from every mark and build the element vector utilizing the bitmap of the comparing mark. We likewise display a system to catch the variability of every client utilizing only one unique mark. Our outcomes exactly demonstrate that SVM, which accomplishes up to 71% right acknowledgment rate, beats MLP. [5]



ISBN (eBook)
ISBN (Book)
File size
1 MB
Catalog Number
Institution / College
University of the Punjab – Guru Nanak Dev Engineering College, Ludhiana
rotation scale translation invariant character recognition system using neutral network




Title: Inventing a Recognition System to Rotate, Scale and Translate Invariant Characters