IBM Journal of Research and Development
IBM Skip to main content
  Home     Products & services     Support & downloads     My account  

  Select a country  
Journals Home  
  Systems Journal  
Journal of Research
and Development
    Current Issue  
    Recent Issues  
    Papers in Progress  
    Recent publications  
    Author's Guide  
  Contact Us  
  Related links:  
     IBM Research  

IBM Journal of Research and Development  
Volume 19, Number 4, Page 398 (1975)
Nontopical Issue
  Full article: arrowPDF   arrowCopyright info


Multifont OCR Postprocessing System

by W. S. Rosenbaum, J. J. Hilliard
A series of techniques is being developed to postprocess noisy, multifont, nonformatted OCR data on a word basis to 1) determine if a field is alphabetic or numeric; 2) verify that an alphabetic word is legitimate; 3) fetch from a dictionary a set of potential entries using a garbled word as a key; and 4) error-correct the garbled word by selecting the most likely dictionary word. Four algorithms were developed using a technique called vector processing (representing alphabetic words as numeric vectors) and also by applying Bayes maximum likelihood solutions to correct the OCR output. The result was the development of a software simulator which processed sequential fields generated by the Advanced Optical Character Reader (in use by the U.S. Postal Service in New York City), performed the four functions indicated above, and selected the correct alphabetic word from a dictionary of 62,000 entries.
Related Subjects: Analytical models; Character recognition; Codes and coding; Pattern Recognition