IBM Journal of Research and Development
IBM Skip to main content
  Home     Products & services     Support & downloads     My account  

  Select a country  
Journals Home  
  Systems Journal  
Journal of Research
and Development
    Current Issue  
    Recent Issues  
    Papers in Progress  
    Search/Index  
    Orders  
    Description  
    Patents  
    Recent publications  
    Author's Guide  
  Staff  
  Contact Us  
  Related links:  
     IBM Research  

IBM Journal of Research and Development  
Volume 1, Number 4, Page 309 (1957)
Nontopical Issue
  Full article: arrowPDF   arrowCopyright info





   

A Statistical Approach to Mechanized Encoding and Searching of Literary Information

by H. P. Luhn
Written communication of ideas is carried out on the basis of statistical probability in that a writer chooses that level of subject specificity and that combination of words which he feels will convey the most meaning. Since this process varies among individuals and since similar ideas are therefore relayed at different levels of specificity and by means of different words, the problem of literature searching by machines still presents major difficulties. A statistical approach to this problem will be outlined and the various steps of a system based on this approach will be described. Steps include the statistical analysis of a collection of documents in a field of interest, the establishment of a set of "notions" and the vocabulary by which they are expressed, the compilation of a thesaurus-type dictionary and index, the automatic encoding of documents by machine with the aid of such a dictionary, the encoding of topological notations (such as branched structures), the recording of the coded information, the establishment of a searching pattern for finding pertinent information, and the programming of appropriate machines to carry out a search.
Related Subjects: Communication, scientific; Data, structures and accessing