help button home button Am J Pathol ASIP WHAT IS IT?
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Order Full text via Infotrieve
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Moore, G. W.
Right arrow Articles by Miller, R. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Moore, G. W.
Right arrow Articles by Miller, R. E.

American Journal of Pathology, Vol 115, 36-41, Copyright © 1984 by American Society for Investigative Pathology


REGULAR ARTICLES

Strategies for searching medical natural language text. Distribution of words in the anatomic diagnoses of 7000 autopsy subjects

GW Moore, GM Hutchins and RE Miller

Computerized indexing and retrieval of medical records is increasingly important; but the use of natural language versus coded languages (SNOP, SNOMED) for this purpose remains controversial. In an effort to develop search strategies for natural language text, the authors examined the anatomic diagnosis reports by computer for 7000 consecutive autopsy subjects spanning a 13-year period at The Johns Hopkins Hospital. There were 923,657 words, 11,642 of them distinct. The authors observed an average of 1052 keystrokes, 28 lines, and 131 words per autopsy report, with an average 4.6 words per line and 7.0 letters per word. The entire text file represented 921 hours of secretarial effort. Words ranged in frequency from 33,959 occurrences of "and" to one occurrence for each of 3398 different words. Searches for rare diseases with unique names or for representative examples of common diseases were most readily performed with the use of computer- printed key word in context (KWIC) books. For uncommon diseases designated by commonly used terms (such as "cystic fibrosis"), needs were best served by a computerized search for logical combinations of key words. In an unbalanced word distribution, each conjunction (logical and) search should be performed in ascending order of word frequency; but each alternation (logical inclusive or) search should be performed in descending order of word frequency. Natural language text searches will assume a larger role in medical records analysis as the labor-intensive procedure of translation into a coded language becomes more costly, compared with the computer-intensive procedure of text searching.





HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 1984 by the American Society for Investigative Pathology.